From: наб <nabijaczleweli@nabijaczleweli.xyz>
To: Alejandro Colomar <alx.manpages@gmail.com>
Cc: linux-man@vger.kernel.org
Subject: Re: [PATCH v4 6/6] regex.3: Destandardeseify Match offsets
Date: Thu, 20 Apr 2023 17:05:53 +0200 [thread overview]
Message-ID: <tffsidfrewe4yxhfr534jsznrbv526x2ilrk7k7vfpwabpmxne@5bilrttama4e> (raw)
In-Reply-To: <09c3c0bf-79e4-1cdf-8aa7-b82155aa3f47@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 5704 bytes --]
Hi!
On Thu, Apr 20, 2023 at 04:10:04PM +0200, Alejandro Colomar wrote:
> On 4/20/23 15:02, наб wrote:
> > --- a/man3/regex.3
> > +++ b/man3/regex.3
> > @@ -188,37 +188,34 @@ This flag is a BSD extension, not present in POSIX.
> > .SS Match offsets
> > Unless
> > .B REG_NOSUB
> > -was set for the compilation of the pattern buffer, it is possible to
> > -obtain match addressing information.
> > -.I pmatch
> > -must be dimensioned to have at least
> > -.I nmatch
> > -elements.
> > -These are filled in by
> > +was passed to
> > +.BR regcomp (),
> > +it is possible to
> > +obtain the locations of matches within
> > +.IR string :
> > .BR regexec ()
> > -with substring match addresses.
> > -The offsets of the subexpression starting at the
> > -.IR i th
> > -open parenthesis are stored in
> > -.IR pmatch[i] .
> > -The entire regular expression's match addresses are stored in
> > -.IR pmatch[0] .
> > -(Note that to return the offsets of
> > -.I N
> > -subexpression matches,
> > +fills
> > .I nmatch
> > -must be at least
> > -.IR N+1 .)
> > -Any unused structure elements will contain the value \-1.
> > +elements of
> > +.I pmatch
> > +with results:
> > +.I pmatch[0]
> > +corresponds to the entire match,
> I still don't understand this. Does REG_NOSUB also affect pmatch[0]?
> I would have expected that it would only affect *sub*matches, that is, [>0].
Let's consult the manual:
REG_NOSUB Do not report position of matches. [...]
REG_NOSUB Compile for matching that need only report success or
failure, not what was matched. (4.4BSD)
and POSIX:
REG_NOSUB Report only success or fail in regexec().
REG_NOSUB Report only success/fail in regexec( ).
(yes; the two times it describes it, it's written differently).
POSIX says it better I think.
And, indeed:
$ cat a.c
#include <regex.h>
#include <stdio.h>
int main(int c, char ** v) {
regex_t r;
regcomp(&r, v[1], 0);
regmatch_t dt = {0, 3};
printf("%d\n", regexec(&r, v[2], 1, &dt, REG_STARTEND));
printf("%d, %d\n", (int)dt.rm_so, (int)dt.rm_eo);
}
$ cc a.c -oac
$ ./ac 'c$' 'abcdef'
0
2, 3
$ sed 's/0)/REG_NOSUB)/' a.c | cc -xc - -oac
$ ./ac 'c$' 'abcdef'
0
0, 3
...and I've just realised why you're asking ‒ I think you're reading too
much (and ahistorically) into the "SUB" bit;
heretofor I've assumed this is for "substitution", which I think is fair.
Actually, let's consult POSIX.2 (Draft 11.2):
591 Table B-8 − regcomp() cflags Argument
596 REG_NOSUB Report only success/fail in regexec().
B.5 C Binding for Regular Expression Matching, B.5.2 Description:
609 If the REG_NOSUB flag was not set in cflags, then regcomp() shall set re_nsub to
610 the number of parenthesized subexpressions [delimited by \( \) in basic regular
611 expressions or ( ) in extended regular expressions] found in pattern.
both as present-day.
B.5.5 Rationale., History of Decisions Made:
791 The working group has rejected, at least for now, the inclusion of a regsub() func-
792 tion that would be used to do substitutions for a matched regular expression.
793 While such a routine would be useful to some applications, its utility would be
794 much more limited than the matching function described here. Both regular
795 expression parsing and substitution are possible to implement without support
796 other than that required by the C Standard {7}, but matching is much more com-
797 plex than substituting. The only ‘‘difficult’’ part of substitution, given the infor-
798 mation supplied by regexec(), is finding the next character in a string when there
799 can be multibyte characters. That is a much wider issue, and one that needs a
800 more general solution.
803 In Draft 9, the interface was modified so that the matched substrings rm_sp and
804 rm_ep are in a separate regmatch_t structure instead of in regex_t. This allows a
805 single compiled regular expression to be used simultaneously in several contexts;
806 in main() and a signal handler, perhaps, or in multiple threads of lightweight
807 processes. (The preg argument to regexec() is declared with type const, so the
808 implementation is not permitted to use the structure to store intermediate
809 results.) It also allows an application to request an arbitrary number of sub-
810 strings from a regular expression. (Previous versions reported only ten sub-
811 strings.) The number of subexpressions in the regular expression is reported in
812 re_nsub in preg. With this change to regexec(), consideration was given to drop-
813 ping the REG_NOSUB flag, since the user can now specify this with a zero nmatch
814 argument to regexec(). However, keeping REG_NOSUB allows an implementation
815 to use a different (perhaps more efficient) algorithm if it knows in regcomp() that
816 no subexpressions need be reported. The implementation is only required to fill
817 in pmatch if nmatch is not zero and if REG_NOSUB is not specified. Note that the
818 size_t type, as defined in the C Standard {7}, is unsigned, so the description of
819 regexec() does not need to address negative values of nmatch.
So: yes, there was a substitution interface that got cut.
The name is actually a hold-over from
"don't allocate for ten subexpressions in regex_t".
I think changing our description to
REG_NOSUB Only report overall success. regexec() will only use pmatch
for REG_STARTEND, and ignore nmatch.
may make that more obvious.
Best,
наб
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-04-20 15:06 UTC|newest]
Thread overview: 143+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-19 17:47 [PATCH 1/2] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND наб
2023-04-19 17:48 ` [PATCH 2/2] regex.3: improve REG_STARTEND наб
2023-04-19 20:23 ` Alejandro Colomar
2023-04-19 21:20 ` наб
2023-04-19 21:45 ` Alejandro Colomar
2023-04-19 23:23 ` [PATCH v2 1/9] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND наб
2023-04-20 11:21 ` Alejandro Colomar
2023-04-19 23:23 ` [PATCH v2 2/9] regex.3: improve REG_STARTEND наб
2023-04-20 10:00 ` G. Branden Robinson
2023-04-20 11:13 ` наб
2023-04-20 18:33 ` G. Branden Robinson
2023-04-20 22:29 ` Alejandro Colomar
2023-04-21 5:00 ` G. Branden Robinson
2023-04-21 8:06 ` a straw-man `SR` man(7) macro for (sub)section cross references (was: [PATCH v2 2/9] regex.3: improve REG_STARTEND) G. Branden Robinson
2023-04-21 11:07 ` [PATCH v2 2/9] regex.3: improve REG_STARTEND Alejandro Colomar
2023-06-02 0:12 ` Alejandro Colomar
2023-06-02 0:49 ` наб
2023-06-03 17:30 ` Alejandro Colomar
2023-04-19 23:23 ` [PATCH v2 3/9] regex.3: ffix наб
2023-04-20 11:23 ` Alejandro Colomar
2023-04-19 23:23 ` [PATCH v2 4/9] regex.3: wfix наб
2023-04-20 11:27 ` Alejandro Colomar
2023-04-19 23:23 ` [PATCH v2 5/9] regex.3: ffix наб
2023-04-20 11:28 ` Alejandro Colomar
2023-04-20 12:12 ` [PATCH v3 5/9] adjtimex.2, clone.2, mprotect.2, open.2, syscall.2, regex.3: ffix, wfix наб
2023-04-20 12:52 ` Alejandro Colomar
2023-04-20 13:03 ` Alejandro Colomar
2023-04-20 14:13 ` наб
2023-04-20 14:19 ` Alejandro Colomar
2023-04-20 18:42 ` G. Branden Robinson
2023-04-20 22:40 ` Alejandro Colomar
2023-04-19 23:25 ` [PATCH v2 6/9] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: move in with regex.3 наб
2023-04-20 11:31 ` Alejandro Colomar
2023-04-20 13:02 ` [PATCH v4 1/6] regex.3: Fix subsection headings наб
2023-04-20 13:13 ` Alejandro Colomar
2023-04-20 13:24 ` наб
2023-04-20 13:35 ` Alejandro Colomar
2023-04-20 15:35 ` [PATCH v5 0/8] regex.3 momento наб
2023-04-20 15:35 ` [PATCH v5 1/8] regex.3: Desoupify regcomp() description наб
2023-04-20 16:37 ` Alejandro Colomar
2023-04-20 15:35 ` [PATCH v5 2/8] regex.3: Desoupify regexec() description наб
2023-04-20 15:35 ` [PATCH v5 3/8] regex.3: Desoupify regerror() description наб
2023-04-20 16:42 ` Alejandro Colomar
2023-04-20 18:50 ` наб
2023-04-20 16:50 ` Alejandro Colomar
2023-04-20 17:23 ` Alejandro Colomar
2023-04-20 18:46 ` наб
2023-04-20 22:45 ` Alejandro Colomar
2023-04-20 23:05 ` наб
2023-04-20 15:35 ` [PATCH v5 4/8] regex.3: Improve REG_STARTEND наб
2023-04-20 17:29 ` Alejandro Colomar
2023-04-20 19:30 ` наб
2023-04-20 19:33 ` наб
2023-04-20 23:01 ` Alejandro Colomar
2023-04-21 0:13 ` наб
2023-04-20 15:36 ` [PATCH v5 5/8] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб
2023-04-20 15:36 ` [PATCH v5 6/8] regex.3: Finalise move of reg*.3type наб
2023-04-20 15:36 ` [PATCH v5 7/8] regex.3: Destandardeseify Match offsets наб
2023-04-20 15:36 ` [PATCH v5 8/8] regex.3: Further clarify the sole purpose of REG_NOSUB наб
2023-04-20 19:36 ` [PATCH v6 0/8] regex.3 momento наб
2023-04-20 19:36 ` [PATCH v6 1/8] regex.3: Desoupify regexec() description наб
2023-04-20 23:24 ` Alejandro Colomar
2023-04-21 0:33 ` наб
2023-04-21 0:49 ` Alejandro Colomar
2023-04-20 19:36 ` [PATCH v6 2/8] regex.3: Desoupify regerror() description наб
2023-04-20 19:37 ` [PATCH v6 3/8] regex.3: Desoupify regfree() description наб
2023-04-20 23:35 ` Alejandro Colomar
2023-04-21 0:27 ` наб
2023-04-21 0:37 ` [PATCH v7 " наб
2023-04-21 0:58 ` [PATCH v6 " Alejandro Colomar
2023-04-21 1:24 ` [PATCH v7a " наб
2023-04-21 1:55 ` Alejandro Colomar
2023-04-20 19:37 ` [PATCH v6 4/8] regex.3: Improve REG_STARTEND наб
2023-04-20 23:15 ` Alejandro Colomar
2023-04-21 0:39 ` [PATCH v7 " наб
2023-04-21 1:42 ` Alejandro Colomar
2023-04-21 2:16 ` наб
2023-04-21 9:45 ` Alejandro Colomar
2023-04-21 12:13 ` наб
2023-04-21 12:21 ` Alejandro Colomar
2023-04-21 12:23 ` Alejandro Colomar
2023-04-21 10:19 ` Jakub Wilk
2023-04-21 10:22 ` Alejandro Colomar
2023-04-21 10:44 ` Jakub Wilk
2023-04-21 11:16 ` Alejandro Colomar
2023-04-21 11:34 ` наб
2023-04-21 12:46 ` Jakub Wilk
2023-04-20 19:37 ` [PATCH v6 5/8] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб
2023-04-20 19:37 ` [PATCH v6 6/8] regex.3: Finalise move of reg*.3type наб
2023-04-20 19:37 ` [PATCH v6 7/8] regex.3: Destandardeseify Match offsets наб
2023-04-20 19:37 ` [PATCH v6 8/8] regex.3: Further clarify the sole purpose of REG_NOSUB наб
2023-04-21 2:01 ` [PATCH v6 0/8] regex.3 momento Alejandro Colomar
2023-04-21 2:48 ` [PATCH v8 0/5] " наб
2023-04-21 2:48 ` [PATCH v8 1/5] regex.3: Desoupify regerror() description наб
2023-04-21 10:06 ` Alejandro Colomar
2023-04-21 12:03 ` [PATCH v9] " наб
2023-04-21 12:26 ` Alejandro Colomar
2023-04-21 12:27 ` Alejandro Colomar
2023-04-21 2:48 ` [PATCH v8 2/5] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб
2023-04-21 11:55 ` Alejandro Colomar
2023-04-21 11:57 ` Alejandro Colomar
2023-04-21 11:57 ` Alejandro Colomar
2023-04-21 2:48 ` [PATCH v8 3/5] regex.3: Finalise move of reg*.3type наб
2023-04-21 10:33 ` Alejandro Colomar
2023-04-21 10:34 ` Alejandro Colomar
2023-04-21 11:26 ` наб
2023-04-21 11:36 ` Alejandro Colomar
2023-04-21 11:49 ` наб
[not found] ` <1d2d0aa8-cb28-2d7f-c48b-7a02f907cb5b@gmail.com>
2023-04-21 11:57 ` Ralph Corderoy
2023-04-21 11:59 ` Alejandro Colomar
2023-04-21 12:03 ` Alejandro Colomar
2023-04-21 12:09 ` Ralph Corderoy
2023-04-21 12:14 ` Alejandro Colomar
2023-04-21 2:49 ` [PATCH v8 4/5] regex.3: Destandardeseify Match offsets наб
2023-04-21 10:36 ` Alejandro Colomar
2023-04-21 12:55 ` [PATCH v9] " наб
2023-04-21 13:15 ` Alejandro Colomar
2023-04-21 13:29 ` [PATCH v9a] " наб
2023-04-21 13:55 ` Alejandro Colomar
2023-04-21 2:49 ` [PATCH v8 5/5] regex.3: Further clarify the sole purpose of REG_NOSUB наб
2023-04-21 11:44 ` Alejandro Colomar
2023-04-21 10:00 ` [PATCH v8 0/5] regex.3 momento Alejandro Colomar
2023-04-20 13:02 ` [PATCH v4 2/6] regex.3: Desoupify function descriptions наб
2023-04-20 14:00 ` Alejandro Colomar
2023-04-20 14:37 ` наб
2023-04-20 13:02 ` [PATCH v4 3/6] regex.3: Improve REG_STARTEND наб
2023-04-20 14:04 ` Alejandro Colomar
2023-04-20 13:02 ` [PATCH v4 4/6] regex.3, regex_t.3type: Move regex_t.3type into regex.3 наб
2023-04-20 13:02 ` [PATCH v4 5/6] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move in with regex.3 наб
2023-04-20 14:07 ` Alejandro Colomar
2023-04-20 13:02 ` [PATCH v4 6/6] regex.3: Destandardeseify Match offsets наб
2023-04-20 14:10 ` Alejandro Colomar
2023-04-20 15:05 ` наб [this message]
2023-04-20 18:51 ` G. Branden Robinson
2023-04-21 11:34 ` Alejandro Colomar
2023-04-19 23:25 ` [PATCH v2 7/9] regex.3: destandardeseify Byte offsets наб
2023-04-19 23:26 ` [PATCH v2 8/9] regex.3: desoupify function descriptions наб
2023-04-20 11:15 ` [PATCH v3 " наб
2023-04-20 11:43 ` Alejandro Colomar
2023-04-20 11:50 ` наб
2023-04-19 23:26 ` [PATCH v2 9/9] regex.3: fix subsection headings наб
2023-04-20 11:17 ` [PATCH v3 " наб
2023-04-19 19:51 ` [PATCH 1/2] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND Alejandro Colomar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tffsidfrewe4yxhfr534jsznrbv526x2ilrk7k7vfpwabpmxne@5bilrttama4e \
--to=nabijaczleweli@nabijaczleweli.xyz \
--cc=alx.manpages@gmail.com \
--cc=linux-man@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).