linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: наб <nabijaczleweli@nabijaczleweli.xyz>
To: Alejandro Colomar <alx.manpages@gmail.com>
Cc: linux-man@vger.kernel.org
Subject: Re: [PATCH v4 6/6] regex.3: Destandardeseify Match offsets
Date: Thu, 20 Apr 2023 17:05:53 +0200	[thread overview]
Message-ID: <tffsidfrewe4yxhfr534jsznrbv526x2ilrk7k7vfpwabpmxne@5bilrttama4e> (raw)
In-Reply-To: <09c3c0bf-79e4-1cdf-8aa7-b82155aa3f47@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5704 bytes --]

Hi!

On Thu, Apr 20, 2023 at 04:10:04PM +0200, Alejandro Colomar wrote:
> On 4/20/23 15:02, наб wrote:
> > --- a/man3/regex.3
> > +++ b/man3/regex.3
> > @@ -188,37 +188,34 @@ This flag is a BSD extension, not present in POSIX.
> >  .SS Match offsets
> >  Unless
> >  .B REG_NOSUB
> > -was set for the compilation of the pattern buffer, it is possible to
> > -obtain match addressing information.
> > -.I pmatch
> > -must be dimensioned to have at least
> > -.I nmatch
> > -elements.
> > -These are filled in by
> > +was passed to
> > +.BR regcomp (),
> > +it is possible to
> > +obtain the locations of matches within
> > +.IR string :
> >  .BR regexec ()
> > -with substring match addresses.
> > -The offsets of the subexpression starting at the
> > -.IR i th
> > -open parenthesis are stored in
> > -.IR pmatch[i] .
> > -The entire regular expression's match addresses are stored in
> > -.IR pmatch[0] .
> > -(Note that to return the offsets of
> > -.I N
> > -subexpression matches,
> > +fills
> >  .I nmatch
> > -must be at least
> > -.IR N+1 .)
> > -Any unused structure elements will contain the value \-1.
> > +elements of
> > +.I pmatch
> > +with results:
> > +.I pmatch[0]
> > +corresponds to the entire match,
> I still don't understand this.  Does REG_NOSUB also affect pmatch[0]?
> I would have expected that it would only affect *sub*matches, that is, [>0].

Let's consult the manual:
  REG_NOSUB  Do not report position of matches. [...]
  REG_NOSUB  Compile for matching that need only report success or
             failure, not what was matched.                    (4.4BSD)
and POSIX:
  REG_NOSUB  Report only success or fail in regexec().
  REG_NOSUB  Report only success/fail in regexec( ).
(yes; the two times it describes it, it's written differently).

POSIX says it better I think.

And, indeed:
	$ cat a.c
	#include <regex.h>
	#include <stdio.h>
	int main(int c, char ** v) {
		regex_t r;
		regcomp(&r, v[1], 0);
		regmatch_t dt = {0, 3};
		printf("%d\n", regexec(&r, v[2], 1, &dt, REG_STARTEND));
		printf("%d, %d\n", (int)dt.rm_so, (int)dt.rm_eo);
	}

	$ cc a.c -oac
	$ ./ac 'c$' 'abcdef'
	0
	2, 3

	$ sed 's/0)/REG_NOSUB)/' a.c | cc -xc - -oac
	$ ./ac 'c$' 'abcdef'
	0
	0, 3


...and I've just realised why you're asking ‒ I think you're reading too
much (and ahistorically) into the "SUB" bit;
heretofor I've assumed this is for "substitution", which I think is fair.

Actually, let's consult POSIX.2 (Draft 11.2):
  591     Table B-8  − regcomp() cflags Argument
  596  REG_NOSUB  Report only success/fail in regexec().
B.5 C Binding for Regular Expression Matching, B.5.2 Description:
  609  If the REG_NOSUB flag was not set in cflags, then regcomp() shall set re_nsub to
  610  the number of parenthesized subexpressions [delimited by \( \) in basic regular
  611  expressions or ( ) in extended regular expressions] found in pattern.
both as present-day.

B.5.5 Rationale., History of Decisions Made:
  791  The working group has rejected, at least for now, the inclusion of a regsub() func-
  792  tion that would be used to do substitutions for a matched regular expression.
  793  While such a routine would be useful to some applications, its utility would be
  794  much more limited than the matching function described here. Both regular
  795  expression parsing and substitution are possible to implement without support
  796  other than that required by the C Standard {7}, but matching is much more com-
  797  plex than substituting. The only ‘‘difficult’’ part of substitution, given the infor-
  798  mation supplied by regexec(), is finding the next character in a string when there
  799  can be multibyte characters. That is a much wider issue, and one that needs a
  800  more general solution.

  803  In Draft 9, the interface was modified so that the matched substrings rm_sp and
  804  rm_ep are in a separate regmatch_t structure instead of in regex_t. This allows a
  805  single compiled regular expression to be used simultaneously in several contexts;
  806  in main() and a signal handler, perhaps, or in multiple threads of lightweight
  807  processes. (The preg argument to regexec() is declared with type const, so the
  808  implementation is not permitted to use the structure to store intermediate
  809  results.) It also allows an application to request an arbitrary number of sub-
  810  strings from a regular expression. (Previous versions reported only ten sub-
  811  strings.) The number of subexpressions in the regular expression is reported in
  812  re_nsub in preg. With this change to regexec(), consideration was given to drop-
  813  ping the REG_NOSUB flag, since the user can now specify this with a zero nmatch
  814  argument to regexec(). However, keeping REG_NOSUB allows an implementation
  815  to use a different (perhaps more efficient) algorithm if it knows in regcomp() that
  816  no subexpressions need be reported. The implementation is only required to fill
  817  in pmatch if nmatch is not zero and if REG_NOSUB is not specified. Note that the
  818  size_t type, as defined in the C Standard {7}, is unsigned, so the description of
  819  regexec() does not need to address negative values of nmatch.

So: yes, there was a substitution interface that got cut.
The name is actually a hold-over from
"don't allocate for ten subexpressions in regex_t".

I think changing our description to
  REG_NOSUB  Only report overall success. regexec() will only use pmatch
             for REG_STARTEND, and ignore nmatch.
may make that more obvious.

Best,
наб

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-04-20 15:06 UTC|newest]

Thread overview: 143+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-19 17:47 [PATCH 1/2] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND наб
2023-04-19 17:48 ` [PATCH 2/2] regex.3: improve REG_STARTEND наб
2023-04-19 20:23   ` Alejandro Colomar
2023-04-19 21:20     ` наб
2023-04-19 21:45       ` Alejandro Colomar
2023-04-19 23:23       ` [PATCH v2 1/9] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND наб
2023-04-20 11:21         ` Alejandro Colomar
2023-04-19 23:23       ` [PATCH v2 2/9] regex.3: improve REG_STARTEND наб
2023-04-20 10:00         ` G. Branden Robinson
2023-04-20 11:13           ` наб
2023-04-20 18:33             ` G. Branden Robinson
2023-04-20 22:29               ` Alejandro Colomar
2023-04-21  5:00                 ` G. Branden Robinson
2023-04-21  8:06                   ` a straw-man `SR` man(7) macro for (sub)section cross references (was: [PATCH v2 2/9] regex.3: improve REG_STARTEND) G. Branden Robinson
2023-04-21 11:07                   ` [PATCH v2 2/9] regex.3: improve REG_STARTEND Alejandro Colomar
2023-06-02  0:12         ` Alejandro Colomar
2023-06-02  0:49           ` наб
2023-06-03 17:30             ` Alejandro Colomar
2023-04-19 23:23       ` [PATCH v2 3/9] regex.3: ffix наб
2023-04-20 11:23         ` Alejandro Colomar
2023-04-19 23:23       ` [PATCH v2 4/9] regex.3: wfix наб
2023-04-20 11:27         ` Alejandro Colomar
2023-04-19 23:23       ` [PATCH v2 5/9] regex.3: ffix наб
2023-04-20 11:28         ` Alejandro Colomar
2023-04-20 12:12           ` [PATCH v3 5/9] adjtimex.2, clone.2, mprotect.2, open.2, syscall.2, regex.3: ffix, wfix наб
2023-04-20 12:52             ` Alejandro Colomar
2023-04-20 13:03               ` Alejandro Colomar
2023-04-20 14:13                 ` наб
2023-04-20 14:19                   ` Alejandro Colomar
2023-04-20 18:42                 ` G. Branden Robinson
2023-04-20 22:40                   ` Alejandro Colomar
2023-04-19 23:25       ` [PATCH v2 6/9] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: move in with regex.3 наб
2023-04-20 11:31         ` Alejandro Colomar
2023-04-20 13:02           ` [PATCH v4 1/6] regex.3: Fix subsection headings наб
2023-04-20 13:13             ` Alejandro Colomar
2023-04-20 13:24               ` наб
2023-04-20 13:35                 ` Alejandro Colomar
2023-04-20 15:35             ` [PATCH v5 0/8] regex.3 momento наб
2023-04-20 15:35               ` [PATCH v5 1/8] regex.3: Desoupify regcomp() description наб
2023-04-20 16:37                 ` Alejandro Colomar
2023-04-20 15:35               ` [PATCH v5 2/8] regex.3: Desoupify regexec() description наб
2023-04-20 15:35               ` [PATCH v5 3/8] regex.3: Desoupify regerror() description наб
2023-04-20 16:42                 ` Alejandro Colomar
2023-04-20 18:50                   ` наб
2023-04-20 16:50                 ` Alejandro Colomar
2023-04-20 17:23                 ` Alejandro Colomar
2023-04-20 18:46                   ` наб
2023-04-20 22:45                     ` Alejandro Colomar
2023-04-20 23:05                       ` наб
2023-04-20 15:35               ` [PATCH v5 4/8] regex.3: Improve REG_STARTEND наб
2023-04-20 17:29                 ` Alejandro Colomar
2023-04-20 19:30                   ` наб
2023-04-20 19:33                     ` наб
2023-04-20 23:01                     ` Alejandro Colomar
2023-04-21  0:13                       ` наб
2023-04-20 15:36               ` [PATCH v5 5/8] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб
2023-04-20 15:36               ` [PATCH v5 6/8] regex.3: Finalise move of reg*.3type наб
2023-04-20 15:36               ` [PATCH v5 7/8] regex.3: Destandardeseify Match offsets наб
2023-04-20 15:36               ` [PATCH v5 8/8] regex.3: Further clarify the sole purpose of REG_NOSUB наб
2023-04-20 19:36               ` [PATCH v6 0/8] regex.3 momento наб
2023-04-20 19:36                 ` [PATCH v6 1/8] regex.3: Desoupify regexec() description наб
2023-04-20 23:24                   ` Alejandro Colomar
2023-04-21  0:33                     ` наб
2023-04-21  0:49                       ` Alejandro Colomar
2023-04-20 19:36                 ` [PATCH v6 2/8] regex.3: Desoupify regerror() description наб
2023-04-20 19:37                 ` [PATCH v6 3/8] regex.3: Desoupify regfree() description наб
2023-04-20 23:35                   ` Alejandro Colomar
2023-04-21  0:27                     ` наб
2023-04-21  0:37                       ` [PATCH v7 " наб
2023-04-21  0:58                       ` [PATCH v6 " Alejandro Colomar
2023-04-21  1:24                         ` [PATCH v7a " наб
2023-04-21  1:55                           ` Alejandro Colomar
2023-04-20 19:37                 ` [PATCH v6 4/8] regex.3: Improve REG_STARTEND наб
2023-04-20 23:15                   ` Alejandro Colomar
2023-04-21  0:39                     ` [PATCH v7 " наб
2023-04-21  1:42                       ` Alejandro Colomar
2023-04-21  2:16                         ` наб
2023-04-21  9:45                           ` Alejandro Colomar
2023-04-21 12:13                             ` наб
2023-04-21 12:21                               ` Alejandro Colomar
2023-04-21 12:23                               ` Alejandro Colomar
2023-04-21 10:19                           ` Jakub Wilk
2023-04-21 10:22                             ` Alejandro Colomar
2023-04-21 10:44                               ` Jakub Wilk
2023-04-21 11:16                                 ` Alejandro Colomar
2023-04-21 11:34                             ` наб
2023-04-21 12:46                               ` Jakub Wilk
2023-04-20 19:37                 ` [PATCH v6 5/8] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб
2023-04-20 19:37                 ` [PATCH v6 6/8] regex.3: Finalise move of reg*.3type наб
2023-04-20 19:37                 ` [PATCH v6 7/8] regex.3: Destandardeseify Match offsets наб
2023-04-20 19:37                 ` [PATCH v6 8/8] regex.3: Further clarify the sole purpose of REG_NOSUB наб
2023-04-21  2:01                 ` [PATCH v6 0/8] regex.3 momento Alejandro Colomar
2023-04-21  2:48                   ` [PATCH v8 0/5] " наб
2023-04-21  2:48                     ` [PATCH v8 1/5] regex.3: Desoupify regerror() description наб
2023-04-21 10:06                       ` Alejandro Colomar
2023-04-21 12:03                         ` [PATCH v9] " наб
2023-04-21 12:26                           ` Alejandro Colomar
2023-04-21 12:27                             ` Alejandro Colomar
2023-04-21  2:48                     ` [PATCH v8 2/5] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб
2023-04-21 11:55                       ` Alejandro Colomar
2023-04-21 11:57                         ` Alejandro Colomar
2023-04-21 11:57                           ` Alejandro Colomar
2023-04-21  2:48                     ` [PATCH v8 3/5] regex.3: Finalise move of reg*.3type наб
2023-04-21 10:33                       ` Alejandro Colomar
2023-04-21 10:34                         ` Alejandro Colomar
2023-04-21 11:26                           ` наб
2023-04-21 11:36                             ` Alejandro Colomar
2023-04-21 11:49                               ` наб
     [not found]                         ` <1d2d0aa8-cb28-2d7f-c48b-7a02f907cb5b@gmail.com>
2023-04-21 11:57                           ` Ralph Corderoy
2023-04-21 11:59                             ` Alejandro Colomar
2023-04-21 12:03                               ` Alejandro Colomar
2023-04-21 12:09                               ` Ralph Corderoy
2023-04-21 12:14                                 ` Alejandro Colomar
2023-04-21  2:49                     ` [PATCH v8 4/5] regex.3: Destandardeseify Match offsets наб
2023-04-21 10:36                       ` Alejandro Colomar
2023-04-21 12:55                         ` [PATCH v9] " наб
2023-04-21 13:15                           ` Alejandro Colomar
2023-04-21 13:29                             ` [PATCH v9a] " наб
2023-04-21 13:55                               ` Alejandro Colomar
2023-04-21  2:49                     ` [PATCH v8 5/5] regex.3: Further clarify the sole purpose of REG_NOSUB наб
2023-04-21 11:44                       ` Alejandro Colomar
2023-04-21 10:00                     ` [PATCH v8 0/5] regex.3 momento Alejandro Colomar
2023-04-20 13:02           ` [PATCH v4 2/6] regex.3: Desoupify function descriptions наб
2023-04-20 14:00             ` Alejandro Colomar
2023-04-20 14:37               ` наб
2023-04-20 13:02           ` [PATCH v4 3/6] regex.3: Improve REG_STARTEND наб
2023-04-20 14:04             ` Alejandro Colomar
2023-04-20 13:02           ` [PATCH v4 4/6] regex.3, regex_t.3type: Move regex_t.3type into regex.3 наб
2023-04-20 13:02           ` [PATCH v4 5/6] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move in with regex.3 наб
2023-04-20 14:07             ` Alejandro Colomar
2023-04-20 13:02           ` [PATCH v4 6/6] regex.3: Destandardeseify Match offsets наб
2023-04-20 14:10             ` Alejandro Colomar
2023-04-20 15:05               ` наб [this message]
2023-04-20 18:51                 ` G. Branden Robinson
2023-04-21 11:34                 ` Alejandro Colomar
2023-04-19 23:25       ` [PATCH v2 7/9] regex.3: destandardeseify Byte offsets наб
2023-04-19 23:26       ` [PATCH v2 8/9] regex.3: desoupify function descriptions наб
2023-04-20 11:15         ` [PATCH v3 " наб
2023-04-20 11:43           ` Alejandro Colomar
2023-04-20 11:50             ` наб
2023-04-19 23:26       ` [PATCH v2 9/9] regex.3: fix subsection headings наб
2023-04-20 11:17         ` [PATCH v3 " наб
2023-04-19 19:51 ` [PATCH 1/2] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND Alejandro Colomar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tffsidfrewe4yxhfr534jsznrbv526x2ilrk7k7vfpwabpmxne@5bilrttama4e \
    --to=nabijaczleweli@nabijaczleweli.xyz \
    --cc=alx.manpages@gmail.com \
    --cc=linux-man@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).