* [PATCH 1/2] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND @ 2023-04-19 17:47 наб 2023-04-19 17:48 ` [PATCH 2/2] regex.3: improve REG_STARTEND наб 2023-04-19 19:51 ` [PATCH 1/2] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND Alejandro Colomar 0 siblings, 2 replies; 143+ messages in thread From: наб @ 2023-04-19 17:47 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1059 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- Also note that in int regexec(const regex_t *restrict preg, const char *restrict string, size_t nmatch, regmatch_t pmatch[restrict .nmatch], int eflags); pmatch is [1] if nmatch is 0 if eflags®_STARTEND. Or, more succinctly, regmatch_t pmatch[restrict !!(.eflags & ®_STARTEND) ?: .nmatch], Doesn't really matter, and that's a much worse signature than what's currently there, but. man3/regex.3 | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/man3/regex.3 b/man3/regex.3 index e8fed5147..d54d6024c 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -82,7 +82,9 @@ and .I pmatch arguments to .BR regexec () -are ignored if the pattern buffer supplied was compiled with this flag set. +are only used for +.B REG_STARTEND +if the pattern buffer supplied was compiled with this flag set. .TP .B REG_NEWLINE Match-any-character operators don't match a newline. -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* [PATCH 2/2] regex.3: improve REG_STARTEND 2023-04-19 17:47 [PATCH 1/2] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND наб @ 2023-04-19 17:48 ` наб 2023-04-19 20:23 ` Alejandro Colomar 2023-04-19 19:51 ` [PATCH 1/2] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND Alejandro Colomar 1 sibling, 1 reply; 143+ messages in thread From: наб @ 2023-04-19 17:48 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1652 bytes --] Explicitly spell out the ranges involved. The original wording always confused me, but it's actually very sane. Also change the [0]. to -> here to make more obvious the point that pmatch is used as a pointer-to-object, not array in this scenario. Remove "this doesn't change R_NOTBOL & R_NEWLINE" ‒ so does it change R_NOTEOL? No. That's weird and confusing. String largeness doesn't matter, known-lengthness does. Explicitly spell out the influence on returned matches (relative to string, not start of range). Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index d54d6024c..2c8b87aca 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -141,23 +141,20 @@ compilation flag above). .TP .B REG_STARTEND -Use -.I pmatch[0] -on the input string, starting at byte -.I pmatch[0].rm_so -and ending before byte -.IR pmatch[0].rm_eo . +Match +.RI [ string " + " pmatch->rm_so ", " string " + " pmatch->rm_eo ) +instead of +.RI [ string ", " string " + \fBstrlen\fP(" string )). This allows matching embedded NUL bytes and avoids a .BR strlen (3) -on large strings. -It does not use +on known-length strings. .I nmatch -on input, and does not change -.B REG_NOTBOL -or -.B REG_NEWLINE -processing. +is not consulted for this purpose. +If any matches are returned, they're relative to +.IR string , +not +.IR string " + " pmatch->rm_so . This flag is a BSD extension, not present in POSIX. .SS Byte offsets Unless -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH 2/2] regex.3: improve REG_STARTEND 2023-04-19 17:48 ` [PATCH 2/2] regex.3: improve REG_STARTEND наб @ 2023-04-19 20:23 ` Alejandro Colomar 2023-04-19 21:20 ` наб 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-19 20:23 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 2184 bytes --] Hi наб! On 4/19/23 19:48, наб wrote: > Explicitly spell out the ranges involved. The original wording always > confused me, but it's actually very sane. > > Also change the [0]. to -> here to make more obvious the point that > pmatch is used as a pointer-to-object, not array in this scenario. > > Remove "this doesn't change R_NOTBOL & R_NEWLINE" ‒ so does it change > R_NOTEOL? No. That's weird and confusing. > > String largeness doesn't matter, known-lengthness does. > > Explicitly spell out the influence on returned matches > (relative to string, not start of range). > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > man3/regex.3 | 23 ++++++++++------------- > 1 file changed, 10 insertions(+), 13 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index d54d6024c..2c8b87aca 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -141,23 +141,20 @@ compilation flag > above). > .TP > .B REG_STARTEND > -Use > -.I pmatch[0] > -on the input string, starting at byte > -.I pmatch[0].rm_so > -and ending before byte > -.IR pmatch[0].rm_eo . > +Match > +.RI [ string " + " pmatch->rm_so ", " string " + " pmatch->rm_eo ) > +instead of > +.RI [ string ", " string " + \fBstrlen\fP(" string )). Hmmm, I like this! Let's see if I understand it. pmatch[] is normally [[gnu::access(write_only, 4, 3)]] but if ((.eflags & REG_STARTEND) != 0) it's [1] and [[gnu::access(read_write, 4)]]? > This allows matching embedded NUL bytes > and avoids a > .BR strlen (3) > -on large strings. > -It does not use > +on known-length strings. > .I nmatch > -on input, and does not change > -.B REG_NOTBOL > -or > -.B REG_NEWLINE > -processing. > +is not consulted for this purpose. > +If any matches are returned, they're relative to > +.IR string , > +not > +.IR string " + " pmatch->rm_so . How are such matches returned? In pmatch[>0]? Or how? Cheers, Alex > This flag is a BSD extension, not present in POSIX. > .SS Byte offsets > Unless -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH 2/2] regex.3: improve REG_STARTEND 2023-04-19 20:23 ` Alejandro Colomar @ 2023-04-19 21:20 ` наб 2023-04-19 21:45 ` Alejandro Colomar ` (9 more replies) 0 siblings, 10 replies; 143+ messages in thread From: наб @ 2023-04-19 21:20 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2486 bytes --] Hi! On Wed, Apr 19, 2023 at 10:23:29PM +0200, Alejandro Colomar wrote: > On 4/19/23 19:48, наб wrote: > > diff --git a/man3/regex.3 b/man3/regex.3 > > index d54d6024c..2c8b87aca 100644 > > --- a/man3/regex.3 > > +++ b/man3/regex.3 > > @@ -141,23 +141,20 @@ compilation flag > > above). > > .TP > > .B REG_STARTEND > > -Use > > -.I pmatch[0] > > -on the input string, starting at byte > > -.I pmatch[0].rm_so > > -and ending before byte > > -.IR pmatch[0].rm_eo . > > +Match > > +.RI [ string " + " pmatch->rm_so ", " string " + " pmatch->rm_eo ) > > +instead of > > +.RI [ string ", " string " + \fBstrlen\fP(" string )). > Hmmm, I like this! > > Let's see if I understand it. pmatch[] is normally > [[gnu::access(write_only, 4, 3)]] > but if ((.eflags & REG_STARTEND) != 0) it's [1] and > [[gnu::access(read_write, 4)]]? I fucked the ternary in my previous mail I think, soz; I don't know if it's gnu::anything, but you could model it as { if(eflags & REG_STARTEND) read(pmatch, 1); if(!(preg->flags & REG_NOSUB)) // as "set" in regcomp() write(pmatch, nmatch); } I.e. pmatch[nmatch] must be a writable array, unless REG_NOSUB, and also, additively, *pmatch must be readable if REG_STARTEND. > > This allows matching embedded NUL bytes > > and avoids a > > .BR strlen (3) > > -on large strings. > > -It does not use > > +on known-length strings. > > .I nmatch > > -on input, and does not change > > -.B REG_NOTBOL > > -or > > -.B REG_NEWLINE > > -processing. > > +is not consulted for this purpose. > > +If any matches are returned, they're relative to > > +.IR string , > > +not > > +.IR string " + " pmatch->rm_so . > How are such matches returned? In pmatch[>0]? Or how? In the usual way in pmatch[0..nmatch]. I guess the "nmatch isn't taken into account" thing is confusing, because REG_STARTEND just adds a read. regexec() can be modelled as { const char * start, * end; if(eflags & REG_STARTEND) { start = string + pmatch->rm_so; end = string + pmatch->rm_eo; } else { start = string; end = string + strlen(string); } // match stuff in [start, end) } And that's the /only/ effect REG_STARTEND has (+ matches are returned relative to string, not to start, but that's consistent, and they just got decoupled; it bears noting it there since it's not what I expected to happen). I'll sleep on this and post something I hate less tomorrow. Best, [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH 2/2] regex.3: improve REG_STARTEND 2023-04-19 21:20 ` наб @ 2023-04-19 21:45 ` Alejandro Colomar 2023-04-19 23:23 ` [PATCH v2 1/9] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND наб ` (8 subsequent siblings) 9 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-19 21:45 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 2805 bytes --] Hi! On 4/19/23 23:20, наб wrote: > Hi! > > On Wed, Apr 19, 2023 at 10:23:29PM +0200, Alejandro Colomar wrote: >> On 4/19/23 19:48, наб wrote: >>> diff --git a/man3/regex.3 b/man3/regex.3 >>> index d54d6024c..2c8b87aca 100644 >>> --- a/man3/regex.3 >>> +++ b/man3/regex.3 >>> @@ -141,23 +141,20 @@ compilation flag >>> above). >>> .TP >>> .B REG_STARTEND >>> -Use >>> -.I pmatch[0] >>> -on the input string, starting at byte >>> -.I pmatch[0].rm_so >>> -and ending before byte >>> -.IR pmatch[0].rm_eo . >>> +Match >>> +.RI [ string " + " pmatch->rm_so ", " string " + " pmatch->rm_eo ) >>> +instead of >>> +.RI [ string ", " string " + \fBstrlen\fP(" string )). >> Hmmm, I like this! >> >> Let's see if I understand it. pmatch[] is normally >> [[gnu::access(write_only, 4, 3)]] >> but if ((.eflags & REG_STARTEND) != 0) it's [1] and >> [[gnu::access(read_write, 4)]]? > I fucked the ternary in my previous mail I think, soz; > I don't know if it's gnu::anything, but you could model it as > { > if(eflags & REG_STARTEND) > read(pmatch, 1); > > if(!(preg->flags & REG_NOSUB)) // as "set" in regcomp() > write(pmatch, nmatch); > } > > I.e. pmatch[nmatch] must be a writable array, unless REG_NOSUB, > and also, additively, *pmatch must be readable if REG_STARTEND. Ahh, now it's clear to me (I think). :) > >>> This allows matching embedded NUL bytes >>> and avoids a >>> .BR strlen (3) >>> -on large strings. >>> -It does not use >>> +on known-length strings. >>> .I nmatch >>> -on input, and does not change >>> -.B REG_NOTBOL >>> -or >>> -.B REG_NEWLINE >>> -processing. >>> +is not consulted for this purpose. >>> +If any matches are returned, they're relative to >>> +.IR string , >>> +not >>> +.IR string " + " pmatch->rm_so . >> How are such matches returned? In pmatch[>0]? Or how? > In the usual way in pmatch[0..nmatch]. > > I guess the "nmatch isn't taken into account" thing is confusing, > because REG_STARTEND just adds a read. regexec() can be modelled as > { > const char * start, * end; > if(eflags & REG_STARTEND) { > start = string + pmatch->rm_so; > end = string + pmatch->rm_eo; > } else { > start = string; > end = string + strlen(string); > } > > // match stuff in [start, end) > } > > And that's the /only/ effect REG_STARTEND has > (+ matches are returned relative to string, not to start, > but that's consistent, and they just got decoupled; > it bears noting it there since it's not what I expected to happen). > > I'll sleep on this and post something I hate less tomorrow. Sure; good night! Best, Alex > > Best, -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v2 1/9] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND 2023-04-19 21:20 ` наб 2023-04-19 21:45 ` Alejandro Colomar @ 2023-04-19 23:23 ` наб 2023-04-20 11:21 ` Alejandro Colomar 2023-04-19 23:23 ` [PATCH v2 2/9] regex.3: improve REG_STARTEND наб ` (7 subsequent siblings) 9 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-19 23:23 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1436 bytes --] In the regexec() signature regmatch_t pmatch[restrict .nmatch], is a simplification. It's actually regmatch_t pmatch[restrict ((.preg->flags & REG_NOSUB) ? 0 : .nmatch) ?: !!(.eflags & REG_STARTEND)], But speccing that would be insane. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- By the end, I think I get to a regex(3) that I don't dread opening (and that has all the info I'd want. who knew there was re_nsub?)! The main issues here are (a) it's full of standardese, entire paragraphs lifted from POSIX, or very close to that, and the POSIX dialect is hostile to human life^W^Wbeing effectively used and (b) what reads like 30 years of people adding stuff without having read any other part of the document. Almost everything repeats at least once. Funny moments outlined as they come in the messages. man3/regex.3 | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index e8fed5147..d77aac2e7 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -80,9 +80,11 @@ The .I nmatch and .I pmatch -arguments to .BR regexec () -are ignored if the pattern buffer supplied was compiled with this flag set. +arguments will be ignored for this purpose (but +.I pmatch +may still be used for +.BR REG_STARTEND ). .TP .B REG_NEWLINE Match-any-character operators don't match a newline. -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v2 1/9] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND 2023-04-19 23:23 ` [PATCH v2 1/9] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND наб @ 2023-04-20 11:21 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 11:21 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1922 bytes --] Hi! On 4/20/23 01:23, наб wrote: > In the regexec() signature > regmatch_t pmatch[restrict .nmatch], > is a simplification. It's actually > regmatch_t pmatch[restrict > ((.preg->flags & REG_NOSUB) ? 0 : .nmatch) ?: > !!(.eflags & REG_STARTEND)], > > But speccing that would be insane. > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Patch applied. Thanks! BTW, I capitalized the subject, as is house practice of using proper English sentences for the subject (after the page prefix), with the exception of not using the trailing period (which I know Branden disapproves :p). Cheers, Alex > --- > By the end, I think I get to a regex(3) that I don't dread opening > (and that has all the info I'd want. who knew there was re_nsub?)! > > The main issues here are (a) it's full of standardese, entire paragraphs > lifted from POSIX, or very close to that, and the POSIX dialect is > hostile to human life^W^Wbeing effectively used and (b) what reads like > 30 years of people adding stuff without having read any other part of > the document. Almost everything repeats at least once. > > Funny moments outlined as they come in the messages. > > man3/regex.3 | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index e8fed5147..d77aac2e7 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -80,9 +80,11 @@ The > .I nmatch > and > .I pmatch > -arguments to > .BR regexec () > -are ignored if the pattern buffer supplied was compiled with this flag set. > +arguments will be ignored for this purpose (but > +.I pmatch > +may still be used for > +.BR REG_STARTEND ). > .TP > .B REG_NEWLINE > Match-any-character operators don't match a newline. -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v2 2/9] regex.3: improve REG_STARTEND 2023-04-19 21:20 ` наб 2023-04-19 21:45 ` Alejandro Colomar 2023-04-19 23:23 ` [PATCH v2 1/9] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND наб @ 2023-04-19 23:23 ` наб 2023-04-20 10:00 ` G. Branden Robinson 2023-06-02 0:12 ` Alejandro Colomar 2023-04-19 23:23 ` [PATCH v2 3/9] regex.3: ffix наб ` (6 subsequent siblings) 9 siblings, 2 replies; 143+ messages in thread From: наб @ 2023-04-19 23:23 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1835 bytes --] Explicitly spell out the ranges involved. The original wording always confused me, but it's actually very sane. Also change the [0]. to -> here to make more obvious the point that pmatch is used as a pointer-to-object, not array in this scenario. Remove "this doesn't change R_NOTBOL & R_NEWLINE" ‒ so does it change R_NOTEOL? No. That's weird and confusing. String largeness doesn't matter, known-lengthness does. Explicitly spell out the influence on returned matches (relative to string, not start of range). Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index d77aac2e7..74f19945d 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -141,23 +141,30 @@ compilation flag above). .TP .B REG_STARTEND -Use -.I pmatch[0] -on the input string, starting at byte -.I pmatch[0].rm_so -and ending before byte -.IR pmatch[0].rm_eo . +Match +.RI [ string " + " pmatch->rm_so ", " string " + " pmatch->rm_eo ) +instead of +.RI [ string ", " string " + \fBstrlen\fP(" string )). This allows matching embedded NUL bytes and avoids a .BR strlen (3) -on large strings. -It does not use +on known-length strings. +.I pmatch +must point to a valid readable object. +If any matches are returned +.RB ( REG_NOSUB +wasn't passed to +.BR regcomp (), +the match succeeded, and .I nmatch -on input, and does not change -.B REG_NOTBOL -or -.B REG_NEWLINE -processing. +> 0), they overwrite +.I pmatch +as usual, and the +.B Byte offsets +remain relative to +.IR string +(not +.IR string " + " pmatch->rm_so ). This flag is a BSD extension, not present in POSIX. .SS Byte offsets Unless -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v2 2/9] regex.3: improve REG_STARTEND 2023-04-19 23:23 ` [PATCH v2 2/9] regex.3: improve REG_STARTEND наб @ 2023-04-20 10:00 ` G. Branden Robinson 2023-04-20 11:13 ` наб 2023-06-02 0:12 ` Alejandro Colomar 1 sibling, 1 reply; 143+ messages in thread From: G. Branden Robinson @ 2023-04-20 10:00 UTC (permalink / raw) To: наб; +Cc: Alejandro Colomar (man-pages), linux-man [-- Attachment #1: Type: text/plain, Size: 299 bytes --] Hi наб, At 2023-04-20T01:23:14+0200, наб wrote: > +> 0), they overwrite > +.I pmatch > +as usual, and the > +.B Byte offsets > +remain relative to > +.IR string > +(not > +.IR string " + " pmatch->rm_so ). I don't think "byte" needs to be captialized here. Regards, Branden [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v2 2/9] regex.3: improve REG_STARTEND 2023-04-20 10:00 ` G. Branden Robinson @ 2023-04-20 11:13 ` наб 2023-04-20 18:33 ` G. Branden Robinson 0 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-20 11:13 UTC (permalink / raw) To: G. Branden Robinson; +Cc: Alejandro Colomar (man-pages), linux-man [-- Attachment #1: Type: text/plain, Size: 404 bytes --] Hi! On Thu, Apr 20, 2023 at 05:00:59AM -0500, G. Branden Robinson wrote: > At 2023-04-20T01:23:14+0200, наб wrote: > > +> 0), they overwrite > > +.I pmatch > > +as usual, and the > > +.B Byte offsets > > +remain relative to > > +.IR string > I don't think "byte" needs to be captialized here. I'm using it as a Sx and the section is capitalised, so I think this should also be? Best, [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v2 2/9] regex.3: improve REG_STARTEND 2023-04-20 11:13 ` наб @ 2023-04-20 18:33 ` G. Branden Robinson 2023-04-20 22:29 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: G. Branden Robinson @ 2023-04-20 18:33 UTC (permalink / raw) To: наб; +Cc: Alejandro Colomar (man-pages), linux-man [-- Attachment #1: Type: text/plain, Size: 1561 bytes --] At 2023-04-20T13:13:29+0200, наб wrote: > On Thu, Apr 20, 2023 at 05:00:59AM -0500, G. Branden Robinson wrote: > > At 2023-04-20T01:23:14+0200, наб wrote: > > > +> 0), they overwrite > > > +.I pmatch > > > +as usual, and the > > > +.B Byte offsets > > > +remain relative to > > > +.IR string > > I don't think "byte" needs to be captialized here. > I'm using it as a Sx and the section is capitalised, > so I think this should also be? [Note for non-mdoc(7) speakers: `Sx` is its macro for (sub)section heading cross references. man(7) doesn't have an equivalent, though if there is demand, I'm happy to implement one. :D] Nothing I can see in man-pages(7) suggests that references to (sub)section headings should be in an unusual typeface. The norm in English is usually to quote them. It's also unusual to pun a (sub)section heading name as an ordinary noun phrase this way. So in this case I would neither capitalize _nor_ embolden the phrase. After a piece of domain-specific jargon has been introduced in technical writing (usually with italics), it is not thereafter specially marked. In long-form works, it may get a cross reference after it in parentheses or a footnote if it hasn't been mentioned for dozens of pages and the reader requires a reminder. I don't think regex(3) is large enough to warrant that consideration, and "byte offset" seems to have the meaning that a programmer already familiar with the individual terms would infer. Just the usual style coaching, not a NAK. Regards, Branden [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v2 2/9] regex.3: improve REG_STARTEND 2023-04-20 18:33 ` G. Branden Robinson @ 2023-04-20 22:29 ` Alejandro Colomar 2023-04-21 5:00 ` G. Branden Robinson 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 22:29 UTC (permalink / raw) To: G. Branden Robinson, наб; +Cc: linux-man, groff [-- Attachment #1.1: Type: text/plain, Size: 739 bytes --] Hi Branden, On 4/20/23 20:33, G. Branden Robinson wrote: > [Note for non-mdoc(7) speakers: `Sx` is its macro for (sub)section > heading cross references. man(7) doesn't have an equivalent, though if > there is demand, I'm happy to implement one. :D] I've been delaying my global switch to non-shouting sexion headings, due to not having a clear idea of how to refer to them. Having a macro that does that for me, and ensures that the appropriate formatting is applied might be a good solution. It would also please the info(1) people, so that the few references we have to those would be linked. Cheers, Alex -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v2 2/9] regex.3: improve REG_STARTEND 2023-04-20 22:29 ` Alejandro Colomar @ 2023-04-21 5:00 ` G. Branden Robinson 2023-04-21 8:06 ` a straw-man `SR` man(7) macro for (sub)section cross references (was: [PATCH v2 2/9] regex.3: improve REG_STARTEND) G. Branden Robinson 2023-04-21 11:07 ` [PATCH v2 2/9] regex.3: improve REG_STARTEND Alejandro Colomar 0 siblings, 2 replies; 143+ messages in thread From: G. Branden Robinson @ 2023-04-21 5:00 UTC (permalink / raw) To: Alejandro Colomar; +Cc: наб, linux-man, groff [-- Attachment #1: Type: text/plain, Size: 2339 bytes --] Hi Alex, At 2023-04-21T00:29:12+0200, Alejandro Colomar wrote: > On 4/20/23 20:33, G. Branden Robinson wrote: > > [Note for non-mdoc(7) speakers: `Sx` is its macro for (sub)section > > heading cross references. man(7) doesn't have an equivalent, though > > if there is demand, I'm happy to implement one. :D] > > I've been delaying my global switch to non-shouting sexion headings, > due to not having a clear idea of how to refer to them. Fair. > Having a macro that does that for me, and ensures that the appropriate > formatting is applied might be a good solution. Well, I have three ideas. 1. Mark them up the way the groff man pages do, in typographer's quotes. See \[lq]Match offsets\[rq] in .MR regex 3 . 2. I could implement the `Q` quotation macro for man(7) that I've been making noise about for a while.[1] Of course, you'd be waiting for the next release _after_ groff 1.23.0 for it... See .Q "Match offsets" in .MR regex 3 . 3. I could implement a macro explicitly tuned to the problem of (sub)section cross references. I didn't see anybody come up with a good way to shoehorn this functionality into `MR`, so I suggest the following. .SR section-or-subsection-title [page-topic page-section [trailing-punct] See .SR "Match offsets" regex 3 . . Also see .SR Bugs below. In this design, if argument 2 is present, argument 3 is mandatory. The foregoing would render as See “Match offsets” (regex(3)). Also see “Bugs” below. On devices supporting hyperlinks, "Match offsets" would be a hyperlink with a to-be-determined anchor reference. "regex(3)" would be a hyperlink as with the `MR` macro today. "Bugs" would be a hyperlink with a to-be-determined anchor reference within the current document. (OSC 8 support for this may require some thought, or maybe we'd just handle them like external page references.) I trust the tradeoffs involved with each of the above solutions are readily apparent. > It would also please the info(1) people, so that the few references we > have to those would be linked. What's the URL format for hyperlinks into Info documents? How is the existing .UR/.UE inadequate? Regards, Branden [1] https://mail.gnu.org/archive/html/groff/2022-12/msg00078.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* a straw-man `SR` man(7) macro for (sub)section cross references (was: [PATCH v2 2/9] regex.3: improve REG_STARTEND) 2023-04-21 5:00 ` G. Branden Robinson @ 2023-04-21 8:06 ` G. Branden Robinson 2023-04-21 11:07 ` [PATCH v2 2/9] regex.3: improve REG_STARTEND Alejandro Colomar 1 sibling, 0 replies; 143+ messages in thread From: G. Branden Robinson @ 2023-04-21 8:06 UTC (permalink / raw) To: Alejandro Colomar; +Cc: наб, linux-man, groff [-- Attachment #1: Type: text/plain, Size: 1383 bytes --] [self-follow-up; updated subject] At 2023-04-21T00:07:21-0500, G. Branden Robinson wrote: > 3. I could implement a macro explicitly tuned to the problem of > (sub)section cross references. I didn't see anybody come up with a > good way to shoehorn this functionality into `MR`, so I suggest the > following. > > .SR section-or-subsection-title [page-topic page-section [trailing-punct] On second thought, I think it would be better to have matched brackets here. And more seriously, to permute the argument order to feel more parallel to `MR` (as well as `ME` and `UE`). .SR section-or-subsection-title [trailing-punct [page-topic page-section]] Updating the example: See .SR "Match offsets" . regex 3 . Also see .SR Bugs below. In this design, if argument 3 is present, argument 4 is mandatory. This would need to be a pretty hard requirement. Maybe the default section, if unspecified, would be "UNKNOWN". This is rude but doesn't penalize the user any more than the document author does. (There is also precedent in mdoc(7)'s setup macros.) We don't want to `ab`ort page rendering for these errors because that will adversely affect innocent users who are simply trying to read documentation. The foregoing would render as See “Match offsets” (regex(3)). Also see “Bugs” below. Regards, Branden [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v2 2/9] regex.3: improve REG_STARTEND 2023-04-21 5:00 ` G. Branden Robinson 2023-04-21 8:06 ` a straw-man `SR` man(7) macro for (sub)section cross references (was: [PATCH v2 2/9] regex.3: improve REG_STARTEND) G. Branden Robinson @ 2023-04-21 11:07 ` Alejandro Colomar 1 sibling, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 11:07 UTC (permalink / raw) To: G. Branden Robinson; +Cc: наб, linux-man, groff [-- Attachment #1.1: Type: text/plain, Size: 5952 bytes --] Hi Branden! On 4/21/23 07:00, G. Branden Robinson wrote: > Hi Alex, > > At 2023-04-21T00:29:12+0200, Alejandro Colomar wrote: >> On 4/20/23 20:33, G. Branden Robinson wrote: >>> [Note for non-mdoc(7) speakers: `Sx` is its macro for (sub)section >>> heading cross references. man(7) doesn't have an equivalent, though >>> if there is demand, I'm happy to implement one. :D] >> >> I've been delaying my global switch to non-shouting sexion headings, >> due to not having a clear idea of how to refer to them. > > Fair. > >> Having a macro that does that for me, and ensures that the appropriate >> formatting is applied might be a good solution. > > Well, I have three ideas. > > 1. Mark them up the way the groff man pages do, in typographer's > quotes. > > See \[lq]Match offsets\[rq] in > .MR regex 3 . Not bad, but if we can have some macro that hides these details, and even lets users tune their favourite formatting, that may be nicer. As a bonus, it adds hyperlinking abilities. :-) > > 2. I could implement the `Q` quotation macro for man(7) that I've been > making noise about for a while.[1] Of course, you'd be waiting for > the next release _after_ groff 1.23.0 for it... > > See > .Q "Match offsets" > in > .MR regex 3 . I'm not yet convinced by a general need for .Q. Since the single use I've needed so far for it is in section references, I guess a .SR macro is more appropriate. > > 3. I could implement a macro explicitly tuned to the problem of > (sub)section cross references. I didn't see anybody come up with a > good way to shoehorn this functionality into `MR`, so I suggest the > following. Agree; extending .MR for that seems not easy. [... fixed in reply; below ...] > > On devices supporting hyperlinks, "Match offsets" would be a hyperlink > with a to-be-determined anchor reference. "regex(3)" would be a > hyperlink as with the `MR` macro today. "Bugs" would be a hyperlink > with a to-be-determined anchor reference within the current document. > (OSC 8 support for this may require some thought, or maybe we'd just > handle them like external page references.) > > I trust the tradeoffs involved with each of the above solutions are > readily apparent. > >> It would also please the info(1) people, so that the few references we >> have to those would be linked. > > What's the URL format for hyperlinks into Info documents? You ask me about how info(1) works? :D info(1) is to me as unknown as ed(1). At least I can quit them both with q. There's not much more I know of either. > How is the > existing .UR/.UE inadequate? I meant more that man(7) would have capabilities similar to info documents. It would only be that the current implementation of man(1) is not powerful enough to do what info(1) does, but I guess it would be conceivable to implement an info-like system that got man(7) source. Similar to what this lsp(1) proposed to the list recently could do. > > Regards, > Branden > > [1] https://mail.gnu.org/archive/html/groff/2022-12/msg00078.html On 4/21/23 10:06, G. Branden Robinson wrote: > [self-follow-up; updated subject] > > At 2023-04-21T00:07:21-0500, G. Branden Robinson wrote: >> 3. I could implement a macro explicitly tuned to the problem of >> (sub)section cross references. I didn't see anybody come up with a >> good way to shoehorn this functionality into `MR`, so I suggest the >> following. >> >> .SR section-or-subsection-title [page-topic page-section [trailing-punct] > > On second thought, I think it would be better to have matched brackets > here. And more seriously, to permute the argument order to feel more > parallel to `MR` (as well as `ME` and `UE`). > > .SR section-or-subsection-title [trailing-punct [page-topic page-section]] I like this one most, by far. However, I wonder what happens when there are conflicting names for subsections. This doesn't happen often, but certainly happens. Should we disambiguate by specifying the section and subsection in separate arguments? .SR section-or-subsection-title [trailing-punct [page-topic page-section] [section-title]] I guess that will be hard to implement, but should be doable, since it's unambiguous. It also answers what to do when the chapter is not specified: it would be interpreted as a section instead, so author's fault. Some draft examples: See .SR Description See “Description” See .SR Description . See “Description”. See .SR Compilation . Description See “Compilation” (“Description”). See .SR Description . regex 3 See “Description” (regex(3)). See .SR Compilation . regex 3 Description See “Compilation” (regex(3) “Description”). Further arguments ignored. The complex thing would be that the meaning of the 3rd arg depends on having a 4th one, but it's not so bad. Does it make sense to you? Cheers, Alex > > Updating the example: > > See > .SR "Match offsets" . regex 3 > . > Also see > .SR Bugs > below. > > In this design, if argument 3 is present, argument 4 is mandatory. This > would need to be a pretty hard requirement. Maybe the default section, > if unspecified, would be "UNKNOWN". This is rude but doesn't penalize > the user any more than the document author does. (There is also > precedent in mdoc(7)'s setup macros.) We don't want to `ab`ort page > rendering for these errors because that will adversely affect innocent > users who are simply trying to read documentation. > > The foregoing would render as > > See “Match offsets” (regex(3)). Also see “Bugs” below. > > Regards, > Branden -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v2 2/9] regex.3: improve REG_STARTEND 2023-04-19 23:23 ` [PATCH v2 2/9] regex.3: improve REG_STARTEND наб 2023-04-20 10:00 ` G. Branden Robinson @ 2023-06-02 0:12 ` Alejandro Colomar 2023-06-02 0:49 ` наб 1 sibling, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-06-02 0:12 UTC (permalink / raw) To: наб; +Cc: linux-man, G. Branden Robinson [-- Attachment #1.1: Type: text/plain, Size: 2341 bytes --] Hi! On 4/20/23 01:23, наб wrote: > Explicitly spell out the ranges involved. The original wording always > confused me, but it's actually very sane. > > Also change the [0]. to -> here to make more obvious the point that > pmatch is used as a pointer-to-object, not array in this scenario. > > Remove "this doesn't change R_NOTBOL & R_NEWLINE" ‒ so does it change > R_NOTEOL? No. That's weird and confusing. > > String largeness doesn't matter, known-lengthness does. > > Explicitly spell out the influence on returned matches > (relative to string, not start of range). > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> I forgot about this patch set. Could you please resend anything that is still pending? Thanks! > --- > man3/regex.3 | 33 ++++++++++++++++++++------------- > 1 file changed, 20 insertions(+), 13 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index d77aac2e7..74f19945d 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -141,23 +141,30 @@ compilation flag > above). > .TP > .B REG_STARTEND > -Use > -.I pmatch[0] > -on the input string, starting at byte > -.I pmatch[0].rm_so > -and ending before byte > -.IR pmatch[0].rm_eo . > +Match > +.RI [ string " + " pmatch->rm_so ", " string " + " pmatch->rm_eo ) > +instead of > +.RI [ string ", " string " + \fBstrlen\fP(" string )). > This allows matching embedded NUL bytes > and avoids a > .BR strlen (3) > -on large strings. > -It does not use > +on known-length strings. > +.I pmatch > +must point to a valid readable object. > +If any matches are returned > +.RB ( REG_NOSUB > +wasn't passed to > +.BR regcomp (), > +the match succeeded, and > .I nmatch > -on input, and does not change > -.B REG_NOTBOL > -or > -.B REG_NEWLINE > -processing. > +> 0), they overwrite > +.I pmatch > +as usual, and the > +.B Byte offsets I'm still unsure about this. Please do whatever you prefer, and let's discuss again after you send the patch(es). Cheers, Alex > +remain relative to > +.IR string > +(not > +.IR string " + " pmatch->rm_so ). > This flag is a BSD extension, not present in POSIX. > .SS Byte offsets > Unless -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v2 2/9] regex.3: improve REG_STARTEND 2023-06-02 0:12 ` Alejandro Colomar @ 2023-06-02 0:49 ` наб 2023-06-03 17:30 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-06-02 0:49 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man, G. Branden Robinson [-- Attachment #1: Type: text/plain, Size: 414 bytes --] On Fri, Jun 02, 2023 at 02:12:27AM +0200, Alejandro Colomar wrote: > I forgot about this patch set. Could you please resend anything that > is still pending? Thanks! I did too, but that's because you applied it. This particular patch is 164297a322b5dee6addff9ad4acb224302ab6e7d and the whole set is 0d120a3c76b4446b194a54387ce0e7a84b208bfd^..e894e84af353727082420c48b3cbea566a0f7692 from the looks of it? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v2 2/9] regex.3: improve REG_STARTEND 2023-06-02 0:49 ` наб @ 2023-06-03 17:30 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-06-03 17:30 UTC (permalink / raw) To: наб; +Cc: linux-man, G. Branden Robinson [-- Attachment #1.1: Type: text/plain, Size: 733 bytes --] On 6/2/23 02:49, наб wrote: > On Fri, Jun 02, 2023 at 02:12:27AM +0200, Alejandro Colomar wrote: >> I forgot about this patch set. Could you please resend anything that >> is still pending? Thanks! > I did too, but that's because you applied it. This particular patch is > 164297a322b5dee6addff9ad4acb224302ab6e7d and the whole set is > 0d120a3c76b4446b194a54387ce0e7a84b208bfd^..e894e84af353727082420c48b3cbea566a0f7692 > from the looks of it? Yep. For some reason I had marked it as unread to check it later; probably I just forgot to change that after receiving the revision of the patch. Thanks! Alex -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v2 3/9] regex.3: ffix 2023-04-19 21:20 ` наб ` (2 preceding siblings ...) 2023-04-19 23:23 ` [PATCH v2 2/9] regex.3: improve REG_STARTEND наб @ 2023-04-19 23:23 ` наб 2023-04-20 11:23 ` Alejandro Colomar 2023-04-19 23:23 ` [PATCH v2 4/9] regex.3: wfix наб ` (5 subsequent siblings) 9 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-19 23:23 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 617 bytes --] We never bold POSIX, not even anywhere else on this page. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 74f19945d..5aaf42caa 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -61,11 +61,11 @@ of zero or more of the following: .TP .B REG_EXTENDED Use -.B POSIX +POSIX Extended Regular Expression syntax when interpreting .IR regex . If not set, -.B POSIX +POSIX Basic Regular Expression syntax is used. .TP .B REG_ICASE -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v2 3/9] regex.3: ffix 2023-04-19 23:23 ` [PATCH v2 3/9] regex.3: ffix наб @ 2023-04-20 11:23 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 11:23 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 835 bytes --] Hi! On 4/20/23 01:23, наб wrote: > We never bold POSIX, not even anywhere else on this page. > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Patch applied. Thanks, Alex > --- > man3/regex.3 | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index 74f19945d..5aaf42caa 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -61,11 +61,11 @@ of zero or more of the following: > .TP > .B REG_EXTENDED > Use > -.B POSIX > +POSIX > Extended Regular Expression syntax when interpreting > .IR regex . > If not set, > -.B POSIX > +POSIX > Basic Regular Expression syntax is used. > .TP > .B REG_ICASE -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v2 4/9] regex.3: wfix 2023-04-19 21:20 ` наб ` (3 preceding siblings ...) 2023-04-19 23:23 ` [PATCH v2 3/9] regex.3: ffix наб @ 2023-04-19 23:23 ` наб 2023-04-20 11:27 ` Alejandro Colomar 2023-04-19 23:23 ` [PATCH v2 5/9] regex.3: ffix наб ` (4 subsequent siblings) 9 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-19 23:23 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 948 bytes --] "Not in POSIX.2", so is it in POSIX.1-2008? POSIX.1-2001? (or any other combination of standards from this millenion not mentioned on this page?) It's not: just say POSIX. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 5aaf42caa..b6e574b4d 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -289,7 +289,7 @@ Unknown character class name. .TP .B REG_EEND Nonspecific error. -This is not defined by POSIX.2. +This is not defined by POSIX. .TP .B REG_EESCAPE Trailing backslash. @@ -303,7 +303,7 @@ occurs prior to the starting point. .TP .B REG_ESIZE Compiled regular expression requires a pattern buffer larger than 64\ kB. -This is not defined by POSIX.2. +This is not defined by POSIX. .TP .B REG_ESPACE The regex routines ran out of memory. -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v2 4/9] regex.3: wfix 2023-04-19 23:23 ` [PATCH v2 4/9] regex.3: wfix наб @ 2023-04-20 11:27 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 11:27 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1232 bytes --] Hi! On 4/20/23 01:23, наб wrote: > "Not in POSIX.2", so is it in POSIX.1-2008? POSIX.1-2001? > (or any other combination of standards from this millenion > not mentioned on this page?) It's not: just say POSIX. > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Patch applied (with some added double-spaces to the log). Thanks! Cheers, Alex > --- > man3/regex.3 | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index 5aaf42caa..b6e574b4d 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -289,7 +289,7 @@ Unknown character class name. > .TP > .B REG_EEND > Nonspecific error. > -This is not defined by POSIX.2. > +This is not defined by POSIX. > .TP > .B REG_EESCAPE > Trailing backslash. > @@ -303,7 +303,7 @@ occurs prior to the starting point. > .TP > .B REG_ESIZE > Compiled regular expression requires a pattern buffer larger than 64\ kB. > -This is not defined by POSIX.2. > +This is not defined by POSIX. > .TP > .B REG_ESPACE > The regex routines ran out of memory. -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v2 5/9] regex.3: ffix 2023-04-19 21:20 ` наб ` (4 preceding siblings ...) 2023-04-19 23:23 ` [PATCH v2 4/9] regex.3: wfix наб @ 2023-04-19 23:23 ` наб 2023-04-20 11:28 ` Alejandro Colomar 2023-04-19 23:25 ` [PATCH v2 6/9] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: move in with regex.3 наб ` (3 subsequent siblings) 9 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-19 23:23 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 873 bytes --] Use "bitwise OR" instead of "bitwise-\fBor\fP". No other page spells it like this. The other weirdo contenders are $ git grep bitwise | grep RI man2/adjtimex.2:.RI bitwise- or man2/open.2:.RI bitwise- or 'd Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index b6e574b4d..fa2669544 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -56,7 +56,7 @@ pattern buffer. .PP .I cflags is the -.RB bitwise- or +bitwise OR of zero or more of the following: .TP .B REG_EXTENDED @@ -121,7 +121,7 @@ and are used to provide information regarding the location of any matches. .I eflags is the -.RB bitwise- or +bitwise OR of zero or more of the following flags: .TP .B REG_NOTBOL -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v2 5/9] regex.3: ffix 2023-04-19 23:23 ` [PATCH v2 5/9] regex.3: ffix наб @ 2023-04-20 11:28 ` Alejandro Colomar 2023-04-20 12:12 ` [PATCH v3 5/9] adjtimex.2, clone.2, mprotect.2, open.2, syscall.2, regex.3: ffix, wfix наб 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 11:28 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1157 bytes --] Hi! On 4/20/23 01:23, наб wrote: > Use "bitwise OR" instead of "bitwise-\fBor\fP". No other page spells it > like this. The other weirdo contenders are > $ git grep bitwise | grep RI > man2/adjtimex.2:.RI bitwise- or > man2/open.2:.RI bitwise- or 'd Please check also those, and maybe fix them in the same patch :) Cheers, Alex > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > man3/regex.3 | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index b6e574b4d..fa2669544 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -56,7 +56,7 @@ pattern buffer. > .PP > .I cflags > is the > -.RB bitwise- or > +bitwise OR > of zero or more of the following: > .TP > .B REG_EXTENDED > @@ -121,7 +121,7 @@ and > are used to provide information regarding the location of any matches. > .I eflags > is the > -.RB bitwise- or > +bitwise OR > of zero or more of the following flags: > .TP > .B REG_NOTBOL -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v3 5/9] adjtimex.2, clone.2, mprotect.2, open.2, syscall.2, regex.3: ffix, wfix 2023-04-20 11:28 ` Alejandro Colomar @ 2023-04-20 12:12 ` наб 2023-04-20 12:52 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-20 12:12 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 3090 bytes --] Use "bitwise OR" instead of "bitwise-or" (with fonts). No other pages spell it like this. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- Range-diff against v2: 1: 1ccffe37b < -: --------- regex.3: ffix -: --------- > 1: 830173bb5 adjtimex.2, clone.2, mprotect.2, open.2, syscall.2, regex.3: ffix, wfix idk if this did anything man2/adjtimex.2 | 2 +- man2/clone.2 | 2 +- man2/mprotect.2 | 2 +- man2/open.2 | 2 +- man2/syscall.2 | 2 +- man3/regex.3 | 4 ++-- 6 files changed, 7 insertions(+), 7 deletions(-) diff --git a/man2/adjtimex.2 b/man2/adjtimex.2 index 523347de2..40b05cb0e 100644 --- a/man2/adjtimex.2 +++ b/man2/adjtimex.2 @@ -90,7 +90,7 @@ the constants used for .BR ntp_adjtime () are equivalent but differently named.) It is a bit mask containing a -.RI bitwise- or +bitwise OR combination of zero or more of the following bits: .TP .B ADJ_OFFSET diff --git a/man2/clone.2 b/man2/clone.2 index 42ee3fee8..ec43841eb 100644 --- a/man2/clone.2 +++ b/man2/clone.2 @@ -413,7 +413,7 @@ mask in the remainder of this page. .PP The .I flags -mask is specified as a bitwise-OR of zero or more of +mask is specified as a bitwise OR of zero or more of the constants listed below. Except as noted below, these flags are available (and have the same effect) in both diff --git a/man2/mprotect.2 b/man2/mprotect.2 index 52c14da05..5a829dafe 100644 --- a/man2/mprotect.2 +++ b/man2/mprotect.2 @@ -43,7 +43,7 @@ signal for the process. .I prot is a combination of the following access flags: .B PROT_NONE -or a bitwise-or of the other values in the following list: +or a bitwise OR of the other values in the following list: .TP .B PROT_NONE The memory cannot be accessed at all. diff --git a/man2/open.2 b/man2/open.2 index 77c06b55d..b5aff887c 100644 --- a/man2/open.2 +++ b/man2/open.2 @@ -123,7 +123,7 @@ respectively. .PP In addition, zero or more file creation flags and file status flags can be -.RI bitwise- or 'd +bitwise ORed in .IR flags . The diff --git a/man2/syscall.2 b/man2/syscall.2 index 3eba62182..55233ac51 100644 --- a/man2/syscall.2 +++ b/man2/syscall.2 @@ -235,7 +235,7 @@ nuances: In order to indicate that a system call is called under the x32 ABI, an additional bit, .BR __X32_SYSCALL_BIT , -is bitwise-ORed with the system call number. +is bitwise ORed with the system call number. The ABI used by a process affects some process behaviors, including signal handling or system call restarting. .IP \[bu] diff --git a/man3/regex.3 b/man3/regex.3 index 3b504a4d5..3ee58f61d 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -56,7 +56,7 @@ pattern buffer. .PP .I cflags is the -.RB bitwise- or +bitwise OR of zero or more of the following: .TP .B REG_EXTENDED @@ -121,7 +121,7 @@ and are used to provide information regarding the location of any matches. .I eflags is the -.RB bitwise- or +bitwise OR of zero or more of the following flags: .TP .B REG_NOTBOL -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v3 5/9] adjtimex.2, clone.2, mprotect.2, open.2, syscall.2, regex.3: ffix, wfix 2023-04-20 12:12 ` [PATCH v3 5/9] adjtimex.2, clone.2, mprotect.2, open.2, syscall.2, regex.3: ffix, wfix наб @ 2023-04-20 12:52 ` Alejandro Colomar 2023-04-20 13:03 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 12:52 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 3739 bytes --] On 4/20/23 14:12, наб wrote: > Use "bitwise OR" instead of "bitwise-or" (with fonts). > No other pages spell it like this. > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Patch applied. Thanks. > --- > Range-diff against v2: > 1: 1ccffe37b < -: --------- regex.3: ffix > -: --------- > 1: 830173bb5 adjtimex.2, clone.2, mprotect.2, open.2, syscall.2, regex.3: ffix, wfix I rewrote the subject to: man*/: ffix, wfix > > idk if this did anything Heh, it didn't do much. What happened is that the patches are so different, that git thinks you just removed one patch, and wrote a different one from scratch. Anyway, I find it useful most of the time. Cheers, Alex > > man2/adjtimex.2 | 2 +- > man2/clone.2 | 2 +- > man2/mprotect.2 | 2 +- > man2/open.2 | 2 +- > man2/syscall.2 | 2 +- > man3/regex.3 | 4 ++-- > 6 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/man2/adjtimex.2 b/man2/adjtimex.2 > index 523347de2..40b05cb0e 100644 > --- a/man2/adjtimex.2 > +++ b/man2/adjtimex.2 > @@ -90,7 +90,7 @@ the constants used for > .BR ntp_adjtime () > are equivalent but differently named.) > It is a bit mask containing a > -.RI bitwise- or > +bitwise OR > combination of zero or more of the following bits: > .TP > .B ADJ_OFFSET > diff --git a/man2/clone.2 b/man2/clone.2 > index 42ee3fee8..ec43841eb 100644 > --- a/man2/clone.2 > +++ b/man2/clone.2 > @@ -413,7 +413,7 @@ mask in the remainder of this page. > .PP > The > .I flags > -mask is specified as a bitwise-OR of zero or more of > +mask is specified as a bitwise OR of zero or more of > the constants listed below. > Except as noted below, these flags are available > (and have the same effect) in both > diff --git a/man2/mprotect.2 b/man2/mprotect.2 > index 52c14da05..5a829dafe 100644 > --- a/man2/mprotect.2 > +++ b/man2/mprotect.2 > @@ -43,7 +43,7 @@ signal for the process. > .I prot > is a combination of the following access flags: > .B PROT_NONE > -or a bitwise-or of the other values in the following list: > +or a bitwise OR of the other values in the following list: > .TP > .B PROT_NONE > The memory cannot be accessed at all. > diff --git a/man2/open.2 b/man2/open.2 > index 77c06b55d..b5aff887c 100644 > --- a/man2/open.2 > +++ b/man2/open.2 > @@ -123,7 +123,7 @@ respectively. > .PP > In addition, zero or more file creation flags and file status flags > can be > -.RI bitwise- or 'd > +bitwise ORed > in > .IR flags . > The > diff --git a/man2/syscall.2 b/man2/syscall.2 > index 3eba62182..55233ac51 100644 > --- a/man2/syscall.2 > +++ b/man2/syscall.2 > @@ -235,7 +235,7 @@ nuances: > In order to indicate that a system call is called under the x32 ABI, > an additional bit, > .BR __X32_SYSCALL_BIT , > -is bitwise-ORed with the system call number. > +is bitwise ORed with the system call number. > The ABI used by a process affects some process behaviors, > including signal handling or system call restarting. > .IP \[bu] > diff --git a/man3/regex.3 b/man3/regex.3 > index 3b504a4d5..3ee58f61d 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -56,7 +56,7 @@ pattern buffer. > .PP > .I cflags > is the > -.RB bitwise- or > +bitwise OR > of zero or more of the following: > .TP > .B REG_EXTENDED > @@ -121,7 +121,7 @@ and > are used to provide information regarding the location of any matches. > .I eflags > is the > -.RB bitwise- or > +bitwise OR > of zero or more of the following flags: > .TP > .B REG_NOTBOL -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v3 5/9] adjtimex.2, clone.2, mprotect.2, open.2, syscall.2, regex.3: ffix, wfix 2023-04-20 12:52 ` Alejandro Colomar @ 2023-04-20 13:03 ` Alejandro Colomar 2023-04-20 14:13 ` наб 2023-04-20 18:42 ` G. Branden Robinson 0 siblings, 2 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 13:03 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 4553 bytes --] On 4/20/23 14:52, Alejandro Colomar wrote: > On 4/20/23 14:12, наб wrote: >> Use "bitwise OR" instead of "bitwise-or" (with fonts). >> No other pages spell it like this. >> >> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > > Patch applied. Thanks. > >> --- >> Range-diff against v2: >> 1: 1ccffe37b < -: --------- regex.3: ffix >> -: --------- > 1: 830173bb5 adjtimex.2, clone.2, mprotect.2, open.2, syscall.2, regex.3: ffix, wfix > > I rewrote the subject to: > > man*/: ffix, wfix > >> >> idk if this did anything > > Heh, it didn't do much. What happened is that the patches are so > different, that git thinks you just removed one patch, and wrote > a different one from scratch. Anyway, I find it useful most of > the time. > > Cheers, > Alex > >> >> man2/adjtimex.2 | 2 +- >> man2/clone.2 | 2 +- >> man2/mprotect.2 | 2 +- >> man2/open.2 | 2 +- >> man2/syscall.2 | 2 +- >> man3/regex.3 | 4 ++-- >> 6 files changed, 7 insertions(+), 7 deletions(-) >> >> diff --git a/man2/adjtimex.2 b/man2/adjtimex.2 >> index 523347de2..40b05cb0e 100644 >> --- a/man2/adjtimex.2 >> +++ b/man2/adjtimex.2 >> @@ -90,7 +90,7 @@ the constants used for BTW, another thing you might find useful is this: $ cat ~/.config/git/attributes *.[1-8]* diff=man And then in your .gitconfig: [diff "man"] xfuncname = "^\\.S[SH] .*$" You may want to use a regex that also works for mdoc(7). This produces the following hunks: @@ -90,7 +90,7 @@ .SH DESCRIPTION >> .BR ntp_adjtime () >> are equivalent but differently named.) >> It is a bit mask containing a >> -.RI bitwise- or >> +bitwise OR >> combination of zero or more of the following bits: >> .TP >> .B ADJ_OFFSET >> diff --git a/man2/clone.2 b/man2/clone.2 >> index 42ee3fee8..ec43841eb 100644 >> --- a/man2/clone.2 >> +++ b/man2/clone.2 >> @@ -413,7 +413,7 @@ mask in the remainder of this page. @@ -413,7 +413,7 @@ .SS The flags mask >> .PP >> The >> .I flags >> -mask is specified as a bitwise-OR of zero or more of >> +mask is specified as a bitwise OR of zero or more of >> the constants listed below. >> Except as noted below, these flags are available >> (and have the same effect) in both >> diff --git a/man2/mprotect.2 b/man2/mprotect.2 >> index 52c14da05..5a829dafe 100644 >> --- a/man2/mprotect.2 >> +++ b/man2/mprotect.2 >> @@ -43,7 +43,7 @@ signal for the process. @@ -43,7 +43,7 @@ .SH DESCRIPTION >> .I prot >> is a combination of the following access flags: >> .B PROT_NONE >> -or a bitwise-or of the other values in the following list: >> +or a bitwise OR of the other values in the following list: >> .TP >> .B PROT_NONE >> The memory cannot be accessed at all. >> diff --git a/man2/open.2 b/man2/open.2 >> index 77c06b55d..b5aff887c 100644 >> --- a/man2/open.2 >> +++ b/man2/open.2 >> @@ -123,7 +123,7 @@ respectively. @@ -123,7 +123,7 @@ .SH DESCRIPTION >> .PP >> In addition, zero or more file creation flags and file status flags >> can be >> -.RI bitwise- or 'd >> +bitwise ORed >> in >> .IR flags . >> The >> diff --git a/man2/syscall.2 b/man2/syscall.2 >> index 3eba62182..55233ac51 100644 >> --- a/man2/syscall.2 >> +++ b/man2/syscall.2 >> @@ -235,7 +235,7 @@ nuances: @@ -235,7 +235,7 @@ .SS Architecture calling conventions >> In order to indicate that a system call is called under the x32 ABI, >> an additional bit, >> .BR __X32_SYSCALL_BIT , >> -is bitwise-ORed with the system call number. >> +is bitwise ORed with the system call number. >> The ABI used by a process affects some process behaviors, >> including signal handling or system call restarting. >> .IP \[bu] >> diff --git a/man3/regex.3 b/man3/regex.3 >> index 3b504a4d5..3ee58f61d 100644 >> --- a/man3/regex.3 >> +++ b/man3/regex.3 >> @@ -56,7 +56,7 @@ pattern buffer. @@ -56,7 +56,7 @@ .SS POSIX regex compiling >> .PP >> .I cflags >> is the >> -.RB bitwise- or >> +bitwise OR >> of zero or more of the following: >> .TP >> .B REG_EXTENDED >> @@ -121,7 +121,7 @@ and @@ -121,7 +121,7 @@ .SS POSIX regex matching >> are used to provide information regarding the location of any matches. >> .I eflags >> is the >> -.RB bitwise- or >> +bitwise OR >> of zero or more of the following flags: >> .TP >> .B REG_NOTBOL > Cheers, Alex -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v3 5/9] adjtimex.2, clone.2, mprotect.2, open.2, syscall.2, regex.3: ffix, wfix 2023-04-20 13:03 ` Alejandro Colomar @ 2023-04-20 14:13 ` наб 2023-04-20 14:19 ` Alejandro Colomar 2023-04-20 18:42 ` G. Branden Robinson 1 sibling, 1 reply; 143+ messages in thread From: наб @ 2023-04-20 14:13 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 630 bytes --] Hi! On Thu, Apr 20, 2023 at 03:03:24PM +0200, Alejandro Colomar wrote: > >> diff --git a/man2/adjtimex.2 b/man2/adjtimex.2 > >> index 523347de2..40b05cb0e 100644 > >> --- a/man2/adjtimex.2 > >> +++ b/man2/adjtimex.2 > >> @@ -90,7 +90,7 @@ the constants used for > BTW, another thing you might find useful is this: > > $ cat ~/.config/git/attributes > *.[1-8]* diff=man > > And then in your .gitconfig: > > [diff "man"] > xfuncname = "^\\.S[SH] .*$" That's great tech, thanks. > You may want to use a regex that also works for mdoc(7). mdoc uses .Sh and .Ss, so: xfuncname = "^\\.S[SHsh] .*" Best, [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v3 5/9] adjtimex.2, clone.2, mprotect.2, open.2, syscall.2, regex.3: ffix, wfix 2023-04-20 14:13 ` наб @ 2023-04-20 14:19 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 14:19 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 982 bytes --] Hi! On 4/20/23 16:13, наб wrote: > Hi! > > On Thu, Apr 20, 2023 at 03:03:24PM +0200, Alejandro Colomar wrote: >>>> diff --git a/man2/adjtimex.2 b/man2/adjtimex.2 >>>> index 523347de2..40b05cb0e 100644 >>>> --- a/man2/adjtimex.2 >>>> +++ b/man2/adjtimex.2 >>>> @@ -90,7 +90,7 @@ the constants used for >> BTW, another thing you might find useful is this: >> >> $ cat ~/.config/git/attributes >> *.[1-8]* diff=man >> >> And then in your .gitconfig: >> >> [diff "man"] >> xfuncname = "^\\.S[SH] .*$" > That's great tech, thanks. > >> You may want to use a regex that also works for mdoc(7). > mdoc uses .Sh and .Ss, so: > xfuncname = "^\\.S[SHsh] .*" Thanks! I improved my config file :-) [1] Best, Alex [1]: <http://www.alejandro-colomar.es/src/alx/alx/config.git/commit/?id=4e772e3e3fe0785d773cf702b115dfc3d20d90d5> > > Best, -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v3 5/9] adjtimex.2, clone.2, mprotect.2, open.2, syscall.2, regex.3: ffix, wfix 2023-04-20 13:03 ` Alejandro Colomar 2023-04-20 14:13 ` наб @ 2023-04-20 18:42 ` G. Branden Robinson 2023-04-20 22:40 ` Alejandro Colomar 1 sibling, 1 reply; 143+ messages in thread From: G. Branden Robinson @ 2023-04-20 18:42 UTC (permalink / raw) To: Alejandro Colomar; +Cc: наб, linux-man [-- Attachment #1: Type: text/plain, Size: 1185 bytes --] At 2023-04-20T15:03:24+0200, Alejandro Colomar wrote: > BTW, another thing you might find useful is this: > > $ cat ~/.config/git/attributes > *.[1-8]* diff=man > > > And then in your .gitconfig: > > [diff "man"] > xfuncname = "^\\.S[SH] .*$" Nice trick! How on Earth have I been living without this? > You may want to use a regex that also works for mdoc(7). I reckon you could sweep up mdoc(7) pages as well with: xfuncname = "^\\.S[HShs] .*$" > >> .BR ntp_adjtime () > >> are equivalent but differently named.) > >> It is a bit mask containing a > >> -.RI bitwise- or > >> +bitwise OR > >> combination of zero or more of the following bits: Discussion of Boolean-algebraic operations is common enough among programmers that it might be a good idea to settle on a specific style recommendation for typesetting them. I think either quotation (e.g., \[lq]or\[rq]) or shouting capitals (OR) are tolerable, the latter only because the few operators commonly mentioned have very short names (you don't see EQUIVALENCE much). I would counsel against changing the type face for them (i.e., no bold, no italics). Regards, Branden [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v3 5/9] adjtimex.2, clone.2, mprotect.2, open.2, syscall.2, regex.3: ffix, wfix 2023-04-20 18:42 ` G. Branden Robinson @ 2023-04-20 22:40 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 22:40 UTC (permalink / raw) To: G. Branden Robinson; +Cc: наб, linux-man [-- Attachment #1.1: Type: text/plain, Size: 1868 bytes --] Hi Branden, On 4/20/23 20:42, G. Branden Robinson wrote: > At 2023-04-20T15:03:24+0200, Alejandro Colomar wrote: >> BTW, another thing you might find useful is this: >> >> $ cat ~/.config/git/attributes >> *.[1-8]* diff=man >> >> >> And then in your .gitconfig: >> >> [diff "man"] >> xfuncname = "^\\.S[SH] .*$" > > Nice trick! How on Earth have I been living without this? I don't remember how I found this obscure git(1) configuration. I think I was reviewing some patch at work and the hunk was complete garbage, and we pulled some threads... Itchy and Scratchy :) > >> You may want to use a regex that also works for mdoc(7). > > I reckon you could sweep up mdoc(7) pages as well with: > > xfuncname = "^\\.S[HShs] .*$" Already fixed(: <http://www.alejandro-colomar.es/src/alx/alx/config.git/commit/?id=4e772e3e3fe0785d773cf702b115dfc3d20d90d5> > >>>> .BR ntp_adjtime () >>>> are equivalent but differently named.) >>>> It is a bit mask containing a >>>> -.RI bitwise- or >>>> +bitwise OR >>>> combination of zero or more of the following bits: > > Discussion of Boolean-algebraic operations is common enough among > programmers that it might be a good idea to settle on a specific style > recommendation for typesetting them. > > I think either quotation (e.g., \[lq]or\[rq]) or shouting capitals (OR) > are tolerable, the latter only because the few operators commonly > mentioned have very short names (you don't see EQUIVALENCE much). I'm used to seeing uppercase OR, AND, NAND, XOR, and similar names in Electronics. I'll vote for that. > > I would counsel against changing the type face for them (i.e., no bold, > no italics). > > Regards, > Branden Cheers, Alex -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v2 6/9] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: move in with regex.3 2023-04-19 21:20 ` наб ` (5 preceding siblings ...) 2023-04-19 23:23 ` [PATCH v2 5/9] regex.3: ffix наб @ 2023-04-19 23:25 ` наб 2023-04-20 11:31 ` Alejandro Colomar 2023-04-19 23:25 ` [PATCH v2 7/9] regex.3: destandardeseify Byte offsets наб ` (2 subsequent siblings) 9 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-19 23:25 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 5738 bytes --] They're inextricably linked, not cross-referenced at all, and not used anywhere else. Now that they (realistically) exist to the reader, add a note on how big nmatch can be; POSIX even says "The application develope should note that there is probably no reason for using a value of nmatch that is larger than preg−>re_nsub+1.". Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 66 ++++++++++++++++++++++++++++----------- man3type/regex_t.3type | 64 +------------------------------------ man3type/regmatch_t.3type | 2 +- man3type/regoff_t.3type | 2 +- 4 files changed, 51 insertions(+), 83 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index fa2669544..b95b3c3b0 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -15,7 +15,7 @@ regcomp, regexec, regerror, regfree \- POSIX regex functions Standard C library .RI ( libc ", " \-lc ) .SH SYNOPSIS -.nf +.EX .B #include <regex.h> .PP .BI "int regcomp(regex_t *restrict " preg ", const char *restrict " regex , @@ -29,7 +29,21 @@ Standard C library .BI " char " errbuf "[restrict ." errbuf_size "], \ size_t " errbuf_size ); .BI "void regfree(regex_t *" preg ); -.fi +.PP +.B typedef struct { +.BR " size_t re_nsub;" " /* Number of parenthesized subexpressions */" +.B } regex_t; +.PP +.B typedef struct { +.BR " regoff_t rm_so;" " /* Byte offset from start of string" + to start of substring */ +.BR " regoff_t rm_eo;" " /* Byte offset from start of string to" + the first character after the end of + substring */ +.B } regmatch_t; +.PP +.BR typedef " /* ... */ " regoff_t; +.EE .SH DESCRIPTION .SS POSIX regex compiling .BR regcomp () @@ -54,6 +68,21 @@ must always be supplied with the address of a .BR regcomp ()-initialized pattern buffer. .PP +After +.BR regcomp () +succeeds, +.I preg->re_nsub +holds the number of subexpressions in +.IR regex . +Thus, a value of +.I preg->re_nsub ++ 1 +passed as +.I nmatch +to +.BR regexec () +is sufficient to capture all matches. +.PP .I cflags is the bitwise OR @@ -192,22 +221,6 @@ must be at least .IR N+1 .) Any unused structure elements will contain the value \-1. .PP -The -.I regmatch_t -structure which is the type of -.I pmatch -is defined in -.IR <regex.h> . -.PP -.in +4n -.EX -typedef struct { - regoff_t rm_so; - regoff_t rm_eo; -} regmatch_t; -.EE -.in -.PP Each .I rm_so element that is not \-1 indicates the start offset of the next largest @@ -216,6 +229,14 @@ The relative .I rm_eo element indicates the end offset of the match, which is the offset of the first character after the matching text. +.PP +.I regoff_t +is a signed integer type +capable of storing the largest value that can be stored in either an +.I ptrdiff_t +type or a +.I ssize_t +type. .SS POSIX error reporting .BR regerror () is used to turn the error codes that can be returned by both @@ -338,6 +359,15 @@ T} Thread safety MT-Safe POSIX.1-2008. .SH HISTORY POSIX.1-2001. +.PP +Prior to POSIX.1-2008, +.I regoff_t +was required to be +capable of storing the largest value that can be stored in either an +.I off_t +type or a +.I ssize_t +type. .SH EXAMPLES .EX #include <stdint.h> diff --git a/man3type/regex_t.3type b/man3type/regex_t.3type index 176d2c7a6..c0daaf0ff 100644 --- a/man3type/regex_t.3type +++ b/man3type/regex_t.3type @@ -1,63 +1 @@ -.\" Copyright (c) 2020-2022 by Alejandro Colomar <alx@kernel.org> -.\" and Copyright (c) 2020 by Michael Kerrisk <mtk.manpages@gmail.com> -.\" -.\" SPDX-License-Identifier: Linux-man-pages-copyleft -.\" -.\" -.TH regex_t 3type (date) "Linux man-pages (unreleased)" -.SH NAME -regex_t, regmatch_t, regoff_t -\- regular expression matching -.SH LIBRARY -Standard C library -.RI ( libc ) -.SH SYNOPSIS -.EX -.B #include <regex.h> -.PP -.B typedef struct { -.BR " size_t re_nsub;" " /* Number of parenthesized subexpressions */" -.B } regex_t; -.PP -.B typedef struct { -.BR " regoff_t rm_so;" " /* Byte offset from start of string" - to start of substring */ -.BR " regoff_t rm_eo;" " /* Byte offset from start of string to" - the first character after the end of - substring */ -.B } regmatch_t; -.PP -.BR typedef " /* ... */ " regoff_t; -.EE -.SH DESCRIPTION -.TP -.I regex_t -This is a structure type used in regular expression matching. -It holds a compiled regular expression, -compiled with -.BR regcomp (3). -.TP -.I regmatch_t -This is a structure type used in regular expression matching. -.TP -.I regoff_t -It is a signed integer type -capable of storing the largest value that can be stored in either an -.I ptrdiff_t -type or a -.I ssize_t -type. -.SH STANDARDS -POSIX.1-2008. -.SH HISTORY -POSIX.1-2001. -.PP -Prior to POSIX.1-2008, -the type was -capable of storing the largest value that can be stored in either an -.I off_t -type or a -.I ssize_t -type. -.SH SEE ALSO -.BR regex (3) +.so man3/regex.3 diff --git a/man3type/regmatch_t.3type b/man3type/regmatch_t.3type index dc78f2cf2..c0daaf0ff 100644 --- a/man3type/regmatch_t.3type +++ b/man3type/regmatch_t.3type @@ -1 +1 @@ -.so man3type/regex_t.3type +.so man3/regex.3 diff --git a/man3type/regoff_t.3type b/man3type/regoff_t.3type index dc78f2cf2..c0daaf0ff 100644 --- a/man3type/regoff_t.3type +++ b/man3type/regoff_t.3type @@ -1 +1 @@ -.so man3type/regex_t.3type +.so man3/regex.3 -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v2 6/9] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: move in with regex.3 2023-04-19 23:25 ` [PATCH v2 6/9] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: move in with regex.3 наб @ 2023-04-20 11:31 ` Alejandro Colomar 2023-04-20 13:02 ` [PATCH v4 1/6] regex.3: Fix subsection headings наб ` (5 more replies) 0 siblings, 6 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 11:31 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 6544 bytes --] Hi! On 4/20/23 01:25, наб wrote: > They're inextricably linked, not cross-referenced at all, > and not used anywhere else. > > Now that they (realistically) exist to the reader, add a note I prefer if the text movement is done in a separate commit that does the minimum, so that git(1) has it easier to follow the changes. Also, this is a big change. Could you please move it closer to the end of the patch set? Thanks, Alex > on how big nmatch can be; POSIX even says "The application develope > should note that there is probably no reason for using a value of > nmatch that is larger than preg−>re_nsub+1.". > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > man3/regex.3 | 66 ++++++++++++++++++++++++++++----------- > man3type/regex_t.3type | 64 +------------------------------------ > man3type/regmatch_t.3type | 2 +- > man3type/regoff_t.3type | 2 +- > 4 files changed, 51 insertions(+), 83 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index fa2669544..b95b3c3b0 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -15,7 +15,7 @@ regcomp, regexec, regerror, regfree \- POSIX regex functions > Standard C library > .RI ( libc ", " \-lc ) > .SH SYNOPSIS > -.nf > +.EX > .B #include <regex.h> > .PP > .BI "int regcomp(regex_t *restrict " preg ", const char *restrict " regex , > @@ -29,7 +29,21 @@ Standard C library > .BI " char " errbuf "[restrict ." errbuf_size "], \ > size_t " errbuf_size ); > .BI "void regfree(regex_t *" preg ); > -.fi > +.PP > +.B typedef struct { > +.BR " size_t re_nsub;" " /* Number of parenthesized subexpressions */" > +.B } regex_t; > +.PP > +.B typedef struct { > +.BR " regoff_t rm_so;" " /* Byte offset from start of string" > + to start of substring */ > +.BR " regoff_t rm_eo;" " /* Byte offset from start of string to" > + the first character after the end of > + substring */ > +.B } regmatch_t; > +.PP > +.BR typedef " /* ... */ " regoff_t; > +.EE > .SH DESCRIPTION > .SS POSIX regex compiling > .BR regcomp () > @@ -54,6 +68,21 @@ must always be supplied with the address of a > .BR regcomp ()-initialized > pattern buffer. > .PP > +After > +.BR regcomp () > +succeeds, > +.I preg->re_nsub > +holds the number of subexpressions in > +.IR regex . > +Thus, a value of > +.I preg->re_nsub > ++ 1 > +passed as > +.I nmatch > +to > +.BR regexec () > +is sufficient to capture all matches. > +.PP > .I cflags > is the > bitwise OR > @@ -192,22 +221,6 @@ must be at least > .IR N+1 .) > Any unused structure elements will contain the value \-1. > .PP > -The > -.I regmatch_t > -structure which is the type of > -.I pmatch > -is defined in > -.IR <regex.h> . > -.PP > -.in +4n > -.EX > -typedef struct { > - regoff_t rm_so; > - regoff_t rm_eo; > -} regmatch_t; > -.EE > -.in > -.PP > Each > .I rm_so > element that is not \-1 indicates the start offset of the next largest > @@ -216,6 +229,14 @@ The relative > .I rm_eo > element indicates the end offset of the match, > which is the offset of the first character after the matching text. > +.PP > +.I regoff_t > +is a signed integer type > +capable of storing the largest value that can be stored in either an > +.I ptrdiff_t > +type or a > +.I ssize_t > +type. > .SS POSIX error reporting > .BR regerror () > is used to turn the error codes that can be returned by both > @@ -338,6 +359,15 @@ T} Thread safety MT-Safe > POSIX.1-2008. > .SH HISTORY > POSIX.1-2001. > +.PP > +Prior to POSIX.1-2008, > +.I regoff_t > +was required to be > +capable of storing the largest value that can be stored in either an > +.I off_t > +type or a > +.I ssize_t > +type. > .SH EXAMPLES > .EX > #include <stdint.h> > diff --git a/man3type/regex_t.3type b/man3type/regex_t.3type > index 176d2c7a6..c0daaf0ff 100644 > --- a/man3type/regex_t.3type > +++ b/man3type/regex_t.3type > @@ -1,63 +1 @@ > -.\" Copyright (c) 2020-2022 by Alejandro Colomar <alx@kernel.org> > -.\" and Copyright (c) 2020 by Michael Kerrisk <mtk.manpages@gmail.com> > -.\" > -.\" SPDX-License-Identifier: Linux-man-pages-copyleft > -.\" > -.\" > -.TH regex_t 3type (date) "Linux man-pages (unreleased)" > -.SH NAME > -regex_t, regmatch_t, regoff_t > -\- regular expression matching > -.SH LIBRARY > -Standard C library > -.RI ( libc ) > -.SH SYNOPSIS > -.EX > -.B #include <regex.h> > -.PP > -.B typedef struct { > -.BR " size_t re_nsub;" " /* Number of parenthesized subexpressions */" > -.B } regex_t; > -.PP > -.B typedef struct { > -.BR " regoff_t rm_so;" " /* Byte offset from start of string" > - to start of substring */ > -.BR " regoff_t rm_eo;" " /* Byte offset from start of string to" > - the first character after the end of > - substring */ > -.B } regmatch_t; > -.PP > -.BR typedef " /* ... */ " regoff_t; > -.EE > -.SH DESCRIPTION > -.TP > -.I regex_t > -This is a structure type used in regular expression matching. > -It holds a compiled regular expression, > -compiled with > -.BR regcomp (3). > -.TP > -.I regmatch_t > -This is a structure type used in regular expression matching. > -.TP > -.I regoff_t > -It is a signed integer type > -capable of storing the largest value that can be stored in either an > -.I ptrdiff_t > -type or a > -.I ssize_t > -type. > -.SH STANDARDS > -POSIX.1-2008. > -.SH HISTORY > -POSIX.1-2001. > -.PP > -Prior to POSIX.1-2008, > -the type was > -capable of storing the largest value that can be stored in either an > -.I off_t > -type or a > -.I ssize_t > -type. > -.SH SEE ALSO > -.BR regex (3) > +.so man3/regex.3 > diff --git a/man3type/regmatch_t.3type b/man3type/regmatch_t.3type > index dc78f2cf2..c0daaf0ff 100644 > --- a/man3type/regmatch_t.3type > +++ b/man3type/regmatch_t.3type > @@ -1 +1 @@ > -.so man3type/regex_t.3type > +.so man3/regex.3 > diff --git a/man3type/regoff_t.3type b/man3type/regoff_t.3type > index dc78f2cf2..c0daaf0ff 100644 > --- a/man3type/regoff_t.3type > +++ b/man3type/regoff_t.3type > @@ -1 +1 @@ > -.so man3type/regex_t.3type > +.so man3/regex.3 -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v4 1/6] regex.3: Fix subsection headings 2023-04-20 11:31 ` Alejandro Colomar @ 2023-04-20 13:02 ` наб 2023-04-20 13:13 ` Alejandro Colomar 2023-04-20 15:35 ` [PATCH v5 0/8] regex.3 momento наб 2023-04-20 13:02 ` [PATCH v4 2/6] regex.3: Desoupify function descriptions наб ` (4 subsequent siblings) 5 siblings, 2 replies; 143+ messages in thread From: наб @ 2023-04-20 13:02 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1927 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- $ git diff v3 But the patches are re-ordered (and a new move-only one added); --range-diff, humorously, /only/ picks up that one, and doesn't understand the rest, which is worse than if it failed entirely. The 3type move is as far back as I could make it I think, 6/6 wants to come after regoff_t deduplication. man3/regex.3 | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 3ee58f61d..637cb2231 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -31,7 +31,7 @@ size_t " errbuf_size ); .BI "void regfree(regex_t *" preg ); .fi .SH DESCRIPTION -.SS POSIX regex compiling +.SS Compilation .BR regcomp () is used to compile a regular expression into a form that is suitable for subsequent @@ -110,7 +110,7 @@ whether .I eflags contains .BR REG_NOTEOL . -.SS POSIX regex matching +.SS Matching .BR regexec () is used to match a null-terminated string against the precompiled pattern buffer, @@ -159,7 +159,7 @@ or .B REG_NEWLINE processing. This flag is a BSD extension, not present in POSIX. -.SS Byte offsets +.SS Match offsets Unless .B REG_NOSUB was set for the compilation of the pattern buffer, it is possible to @@ -209,7 +209,7 @@ The relative .I rm_eo element indicates the end offset of the match, which is the offset of the first character after the matching text. -.SS POSIX error reporting +.SS Error reporting .BR regerror () is used to turn the error codes that can be returned by both .BR regcomp () @@ -238,7 +238,7 @@ are nonzero, is filled in with the first .I "errbuf_size \- 1" characters of the error message and a terminating null byte (\[aq]\e0\[aq]). -.SS POSIX pattern buffer freeing +.SS Freeing Supplying .BR regfree () with a precompiled pattern buffer, -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v4 1/6] regex.3: Fix subsection headings 2023-04-20 13:02 ` [PATCH v4 1/6] regex.3: Fix subsection headings наб @ 2023-04-20 13:13 ` Alejandro Colomar 2023-04-20 13:24 ` наб 2023-04-20 15:35 ` [PATCH v5 0/8] regex.3 momento наб 1 sibling, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 13:13 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 2294 bytes --] On 4/20/23 15:02, наб wrote: > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > $ git diff v3 > > But the patches are re-ordered (and a new move-only one added); > --range-diff, humorously, /only/ picks up that one, and doesn't > understand the rest, which is worse than if it failed entirely. > > The 3type move is as far back as I could make it I think, > 6/6 wants to come after regoff_t deduplication. > > man3/regex.3 | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index 3ee58f61d..637cb2231 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -31,7 +31,7 @@ size_t " errbuf_size ); > .BI "void regfree(regex_t *" preg ); > .fi > .SH DESCRIPTION > -.SS POSIX regex compiling > +.SS Compilation > .BR regcomp () > is used to compile a regular expression into a form that is suitable > for subsequent > @@ -110,7 +110,7 @@ whether > .I eflags > contains > .BR REG_NOTEOL . > -.SS POSIX regex matching > +.SS Matching > .BR regexec () > is used to match a null-terminated string > against the precompiled pattern buffer, > @@ -159,7 +159,7 @@ or > .B REG_NEWLINE > processing. > This flag is a BSD extension, not present in POSIX. > -.SS Byte offsets > +.SS Match offsets I think it might be a bit clearer as "Subexpression match offsets" or something like that? What do you think? > Unless > .B REG_NOSUB > was set for the compilation of the pattern buffer, it is possible to > @@ -209,7 +209,7 @@ The relative > .I rm_eo > element indicates the end offset of the match, > which is the offset of the first character after the matching text. > -.SS POSIX error reporting > +.SS Error reporting > .BR regerror () > is used to turn the error codes that can be returned by both > .BR regcomp () > @@ -238,7 +238,7 @@ are nonzero, > is filled in with the first > .I "errbuf_size \- 1" > characters of the error message and a terminating null byte (\[aq]\e0\[aq]). > -.SS POSIX pattern buffer freeing > +.SS Freeing > Supplying > .BR regfree () > with a precompiled pattern buffer, -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v4 1/6] regex.3: Fix subsection headings 2023-04-20 13:13 ` Alejandro Colomar @ 2023-04-20 13:24 ` наб 2023-04-20 13:35 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-20 13:24 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 752 bytes --] Hi! On Thu, Apr 20, 2023 at 03:13:54PM +0200, Alejandro Colomar wrote: > On 4/20/23 15:02, наб wrote: > > @@ -159,7 +159,7 @@ or > > .B REG_NEWLINE > > processing. > > This flag is a BSD extension, not present in POSIX. > > -.SS Byte offsets > > +.SS Match offsets > I think it might be a bit clearer as "Subexpression match offsets" or > something like that? What do you think? Nah; in a significant amount of scenarios you don't care about subexpressions at all, and the one thing you're guaranteed to get is, well, the non-subexpression match. Saying "Subexpression match offsets" to mean "Match offsets, including of subexpressions" is more confusing, and which offsets are returned is explained in running text. Best, [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v4 1/6] regex.3: Fix subsection headings 2023-04-20 13:24 ` наб @ 2023-04-20 13:35 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 13:35 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1392 bytes --] Hi! On 4/20/23 15:24, наб wrote: > Hi! > > On Thu, Apr 20, 2023 at 03:13:54PM +0200, Alejandro Colomar wrote: >> On 4/20/23 15:02, наб wrote: >>> @@ -159,7 +159,7 @@ or >>> .B REG_NEWLINE >>> processing. >>> This flag is a BSD extension, not present in POSIX. >>> -.SS Byte offsets >>> +.SS Match offsets >> I think it might be a bit clearer as "Subexpression match offsets" or >> something like that? What do you think? > Nah; in a significant amount of scenarios you don't care about > subexpressions at all, and the one thing you're guaranteed to get is, > well, the non-subexpression match. > Saying "Subexpression match offsets" to mean "Match offsets, including > of subexpressions" is more confusing, and which offsets are returned is > explained in running text. Ahh, sorry; I was myself confused. I thought the section was only about subexpressions, which is why I found confusing that the title was not more explicit. Being about the main match + subexp, your title is better. I'll apply this patch in a moment, after I push my SYNOPSIS patch, based on your 2/6, since I found there are 2 places where _Nullable should go, not one. Best, Alex > > Best, P.S.: That comma without continuation feels very awkward to me :) -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v5 0/8] regex.3 momento 2023-04-20 13:02 ` [PATCH v4 1/6] regex.3: Fix subsection headings наб 2023-04-20 13:13 ` Alejandro Colomar @ 2023-04-20 15:35 ` наб 2023-04-20 15:35 ` [PATCH v5 1/8] regex.3: Desoupify regcomp() description наб ` (8 more replies) 1 sibling, 9 replies; 143+ messages in thread From: наб @ 2023-04-20 15:35 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2674 bytes --] The range diff was again soup, I think there's something in the interdiff tho. 8/8 may be clearer, may be not. наб (8): regex.3: Desoupify regcomp() description regex.3: Desoupify regexec() description regex.3: Desoupify regerror() description regex.3: Improve REG_STARTEND regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 regex.3: Finalise move of reg*.3type regex.3: Destandardeseify Match offsets regex.3: Further clarify the sole purpose of REG_NOSUB man3/regex.3 | 250 +++++++++++++++++++++----------------- man3type/regex_t.3type | 64 +--------- man3type/regmatch_t.3type | 2 +- man3type/regoff_t.3type | 2 +- 4 files changed, 143 insertions(+), 175 deletions(-) Interdiff against v4: diff --git a/man3/regex.3 b/man3/regex.3 index 552763940..66d9c6596 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -52,7 +52,7 @@ .SS Compilation .BR regexec () searches. .PP -The pattern buffer at +On success, the pattern buffer at .I *preg is initialized. .I regex @@ -96,16 +96,14 @@ .SS Compilation searches using this pattern buffer will be case insensitive. .TP .B REG_NOSUB -Do not report position of matches. -The -.I nmatch -and -.I pmatch +Only report overall success: .BR regexec () -arguments will be ignored for this purpose (but +will only use .I pmatch -may still be used for -.BR REG_STARTEND ). +for +.BR REG_STARTEND , +and ignore +.IR nmatch . .TP .B REG_NEWLINE Match-any-character operators don't match a newline. @@ -161,7 +159,7 @@ .SS Matching .TP .B REG_STARTEND Match -.RI [ string " + " pmatch->rm_so ", " string " + " pmatch->rm_eo ) +.RI [ string " + " pmatch[0].rm_so ", " string " + " pmatch[0].rm_eo ) instead of .RI [ string ", " string " + \fBstrlen\fP(" string )). This allows matching embedded NUL bytes @@ -183,7 +181,7 @@ .SS Matching remain relative to .IR string (not -.IR string " + " pmatch->rm_so ). +.IR string " + " pmatch[0].rm_so ). This flag is a BSD extension, not present in POSIX. .SS Match offsets Unless @@ -349,6 +347,20 @@ .SH HISTORY type or a .I ssize_t type. +.SH NOTES +.I re_nsub +is only required to be initialized if +.B REG_NOSUB +wasn't specified, but all known implementations initialize it regardless. +.\" glibc, musl, 4.4BSD, illumos +.PP +Both +.I regex_t +and +.I regmatch_t +may (and do) have more members, in any order. +Always reference them by name. +.\" illumos has two more start/end pairs and the first one is of pointers .SH EXAMPLES .EX #include <stdint.h> -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* [PATCH v5 1/8] regex.3: Desoupify regcomp() description 2023-04-20 15:35 ` [PATCH v5 0/8] regex.3 momento наб @ 2023-04-20 15:35 ` наб 2023-04-20 16:37 ` Alejandro Colomar 2023-04-20 15:35 ` [PATCH v5 2/8] regex.3: Desoupify regexec() description наб ` (7 subsequent siblings) 8 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-20 15:35 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1532 bytes --] Behold: regerror() is passed the error code, errcode, the pattern buffer, preg, a pointer to a character string buffer, errbuf, and the size of the string buffer, errbuf_size. Absolute soup. This reads to me like an ill-conceived copy from a very early standard version. It looks fine in source form but is horrific to read as running text. Instead, replace all of these with just the descriptions of what they do with their arguments. What the arguments are is very clearly noted in big bold in the prototypes. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 129c42412..2f6ee816f 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -38,21 +38,13 @@ .SS Compilation .BR regexec () searches. .PP -.BR regcomp () -is supplied with -.IR preg , -a pointer to a pattern buffer storage area; -.IR regex , -a pointer to the null-terminated string and -.IR cflags , -flags used to determine the type of compilation. -.PP -All regular expression searching must be done via a compiled pattern -buffer, thus -.BR regexec () -must always be supplied with the address of a -.BR regcomp ()-initialized -pattern buffer. +On success, the pattern buffer at +.I *preg +is initialized. +.I regex +is a null-terminated string. +The locale must be the same when running +.BR regexec (). .PP .I cflags is the -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v5 1/8] regex.3: Desoupify regcomp() description 2023-04-20 15:35 ` [PATCH v5 1/8] regex.3: Desoupify regcomp() description наб @ 2023-04-20 16:37 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 16:37 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 2147 bytes --] Hi наб! On 4/20/23 17:35, наб wrote: > Behold: > regerror() is passed the error code, errcode, the pattern buffer, > preg, a pointer to a character string buffer, errbuf, and the size > of the string buffer, errbuf_size. > > Absolute soup. This reads to me like an ill-conceived copy from a very Single space after period is evil. I'd like to point you to this rant o'mine where I give more details, to not repeat myself too much: <https://lore.kernel.org/linux-man/9c5c5744-dde0-b333-09e0-ba9d92aa96b1@gmail.com/T/#mb4eb99c9bccb59c6df82c1f6945766c878d85f07> I've cleaned up those crimes before applying, and then applied this patch. :) Cheers, Alex > early standard version. It looks fine in source form but is horrific to > read as running text. > > Instead, replace all of these with just the descriptions of what they do > with their arguments. What the arguments are is very clearly noted in > big bold in the prototypes. > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > man3/regex.3 | 22 +++++++--------------- > 1 file changed, 7 insertions(+), 15 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index 129c42412..2f6ee816f 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -38,21 +38,13 @@ .SS Compilation > .BR regexec () > searches. > .PP > -.BR regcomp () > -is supplied with > -.IR preg , > -a pointer to a pattern buffer storage area; > -.IR regex , > -a pointer to the null-terminated string and > -.IR cflags , > -flags used to determine the type of compilation. > -.PP > -All regular expression searching must be done via a compiled pattern > -buffer, thus > -.BR regexec () > -must always be supplied with the address of a > -.BR regcomp ()-initialized > -pattern buffer. > +On success, the pattern buffer at > +.I *preg > +is initialized. > +.I regex > +is a null-terminated string. > +The locale must be the same when running > +.BR regexec (). > .PP > .I cflags > is the -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v5 2/8] regex.3: Desoupify regexec() description 2023-04-20 15:35 ` [PATCH v5 0/8] regex.3 momento наб 2023-04-20 15:35 ` [PATCH v5 1/8] regex.3: Desoupify regcomp() description наб @ 2023-04-20 15:35 ` наб 2023-04-20 15:35 ` [PATCH v5 3/8] regex.3: Desoupify regerror() description наб ` (6 subsequent siblings) 8 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 15:35 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 713 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 2f6ee816f..ae160c9b3 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -105,12 +105,10 @@ .SS Compilation .SS Matching .BR regexec () is used to match a null-terminated string -against the precompiled pattern buffer, -.IR preg . -.I nmatch -and -.I pmatch -are used to provide information regarding the location of any matches. +against the compiled pattern buffer in +.IR *preg , +which must have been initialised with +.BR regexec (). .I eflags is the bitwise OR -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* [PATCH v5 3/8] regex.3: Desoupify regerror() description 2023-04-20 15:35 ` [PATCH v5 0/8] regex.3 momento наб 2023-04-20 15:35 ` [PATCH v5 1/8] regex.3: Desoupify regcomp() description наб 2023-04-20 15:35 ` [PATCH v5 2/8] regex.3: Desoupify regexec() description наб @ 2023-04-20 15:35 ` наб 2023-04-20 16:42 ` Alejandro Colomar ` (2 more replies) 2023-04-20 15:35 ` [PATCH v5 4/8] regex.3: Improve REG_STARTEND наб ` (5 subsequent siblings) 8 siblings, 3 replies; 143+ messages in thread From: наб @ 2023-04-20 15:35 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1981 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 46 ++++++++++++++++++++-------------------------- 1 file changed, 20 insertions(+), 26 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index ae160c9b3..c5185549b 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -26,7 +26,7 @@ .SH SYNOPSIS .BI " int " eflags ); .PP .BI "size_t regerror(int " errcode ", const regex_t *_Nullable restrict " preg , -.BI " char " errbuf "[restrict ." errbuf_size "], \ +.BI " char " errbuf "[restrict ." errbuf_size "], \ size_t " errbuf_size ); .BI "void regfree(regex_t *" preg ); .fi @@ -207,34 +207,28 @@ .SS Error reporting .BR regexec () into error message strings. .PP -.BR regerror () -is passed the error code, -.IR errcode , -the pattern buffer, -.IR preg , -a pointer to a character string buffer, -.IR errbuf , -and the size of the string buffer, -.IR errbuf_size . -It returns the size of the -.I errbuf -required to contain the null-terminated error message string. -If both -.I errbuf -and +.I errcode +must be the latest error returned from an operation on +.IR preg . +If +.I preg +is a null pointer\(emthe latest error. +.PP +If .I errbuf_size -are nonzero, -.I errbuf -is filled in with the first -.I "errbuf_size \- 1" -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). +is +.BR 0 , +the size of the required buffer is returned. +Otherwise, up to +.I errbuf_size +bytes are copied to +.IR errbuf ; +the error string is always null-terminated, and truncated to fit. .SS Freeing -Supplying .BR regfree () -with a precompiled pattern buffer, -.IR preg , -will free the memory allocated to the pattern buffer by the compiling -process, +invalidates the pattern buffer at +.IR *preg , +which must have been initialized via .BR regcomp (). .SH RETURN VALUE .BR regcomp () -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v5 3/8] regex.3: Desoupify regerror() description 2023-04-20 15:35 ` [PATCH v5 3/8] regex.3: Desoupify regerror() description наб @ 2023-04-20 16:42 ` Alejandro Colomar 2023-04-20 18:50 ` наб 2023-04-20 16:50 ` Alejandro Colomar 2023-04-20 17:23 ` Alejandro Colomar 2 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 16:42 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 4153 bytes --] On 4/20/23 17:35, наб wrote: > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > man3/regex.3 | 46 ++++++++++++++++++++-------------------------- > 1 file changed, 20 insertions(+), 26 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index ae160c9b3..c5185549b 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -26,7 +26,7 @@ .SH SYNOPSIS > .BI " int " eflags ); > .PP > .BI "size_t regerror(int " errcode ", const regex_t *_Nullable restrict " preg , > -.BI " char " errbuf "[restrict ." errbuf_size "], \ > +.BI " char " errbuf "[restrict ." errbuf_size "], \ See man-pages(7): FORMATTING AND WORDING CONVENTIONS The following subsections note some details for preferred formatting and wording conventions in various sections of the pages in the man‐ pages project. SYNOPSIS [...] In the SYNOPSIS, a long function prototype may need to be continued over to the next line. The continuation line is indented according to the following rules: (1) If there is a single such prototype that needs to be continued, then align the continuation line so that when the page is rendered on a fixed‐width font device (e.g., on an xterm) the continuation line starts just below the start of the argument list in the line above. (Exception: the indentation may be adjusted if necessary to prevent a very long continuation line or a further continuation line where the function prototype is very long.) As an example: int tcsetattr(int fd, int optional_actions, const struct termios *termios_p); (2) But, where multiple functions in the SYNOPSIS require continuation lines, and the function names have different lengths, then align all continuation lines to start in the same column. This provides a nicer rendering in PDF output (because the SYNOPSIS uses a vari‐ able width font where spaces render narrower than most charac‐ ters). As an example: int getopt(int argc, char * const argv[], const char *optstring); int getopt_long(int argc, char * const argv[], const char *optstring, const struct option *longopts, int *longindex); > size_t " errbuf_size ); > .BI "void regfree(regex_t *" preg ); > .fi > @@ -207,34 +207,28 @@ .SS Error reporting > .BR regexec () > into error message strings. > .PP > -.BR regerror () > -is passed the error code, > -.IR errcode , > -the pattern buffer, > -.IR preg , > -a pointer to a character string buffer, > -.IR errbuf , > -and the size of the string buffer, > -.IR errbuf_size . > -It returns the size of the > -.I errbuf > -required to contain the null-terminated error message string. > -If both > -.I errbuf > -and > +.I errcode > +must be the latest error returned from an operation on > +.IR preg . > +If > +.I preg > +is a null pointer\(emthe latest error. > +.PP > +If > .I errbuf_size > -are nonzero, > -.I errbuf > -is filled in with the first > -.I "errbuf_size \- 1" > -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). > +is > +.BR 0 , > +the size of the required buffer is returned. > +Otherwise, up to > +.I errbuf_size > +bytes are copied to > +.IR errbuf ; > +the error string is always null-terminated, and truncated to fit. > .SS Freeing > -Supplying > .BR regfree () > -with a precompiled pattern buffer, > -.IR preg , > -will free the memory allocated to the pattern buffer by the compiling > -process, > +invalidates the pattern buffer at > +.IR *preg , > +which must have been initialized via > .BR regcomp (). > .SH RETURN VALUE > .BR regcomp () -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v5 3/8] regex.3: Desoupify regerror() description 2023-04-20 16:42 ` Alejandro Colomar @ 2023-04-20 18:50 ` наб 0 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 18:50 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 715 bytes --] On Thu, Apr 20, 2023 at 06:42:55PM +0200, Alejandro Colomar wrote: > On 4/20/23 17:35, наб wrote: > > --- a/man3/regex.3 > > +++ b/man3/regex.3 > > @@ -26,7 +26,7 @@ .SH SYNOPSIS > > .BI " int " eflags ); > > .PP > > .BI "size_t regerror(int " errcode ", const regex_t *_Nullable restrict " preg , > > -.BI " char " errbuf "[restrict ." errbuf_size "], \ > > +.BI " char " errbuf "[restrict ." errbuf_size "], \ > See man-pages(7): I didn't even notice it was matching regexec()/regcomp() since they're in a separate paragraph, it just looks like a formatting error (and makes it so multiple functions aren't as well-delineated as they could be), but sure. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v5 3/8] regex.3: Desoupify regerror() description 2023-04-20 15:35 ` [PATCH v5 3/8] regex.3: Desoupify regerror() description наб 2023-04-20 16:42 ` Alejandro Colomar @ 2023-04-20 16:50 ` Alejandro Colomar 2023-04-20 17:23 ` Alejandro Colomar 2 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 16:50 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 562 bytes --] On 4/20/23 17:35, наб wrote: > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- [...] > -If both > -.I errbuf > -and [...] > .I errbuf_size > -are nonzero, Now that I read this, it seems we should add _Nullable to errbuf too. I'll do that. > -.I errbuf > -is filled in with the first > -.I "errbuf_size \- 1" > -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v5 3/8] regex.3: Desoupify regerror() description 2023-04-20 15:35 ` [PATCH v5 3/8] regex.3: Desoupify regerror() description наб 2023-04-20 16:42 ` Alejandro Colomar 2023-04-20 16:50 ` Alejandro Colomar @ 2023-04-20 17:23 ` Alejandro Colomar 2023-04-20 18:46 ` наб 2 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 17:23 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 839 bytes --] On 4/20/23 17:35, наб wrote: > +.I errcode > +must be the latest error returned from an operation on > +.IR preg . > +If > +.I preg > +is a null pointer\(emthe latest error. I don't read that from the POSIX spec. If preg is NULL, then I think any error returned by a call to one of these APIs would be valid. In fact, since these functions are MT-Safe, they can't store any state, which leads me to think that they can't really distinguish between the latest error, and an error returned at a random point in the past, or even the result of csrand_interval(x, y)[1] with appropriate x and y. [1]: <https://github.com/shadow-maint/shadow/blob/c80788a3ac092bc5abfa89ff48060d3f95cd5812/libmisc/csrand.c#L93> -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v5 3/8] regex.3: Desoupify regerror() description 2023-04-20 17:23 ` Alejandro Colomar @ 2023-04-20 18:46 ` наб 2023-04-20 22:45 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-20 18:46 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2622 bytes --] Hi! On Thu, Apr 20, 2023 at 07:23:39PM +0200, Alejandro Colomar wrote: > On 4/20/23 17:35, наб wrote: > > +.I errcode > > +must be the latest error returned from an operation on > > +.IR preg . > > +If > > +.I preg > > +is a null pointer\(emthe latest error. > I don't read that from the POSIX spec. Whereas that's precisely where I got it from. > If preg is NULL, then I think any > error returned by a call to one of these APIs would be valid. That's unspecified. > In fact, > since these functions are MT-Safe, they can't store any state, Probably. OTOH, musl raw-dogs mbtowc() in regexec(), so. (I'm pretty sure it's by accident since they do have a mbstate_t and juggle it a lot, but it's never actually used.) > which leads > me to think that they can't really distinguish between the latest error, > and an error returned at a random point in the past, or even the result of > csrand_interval(x, y)[1] with appropriate x and y. Again, probably. But (line numbers from Issue 8 Draft 2.1): 57517 The regerror( ) function provides a mapping from error codes returned by regcomp( ) and 57518 regexec( ) to unspecified printable strings. It generates a string corresponding to the value of the 57519 errcode argument, which the application shall ensure is the last non-zero value returned by 57520 regcomp( ) or regexec( ) with the given value of preg. If errcode is not such a value, the content of 57521 the generated string is unspecified. 57522 If preg is a null pointer, but errcode is a value returned by a previous call to regexec( ) or regcomp( ), 57523 the regerror( ) still generates an error string corresponding to the value of errcode, but it might not 57524 be as detailed under some implementations. 57525 If the errbuf_size argument is not 0, regerror( ) shall place the generated string into the buffer of 57526 size errbuf_size bytes pointed to by errbuf. If the string (including the terminating null) cannot fit 57527 in the buffer, regerror( ) shall truncate the string and null-terminate the result. 57528 If errbuf_size is 0, regerror( ) shall ignore the errbuf argument, and return the size of the buffer 57529 needed to hold the generated string. In these difficult times I tend to turn to what implementations do: NetBSD, musl, illumos, and glibc, if you subtract REG_ATOI and REG_ITOA, all essentially return lsearch(errors, errcode)->description + all sans NetBSD localise it. None of them even use preg. So yeah, I'll axe that. And split out regfree() from this patch because I missed it. Best, наб [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v5 3/8] regex.3: Desoupify regerror() description 2023-04-20 18:46 ` наб @ 2023-04-20 22:45 ` Alejandro Colomar 2023-04-20 23:05 ` наб 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 22:45 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 3461 bytes --] Hi, On 4/20/23 20:46, наб wrote: > Hi! > > On Thu, Apr 20, 2023 at 07:23:39PM +0200, Alejandro Colomar wrote: >> On 4/20/23 17:35, наб wrote: >>> +.I errcode >>> +must be the latest error returned from an operation on >>> +.IR preg . >>> +If >>> +.I preg >>> +is a null pointer\(emthe latest error. >> I don't read that from the POSIX spec. > Whereas that's precisely where I got it from. Here's the quote I think is the most relevant (you also quoted it below): If preg is a null pointer, but errcode is a value returned by a previous call to regexec() or regcomp(), the regerror() still generates an error string corresponding to the value of er‐ rcode, but it might not be as detailed under some implementa‐ tions. > >> If preg is NULL, then I think any >> error returned by a call to one of these APIs would be valid. > That's unspecified. I don't think so. POSIX says a "previous call". It doesn't say the "latest" or "immediately preceeding" or similar wording. Don't you understand the same from that paragraph? > >> In fact, >> since these functions are MT-Safe, they can't store any state, > Probably. OTOH, musl raw-dogs mbtowc() in regexec(), so. > (I'm pretty sure it's by accident since they do have a mbstate_t > and juggle it a lot, but it's never actually used.) > >> which leads >> me to think that they can't really distinguish between the latest error, >> and an error returned at a random point in the past, or even the result of >> csrand_interval(x, y)[1] with appropriate x and y. > Again, probably. But (line numbers from Issue 8 Draft 2.1): > 57517 The regerror( ) function provides a mapping from error codes returned by regcomp( ) and > 57518 regexec( ) to unspecified printable strings. It generates a string corresponding to the value of the > 57519 errcode argument, which the application shall ensure is the last non-zero value returned by > 57520 regcomp( ) or regexec( ) with the given value of preg. If errcode is not such a value, the content of > 57521 the generated string is unspecified. > > 57522 If preg is a null pointer, but errcode is a value returned by a previous call to regexec( ) or regcomp( ), > 57523 the regerror( ) still generates an error string corresponding to the value of errcode, but it might not > 57524 be as detailed under some implementations. > > 57525 If the errbuf_size argument is not 0, regerror( ) shall place the generated string into the buffer of > 57526 size errbuf_size bytes pointed to by errbuf. If the string (including the terminating null) cannot fit > 57527 in the buffer, regerror( ) shall truncate the string and null-terminate the result. > > 57528 If errbuf_size is 0, regerror( ) shall ignore the errbuf argument, and return the size of the buffer > 57529 needed to hold the generated string. > > In these difficult times I tend to turn to what implementations do: > NetBSD, musl, illumos, and glibc, if you subtract REG_ATOI and REG_ITOA, > all essentially return lsearch(errors, errcode)->description > + all sans NetBSD localise it. > None of them even use preg. > > So yeah, I'll axe that. > > > And split out regfree() from this patch because I missed it. Thanks, Alex > > > Best, > наб -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v5 3/8] regex.3: Desoupify regerror() description 2023-04-20 22:45 ` Alejandro Colomar @ 2023-04-20 23:05 ` наб 0 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 23:05 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 878 bytes --] Hi! On Fri, Apr 21, 2023 at 12:45:16AM +0200, Alejandro Colomar wrote: > On 4/20/23 20:46, наб wrote: > > On Thu, Apr 20, 2023 at 07:23:39PM +0200, Alejandro Colomar wrote: > >> If preg is NULL, then I think any > >> error returned by a call to one of these APIs would be valid. > > That's unspecified. > I don't think so. POSIX says a "previous call". It doesn't say the > "latest" or "immediately preceeding" or similar wording. Don't you > understand the same from that paragraph? I read "a previous" as a shorthand for "the last non-zero value returned by regcomp( ) or regexec( )" from above originally; but yeah, now that you mention it, "just any returned error" is a valid read. I think just "must be latest if preg passed" is what ended up in v6 on the grounds of realism; if it's also the recise letter of POSIX then all the better. Best, [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v5 4/8] regex.3: Improve REG_STARTEND 2023-04-20 15:35 ` [PATCH v5 0/8] regex.3 momento наб ` (2 preceding siblings ...) 2023-04-20 15:35 ` [PATCH v5 3/8] regex.3: Desoupify regerror() description наб @ 2023-04-20 15:35 ` наб 2023-04-20 17:29 ` Alejandro Colomar 2023-04-20 15:36 ` [PATCH v5 5/8] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб ` (4 subsequent siblings) 8 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-20 15:35 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1700 bytes --] Explicitly spell out the ranges involved. The original wording always confused me, but it's actually very sane. Remove "this doesn't change R_NOTBOL & R_NEWLINE" ‒ so does it change R_NOTEOL? No. That's weird and confusing. String largeness doesn't matter, known-lengthness does. Explicitly spell out the influence on returned matches (relative to string, not start of range). Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index c5185549b..1ce0a3b7e 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -131,23 +131,30 @@ .SS Matching above). .TP .B REG_STARTEND -Use -.I pmatch[0] -on the input string, starting at byte -.I pmatch[0].rm_so -and ending before byte -.IR pmatch[0].rm_eo . +Match +.RI [ string " + " pmatch[0].rm_so ", " string " + " pmatch[0].rm_eo ) +instead of +.RI [ string ", " string " + \fBstrlen\fP(" string )). This allows matching embedded NUL bytes and avoids a .BR strlen (3) -on large strings. -It does not use +on known-length strings. +.I pmatch +must point to a valid readable object. +If any matches are returned +.RB ( REG_NOSUB +wasn't passed to +.BR regcomp (), +the match succeeded, and .I nmatch -on input, and does not change -.B REG_NOTBOL -or -.B REG_NEWLINE -processing. +> 0), they overwrite +.I pmatch +as usual, and the +.B Match offsets +remain relative to +.IR string +(not +.IR string " + " pmatch[0].rm_so ). This flag is a BSD extension, not present in POSIX. .SS Match offsets Unless -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v5 4/8] regex.3: Improve REG_STARTEND 2023-04-20 15:35 ` [PATCH v5 4/8] regex.3: Improve REG_STARTEND наб @ 2023-04-20 17:29 ` Alejandro Colomar 2023-04-20 19:30 ` наб 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 17:29 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 2263 bytes --] On 4/20/23 17:35, наб wrote: > Explicitly spell out the ranges involved. The original wording always > confused me, but it's actually very sane. > > Remove "this doesn't change R_NOTBOL & R_NEWLINE" ‒ so does it change > R_NOTEOL? No. That's weird and confusing. > > String largeness doesn't matter, known-lengthness does. > > Explicitly spell out the influence on returned matches > (relative to string, not start of range). > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > man3/regex.3 | 33 ++++++++++++++++++++------------- > 1 file changed, 20 insertions(+), 13 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index c5185549b..1ce0a3b7e 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -131,23 +131,30 @@ .SS Matching > above). > .TP > .B REG_STARTEND > -Use > -.I pmatch[0] > -on the input string, starting at byte > -.I pmatch[0].rm_so > -and ending before byte > -.IR pmatch[0].rm_eo . > +Match > +.RI [ string " + " pmatch[0].rm_so ", " string " + " pmatch[0].rm_eo ) > +instead of > +.RI [ string ", " string " + \fBstrlen\fP(" string )). > This allows matching embedded NUL bytes > and avoids a > .BR strlen (3) > -on large strings. > -It does not use > +on known-length strings. > +.I pmatch > +must point to a valid readable object. I think this is redundant, since we showed that [0] is accessed by the function. > +If any matches are returned > +.RB ( REG_NOSUB > +wasn't passed to > +.BR regcomp (), > +the match succeeded, and > .I nmatch > -on input, and does not change > -.B REG_NOTBOL > -or > -.B REG_NEWLINE > -processing. > +> 0), they overwrite And of course, nmatch must be at least 1, since otherwise, [0] was not valid, and the whole call would have been UB; right? So that third condition must be true to not invoke UB, so we can omit it too, I think. > +.I pmatch > +as usual, and the > +.B Match offsets > +remain relative to > +.IR string > +(not > +.IR string " + " pmatch[0].rm_so ). > This flag is a BSD extension, not present in POSIX. > .SS Match offsets > Unless -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v5 4/8] regex.3: Improve REG_STARTEND 2023-04-20 17:29 ` Alejandro Colomar @ 2023-04-20 19:30 ` наб 2023-04-20 19:33 ` наб 2023-04-20 23:01 ` Alejandro Colomar 0 siblings, 2 replies; 143+ messages in thread From: наб @ 2023-04-20 19:30 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1925 bytes --] On Thu, Apr 20, 2023 at 07:29:27PM +0200, Alejandro Colomar wrote: > On 4/20/23 17:35, наб wrote: > > --- a/man3/regex.3 > > +++ b/man3/regex.3 > > @@ -131,23 +131,30 @@ .SS Matching > > above). > > .TP > > .B REG_STARTEND > > -Use > > -.I pmatch[0] > > -on the input string, starting at byte > > -.I pmatch[0].rm_so > > -and ending before byte > > -.IR pmatch[0].rm_eo . > > +Match > > +.RI [ string " + " pmatch[0].rm_so ", " string " + " pmatch[0].rm_eo ) > > +instead of > > +.RI [ string ", " string " + \fBstrlen\fP(" string )). > > This allows matching embedded NUL bytes > > and avoids a > > .BR strlen (3) > > -on large strings. > > -It does not use > > +on known-length strings. > > +.I pmatch > > +must point to a valid readable object. > I think this is redundant, since we showed that [0] is accessed by > the function. Yeah. > > +If any matches are returned > > +.RB ( REG_NOSUB > > +wasn't passed to > > +.BR regcomp (), > > +the match succeeded, and > > .I nmatch > > -on input, and does not change > > -.B REG_NOTBOL > > -or > > -.B REG_NEWLINE > > -processing. > > +> 0), they overwrite > And of course, nmatch must be at least 1, since otherwise, [0] was > not valid, and the whole call would have been UB; right? So that > third condition must be true to not invoke UB, so we can omit it too, > I think. What? idk where you got this from. Per 0d120a3c76b4446b194a54387ce0e7a84b208bfd: In the regexec() signature regmatch_t pmatch[restrict .nmatch], is a simplification. It's actually regmatch_t pmatch[restrict ((.preg->flags & REG_NOSUB) ? 0 : .nmatch) ?: !!(.eflags & REG_STARTEND)], If REG_STARTEND, pmatch must point to a valid readable object. (Naturally, if you pass in uninitialised memory or a null pointer, then you get UB.) nmatch is not consulted and has no bearing on this. Best, [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v5 4/8] regex.3: Improve REG_STARTEND 2023-04-20 19:30 ` наб @ 2023-04-20 19:33 ` наб 2023-04-20 23:01 ` Alejandro Colomar 1 sibling, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 19:33 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 557 bytes --] On Thu, Apr 20, 2023 at 09:30:06PM +0200, наб wrote: > If REG_STARTEND, pmatch must point to a valid readable object. > (Naturally, if you pass in uninitialised memory or a null pointer, > then you get UB.) > nmatch is not consulted and has no bearing on this. This is all to say: regexec(®, "str", 0, &rm, REG_STARTEND); is valid, looks in ["str"+rm.so, "str"+rm.eo), and doesn't change rm, whereas regexec(®, "str", 1, &rm, REG_STARTEND); is valid, looks in ["str"+rm.so, "str"+rm.eo), and will update rm with the match, if any. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v5 4/8] regex.3: Improve REG_STARTEND 2023-04-20 19:30 ` наб 2023-04-20 19:33 ` наб @ 2023-04-20 23:01 ` Alejandro Colomar 2023-04-21 0:13 ` наб 1 sibling, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 23:01 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 2801 bytes --] Hi наб, On 4/20/23 21:30, наб wrote: > On Thu, Apr 20, 2023 at 07:29:27PM +0200, Alejandro Colomar wrote: >> On 4/20/23 17:35, наб wrote: >>> --- a/man3/regex.3 >>> +++ b/man3/regex.3 >>> @@ -131,23 +131,30 @@ .SS Matching >>> above). >>> .TP >>> .B REG_STARTEND >>> -Use >>> -.I pmatch[0] >>> -on the input string, starting at byte >>> -.I pmatch[0].rm_so >>> -and ending before byte >>> -.IR pmatch[0].rm_eo . >>> +Match >>> +.RI [ string " + " pmatch[0].rm_so ", " string " + " pmatch[0].rm_eo ) >>> +instead of >>> +.RI [ string ", " string " + \fBstrlen\fP(" string )). >>> This allows matching embedded NUL bytes >>> and avoids a >>> .BR strlen (3) >>> -on large strings. >>> -It does not use >>> +on known-length strings. >>> +.I pmatch >>> +must point to a valid readable object. >> I think this is redundant, since we showed that [0] is accessed by >> the function. > Yeah. > >>> +If any matches are returned >>> +.RB ( REG_NOSUB >>> +wasn't passed to >>> +.BR regcomp (), >>> +the match succeeded, and >>> .I nmatch >>> -on input, and does not change >>> -.B REG_NOTBOL >>> -or >>> -.B REG_NEWLINE >>> -processing. >>> +> 0), they overwrite >> And of course, nmatch must be at least 1, since otherwise, [0] was >> not valid, and the whole call would have been UB; right? So that >> third condition must be true to not invoke UB, so we can omit it too, >> I think. > What? idk where you got this from. > Per 0d120a3c76b4446b194a54387ce0e7a84b208bfd: > In the regexec() signature > regmatch_t pmatch[restrict .nmatch], > is a simplification. It's actually > regmatch_t pmatch[restrict > ((.preg->flags & REG_NOSUB) ? 0 : .nmatch) ?: > !!(.eflags & REG_STARTEND)], That is a model that was useful in a commit message to describe more or less what happens. It doesn't need to perfectly describe reality. Since REG_STARTEND is not in POSIX, we can't read what POSIX says, so it's all up to how much implementations want to guarantee. I don't think glibc would like to allow specifying .nmatch as 0 while the function accesses [0]. The fact that the current implementation doesn't open Hell's doors to nasal demons doesn't mean it can't do so in the future. I conceive that _FORTIFY_SOURCE could reasonably check that pmatch[] has at least .nmemb elements, and I don't want to preclude that in the documentation. Cheers, Alex > > If REG_STARTEND, pmatch must point to a valid readable object. > (Naturally, if you pass in uninitialised memory or a null pointer, > then you get UB.) > nmatch is not consulted and has no bearing on this. > > Best, -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v5 4/8] regex.3: Improve REG_STARTEND 2023-04-20 23:01 ` Alejandro Colomar @ 2023-04-21 0:13 ` наб 0 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-21 0:13 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 13218 bytes --] On Fri, Apr 21, 2023 at 01:01:11AM +0200, Alejandro Colomar wrote: > On 4/20/23 21:30, наб wrote: > > On Thu, Apr 20, 2023 at 07:29:27PM +0200, Alejandro Colomar wrote: > >> On 4/20/23 17:35, наб wrote: > >>> --- a/man3/regex.3 > >>> +++ b/man3/regex.3 > >>> @@ -131,23 +131,30 @@ .SS Matching > >>> above). > >>> .TP > >>> .B REG_STARTEND > >>> -Use > >>> -.I pmatch[0] > >>> -on the input string, starting at byte > >>> -.I pmatch[0].rm_so > >>> -and ending before byte > >>> -.IR pmatch[0].rm_eo . > >>> +Match > >>> +.RI [ string " + " pmatch[0].rm_so ", " string " + " pmatch[0].rm_eo ) > >>> +instead of > >>> +.RI [ string ", " string " + \fBstrlen\fP(" string )). > >>> This allows matching embedded NUL bytes > >>> and avoids a > >>> .BR strlen (3) > >>> -on large strings. > >>> -It does not use > >>> +on known-length strings. > >>> +.I pmatch > >>> +must point to a valid readable object. > >> I think this is redundant, since we showed that [0] is accessed by > >> the function. > > Yeah. > > > >>> +If any matches are returned > >>> +.RB ( REG_NOSUB > >>> +wasn't passed to > >>> +.BR regcomp (), > >>> +the match succeeded, and > >>> .I nmatch > >>> -on input, and does not change > >>> -.B REG_NOTBOL > >>> -or > >>> -.B REG_NEWLINE > >>> -processing. > >>> +> 0), they overwrite > >> And of course, nmatch must be at least 1, since otherwise, [0] was > >> not valid, and the whole call would have been UB; right? So that > >> third condition must be true to not invoke UB, so we can omit it too, > >> I think. > > What? idk where you got this from. > > Per 0d120a3c76b4446b194a54387ce0e7a84b208bfd: > > In the regexec() signature > > regmatch_t pmatch[restrict .nmatch], > > is a simplification. It's actually > > regmatch_t pmatch[restrict > > ((.preg->flags & REG_NOSUB) ? 0 : .nmatch) ?: > > !!(.eflags & REG_STARTEND)], > That is a model that was useful in a commit message to describe more > or less what happens. It doesn't need to perfectly describe reality. > Since REG_STARTEND is not in POSIX, we can't read what POSIX says, > so it's all up to how much implementations want to guarantee. I > don't think glibc would like to allow specifying .nmatch as 0 while > the function accesses [0]. The fact that the current implementation > doesn't open Hell's doors to nasal demons doesn't mean it can't do > so in the future. I conceive that _FORTIFY_SOURCE could reasonably > check that pmatch[] has at least .nmemb elements, and I don't want > to preclude that in the documentation. What? I don't get this. Who cares what POSIX says about this 4.4BSD extension? This interface has been unchanged for over 30 years; 4.4BSD-Lite, /usr/src/lib/libc/regex/regexec.c: * @(#)regexec.c 8.1 (Berkeley) 6/4/93 int /* 0 success, REG_NOMATCH failure */ regexec(preg, string, nmatch, pmatch, eflags) const regex_t *preg; const char *string; size_t nmatch; regmatch_t pmatch[]; int eflags; { register struct re_guts *g = preg->re_g; #ifdef REDEBUG # define GOODFLAGS(f) (f) #else # define GOODFLAGS(f) ((f)&(REG_NOTBOL|REG_NOTEOL|REG_STARTEND)) #endif if (preg->re_magic != MAGIC1 || g->magic != MAGIC2) return(REG_BADPAT); assert(!(g->iflags&BAD)); if (g->iflags&BAD) /* backstop for no-debug case */ return(REG_BADPAT); if (eflags != GOODFLAGS(eflags)) return(REG_INVARG); if (g->nstates <= CHAR_BIT*sizeof(states1) && !(eflags®_LARGE)) return(smatcher(g, (char *)string, nmatch, pmatch, eflags)); else return(lmatcher(g, (char *)string, nmatch, pmatch, eflags)); } 4.4BSD-Lite, /usr/src/lib/libc/regex/engine.c: * @(#)engine.c 8.1 (Berkeley) 6/4/93 /* * The matching engine and friends. This file is #included by regexec.c * after suitable #defines of a variety of macros used herein, so that * different state representations can be used without duplicating masses * of code. */ #ifdef SNAMES #define matcher smatcher #ifdef LNAMES #define matcher lmatcher /* - matcher - the actual matching engine == static int matcher(register struct re_guts *g, char *string, \ == size_t nmatch, regmatch_t pmatch[], int eflags); */ static int /* 0 success, REG_NOMATCH failure */ matcher(g, string, nmatch, pmatch, eflags) register struct re_guts *g; char *string; size_t nmatch; regmatch_t pmatch[]; int eflags; { register char *endp; register int i; struct match mv; register struct match *m = &mv; register char *dp; const register sopno gf = g->firststate+1; /* +1 for OEND */ const register sopno gl = g->laststate; char *start; char *stop; /* simplify the situation where possible */ if (g->cflags®_NOSUB) nmatch = 0; if (eflags®_STARTEND) { start = string + pmatch[0].rm_so; stop = string + pmatch[0].rm_eo; } else { start = string; stop = start + strlen(start); } if (stop < start) return(REG_INVARG); (rest of matcher) /* fill in the details if requested */ if (nmatch > 0) { pmatch[0].rm_so = m->coldp - m->offp; pmatch[0].rm_eo = endp - m->offp; } if (nmatch > 1) { assert(m->pmatch != NULL); for (i = 1; i < nmatch; i++) if (i <= m->g->nsub) pmatch[i] = m->pmatch[i]; else { pmatch[i].rm_so = -1; pmatch[i].rm_eo = -1; } } That's what the interface /is/ (also, I was guessing last time from behaviour and wrote the exact same pseudocode; fun). And, tell you what, musl also does if(REG_NOSUB) nmatch = 0; so does the illumos gate; glibc does int regexec (const regex_t *__restrict preg, const char *__restrict string, size_t nmatch, regmatch_t pmatch[_REGEX_NELTS (nmatch)], int eflags) { reg_errcode_t err; Idx start, length; re_dfa_t *dfa = preg->buffer; if (eflags & ~(REG_NOTBOL | REG_NOTEOL | REG_STARTEND)) return REG_BADPAT; if (eflags & REG_STARTEND) { start = pmatch[0].rm_so; length = pmatch[0].rm_eo; } else { start = 0; length = strlen (string); } lock_lock (dfa->lock); if (preg->no_sub) err = re_search_internal (preg, string, length, start, length, length, 0, NULL, eflags); else err = re_search_internal (preg, string, length, start, length, length, nmatch, pmatch, eflags); lock_unlock (dfa->lock); return err != REG_NOERROR; } i.e. it sets nmatch to 0 if REG_NOSUB, but later. None of them do if (eflags & REG_STARTEND && !nmatch) ... what now? return an error? for the sole purpose of... providing an interface that's broken? nmatch is the amount of matches you care about getting back, and nothing more. If anything, the POSIX header is (Issue 8 Draft 2.1): 11030 The following shall be declared as functions and may also be defined as macros. Function 11031 prototypes shall be provided. 11032 int regcomp(regex_t *restrict, const char *restrict, int); 11033 size_t regerror(int, const regex_t *restrict, char *restrict, size_t); 11034 int regexec(const regex_t *restrict, const char *restrict, size_t, 11035 regmatch_t [restrict], int); 11036 void regfree(regex_t *); So you've overconstrained the interface for simplicity, and now you're treating the simplification as a ground truth of..? And glibc, if anything, would love for you to specify the start and end bounds with REG_STARTEND while also passing nmatch = 0, because it additionally optimises for that case (&& no backrefs). /And also/, 6.7.6.2 Array declarators says: Constraints 1 In addition to optional type qualifiers and the keyword static, the [ and ] may delimit an expression or *. If they delimit an expression (which specifies the size of an array), the expression shallhave an integer type. If the expression is a constant expression, it shall have a value greater thanzero. The element type shall not be an incomplete or function type. The optional type qualifiers and the keyword static shall appear only in a declaration of a function parameter with an array type, and then only in the outer most array type derivation. 2 If an identifier is declared as having a variably modified type, it shall be an ordinary identifier (as defined in 6.2.3), have no linkage, and have either block scope or function prototype scope. If an identifier is declared to be an object with static or thread storage duration, it shall not have a variable length array type. Semantics 3 If, in the declaration "T D1", D1 has one of the forms: D [ type-qualifier-list(opt) assignment-expression(opt) ] attribute-specifier-sequence(opt) D [ static type-qualifier-list(opt) assignment-expression ] attribute-specifier-sequence(opt) D [ type-qualifier-list static assignment-expression ] attribute-specifier-sequence(opt) D [ type-qualifier-list(opt) * ] attribute-specifier-sequence(opt) and the type specified for /ident/ in the declaration "T D" is "derived-declarator-type-list T", then the type specified for /ident/ is "derived-declarator-type-list array of T".172)173) The optional attribute specifiersequence appertains to the array. (See 6.7.6.3 for the meaning of the optional type qualifiers and the keyword static.) Where 6.7.6.3 Function declarators says: 6 A declaration of a parameter as "array of /type/" shall be adjusted to "qualified pointer to /type/", wherethe /type/ qualifiers (if any) are those specified within the [ and ] of the array type derivation. If the keyword static also appears within the [ and ] of the array type derivation, then for each call to the function, the value of the corresponding actual argument shall provide access to the first element of an array with at least as many elements as specified by the size expression. So even /if/ the declaration was int regexec(const regex_t *restrict, const char *restrict, size_t nmatch, regmatch_t pmatch[restrict static nmatch], int); /which it isn't/, not even in glibc (#define _REGEX_NELTS(n) n, or to empty depending on the environment, which means it's a regular variably-modified array type, which means nothing when used in a function prototype), it would /still/ be legal to do any of the below: regexec(regp, "", 0, NULL, 0); regmatch_t rm; regexec(regp, "", 0, &rm, 0); regexec(regp, "", 1, &rm, 0); regmatch_t rms[999]; for(int i = 0; i < 999; ++i) regexec(regp, "", i, rms, 0); More to the point, perhaps, 6.7.6.3 continues: 20 EXAMPLE 5 The following are all compatible function prototype declarators. double maximum(int n, int m, double a[n][m]); double maximum(int n, int m, double a[*][*]); double maximum(int n, int m, double a[ ][*]); double maximum(int n, int m, double a[ ][m]); as are: void f(double (*restrict a)[5]); void f(double a[restrict][5]); void f(double a[restrict 3][5]); void f(double a[restrict static 3][5]); (Note that the last declaration also specifies that the argument corresponding toain any call tofcan be expected to be a non-null pointer to the first of at least three arrays of 5 doubles, which the others do not.) Which the others do not. Well, it's not to the point since there's no static and there'll never be static, but maybe it drives home that unless whatever's inside [] is "restrict" or "static {expr}", it's purely decorative. And even with static, you can always give it more objects. This is like saying that char *strncpy(char dst[restrict .sz], const char *restrict src, size_t sz); makes char dst[256 + 1] strncpy(dst, "whatever", 256); illegal. (There's also a forward-reffed stanza at 6.9.1.10, but I'm pretty sure it only applies to multi-dimensional VLAs.) Best, наб [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v5 5/8] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 2023-04-20 15:35 ` [PATCH v5 0/8] regex.3 momento наб ` (3 preceding siblings ...) 2023-04-20 15:35 ` [PATCH v5 4/8] regex.3: Improve REG_STARTEND наб @ 2023-04-20 15:36 ` наб 2023-04-20 15:36 ` [PATCH v5 6/8] regex.3: Finalise move of reg*.3type наб ` (3 subsequent siblings) 8 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 15:36 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 4247 bytes --] Move-only commit. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 30 ++++++++++++++++++ man3type/regex_t.3type | 64 +-------------------------------------- man3type/regmatch_t.3type | 2 +- man3type/regoff_t.3type | 2 +- 4 files changed, 33 insertions(+), 65 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 1ce0a3b7e..897a622d4 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -29,6 +29,20 @@ .SH SYNOPSIS .BI " char " errbuf "[restrict ." errbuf_size "], \ size_t " errbuf_size ); .BI "void regfree(regex_t *" preg ); +.PP +.B typedef struct { +.BR " size_t re_nsub;" " /* Number of parenthesized subexpressions */" +.B } regex_t; +.PP +.B typedef struct { +.BR " regoff_t rm_so;" " /* Byte offset from start of string" + to start of substring */ +.BR " regoff_t rm_eo;" " /* Byte offset from start of string to" + the first character after the end of + substring */ +.B } regmatch_t; +.PP +.BR typedef " /* ... */ " regoff_t; .fi .SH DESCRIPTION .SS Compilation @@ -206,6 +220,14 @@ .SS Match offsets .I rm_eo element indicates the end offset of the match, which is the offset of the first character after the matching text. +.PP +.I regoff_t +It is a signed integer type +capable of storing the largest value that can be stored in either an +.I ptrdiff_t +type or a +.I ssize_t +type. .SS Error reporting .BR regerror () is used to turn the error codes that can be returned by both @@ -322,6 +344,14 @@ .SH STANDARDS POSIX.1-2008. .SH HISTORY POSIX.1-2001. +.PP +Prior to POSIX.1-2008, +the type was +capable of storing the largest value that can be stored in either an +.I off_t +type or a +.I ssize_t +type. .SH EXAMPLES .EX #include <stdint.h> diff --git a/man3type/regex_t.3type b/man3type/regex_t.3type index 176d2c7a6..c0daaf0ff 100644 --- a/man3type/regex_t.3type +++ b/man3type/regex_t.3type @@ -1,63 +1 @@ -.\" Copyright (c) 2020-2022 by Alejandro Colomar <alx@kernel.org> -.\" and Copyright (c) 2020 by Michael Kerrisk <mtk.manpages@gmail.com> -.\" -.\" SPDX-License-Identifier: Linux-man-pages-copyleft -.\" -.\" -.TH regex_t 3type (date) "Linux man-pages (unreleased)" -.SH NAME -regex_t, regmatch_t, regoff_t -\- regular expression matching -.SH LIBRARY -Standard C library -.RI ( libc ) -.SH SYNOPSIS -.EX -.B #include <regex.h> -.PP -.B typedef struct { -.BR " size_t re_nsub;" " /* Number of parenthesized subexpressions */" -.B } regex_t; -.PP -.B typedef struct { -.BR " regoff_t rm_so;" " /* Byte offset from start of string" - to start of substring */ -.BR " regoff_t rm_eo;" " /* Byte offset from start of string to" - the first character after the end of - substring */ -.B } regmatch_t; -.PP -.BR typedef " /* ... */ " regoff_t; -.EE -.SH DESCRIPTION -.TP -.I regex_t -This is a structure type used in regular expression matching. -It holds a compiled regular expression, -compiled with -.BR regcomp (3). -.TP -.I regmatch_t -This is a structure type used in regular expression matching. -.TP -.I regoff_t -It is a signed integer type -capable of storing the largest value that can be stored in either an -.I ptrdiff_t -type or a -.I ssize_t -type. -.SH STANDARDS -POSIX.1-2008. -.SH HISTORY -POSIX.1-2001. -.PP -Prior to POSIX.1-2008, -the type was -capable of storing the largest value that can be stored in either an -.I off_t -type or a -.I ssize_t -type. -.SH SEE ALSO -.BR regex (3) +.so man3/regex.3 diff --git a/man3type/regmatch_t.3type b/man3type/regmatch_t.3type index dc78f2cf2..c0daaf0ff 100644 --- a/man3type/regmatch_t.3type +++ b/man3type/regmatch_t.3type @@ -1 +1 @@ -.so man3type/regex_t.3type +.so man3/regex.3 diff --git a/man3type/regoff_t.3type b/man3type/regoff_t.3type index dc78f2cf2..c0daaf0ff 100644 --- a/man3type/regoff_t.3type +++ b/man3type/regoff_t.3type @@ -1 +1 @@ -.so man3type/regex_t.3type +.so man3/regex.3 -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* [PATCH v5 6/8] regex.3: Finalise move of reg*.3type 2023-04-20 15:35 ` [PATCH v5 0/8] regex.3 momento наб ` (4 preceding siblings ...) 2023-04-20 15:36 ` [PATCH v5 5/8] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб @ 2023-04-20 15:36 ` наб 2023-04-20 15:36 ` [PATCH v5 7/8] regex.3: Destandardeseify Match offsets наб ` (2 subsequent siblings) 8 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 15:36 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2891 bytes --] They're inextricably linked, not cross-referenced at all, and not used anywhere else. Now that they (realistically) exist to the reader, add a note on how big nmatch can be; POSIX even says "The application developer should note that there is probably no reason for using a value of nmatch that is larger than preg−>re_nsub+1.". Also remove the now-duplicate regmatch_t declaration. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 54 +++++++++++++++++++++++++++++++++------------------- 1 file changed, 34 insertions(+), 20 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 897a622d4..75c810c41 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -15,7 +15,7 @@ .SH LIBRARY Standard C library .RI ( libc ", " \-lc ) .SH SYNOPSIS -.nf +.EX .B #include <regex.h> .PP .BI "int regcomp(regex_t *restrict " preg ", const char *restrict " regex , @@ -43,7 +43,7 @@ .SH SYNOPSIS .B } regmatch_t; .PP .BR typedef " /* ... */ " regoff_t; -.fi +.EE .SH DESCRIPTION .SS Compilation .BR regcomp () @@ -60,6 +60,21 @@ .SS Compilation The locale must be the same when running .BR regexec (). .PP +After +.BR regcomp () +succeeds, +.I preg->re_nsub +holds the number of subexpressions in +.IR regex . +Thus, a value of +.I preg->re_nsub ++ 1 +passed as +.I nmatch +to +.BR regexec () +is sufficient to capture all matches. +.PP .I cflags is the bitwise OR @@ -196,22 +211,6 @@ .SS Match offsets .IR N+1 .) Any unused structure elements will contain the value \-1. .PP -The -.I regmatch_t -structure which is the type of -.I pmatch -is defined in -.IR <regex.h> . -.PP -.in +4n -.EX -typedef struct { - regoff_t rm_so; - regoff_t rm_eo; -} regmatch_t; -.EE -.in -.PP Each .I rm_so element that is not \-1 indicates the start offset of the next largest @@ -222,7 +221,7 @@ .SS Match offsets which is the offset of the first character after the matching text. .PP .I regoff_t -It is a signed integer type +is a signed integer type capable of storing the largest value that can be stored in either an .I ptrdiff_t type or a @@ -346,12 +345,27 @@ .SH HISTORY POSIX.1-2001. .PP Prior to POSIX.1-2008, -the type was +.I regoff_t +was required to be capable of storing the largest value that can be stored in either an .I off_t type or a .I ssize_t type. +.SH NOTES +.I re_nsub +is only required to be initialized if +.B REG_NOSUB +wasn't specified, but all known implementations initialize it regardless. +.\" glibc, musl, 4.4BSD, illumos +.PP +Both +.I regex_t +and +.I regmatch_t +may (and do) have more members, in any order. +Always reference them by name. +.\" illumos has two more start/end pairs and the first one is of pointers .SH EXAMPLES .EX #include <stdint.h> -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* [PATCH v5 7/8] regex.3: Destandardeseify Match offsets 2023-04-20 15:35 ` [PATCH v5 0/8] regex.3 momento наб ` (5 preceding siblings ...) 2023-04-20 15:36 ` [PATCH v5 6/8] regex.3: Finalise move of reg*.3type наб @ 2023-04-20 15:36 ` наб 2023-04-20 15:36 ` [PATCH v5 8/8] regex.3: Further clarify the sole purpose of REG_NOSUB наб 2023-04-20 19:36 ` [PATCH v6 0/8] regex.3 momento наб 8 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 15:36 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2194 bytes --] This section reads like it were (and pretty much is) lifted from POSIX. That's hard to read, because POSIX is horrendously verbose, as usual. Instead, synopsise it into something less formal but more reasonable, and describe the resulting range with a range instead of a paragraph. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 53 +++++++++++++++++++++++++--------------------------- 1 file changed, 25 insertions(+), 28 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 75c810c41..ca0ab83df 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -188,37 +188,34 @@ .SS Matching .SS Match offsets Unless .B REG_NOSUB -was set for the compilation of the pattern buffer, it is possible to -obtain match addressing information. -.I pmatch -must be dimensioned to have at least -.I nmatch -elements. -These are filled in by +was passed to +.BR regcomp (), +it is possible to +obtain the locations of matches within +.IR string : .BR regexec () -with substring match addresses. -The offsets of the subexpression starting at the -.IR i th -open parenthesis are stored in -.IR pmatch[i] . -The entire regular expression's match addresses are stored in -.IR pmatch[0] . -(Note that to return the offsets of -.I N -subexpression matches, +fills .I nmatch -must be at least -.IR N+1 .) -Any unused structure elements will contain the value \-1. +elements of +.I pmatch +with results: +.I pmatch[0] +corresponds to the entire match, +.I pmatch[1] +to the first expression, etc. +If there were more matches than +.IR nmatch , +they are discarded; +if fewer, +unused elements of +.I pmatch +are filled with +.BR \-1 s. .PP -Each -.I rm_so -element that is not \-1 indicates the start offset of the next largest -substring match within the string. -The relative -.I rm_eo -element indicates the end offset of the match, -which is the offset of the first character after the matching text. +Each returned valid +.RB (non- \-1 ) +match corresponds to the range +.RI [ string " + " rm_so ", " string " + " rm_eo ). .PP .I regoff_t is a signed integer type -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* [PATCH v5 8/8] regex.3: Further clarify the sole purpose of REG_NOSUB 2023-04-20 15:35 ` [PATCH v5 0/8] regex.3 momento наб ` (6 preceding siblings ...) 2023-04-20 15:36 ` [PATCH v5 7/8] regex.3: Destandardeseify Match offsets наб @ 2023-04-20 15:36 ` наб 2023-04-20 19:36 ` [PATCH v6 0/8] regex.3 momento наб 8 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 15:36 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 794 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index ca0ab83df..66d9c6596 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -96,16 +96,14 @@ .SS Compilation searches using this pattern buffer will be case insensitive. .TP .B REG_NOSUB -Do not report position of matches. -The -.I nmatch -and -.I pmatch +Only report overall success: .BR regexec () -arguments will be ignored for this purpose (but +will only use .I pmatch -may still be used for -.BR REG_STARTEND ). +for +.BR REG_STARTEND , +and ignore +.IR nmatch . .TP .B REG_NEWLINE Match-any-character operators don't match a newline. -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* [PATCH v6 0/8] regex.3 momento 2023-04-20 15:35 ` [PATCH v5 0/8] regex.3 momento наб ` (7 preceding siblings ...) 2023-04-20 15:36 ` [PATCH v5 8/8] regex.3: Further clarify the sole purpose of REG_NOSUB наб @ 2023-04-20 19:36 ` наб 2023-04-20 19:36 ` [PATCH v6 1/8] regex.3: Desoupify regexec() description наб ` (8 more replies) 8 siblings, 9 replies; 143+ messages in thread From: наб @ 2023-04-20 19:36 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 4863 bytes --] Should include all comments; includes Branden's wording. наб (8): regex.3: Desoupify regexec() description regex.3: Desoupify regerror() description regex.3: Desoupify regfree() description regex.3: Improve REG_STARTEND regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 regex.3: Finalise move of reg*.3type regex.3: Destandardeseify Match offsets regex.3: Further clarify the sole purpose of REG_NOSUB man3/regex.3 | 226 ++++++++++++++++++++++---------------- man3type/regex_t.3type | 64 +---------- man3type/regmatch_t.3type | 2 +- man3type/regoff_t.3type | 2 +- 4 files changed, 133 insertions(+), 161 deletions(-) Range-diff against v5: 1: fcb8df21b < -: --------- regex.3: Desoupify regcomp() description 2: 7240de5b7 = 1: 1ad1aa6e9 regex.3: Desoupify regexec() description 3: 108f30cd7 ! 2: 6c4d26f89 regex.3: Desoupify regerror() description @@ Commit message Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> ## man3/regex.3 ## -@@ man3/regex.3: .SH SYNOPSIS - .BI " int " eflags ); - .PP - .BI "size_t regerror(int " errcode ", const regex_t *_Nullable restrict " preg , --.BI " char " errbuf "[restrict ." errbuf_size "], \ -+.BI " char " errbuf "[restrict ." errbuf_size "], \ - size_t " errbuf_size ); - .BI "void regfree(regex_t *" preg ); - .fi @@ man3/regex.3: .SS Error reporting .BR regexec () into error message strings. @@ man3/regex.3: .SS Error reporting -If both -.I errbuf -and ++If ++.I preg ++isn't a null pointer, +.I errcode +must be the latest error returned from an operation on +.IR preg . -+If -+.I preg -+is a null pointer\(emthe latest error. +.PP +If ++.I errbuf_size ++is ++.BR 0 , ++the size of the required buffer is returned. ++Otherwise, up to .I errbuf_size -are nonzero, -.I errbuf -is filled in with the first -.I "errbuf_size \- 1" -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). -+is -+.BR 0 , -+the size of the required buffer is returned. -+Otherwise, up to -+.I errbuf_size +bytes are copied to +.IR errbuf ; +the error string is always null-terminated, and truncated to fit. .SS Freeing --Supplying + Supplying .BR regfree () --with a precompiled pattern buffer, --.IR preg , --will free the memory allocated to the pattern buffer by the compiling --process, -+invalidates the pattern buffer at -+.IR *preg , -+which must have been initialized via - .BR regcomp (). - .SH RETURN VALUE - .BR regcomp () -: --------- > 3: 4b7971a5e regex.3: Desoupify regfree() description 4: fd1a104d6 ! 4: 5fb4cc16f regex.3: Improve REG_STARTEND @@ man3/regex.3: .SS Matching -on large strings. -It does not use +on known-length strings. -+.I pmatch -+must point to a valid readable object. +If any matches are returned +.RB ( REG_NOSUB +wasn't passed to @@ man3/regex.3: .SS Matching -processing. +> 0), they overwrite +.I pmatch -+as usual, and the -+.B Match offsets -+remain relative to ++as usual, and the match offsets remain relative to +.IR string +(not +.IR string " + " pmatch[0].rm_so ). 5: 198b7b4fa ! 5: 057a4a522 regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 @@ Commit message ## man3/regex.3 ## @@ man3/regex.3: .SH SYNOPSIS - .BI " char " errbuf "[restrict ." errbuf_size "], \ - size_t " errbuf_size ); + .BI " char " errbuf "[_Nullable restrict ." errbuf_size ], + .BI " size_t " errbuf_size ); .BI "void regfree(regex_t *" preg ); +.PP +.B typedef struct { 6: c6bc9cfd0 = 6: 60ac1a4d1 regex.3: Finalise move of reg*.3type 7: 59b8294c8 = 7: 3313546db regex.3: Destandardeseify Match offsets 8: 2e199fc3c ! 8: 7fa669481 regex.3: Further clarify the sole purpose of REG_NOSUB @@ man3/regex.3: .SS Compilation -.I nmatch -and -.I pmatch -+Only report overall success: ++Report only overall success. .BR regexec () -arguments will be ignored for this purpose (but -+will only use ++will use only .I pmatch -may still be used for -.BR REG_STARTEND ). +for +.BR REG_STARTEND , -+and ignore ++ignoring +.IR nmatch . .TP .B REG_NEWLINE -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v6 1/8] regex.3: Desoupify regexec() description 2023-04-20 19:36 ` [PATCH v6 0/8] regex.3 momento наб @ 2023-04-20 19:36 ` наб 2023-04-20 23:24 ` Alejandro Colomar 2023-04-20 19:36 ` [PATCH v6 2/8] regex.3: Desoupify regerror() description наб ` (7 subsequent siblings) 8 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-20 19:36 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 713 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index bedb97e87..47fe661d2 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -105,12 +105,10 @@ .SS Compilation .SS Matching .BR regexec () is used to match a null-terminated string -against the precompiled pattern buffer, -.IR preg . -.I nmatch -and -.I pmatch -are used to provide information regarding the location of any matches. +against the compiled pattern buffer in +.IR *preg , +which must have been initialised with +.BR regexec (). .I eflags is the bitwise OR -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v6 1/8] regex.3: Desoupify regexec() description 2023-04-20 19:36 ` [PATCH v6 1/8] regex.3: Desoupify regexec() description наб @ 2023-04-20 23:24 ` Alejandro Colomar 2023-04-21 0:33 ` наб 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 23:24 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1122 bytes --] Hi nab, On 4/20/23 21:36, наб wrote: > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > man3/regex.3 | 10 ++++------ > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index bedb97e87..47fe661d2 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -105,12 +105,10 @@ .SS Compilation > .SS Matching > .BR regexec () > is used to match a null-terminated string > -against the precompiled pattern buffer, > -.IR preg . > -.I nmatch > -and > -.I pmatch > -are used to provide information regarding the location of any matches. > +against the compiled pattern buffer in > +.IR *preg , > +which must have been initialised with > +.BR regexec (). This patch removes the nmatch and pmatch info before presumably we add it in a subsequent patch. I prefer if the patch that documents that would go either before this one, or right after this one. Cheers, Alex > .I eflags > is the > bitwise OR -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v6 1/8] regex.3: Desoupify regexec() description 2023-04-20 23:24 ` Alejandro Colomar @ 2023-04-21 0:33 ` наб 2023-04-21 0:49 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-21 0:33 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1091 bytes --] Hi! On Fri, Apr 21, 2023 at 01:24:16AM +0200, Alejandro Colomar wrote: > On 4/20/23 21:36, наб wrote: > > diff --git a/man3/regex.3 b/man3/regex.3 > > index bedb97e87..47fe661d2 100644 > > --- a/man3/regex.3 > > +++ b/man3/regex.3 > > @@ -105,12 +105,10 @@ .SS Compilation > > .SS Matching > > .BR regexec () > > is used to match a null-terminated string > > -against the precompiled pattern buffer, > > -.IR preg . > > -.I nmatch > > -and > > -.I pmatch > > -are used to provide information regarding the location of any matches. > > +against the compiled pattern buffer in > > +.IR *preg , > > +which must have been initialised with > > +.BR regexec (). > This patch removes the nmatch and pmatch info before presumably we add > it in a subsequent patch. It doesn't and we don't ‒ the documentation for nmatch and pmatch never leaves Match offsets. This patch just kills an extraneous, glib, and inaccurate description in Matching. There's another glib description not ten lines above in REG_NOSUB. You don't need to keep the third one. Best, [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v6 1/8] regex.3: Desoupify regexec() description 2023-04-21 0:33 ` наб @ 2023-04-21 0:49 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 0:49 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1333 bytes --] Hi! On 4/21/23 02:33, наб wrote: > Hi! > > On Fri, Apr 21, 2023 at 01:24:16AM +0200, Alejandro Colomar wrote: >> On 4/20/23 21:36, наб wrote: >>> diff --git a/man3/regex.3 b/man3/regex.3 >>> index bedb97e87..47fe661d2 100644 >>> --- a/man3/regex.3 >>> +++ b/man3/regex.3 >>> @@ -105,12 +105,10 @@ .SS Compilation >>> .SS Matching >>> .BR regexec () >>> is used to match a null-terminated string >>> -against the precompiled pattern buffer, >>> -.IR preg . >>> -.I nmatch >>> -and >>> -.I pmatch >>> -are used to provide information regarding the location of any matches. >>> +against the compiled pattern buffer in >>> +.IR *preg , >>> +which must have been initialised with >>> +.BR regexec (). >> This patch removes the nmatch and pmatch info before presumably we add >> it in a subsequent patch. > It doesn't and we don't ‒ > the documentation for nmatch and pmatch never leaves Match offsets. > > This patch just kills an extraneous, glib, and inaccurate description > in Matching. > > There's another glib description not ten lines above in REG_NOSUB. > You don't need to keep the third one. Ahhh, that's right. Thanks! Patch applied. Cheers, Alex > > Best, -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v6 2/8] regex.3: Desoupify regerror() description 2023-04-20 19:36 ` [PATCH v6 0/8] regex.3 momento наб 2023-04-20 19:36 ` [PATCH v6 1/8] regex.3: Desoupify regexec() description наб @ 2023-04-20 19:36 ` наб 2023-04-20 19:37 ` [PATCH v6 3/8] regex.3: Desoupify regfree() description наб ` (6 subsequent siblings) 8 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 19:36 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1317 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 36 ++++++++++++++++-------------------- 1 file changed, 16 insertions(+), 20 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 47fe661d2..3f1529583 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -207,27 +207,23 @@ .SS Error reporting .BR regexec () into error message strings. .PP -.BR regerror () -is passed the error code, -.IR errcode , -the pattern buffer, -.IR preg , -a pointer to a character string buffer, -.IR errbuf , -and the size of the string buffer, -.IR errbuf_size . -It returns the size of the -.I errbuf -required to contain the null-terminated error message string. -If both -.I errbuf -and +If +.I preg +isn't a null pointer, +.I errcode +must be the latest error returned from an operation on +.IR preg . +.PP +If +.I errbuf_size +is +.BR 0 , +the size of the required buffer is returned. +Otherwise, up to .I errbuf_size -are nonzero, -.I errbuf -is filled in with the first -.I "errbuf_size \- 1" -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). +bytes are copied to +.IR errbuf ; +the error string is always null-terminated, and truncated to fit. .SS Freeing Supplying .BR regfree () -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* [PATCH v6 3/8] regex.3: Desoupify regfree() description 2023-04-20 19:36 ` [PATCH v6 0/8] regex.3 momento наб 2023-04-20 19:36 ` [PATCH v6 1/8] regex.3: Desoupify regexec() description наб 2023-04-20 19:36 ` [PATCH v6 2/8] regex.3: Desoupify regerror() description наб @ 2023-04-20 19:37 ` наб 2023-04-20 23:35 ` Alejandro Colomar 2023-04-20 19:37 ` [PATCH v6 4/8] regex.3: Improve REG_STARTEND наб ` (5 subsequent siblings) 8 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-20 19:37 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 735 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 3f1529583..e3dd72a74 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -225,12 +225,10 @@ .SS Error reporting .IR errbuf ; the error string is always null-terminated, and truncated to fit. .SS Freeing -Supplying .BR regfree () -with a precompiled pattern buffer, -.IR preg , -will free the memory allocated to the pattern buffer by the compiling -process, +invalidates the pattern buffer at +.IR *preg , +which must have been initialized via .BR regcomp (). .SH RETURN VALUE .BR regcomp () -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v6 3/8] regex.3: Desoupify regfree() description 2023-04-20 19:37 ` [PATCH v6 3/8] regex.3: Desoupify regfree() description наб @ 2023-04-20 23:35 ` Alejandro Colomar 2023-04-21 0:27 ` наб 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 23:35 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1512 bytes --] On 4/20/23 21:37, наб wrote: > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > man3/regex.3 | 8 +++----- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index 3f1529583..e3dd72a74 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -225,12 +225,10 @@ .SS Error reporting > .IR errbuf ; > the error string is always null-terminated, and truncated to fit. > .SS Freeing > -Supplying > .BR regfree () > -with a precompiled pattern buffer, > -.IR preg , > -will free the memory allocated to the pattern buffer by the compiling > -process, > +invalidates the pattern buffer at While this ("invalidates") is true, it omits the most important information: it frees the object. I think it's better to say that it frees (or deallocates) the object and any memory allocated within it, since that already implies invalidating it (due to <https://port70.net/~nsz/c/c11/n1570.html#6.2.4p2> and <https://port70.net/~nsz/c/c11/n1570.html#7.22.3p1>), and also tells why it's necessary to call this function. Otherwise, it's not clear why we should call it. Why would I want to invalidate a buffer? We can call memfrob(3) for that :p Or for secure stuff, arc4random(3). > +.IR *preg , > +which must have been initialized via > .BR regcomp (). > .SH RETURN VALUE > .BR regcomp () -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v6 3/8] regex.3: Desoupify regfree() description 2023-04-20 23:35 ` Alejandro Colomar @ 2023-04-21 0:27 ` наб 2023-04-21 0:37 ` [PATCH v7 " наб 2023-04-21 0:58 ` [PATCH v6 " Alejandro Colomar 0 siblings, 2 replies; 143+ messages in thread From: наб @ 2023-04-21 0:27 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1709 bytes --] On Fri, Apr 21, 2023 at 01:35:43AM +0200, Alejandro Colomar wrote: > On 4/20/23 21:37, наб wrote: > > diff --git a/man3/regex.3 b/man3/regex.3 > > index 3f1529583..e3dd72a74 100644 > > --- a/man3/regex.3 > > +++ b/man3/regex.3 > > @@ -225,12 +225,10 @@ .SS Error reporting > > .IR errbuf ; > > the error string is always null-terminated, and truncated to fit. > > .SS Freeing > > -Supplying > > .BR regfree () > > -with a precompiled pattern buffer, > > -.IR preg , > > -will free the memory allocated to the pattern buffer by the compiling > > -process, > > +invalidates the pattern buffer at > While this ("invalidates") is true, it omits the most important information: > it frees the object. It doesn't. > I think it's better to say that it frees (or > deallocates) the object and any memory allocated within it, since that > already implies invalidating it (due to > <https://port70.net/~nsz/c/c11/n1570.html#6.2.4p2> and > <https://port70.net/~nsz/c/c11/n1570.html#7.22.3p1>), For the precise reasons listed here: the regex_t object continues to exist. regcomp() doesn't allocate *preg, and regfree() doesn't deallocate it. > and also tells why > it's necessary to call this function. Otherwise, it's not clear why we > should call it. Why would I want to invalidate a buffer? Admittedly, it does also "free any memory allocated by regcomp( ) associated with preg." (Issue 8 Draft 2.1), yeah. Maybe it's my neurosis that I consider "may no longer be passed to regexec()" the primary effect here. Updated to regfree() invalidates the pattern buffer at *preg, freeing any associated memory; *preg must have been initialized via regcomp(). Best, [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v7 3/8] regex.3: Desoupify regfree() description 2023-04-21 0:27 ` наб @ 2023-04-21 0:37 ` наб 2023-04-21 0:58 ` [PATCH v6 " Alejandro Colomar 1 sibling, 0 replies; 143+ messages in thread From: наб @ 2023-04-21 0:37 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1195 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- Range-diff against v6: 1: 4b7971a5e ! 1: 2632fe5c8 regex.3: Desoupify regfree() description @@ man3/regex.3: .SS Error reporting -process, +invalidates the pattern buffer at +.IR *preg , -+which must have been initialized via ++freeing any associated memory; ++.I *preg ++must have been initialized via .BR regcomp (). .SH RETURN VALUE .BR regcomp () man3/regex.3 | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 3f1529583..46a4a12b9 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -225,12 +225,12 @@ .SS Error reporting .IR errbuf ; the error string is always null-terminated, and truncated to fit. .SS Freeing -Supplying .BR regfree () -with a precompiled pattern buffer, -.IR preg , -will free the memory allocated to the pattern buffer by the compiling -process, +invalidates the pattern buffer at +.IR *preg , +freeing any associated memory; +.I *preg +must have been initialized via .BR regcomp (). .SH RETURN VALUE .BR regcomp () -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v6 3/8] regex.3: Desoupify regfree() description 2023-04-21 0:27 ` наб 2023-04-21 0:37 ` [PATCH v7 " наб @ 2023-04-21 0:58 ` Alejandro Colomar 2023-04-21 1:24 ` [PATCH v7a " наб 1 sibling, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 0:58 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 2318 bytes --] On 4/21/23 02:27, наб wrote: > On Fri, Apr 21, 2023 at 01:35:43AM +0200, Alejandro Colomar wrote: >> On 4/20/23 21:37, наб wrote: >>> diff --git a/man3/regex.3 b/man3/regex.3 >>> index 3f1529583..e3dd72a74 100644 >>> --- a/man3/regex.3 >>> +++ b/man3/regex.3 >>> @@ -225,12 +225,10 @@ .SS Error reporting >>> .IR errbuf ; >>> the error string is always null-terminated, and truncated to fit. >>> .SS Freeing >>> -Supplying >>> .BR regfree () >>> -with a precompiled pattern buffer, >>> -.IR preg , >>> -will free the memory allocated to the pattern buffer by the compiling >>> -process, >>> +invalidates the pattern buffer at >> While this ("invalidates") is true, it omits the most important information: >> it frees the object. > It doesn't. You're right. It frees memory within the object. :/ > >> I think it's better to say that it frees (or >> deallocates) the object and any memory allocated within it, since that >> already implies invalidating it (due to >> <https://port70.net/~nsz/c/c11/n1570.html#6.2.4p2> and >> <https://port70.net/~nsz/c/c11/n1570.html#7.22.3p1>), > For the precise reasons listed here: > the regex_t object continues to exist. > regcomp() doesn't allocate *preg, and regfree() doesn't deallocate it. > >> and also tells why >> it's necessary to call this function. Otherwise, it's not clear why we >> should call it. Why would I want to invalidate a buffer? > Admittedly, it does also "free any memory allocated by regcomp( ) > associated with preg." (Issue 8 Draft 2.1), yeah. Yep. > Maybe it's my neurosis that I consider "may no longer be passed to > regexec()" the primary effect here. :) I wish GCC had an attribute for ensuring that in the -fanalyzer. But [[gnu::malloc()]] only works for returned pointers, and not for pointers initialized via a parameter, nor for returned integers. > > Updated to > regfree() invalidates the pattern buffer at *preg, freeing any > associated memory; *preg must have been initialized via regcomp(). How about deinitializes? Since regcomp(3) "initializes" the pattern buffer, it makes sense to use complementary wording. Cheers, Alex > > Best, -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v7a 3/8] regex.3: Desoupify regfree() description 2023-04-21 0:58 ` [PATCH v6 " Alejandro Colomar @ 2023-04-21 1:24 ` наб 2023-04-21 1:55 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-21 1:24 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1480 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- Range-diff against v6: 1: 1ad1aa6e9 < -: --------- regex.3: Desoupify regexec() description 2: 6c4d26f89 < -: --------- regex.3: Desoupify regerror() description 3: 4b7971a5e ! 1: 5706f1892 regex.3: Desoupify regfree() description @@ man3/regex.3: .SS Error reporting -.IR preg , -will free the memory allocated to the pattern buffer by the compiling -process, -+invalidates the pattern buffer at ++deinitializes the pattern buffer at +.IR *preg , -+which must have been initialized via ++freeing any associated memory; ++.I *preg ++must have been initialized via .BR regcomp (). .SH RETURN VALUE .BR regcomp () man3/regex.3 | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 3f1529583..ffdd98376 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -225,12 +225,12 @@ .SS Error reporting .IR errbuf ; the error string is always null-terminated, and truncated to fit. .SS Freeing -Supplying .BR regfree () -with a precompiled pattern buffer, -.IR preg , -will free the memory allocated to the pattern buffer by the compiling -process, +deinitializes the pattern buffer at +.IR *preg , +freeing any associated memory; +.I *preg +must have been initialized via .BR regcomp (). .SH RETURN VALUE .BR regcomp () -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v7a 3/8] regex.3: Desoupify regfree() description 2023-04-21 1:24 ` [PATCH v7a " наб @ 2023-04-21 1:55 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 1:55 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1739 bytes --] On 4/21/23 03:24, наб wrote: > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Patch applied. Thanks! Alex > --- > Range-diff against v6: > 1: 1ad1aa6e9 < -: --------- regex.3: Desoupify regexec() description > 2: 6c4d26f89 < -: --------- regex.3: Desoupify regerror() description > 3: 4b7971a5e ! 1: 5706f1892 regex.3: Desoupify regfree() description > @@ man3/regex.3: .SS Error reporting > -.IR preg , > -will free the memory allocated to the pattern buffer by the compiling > -process, > -+invalidates the pattern buffer at > ++deinitializes the pattern buffer at > +.IR *preg , > -+which must have been initialized via > ++freeing any associated memory; > ++.I *preg > ++must have been initialized via > .BR regcomp (). > .SH RETURN VALUE > .BR regcomp () > > man3/regex.3 | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index 3f1529583..ffdd98376 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -225,12 +225,12 @@ .SS Error reporting > .IR errbuf ; > the error string is always null-terminated, and truncated to fit. > .SS Freeing > -Supplying > .BR regfree () > -with a precompiled pattern buffer, > -.IR preg , > -will free the memory allocated to the pattern buffer by the compiling > -process, > +deinitializes the pattern buffer at > +.IR *preg , > +freeing any associated memory; > +.I *preg > +must have been initialized via > .BR regcomp (). > .SH RETURN VALUE > .BR regcomp () -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v6 4/8] regex.3: Improve REG_STARTEND 2023-04-20 19:36 ` [PATCH v6 0/8] regex.3 momento наб ` (2 preceding siblings ...) 2023-04-20 19:37 ` [PATCH v6 3/8] regex.3: Desoupify regfree() description наб @ 2023-04-20 19:37 ` наб 2023-04-20 23:15 ` Alejandro Colomar 2023-04-20 19:37 ` [PATCH v6 5/8] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб ` (4 subsequent siblings) 8 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-20 19:37 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1636 bytes --] Explicitly spell out the ranges involved. The original wording always confused me, but it's actually very sane. Remove "this doesn't change R_NOTBOL & R_NEWLINE" ‒ so does it change R_NOTEOL? No. That's weird and confusing. String largeness doesn't matter, known-lengthness does. Explicitly spell out the influence on returned matches (relative to string, not start of range). Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index e3dd72a74..a9bec59a9 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -131,23 +131,26 @@ .SS Matching above). .TP .B REG_STARTEND -Use -.I pmatch[0] -on the input string, starting at byte -.I pmatch[0].rm_so -and ending before byte -.IR pmatch[0].rm_eo . +Match +.RI [ string " + " pmatch[0].rm_so ", " string " + " pmatch[0].rm_eo ) +instead of +.RI [ string ", " string " + \fBstrlen\fP(" string )). This allows matching embedded NUL bytes and avoids a .BR strlen (3) -on large strings. -It does not use +on known-length strings. +If any matches are returned +.RB ( REG_NOSUB +wasn't passed to +.BR regcomp (), +the match succeeded, and .I nmatch -on input, and does not change -.B REG_NOTBOL -or -.B REG_NEWLINE -processing. +> 0), they overwrite +.I pmatch +as usual, and the match offsets remain relative to +.IR string +(not +.IR string " + " pmatch[0].rm_so ). This flag is a BSD extension, not present in POSIX. .SS Match offsets Unless -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v6 4/8] regex.3: Improve REG_STARTEND 2023-04-20 19:37 ` [PATCH v6 4/8] regex.3: Improve REG_STARTEND наб @ 2023-04-20 23:15 ` Alejandro Colomar 2023-04-21 0:39 ` [PATCH v7 " наб 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 23:15 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 2459 bytes --] On 4/20/23 21:37, наб wrote: > Explicitly spell out the ranges involved. The original wording always > confused me, but it's actually very sane. > > Remove "this doesn't change R_NOTBOL & R_NEWLINE" ‒ so does it change > R_NOTEOL? No. That's weird and confusing. > > String largeness doesn't matter, known-lengthness does. > > Explicitly spell out the influence on returned matches > (relative to string, not start of range). > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > man3/regex.3 | 29 ++++++++++++++++------------- > 1 file changed, 16 insertions(+), 13 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index e3dd72a74..a9bec59a9 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -131,23 +131,26 @@ .SS Matching > above). > .TP > .B REG_STARTEND > -Use > -.I pmatch[0] > -on the input string, starting at byte > -.I pmatch[0].rm_so > -and ending before byte > -.IR pmatch[0].rm_eo . > +Match > +.RI [ string " + " pmatch[0].rm_so ", " string " + " pmatch[0].rm_eo ) > +instead of > +.RI [ string ", " string " + \fBstrlen\fP(" string )). See man-pages(7): Expressions, if not written on a separate indented line, should be specified in italics. Again, the use of nonbreaking spaces may be appropriate if the expression is inlined with normal text. strlen(string) is an expression, not a man page reference, so it should go in full italics. The + is also part of the expression, so it should also go in italics. I suggest: .RI [ "string + pmatch[0].rm_so" , " string + pmatch[0].rm_eo" ) .RI [ string , " string + strlen(string)" ). > This allows matching embedded NUL bytes > and avoids a > .BR strlen (3) > -on large strings. > -It does not use > +on known-length strings. > +If any matches are returned > +.RB ( REG_NOSUB > +wasn't passed to > +.BR regcomp (), > +the match succeeded, and > .I nmatch > -on input, and does not change > -.B REG_NOTBOL > -or > -.B REG_NEWLINE > -processing. > +> 0), they overwrite > +.I pmatch > +as usual, and the match offsets remain relative to > +.IR string > +(not > +.IR string " + " pmatch[0].rm_so ). Similar stuff here. > This flag is a BSD extension, not present in POSIX. > .SS Match offsets > Unless -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v7 4/8] regex.3: Improve REG_STARTEND 2023-04-20 23:15 ` Alejandro Colomar @ 2023-04-21 0:39 ` наб 2023-04-21 1:42 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-21 0:39 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2558 bytes --] Explicitly spell out the ranges involved. The original wording always confused me, but it's actually very sane. Remove "this doesn't change R_NOTBOL & R_NEWLINE" ‒ so does it change R_NOTEOL? No. That's weird and confusing. String largeness doesn't matter, known-lengthness does. Explicitly spell out the influence on returned matches (relative to string, not start of range). Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- Range-diff against v6: 1: 4b7971a5e < -: --------- regex.3: Desoupify regfree() description 2: 5fb4cc16f ! 1: ed050649b regex.3: Improve REG_STARTEND @@ man3/regex.3: .SS Matching -and ending before byte -.IR pmatch[0].rm_eo . +Match -+.RI [ string " + " pmatch[0].rm_so ", " string " + " pmatch[0].rm_eo ) ++.RI [ "string + pmatch[0].rm_so" , " string + pmatch[0].rm_eo" ) +instead of -+.RI [ string ", " string " + \fBstrlen\fP(" string )). ++.RI [ string , " string + strlen(string)" ). This allows matching embedded NUL bytes and avoids a .BR strlen (3) @@ man3/regex.3: .SS Matching +as usual, and the match offsets remain relative to +.IR string +(not -+.IR string " + " pmatch[0].rm_so ). ++.IR "string + pmatch[0].rm_so" ). This flag is a BSD extension, not present in POSIX. .SS Match offsets Unless man3/regex.3 | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 46a4a12b9..099c2c17f 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -131,23 +131,26 @@ .SS Matching above). .TP .B REG_STARTEND -Use -.I pmatch[0] -on the input string, starting at byte -.I pmatch[0].rm_so -and ending before byte -.IR pmatch[0].rm_eo . +Match +.RI [ "string + pmatch[0].rm_so" , " string + pmatch[0].rm_eo" ) +instead of +.RI [ string , " string + strlen(string)" ). This allows matching embedded NUL bytes and avoids a .BR strlen (3) -on large strings. -It does not use +on known-length strings. +If any matches are returned +.RB ( REG_NOSUB +wasn't passed to +.BR regcomp (), +the match succeeded, and .I nmatch -on input, and does not change -.B REG_NOTBOL -or -.B REG_NEWLINE -processing. +> 0), they overwrite +.I pmatch +as usual, and the match offsets remain relative to +.IR string +(not +.IR "string + pmatch[0].rm_so" ). This flag is a BSD extension, not present in POSIX. .SS Match offsets Unless -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v7 4/8] regex.3: Improve REG_STARTEND 2023-04-21 0:39 ` [PATCH v7 " наб @ 2023-04-21 1:42 ` Alejandro Colomar 2023-04-21 2:16 ` наб 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 1:42 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 4566 bytes --] Hi! On 4/21/23 02:39, наб wrote: > Explicitly spell out the ranges involved. The original wording always > confused me, but it's actually very sane. > > Remove "this doesn't change R_NOTBOL & R_NEWLINE" ‒ so does it change > R_NOTEOL? No. That's weird and confusing. > > String largeness doesn't matter, known-lengthness does. > > Explicitly spell out the influence on returned matches > (relative to string, not start of range). > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Patch applied. > --- > Range-diff against v6: > 1: 4b7971a5e < -: --------- regex.3: Desoupify regfree() description > 2: 5fb4cc16f ! 1: ed050649b regex.3: Improve REG_STARTEND > @@ man3/regex.3: .SS Matching > -and ending before byte > -.IR pmatch[0].rm_eo . > +Match > -+.RI [ string " + " pmatch[0].rm_so ", " string " + " pmatch[0].rm_eo ) > ++.RI [ "string + pmatch[0].rm_so" , " string + pmatch[0].rm_eo" ) > +instead of > -+.RI [ string ", " string " + \fBstrlen\fP(" string )). > ++.RI [ string , " string + strlen(string)" ). > This allows matching embedded NUL bytes > and avoids a > .BR strlen (3) > @@ man3/regex.3: .SS Matching > +as usual, and the match offsets remain relative to > +.IR string > +(not > -+.IR string " + " pmatch[0].rm_so ). > ++.IR "string + pmatch[0].rm_so" ). > This flag is a BSD extension, not present in POSIX. > .SS Match offsets > Unless > > man3/regex.3 | 29 ++++++++++++++++------------- > 1 file changed, 16 insertions(+), 13 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index 46a4a12b9..099c2c17f 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -131,23 +131,26 @@ .SS Matching > above). > .TP > .B REG_STARTEND > -Use > -.I pmatch[0] > -on the input string, starting at byte > -.I pmatch[0].rm_so > -and ending before byte > -.IR pmatch[0].rm_eo . > +Match > +.RI [ "string + pmatch[0].rm_so" , " string + pmatch[0].rm_eo" ) > +instead of > +.RI [ string , " string + strlen(string)" ). > This allows matching embedded NUL bytes > and avoids a > .BR strlen (3) > -on large strings. > -It does not use > +on known-length strings. > +If any matches are returned > +.RB ( REG_NOSUB > +wasn't passed to > +.BR regcomp (), > +the match succeeded, and > .I nmatch > -on input, and does not change > -.B REG_NOTBOL > -or > -.B REG_NEWLINE > -processing. > +> 0), they overwrite > +.I pmatch > +as usual, and the match offsets remain relative to > +.IR string Minor glitch: s/IR/I/ I fixed it. BTW, don't know if you knew, but you can run some linters to check these accidents by yourself. $ make lint check -t >/dev/null $ echo .IR foo >> man3/regex.3 $ make lint check -k LINT (mandoc) .tmp/man/man3/regex.3.lint-man.mandoc.touch LINT (tbl comment) .tmp/man/man3/regex.3.lint-man.tbl.touch PRECONV .tmp/man/man3/regex.3.tbl TBL .tmp/man/man3/regex.3.eqn EQN .tmp/man/man3/regex.3.cat.troff TROFF .tmp/man/man3/regex.3.cat.grotty an.tmac:man3/regex.3:376: style: .IR expects at least 2 arguments, got 1 found style problems; aborting make: *** [share/mk/build/catman.mk:80: .tmp/man/man3/regex.3.cat.grotty] Error 1 make: *** Deleting file '.tmp/man/man3/regex.3.cat.grotty' make: Target 'check' not remade because of errors. $ git restore -p diff --git a/man3/regex.3 b/man3/regex.3 index e91504986..4840edb83 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -373,3 +373,4 @@ .SH SEE ALSO .PP The glibc manual section, .I "Regular Expressions" +.IR foo (1/1) Discard this hunk from worktree [y,n,q,a,d,e,?]? y alx@asus5775:~/src/linux/man-pages/man-pages/main$ make lint check -k LINT (mandoc) .tmp/man/man3/regex.3.lint-man.mandoc.touch LINT (tbl comment) .tmp/man/man3/regex.3.lint-man.tbl.touch PRECONV .tmp/man/man3/regex.3.tbl TBL .tmp/man/man3/regex.3.eqn EQN .tmp/man/man3/regex.3.cat.troff TROFF .tmp/man/man3/regex.3.cat.grotty GROTTY .tmp/man/man3/regex.3.cat COL .tmp/man/man3/regex.3.cat.grep GREP .tmp/man/man3/regex.3.check-catman.touch If you want to read more about this, see the CONTRIBUTING file, or the Makefile itself (or rather, themselves). Cheers, Alex > +(not > +.IR "string + pmatch[0].rm_so" ). > This flag is a BSD extension, not present in POSIX. > .SS Match offsets > Unless -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v7 4/8] regex.3: Improve REG_STARTEND 2023-04-21 1:42 ` Alejandro Colomar @ 2023-04-21 2:16 ` наб 2023-04-21 9:45 ` Alejandro Colomar 2023-04-21 10:19 ` Jakub Wilk 0 siblings, 2 replies; 143+ messages in thread From: наб @ 2023-04-21 2:16 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 9433 bytes --] Hi! On Fri, Apr 21, 2023 at 03:42:48AM +0200, Alejandro Colomar wrote: > On 4/21/23 02:39, наб wrote: > > Explicitly spell out the ranges involved. The original wording always > > confused me, but it's actually very sane. > > > > Remove "this doesn't change R_NOTBOL & R_NEWLINE" ‒ so does it change > > R_NOTEOL? No. That's weird and confusing. > > > > String largeness doesn't matter, known-lengthness does. > > > > Explicitly spell out the influence on returned matches > > (relative to string, not start of range). > > > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > > Patch applied. > > > --- > > Range-diff against v6: > > 1: 4b7971a5e < -: --------- regex.3: Desoupify regfree() description > > 2: 5fb4cc16f ! 1: ed050649b regex.3: Improve REG_STARTEND > > @@ man3/regex.3: .SS Matching > > -and ending before byte > > -.IR pmatch[0].rm_eo . > > +Match > > -+.RI [ string " + " pmatch[0].rm_so ", " string " + " pmatch[0].rm_eo ) > > ++.RI [ "string + pmatch[0].rm_so" , " string + pmatch[0].rm_eo" ) > > +instead of > > -+.RI [ string ", " string " + \fBstrlen\fP(" string )). > > ++.RI [ string , " string + strlen(string)" ). > > This allows matching embedded NUL bytes > > and avoids a > > .BR strlen (3) > > @@ man3/regex.3: .SS Matching > > +as usual, and the match offsets remain relative to > > +.IR string > > +(not > > -+.IR string " + " pmatch[0].rm_so ). > > ++.IR "string + pmatch[0].rm_so" ). > > This flag is a BSD extension, not present in POSIX. > > .SS Match offsets > > Unless > > > > man3/regex.3 | 29 ++++++++++++++++------------- > > 1 file changed, 16 insertions(+), 13 deletions(-) > > > > diff --git a/man3/regex.3 b/man3/regex.3 > > index 46a4a12b9..099c2c17f 100644 > > --- a/man3/regex.3 > > +++ b/man3/regex.3 > > @@ -131,23 +131,26 @@ .SS Matching > > above). > > .TP > > .B REG_STARTEND > > -Use > > -.I pmatch[0] > > -on the input string, starting at byte > > -.I pmatch[0].rm_so > > -and ending before byte > > -.IR pmatch[0].rm_eo . > > +Match > > +.RI [ "string + pmatch[0].rm_so" , " string + pmatch[0].rm_eo" ) > > +instead of > > +.RI [ string , " string + strlen(string)" ). > > This allows matching embedded NUL bytes > > and avoids a > > .BR strlen (3) > > -on large strings. > > -It does not use > > +on known-length strings. > > +If any matches are returned > > +.RB ( REG_NOSUB > > +wasn't passed to > > +.BR regcomp (), > > +the match succeeded, and > > .I nmatch > > -on input, and does not change > > -.B REG_NOTBOL > > -or > > -.B REG_NEWLINE > > -processing. > > +> 0), they overwrite > > +.I pmatch > > +as usual, and the match offsets remain relative to > > +.IR string > > Minor glitch: s/IR/I/ > > I fixed it. BTW, don't know if you knew, but you can run some linters > to check these accidents by yourself. $ make check # ... GREP .tmp/man/man1/memusage.1.check-catman.touch .tmp/man/man1/memusage.1.cat.grep:132: Memory usage summary: heap total: 45200, heap peak: 6440, stack peak: 224 .tmp/man/man1/memusage.1.cat.grep:135: realloc| 40 44800 0 (nomove:40, dec:19, free:0) make: *** [share/mk/check/catman.mk:36: .tmp/man/man1/memusage.1.check-catman.touch] Error 1 $ make lint SED .tmp/man/man2/add_key.2.d/add_key.c LINT (checkpatch) .tmp/man/man2/add_key.2.d/add_key.lint-c.checkpatch.touch bash: line 1: checkpatch: command not found make: *** [share/mk/lint/c.mk:64: .tmp/man/man2/add_key.2.d/add_key.lint-c.checkpatch.touch] Error 127 git grep checkpatch first says I want checkpatch(1). No such manual exists, at least in Debian. Then it reveals I actually want checkpatch.pl from a linux checkout. Probably call it [scripts/]checkpatch.pl then? Then it reveals CHECKPATCH := checkpatch which means that just export CHECKPATCH=~/store/code/linux/scripts/checkpatch.pl doesn't work, and I need to pass it as an argument (should be ?=). The same for all the other linters. $ make -j25 CHECKPATCH=~/store/code/linux/scripts/checkpatch.pl lint # ... LINT (mandoc) .tmp/man/man1/pldd.1.lint-man.mandoc.touch mandoc: man1/getent.1:6:14: WARNING: cannot parse date, using it verbatim: (date) # (same what feels like every page; bullseye mandoc 1.14.5-1) If I pass MANDOC=~/code/voreutils/mandoc (recent(ish, it was recent last year) CVS, + some patches I forgot that fixed some egregious formatting errors): LINT (mandoc) .tmp/man/man5/ftpusers.5.lint-man.mandoc.touch LINT (mandoc) .tmp/man/man5/gai.conf.5.lint-man.mandoc.touch LINT (mandoc) .tmp/man/man5/group.5.lint-man.mandoc.touch LINT (mandoc) .tmp/man/man5/host.conf.5.lint-man.mandoc.touch mandoc: man5/erofs.5:78:2: ERROR: skipping end of block that is not open: RE mandoc: man5/erofs.5:79:2: WARNING: skipping paragraph macro: IP empty mandoc: man5/erofs.5:78:2: WARNING: skipping paragraph macro: br at the end of SS And it passes! Those are the only errors I saw, even on the version with IR\ string$ When I ran with 2>&1 | less to make sure, I got /etc/bash.bashrc: line 7: PS1: unbound variable /etc/bash.bashrc: line 7: PS1: unbound variable /etc/bash.bashrc: line 7: PS1: unbound variable /etc/bash.bashrc: line 7: PS1: unbound variable /etc/bash.bashrc: line 7: PS1: unbound variable /etc/bash.bashrc: line 7: PS1: unbound variable /etc/bash.bashrc: line 7: PS1: unbound variable /etc/bash.bashrc: line 7: PS1: unbound variable /etc/bash.bashrc: line 7: PS1: unbound variable /etc/bash.bashrc: line 7: PS1: unbound variable /etc/bash.bashrc: line 7: PS1: unbound variable /etc/bash.bashrc: line 7: PS1: unbound variable /etc/bash.bashrc: line 7: PS1: unbound variable /etc/bash.bashrc: line 7: PS1: unbound variable SED .tmp/man/man2/add_key.2.d/add_key.c SED .tmp/man/man2/bind.2.d/bind.c SED .tmp/man/man2/chown.2.d/chown.c SED .tmp/man/man2/clock_getres.2.d/clock_getres.c SED .tmp/man/man2/clone.2.d/clone.c SED .tmp/man/man2/close_range.2.d/close_range.c SED .tmp/man/man2/copy_file_range.2.d/copy_file_range.c SED .tmp/man/man2/eventfd.2.d/eventfd.c and indeed Makefile:SHELL := /usr/bin/env bash -Eeuo pipefail and $ sed -n 6,7p /etc/bash.bashrc # If not running interactively, don't do anything [ -z "$PS1" ] && return (That should be ${PS1-}. What's even funnier is that $ sed -n 14p /etc/bash.bashrc if [ -z "${debian_chroot:-}" ] && [ -r /etc/debian_chroot ]; then) $ make -j25 CHECKPATCH=~/store/code/linux/scripts/checkpatch.pl lint MANDOC=: CLANG-TIDY=: LINT (checkpatch) .tmp/man/man3/_Generic.3.d/_Generic.lint-c.checkpatch.touch ERROR:ASSIGN_IN_IF: do not use assignment in if condition #17: FILE: .tmp/man/man3const/EXIT_SUCCESS.3const.d/EXIT_SUCCESS.c:17: + if ((fp = fopen(argv[1], "r")) == NULL) { Do not use assignments in if condition. Example:: if ((foo = bar(...)) < BAZ) { should be written as:: foo = bar(...); if (foo < BAZ) { total: 1 errors, 0 warnings, 0 checks, 29 lines checked make: *** [share/mk/lint/c.mk:64: .tmp/man/man3const/EXIT_SUCCESS.3const.d/EXIT_SUCCESS.lint-c.checkpatch.touch] Error 1 make: *** Waiting for unfinished jobs.... CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis #17: FILE: .tmp/man/man3/dl_iterate_phdr.3.d/dl_iterate_phdr.c:17: + printf("Name: \"%s\" (%d segments)\n", info->dlpi_name, + info->dlpi_phnum); CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis #33: FILE: .tmp/man/man3/dl_iterate_phdr.3.d/dl_iterate_phdr.c:33: + printf(" %2zu: [%14p; memsz:%7jx] flags: %#jx; ", j, + (void *) (info->dlpi_addr + info->dlpi_phdr[j].p_vaddr), total: 0 errors, 0 warnings, 2 checks, 54 lines checked make: *** [share/mk/lint/c.mk:63: .tmp/man/man3/dl_iterate_phdr.3.d/dl_iterate_phdr.lint-c.checkpatch.touch] Error 1 WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to using 'closeSocketPair', this function's name, in a string #230: FILE: .tmp/man/man2/seccomp_unotify.2.d/seccomp_unotify.c:230: + err(EXIT_FAILURE, "closeSocketPair-close-0"); Embedded function names are less appropriate to use as refactoring can cause function renaming. Prefer the use of "%s", __func__ to embedded function names. Note that this does not work with -f (--file) checkpatch option as it depends on patch context providing the function name. WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to using 'closeSocketPair', this function's name, in a string #232: FILE: .tmp/man/man2/seccomp_unotify.2.d/seccomp_unotify.c:232: + err(EXIT_FAILURE, "closeSocketPair-close-1"); total: 0 errors, 2 warnings, 0 checks, 612 lines checked make: *** [share/mk/lint/c.mk:63: .tmp/man/man2/seccomp_unotify.2.d/seccomp_unotify.lint-c.checkpatch.touch] Error 1 (more pages) I'm not sure I agree with the ASSIGN_IN_IF case, but I'm assuming there's a mechanism to kill the lints you don't are about; linux cdc9718d5e590d6905361800b938b93f2b66818e. This continues until I've disabled every linter. I'm assuming you have specific versions that work for you, but, well. Best, наб [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v7 4/8] regex.3: Improve REG_STARTEND 2023-04-21 2:16 ` наб @ 2023-04-21 9:45 ` Alejandro Colomar 2023-04-21 12:13 ` наб 2023-04-21 10:19 ` Jakub Wilk 1 sibling, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 9:45 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 14390 bytes --] Hi! On 4/21/23 04:16, наб wrote: > Hi! > > On Fri, Apr 21, 2023 at 03:42:48AM +0200, Alejandro Colomar wrote: >> On 4/21/23 02:39, наб wrote: >>> Explicitly spell out the ranges involved. The original wording always >>> confused me, but it's actually very sane. >>> >>> Remove "this doesn't change R_NOTBOL & R_NEWLINE" ‒ so does it change >>> R_NOTEOL? No. That's weird and confusing. >>> >>> String largeness doesn't matter, known-lengthness does. >>> >>> Explicitly spell out the influence on returned matches >>> (relative to string, not start of range). >>> >>> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> >> >> Patch applied. >> >>> --- >>> Range-diff against v6: >>> 1: 4b7971a5e < -: --------- regex.3: Desoupify regfree() description >>> 2: 5fb4cc16f ! 1: ed050649b regex.3: Improve REG_STARTEND >>> @@ man3/regex.3: .SS Matching >>> -and ending before byte >>> -.IR pmatch[0].rm_eo . >>> +Match >>> -+.RI [ string " + " pmatch[0].rm_so ", " string " + " pmatch[0].rm_eo ) >>> ++.RI [ "string + pmatch[0].rm_so" , " string + pmatch[0].rm_eo" ) >>> +instead of >>> -+.RI [ string ", " string " + \fBstrlen\fP(" string )). >>> ++.RI [ string , " string + strlen(string)" ). >>> This allows matching embedded NUL bytes >>> and avoids a >>> .BR strlen (3) >>> @@ man3/regex.3: .SS Matching >>> +as usual, and the match offsets remain relative to >>> +.IR string >>> +(not >>> -+.IR string " + " pmatch[0].rm_so ). >>> ++.IR "string + pmatch[0].rm_so" ). >>> This flag is a BSD extension, not present in POSIX. >>> .SS Match offsets >>> Unless >>> >>> man3/regex.3 | 29 ++++++++++++++++------------- >>> 1 file changed, 16 insertions(+), 13 deletions(-) >>> >>> diff --git a/man3/regex.3 b/man3/regex.3 >>> index 46a4a12b9..099c2c17f 100644 >>> --- a/man3/regex.3 >>> +++ b/man3/regex.3 >>> @@ -131,23 +131,26 @@ .SS Matching >>> above). >>> .TP >>> .B REG_STARTEND >>> -Use >>> -.I pmatch[0] >>> -on the input string, starting at byte >>> -.I pmatch[0].rm_so >>> -and ending before byte >>> -.IR pmatch[0].rm_eo . >>> +Match >>> +.RI [ "string + pmatch[0].rm_so" , " string + pmatch[0].rm_eo" ) >>> +instead of >>> +.RI [ string , " string + strlen(string)" ). >>> This allows matching embedded NUL bytes >>> and avoids a >>> .BR strlen (3) >>> -on large strings. >>> -It does not use >>> +on known-length strings. >>> +If any matches are returned >>> +.RB ( REG_NOSUB >>> +wasn't passed to >>> +.BR regcomp (), >>> +the match succeeded, and >>> .I nmatch >>> -on input, and does not change >>> -.B REG_NOTBOL >>> -or >>> -.B REG_NEWLINE >>> -processing. >>> +> 0), they overwrite >>> +.I pmatch >>> +as usual, and the match offsets remain relative to >>> +.IR string >> >> Minor glitch: s/IR/I/ >> >> I fixed it. BTW, don't know if you knew, but you can run some linters >> to check these accidents by yourself. > > > $ make check > # ... > GREP .tmp/man/man1/memusage.1.check-catman.touch > .tmp/man/man1/memusage.1.cat.grep:132: Memory usage summary: heap total: 45200, heap peak: 6440, stack peak: 224 > .tmp/man/man1/memusage.1.cat.grep:135: realloc| 40 44800 0 (nomove:40, dec:19, free:0) > make: *** [share/mk/check/catman.mk:36: .tmp/man/man1/memusage.1.check-catman.touch] Error 1 That means the line goes beyond the 80-column margin in rendered pages. There are pages where code examples go beyond that limit, and I can only live with it :(. Ideally, that test should pass in every page, but in some cases it's impossible. I know the name of the test is horrible. Feel free to suggest alternatives. Maybe something like 'CHECK (80-col) $@' would do. > > > $ make lint > SED .tmp/man/man2/add_key.2.d/add_key.c > LINT (checkpatch) .tmp/man/man2/add_key.2.d/add_key.lint-c.checkpatch.touch > bash: line 1: checkpatch: command not found > make: *** [share/mk/lint/c.mk:64: .tmp/man/man2/add_key.2.d/add_key.lint-c.checkpatch.touch] Error 127 > > git grep checkpatch first says I want checkpatch(1). > No such manual exists, at least in Debian. Nope; that manual page probably only exists in my servers :) <http://www.alejandro-colomar.es/src/alx/linux/checkpatch.git/> > Then it reveals I actually want checkpatch.pl from a linux checkout. > Probably call it [scripts/]checkpatch.pl then? The thing is I suggested (privately; I hate that I can't reference to some list archive) the checkpatch.pl maintainers separating checkpatch.pl to a standalone project that can be packaged separately, and has a separate git history. That way it would be directly useful to many other projects that follow coding styles similar to the kernel. I prepared some proof of concept in that repo, but we agreed that it would be better if the entire git history from the Linux git history was kept, so I need to learn how to extract a few files from a git repo with their history (I know how to do that for a single file or directory, but cherry-picking files is more complex, and I didn't yet look deep into it). So I need to do that work before trying to host that repo in <kernel.org>. Feel free to check out that repo, but keep in mind that I will rewrite the entire history when I learn how to do it. > > Then it reveals > CHECKPATCH := checkpatch For me it's in $ which checkpatch /usr/local/bin/checkpatch And it's a modified version to be nicer to non-kernel projects. > which means that just > export CHECKPATCH=~/store/code/linux/scripts/checkpatch.pl > doesn't work, and I need to pass it as an argument (should be ?=). > The same for all the other linters. Yeah; feel free to send patches :) > > $ make -j25 CHECKPATCH=~/store/code/linux/scripts/checkpatch.pl lint > # ... > LINT (mandoc) .tmp/man/man1/pldd.1.lint-man.mandoc.touch > mandoc: man1/getent.1:6:14: WARNING: cannot parse date, using it verbatim: (date) > # (same what feels like every page; bullseye mandoc 1.14.5-1) If you only want to run $CHECKPATCH, you can run `make lint-c-checkpatch`. For a complete set of targets, see `make help`. (I know; I should have told you before, but that way I learnt some stuff that might have passed inadvertently.) > > If I pass MANDOC=~/code/voreutils/mandoc (recent(ish, it was recent last > year) CVS, + some patches I forgot that fixed some egregious formatting > errors): > LINT (mandoc) .tmp/man/man5/ftpusers.5.lint-man.mandoc.touch > LINT (mandoc) .tmp/man/man5/gai.conf.5.lint-man.mandoc.touch > LINT (mandoc) .tmp/man/man5/group.5.lint-man.mandoc.touch > LINT (mandoc) .tmp/man/man5/host.conf.5.lint-man.mandoc.touch > mandoc: man5/erofs.5:78:2: ERROR: skipping end of block that is not open: RE > mandoc: man5/erofs.5:79:2: WARNING: skipping paragraph macro: IP empty > mandoc: man5/erofs.5:78:2: WARNING: skipping paragraph macro: br at the end of SS I see the same errors; feel free to send patches :) $ make lint check -t $ touch man5/erofs.5 $ make lint check -k LINT (mandoc) .tmp/man/man5/erofs.5.lint-man.mandoc.touch mandoc: man5/erofs.5:78:2: ERROR: skipping end of block that is not open: RE mandoc: man5/erofs.5:79:2: WARNING: skipping paragraph macro: IP empty mandoc: man5/erofs.5:78:2: WARNING: skipping paragraph macro: br at the end of SS make: *** [share/mk/lint/man.mk:33: .tmp/man/man5/erofs.5.lint-man.mandoc.touch] Error 1 LINT (tbl comment) .tmp/man/man5/erofs.5.lint-man.tbl.touch make: Target 'lint' not remade because of errors. PRECONV .tmp/man/man5/erofs.5.tbl TBL .tmp/man/man5/erofs.5.eqn EQN .tmp/man/man5/erofs.5.cat.troff TROFF .tmp/man/man5/erofs.5.cat.grotty an.tmac:man5/erofs.5:18: style: use of deprecated macro: .PD an.tmac:man5/erofs.5:24: style: use of deprecated macro: .PD an.tmac:man5/erofs.5:50: style: .BR expects at least 2 arguments, got 1 an.tmac:man5/erofs.5:78: style: unbalanced .RE found style problems; aborting make: *** [share/mk/build/catman.mk:80: .tmp/man/man5/erofs.5.cat.grotty] Error 1 make: *** Deleting file '.tmp/man/man5/erofs.5.cat.grotty' make: Target 'check' not remade because of errors. > > And it passes! Do you mean that make doesn't recognize the error? > Those are the only errors I saw, even on the version with > IR\ string$ > > When I ran with 2>&1 | less to make sure, I got > /etc/bash.bashrc: line 7: PS1: unbound variable So it seems. > /etc/bash.bashrc: line 7: PS1: unbound variable > /etc/bash.bashrc: line 7: PS1: unbound variable > /etc/bash.bashrc: line 7: PS1: unbound variable > /etc/bash.bashrc: line 7: PS1: unbound variable > /etc/bash.bashrc: line 7: PS1: unbound variable > /etc/bash.bashrc: line 7: PS1: unbound variable > /etc/bash.bashrc: line 7: PS1: unbound variable > /etc/bash.bashrc: line 7: PS1: unbound variable > /etc/bash.bashrc: line 7: PS1: unbound variable > /etc/bash.bashrc: line 7: PS1: unbound variable > /etc/bash.bashrc: line 7: PS1: unbound variable > /etc/bash.bashrc: line 7: PS1: unbound variable > /etc/bash.bashrc: line 7: PS1: unbound variable > SED .tmp/man/man2/add_key.2.d/add_key.c > SED .tmp/man/man2/bind.2.d/bind.c > SED .tmp/man/man2/chown.2.d/chown.c > SED .tmp/man/man2/clock_getres.2.d/clock_getres.c > SED .tmp/man/man2/clone.2.d/clone.c > SED .tmp/man/man2/close_range.2.d/close_range.c > SED .tmp/man/man2/copy_file_range.2.d/copy_file_range.c > SED .tmp/man/man2/eventfd.2.d/eventfd.c > and indeed > Makefile:SHELL := /usr/bin/env bash -Eeuo pipefail > and > $ sed -n 6,7p /etc/bash.bashrc > # If not running interactively, don't do anything > [ -z "$PS1" ] && return I have the same bashrc (Debian Sid here), and have this same line. Why is it failing only for you? Maybe I modified something in my startup scripts? Maybe you did? > > (That should be ${PS1-}. What's even funnier is that Should we call debbugs? :) > $ sed -n 14p /etc/bash.bashrc > if [ -z "${debian_chroot:-}" ] && [ -r /etc/debian_chroot ]; then) Huh! > > > $ make -j25 CHECKPATCH=~/store/code/linux/scripts/checkpatch.pl lint MANDOC=: CLANG-TIDY=: > LINT (checkpatch) .tmp/man/man3/_Generic.3.d/_Generic.lint-c.checkpatch.touch > ERROR:ASSIGN_IN_IF: do not use assignment in if condition > #17: FILE: .tmp/man/man3const/EXIT_SUCCESS.3const.d/EXIT_SUCCESS.c:17: > + if ((fp = fopen(argv[1], "r")) == NULL) { > > Do not use assignments in if condition. > Example:: > > if ((foo = bar(...)) < BAZ) { > > should be written as:: > > foo = bar(...); > if (foo < BAZ) { > > total: 1 errors, 0 warnings, 0 checks, 29 lines checked > make: *** [share/mk/lint/c.mk:64: .tmp/man/man3const/EXIT_SUCCESS.3const.d/EXIT_SUCCESS.lint-c.checkpatch.touch] Error 1 > make: *** Waiting for unfinished jobs.... Hmmm, yes, I see that same error; this page is recent, so I probably never run the linters on it yet. :/ Thanks for the catch! Fixed. > CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis > #17: FILE: .tmp/man/man3/dl_iterate_phdr.3.d/dl_iterate_phdr.c:17: > + printf("Name: \"%s\" (%d segments)\n", info->dlpi_name, > + info->dlpi_phnum); This page has so many warnings, that I probably missed these valid ones. Alignment seems performed by a schoolchild that can't follow lines while painting :p. Fixed. > > CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis > #33: FILE: .tmp/man/man3/dl_iterate_phdr.3.d/dl_iterate_phdr.c:33: > + printf(" %2zu: [%14p; memsz:%7jx] flags: %#jx; ", j, > + (void *) (info->dlpi_addr + info->dlpi_phdr[j].p_vaddr), > > total: 0 errors, 0 warnings, 2 checks, 54 lines checked > make: *** [share/mk/lint/c.mk:63: .tmp/man/man3/dl_iterate_phdr.3.d/dl_iterate_phdr.lint-c.checkpatch.touch] Error 1 > WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to using 'closeSocketPair', this function's name, in a string > #230: FILE: .tmp/man/man2/seccomp_unotify.2.d/seccomp_unotify.c:230: > + err(EXIT_FAILURE, "closeSocketPair-close-0"); > > Embedded function names are less appropriate to use as > refactoring can cause function renaming. Prefer the use of > "%s", __func__ to embedded function names. > > Note that this does not work with -f (--file) checkpatch option > as it depends on patch context providing the function name. > > WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to using 'closeSocketPair', this function's name, in a string > #232: FILE: .tmp/man/man2/seccomp_unotify.2.d/seccomp_unotify.c:232: > + err(EXIT_FAILURE, "closeSocketPair-close-1"); I've seen this one, and thought of fixing it, but I'm not yet sure how to do it so that the page is consistent with itself. So far I've not done anything. > > total: 0 errors, 2 warnings, 0 checks, 612 lines checked > make: *** [share/mk/lint/c.mk:63: .tmp/man/man2/seccomp_unotify.2.d/seccomp_unotify.lint-c.checkpatch.touch] Error 1 > > (more pages) > > > I'm not sure I agree with the ASSIGN_IN_IF case, I do agree with it; it's just that I don't run these often; especially some linters that have many warnings in current pages, I tend to ignore them. But they're still useful sometimes. > but I'm assuming > there's a mechanism to kill the lints you don't are about; > linux cdc9718d5e590d6905361800b938b93f2b66818e. I disable the lints in the Makefile, so whatever you see is probably because it's a wanted warning, or because the linter recently added it. However, fixing all pages would be impossible :(. > > > This continues until I've disabled every linter. > I'm assuming you have specific versions that work for you, > but, well. No; I do see a lot of noise too. The thing is it's still useful for linting specific pages: $ make lint check -t >/dev/null # ignore everything $ make lint check -W man5/erofs.5 # lint only that page Cheers, Alex > > > Best, > наб -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v7 4/8] regex.3: Improve REG_STARTEND 2023-04-21 9:45 ` Alejandro Colomar @ 2023-04-21 12:13 ` наб 2023-04-21 12:21 ` Alejandro Colomar 2023-04-21 12:23 ` Alejandro Colomar 0 siblings, 2 replies; 143+ messages in thread From: наб @ 2023-04-21 12:13 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 673 bytes --] On Fri, Apr 21, 2023 at 11:45:07AM +0200, Alejandro Colomar wrote: > On 4/21/23 04:16, наб wrote: > > And it passes! > Do you mean that make doesn't recognize the error? I mean that > > Those are the only errors I saw, even on the version with > > IR\ string$ so, even if I'd ran the linter pass, it wouldn't've found the line you originally pointed out. > I have the same bashrc (Debian Sid here), and have this same > line. Why is it failing only for you? Maybe I modified > something in my startup scripts? Maybe you did? Unlikely. What if you do make ... 2>&1 | less? Or this is an unrelated bullseye bash bug that's fixed in bookworm. Best, [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v7 4/8] regex.3: Improve REG_STARTEND 2023-04-21 12:13 ` наб @ 2023-04-21 12:21 ` Alejandro Colomar 2023-04-21 12:23 ` Alejandro Colomar 1 sibling, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 12:21 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1126 bytes --] On 4/21/23 14:13, наб wrote: > On Fri, Apr 21, 2023 at 11:45:07AM +0200, Alejandro Colomar wrote: >> On 4/21/23 04:16, наб wrote: >>> And it passes! >> Do you mean that make doesn't recognize the error? > I mean that >>> Those are the only errors I saw, even on the version with >>> IR\ string$ > so, even if I'd ran the linter pass, it wouldn't've found the line you > originally pointed out. > >> I have the same bashrc (Debian Sid here), and have this same >> line. Why is it failing only for you? Maybe I modified >> something in my startup scripts? Maybe you did? > Unlikely. $ grep PS1.*return /etc/bash.bashrc [ -z "$PS1" ] && return > What if you do make ... 2>&1 | less? Nothing bad. I edited ~/.bash_aliases, but I don't think I have anything there that would workaround this issue. I'm puzzled. > > Or this is an unrelated bullseye bash bug that's fixed in bookworm. No idea; it could be. I don't have any bullseye to test it. Cheers, Alex > > Best, -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v7 4/8] regex.3: Improve REG_STARTEND 2023-04-21 12:13 ` наб 2023-04-21 12:21 ` Alejandro Colomar @ 2023-04-21 12:23 ` Alejandro Colomar 1 sibling, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 12:23 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 968 bytes --] On 4/21/23 14:13, наб wrote: > On Fri, Apr 21, 2023 at 11:45:07AM +0200, Alejandro Colomar wrote: >> On 4/21/23 04:16, наб wrote: >>> And it passes! >> Do you mean that make doesn't recognize the error? > I mean that >>> Those are the only errors I saw, even on the version with >>> IR\ string$ > so, even if I'd ran the linter pass, it wouldn't've found the line you > originally pointed out. Yep; you probably need groff-1.23 for that (yet unreleased, but there's an rc4 that you can build from source. :) Cheers > >> I have the same bashrc (Debian Sid here), and have this same >> line. Why is it failing only for you? Maybe I modified >> something in my startup scripts? Maybe you did? > Unlikely. What if you do make ... 2>&1 | less? > > Or this is an unrelated bullseye bash bug that's fixed in bookworm. > > Best, -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v7 4/8] regex.3: Improve REG_STARTEND 2023-04-21 2:16 ` наб 2023-04-21 9:45 ` Alejandro Colomar @ 2023-04-21 10:19 ` Jakub Wilk 2023-04-21 10:22 ` Alejandro Colomar 2023-04-21 11:34 ` наб 1 sibling, 2 replies; 143+ messages in thread From: Jakub Wilk @ 2023-04-21 10:19 UTC (permalink / raw) To: наб; +Cc: Alejandro Colomar, linux-man * наб <nabijaczleweli@nabijaczleweli.xyz>, 2023-04-21 04:16: >/etc/bash.bashrc: line 7: PS1: unbound variable How come? bash is not supposed to read bashrc if the shell is non-interactive (unless you instruct it otherwise). >Makefile:SHELL := /usr/bin/env bash -Eeuo pipefail Unrelated, but what is /usr/bin/env for? -- Jakub Wilk ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v7 4/8] regex.3: Improve REG_STARTEND 2023-04-21 10:19 ` Jakub Wilk @ 2023-04-21 10:22 ` Alejandro Colomar 2023-04-21 10:44 ` Jakub Wilk 2023-04-21 11:34 ` наб 1 sibling, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 10:22 UTC (permalink / raw) To: Jakub Wilk; +Cc: linux-man, наб [-- Attachment #1.1: Type: text/plain, Size: 1448 bytes --] Hi Jakub! On 4/21/23 12:19, Jakub Wilk wrote: > * наб <nabijaczleweli@nabijaczleweli.xyz>, 2023-04-21 04:16: >> /etc/bash.bashrc: line 7: PS1: unbound variable > > How come? bash is not supposed to read bashrc if the shell is > non-interactive (unless you instruct it otherwise). > >> Makefile:SHELL := /usr/bin/env bash -Eeuo pipefail > > Unrelated, but what is /usr/bin/env for? $ git blame -- Makefile | grep bin/env 26061fbd33 (Alejandro Colomar 2022-06-19 19:55:58 +0200 31) SHELL := /usr/bin/env bash -Eeuo pipefail $ git show 26061fbd33 commit 26061fbd337fbcfb6255def88ef4f0573c090702 Author: Alejandro Colomar <alx@kernel.org> Date: Sun Jun 19 19:55:58 2022 +0200 Makefile: SHELL: Use a portable bash Reported-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com> diff --git a/Makefile b/Makefile index 9beca11de..cb1466370 100644 --- a/Makefile +++ b/Makefile @@ -28,7 +28,7 @@ # ######################################################################## -SHELL := /bin/bash -Eeuo pipefail +SHELL := /usr/bin/env bash -Eeuo pipefail MAKEFLAGS += --no-print-directory This helps in systems where bash(1) is not a system command (probably MacOS, and maybe others). Cheers, Alex -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v7 4/8] regex.3: Improve REG_STARTEND 2023-04-21 10:22 ` Alejandro Colomar @ 2023-04-21 10:44 ` Jakub Wilk 2023-04-21 11:16 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: Jakub Wilk @ 2023-04-21 10:44 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man, наб * Alejandro Colomar <alx.manpages@gmail.com>, 2023-04-21 12:22: >-SHELL := /bin/bash -Eeuo pipefail >+SHELL := /usr/bin/env bash -Eeuo pipefail > > > MAKEFLAGS += --no-print-directory > > >This helps in systems where bash(1) is not a system command (probably >MacOS, and maybe others). Yeah, but why not use simply SHELL = bash ... ? -- Jakub Wilk ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v7 4/8] regex.3: Improve REG_STARTEND 2023-04-21 10:44 ` Jakub Wilk @ 2023-04-21 11:16 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 11:16 UTC (permalink / raw) To: Jakub Wilk; +Cc: linux-man, наб, bug-make [-- Attachment #1.1: Type: text/plain, Size: 759 bytes --] Hi Jakub, On 4/21/23 12:44, Jakub Wilk wrote: > * Alejandro Colomar <alx.manpages@gmail.com>, 2023-04-21 12:22: >> -SHELL := /bin/bash -Eeuo pipefail >> +SHELL := /usr/bin/env bash -Eeuo pipefail >> >> >> MAKEFLAGS += --no-print-directory >> >> >> This helps in systems where bash(1) is not a system command (probably >> MacOS, and maybe others). > > Yeah, but why not use simply > > SHELL = bash ... > > ? I couldn't find documentation that guarantees that that should work, so we used shebang style, which will work for sure. I CCd bug-make@, in case they can confirm what is safe and what is not. Thanks, Alex -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v7 4/8] regex.3: Improve REG_STARTEND 2023-04-21 10:19 ` Jakub Wilk 2023-04-21 10:22 ` Alejandro Colomar @ 2023-04-21 11:34 ` наб 2023-04-21 12:46 ` Jakub Wilk 1 sibling, 1 reply; 143+ messages in thread From: наб @ 2023-04-21 11:34 UTC (permalink / raw) To: Jakub Wilk; +Cc: Alejandro Colomar, linux-man [-- Attachment #1: Type: text/plain, Size: 693 bytes --] On Fri, Apr 21, 2023 at 12:19:57PM +0200, Jakub Wilk wrote: > * наб <nabijaczleweli@nabijaczleweli.xyz>, 2023-04-21 04:16: > > /etc/bash.bashrc: line 7: PS1: unbound variable > How come? bash is not supposed to read bashrc if the shell is > non-interactive (unless you instruct it otherwise). No clue, surprised me as well, esp. since I didn't see any funny bash flags to force interactivity. Should be protected against -u regardless. > > Makefile:SHELL := /usr/bin/env bash -Eeuo pipefail > Unrelated, but what is /usr/bin/env for? Oddly, SHELL look-up appears to only be defined for DOS: https://www.gnu.org/software/make/manual/html_node/Choosing-the-Shell.html Best, [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v7 4/8] regex.3: Improve REG_STARTEND 2023-04-21 11:34 ` наб @ 2023-04-21 12:46 ` Jakub Wilk 0 siblings, 0 replies; 143+ messages in thread From: Jakub Wilk @ 2023-04-21 12:46 UTC (permalink / raw) To: наб; +Cc: Alejandro Colomar, linux-man * наб <nabijaczleweli@nabijaczleweli.xyz>, 2023-04-21 13:34: >>>/etc/bash.bashrc: line 7: PS1: unbound variable >>How come? bash is not supposed to read bashrc if the shell is >>non-interactive (unless you instruct it otherwise). >No clue, surprised me as well, esp. since I didn't see any funny bash >flags to force interactivity. I did some googling, which led me to this this: https://lists.debian.org/Ywohi2WEtK+TtquZ@wooledge.org I can reproduce the bug in unstable: $ (SSH_CLIENT=moo bash -uc true) /etc/bash.bashrc: line 7: PS1: unbound variable What is this I don't even. -- Jakub Wilk ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v6 5/8] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 2023-04-20 19:36 ` [PATCH v6 0/8] regex.3 momento наб ` (3 preceding siblings ...) 2023-04-20 19:37 ` [PATCH v6 4/8] regex.3: Improve REG_STARTEND наб @ 2023-04-20 19:37 ` наб 2023-04-20 19:37 ` [PATCH v6 6/8] regex.3: Finalise move of reg*.3type наб ` (3 subsequent siblings) 8 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 19:37 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 4267 bytes --] Move-only commit. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 30 ++++++++++++++++++ man3type/regex_t.3type | 64 +-------------------------------------- man3type/regmatch_t.3type | 2 +- man3type/regoff_t.3type | 2 +- 4 files changed, 33 insertions(+), 65 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index a9bec59a9..2b886eb77 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -29,6 +29,20 @@ .SH SYNOPSIS .BI " char " errbuf "[_Nullable restrict ." errbuf_size ], .BI " size_t " errbuf_size ); .BI "void regfree(regex_t *" preg ); +.PP +.B typedef struct { +.BR " size_t re_nsub;" " /* Number of parenthesized subexpressions */" +.B } regex_t; +.PP +.B typedef struct { +.BR " regoff_t rm_so;" " /* Byte offset from start of string" + to start of substring */ +.BR " regoff_t rm_eo;" " /* Byte offset from start of string to" + the first character after the end of + substring */ +.B } regmatch_t; +.PP +.BR typedef " /* ... */ " regoff_t; .fi .SH DESCRIPTION .SS Compilation @@ -202,6 +216,14 @@ .SS Match offsets .I rm_eo element indicates the end offset of the match, which is the offset of the first character after the matching text. +.PP +.I regoff_t +It is a signed integer type +capable of storing the largest value that can be stored in either an +.I ptrdiff_t +type or a +.I ssize_t +type. .SS Error reporting .BR regerror () is used to turn the error codes that can be returned by both @@ -318,6 +340,14 @@ .SH STANDARDS POSIX.1-2008. .SH HISTORY POSIX.1-2001. +.PP +Prior to POSIX.1-2008, +the type was +capable of storing the largest value that can be stored in either an +.I off_t +type or a +.I ssize_t +type. .SH EXAMPLES .EX #include <stdint.h> diff --git a/man3type/regex_t.3type b/man3type/regex_t.3type index 176d2c7a6..c0daaf0ff 100644 --- a/man3type/regex_t.3type +++ b/man3type/regex_t.3type @@ -1,63 +1 @@ -.\" Copyright (c) 2020-2022 by Alejandro Colomar <alx@kernel.org> -.\" and Copyright (c) 2020 by Michael Kerrisk <mtk.manpages@gmail.com> -.\" -.\" SPDX-License-Identifier: Linux-man-pages-copyleft -.\" -.\" -.TH regex_t 3type (date) "Linux man-pages (unreleased)" -.SH NAME -regex_t, regmatch_t, regoff_t -\- regular expression matching -.SH LIBRARY -Standard C library -.RI ( libc ) -.SH SYNOPSIS -.EX -.B #include <regex.h> -.PP -.B typedef struct { -.BR " size_t re_nsub;" " /* Number of parenthesized subexpressions */" -.B } regex_t; -.PP -.B typedef struct { -.BR " regoff_t rm_so;" " /* Byte offset from start of string" - to start of substring */ -.BR " regoff_t rm_eo;" " /* Byte offset from start of string to" - the first character after the end of - substring */ -.B } regmatch_t; -.PP -.BR typedef " /* ... */ " regoff_t; -.EE -.SH DESCRIPTION -.TP -.I regex_t -This is a structure type used in regular expression matching. -It holds a compiled regular expression, -compiled with -.BR regcomp (3). -.TP -.I regmatch_t -This is a structure type used in regular expression matching. -.TP -.I regoff_t -It is a signed integer type -capable of storing the largest value that can be stored in either an -.I ptrdiff_t -type or a -.I ssize_t -type. -.SH STANDARDS -POSIX.1-2008. -.SH HISTORY -POSIX.1-2001. -.PP -Prior to POSIX.1-2008, -the type was -capable of storing the largest value that can be stored in either an -.I off_t -type or a -.I ssize_t -type. -.SH SEE ALSO -.BR regex (3) +.so man3/regex.3 diff --git a/man3type/regmatch_t.3type b/man3type/regmatch_t.3type index dc78f2cf2..c0daaf0ff 100644 --- a/man3type/regmatch_t.3type +++ b/man3type/regmatch_t.3type @@ -1 +1 @@ -.so man3type/regex_t.3type +.so man3/regex.3 diff --git a/man3type/regoff_t.3type b/man3type/regoff_t.3type index dc78f2cf2..c0daaf0ff 100644 --- a/man3type/regoff_t.3type +++ b/man3type/regoff_t.3type @@ -1 +1 @@ -.so man3type/regex_t.3type +.so man3/regex.3 -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* [PATCH v6 6/8] regex.3: Finalise move of reg*.3type 2023-04-20 19:36 ` [PATCH v6 0/8] regex.3 momento наб ` (4 preceding siblings ...) 2023-04-20 19:37 ` [PATCH v6 5/8] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб @ 2023-04-20 19:37 ` наб 2023-04-20 19:37 ` [PATCH v6 7/8] regex.3: Destandardeseify Match offsets наб ` (2 subsequent siblings) 8 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 19:37 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2891 bytes --] They're inextricably linked, not cross-referenced at all, and not used anywhere else. Now that they (realistically) exist to the reader, add a note on how big nmatch can be; POSIX even says "The application developer should note that there is probably no reason for using a value of nmatch that is larger than preg−>re_nsub+1.". Also remove the now-duplicate regmatch_t declaration. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 54 +++++++++++++++++++++++++++++++++------------------- 1 file changed, 34 insertions(+), 20 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 2b886eb77..2e9bb13ff 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -15,7 +15,7 @@ .SH LIBRARY Standard C library .RI ( libc ", " \-lc ) .SH SYNOPSIS -.nf +.EX .B #include <regex.h> .PP .BI "int regcomp(regex_t *restrict " preg ", const char *restrict " regex , @@ -43,7 +43,7 @@ .SH SYNOPSIS .B } regmatch_t; .PP .BR typedef " /* ... */ " regoff_t; -.fi +.EE .SH DESCRIPTION .SS Compilation .BR regcomp () @@ -60,6 +60,21 @@ .SS Compilation The locale must be the same when running .BR regexec (). .PP +After +.BR regcomp () +succeeds, +.I preg->re_nsub +holds the number of subexpressions in +.IR regex . +Thus, a value of +.I preg->re_nsub ++ 1 +passed as +.I nmatch +to +.BR regexec () +is sufficient to capture all matches. +.PP .I cflags is the bitwise OR @@ -192,22 +207,6 @@ .SS Match offsets .IR N+1 .) Any unused structure elements will contain the value \-1. .PP -The -.I regmatch_t -structure which is the type of -.I pmatch -is defined in -.IR <regex.h> . -.PP -.in +4n -.EX -typedef struct { - regoff_t rm_so; - regoff_t rm_eo; -} regmatch_t; -.EE -.in -.PP Each .I rm_so element that is not \-1 indicates the start offset of the next largest @@ -218,7 +217,7 @@ .SS Match offsets which is the offset of the first character after the matching text. .PP .I regoff_t -It is a signed integer type +is a signed integer type capable of storing the largest value that can be stored in either an .I ptrdiff_t type or a @@ -342,12 +341,27 @@ .SH HISTORY POSIX.1-2001. .PP Prior to POSIX.1-2008, -the type was +.I regoff_t +was required to be capable of storing the largest value that can be stored in either an .I off_t type or a .I ssize_t type. +.SH NOTES +.I re_nsub +is only required to be initialized if +.B REG_NOSUB +wasn't specified, but all known implementations initialize it regardless. +.\" glibc, musl, 4.4BSD, illumos +.PP +Both +.I regex_t +and +.I regmatch_t +may (and do) have more members, in any order. +Always reference them by name. +.\" illumos has two more start/end pairs and the first one is of pointers .SH EXAMPLES .EX #include <stdint.h> -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* [PATCH v6 7/8] regex.3: Destandardeseify Match offsets 2023-04-20 19:36 ` [PATCH v6 0/8] regex.3 momento наб ` (5 preceding siblings ...) 2023-04-20 19:37 ` [PATCH v6 6/8] regex.3: Finalise move of reg*.3type наб @ 2023-04-20 19:37 ` наб 2023-04-20 19:37 ` [PATCH v6 8/8] regex.3: Further clarify the sole purpose of REG_NOSUB наб 2023-04-21 2:01 ` [PATCH v6 0/8] regex.3 momento Alejandro Colomar 8 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 19:37 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2194 bytes --] This section reads like it were (and pretty much is) lifted from POSIX. That's hard to read, because POSIX is horrendously verbose, as usual. Instead, synopsise it into something less formal but more reasonable, and describe the resulting range with a range instead of a paragraph. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 53 +++++++++++++++++++++++++--------------------------- 1 file changed, 25 insertions(+), 28 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 2e9bb13ff..7b91f5b30 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -184,37 +184,34 @@ .SS Matching .SS Match offsets Unless .B REG_NOSUB -was set for the compilation of the pattern buffer, it is possible to -obtain match addressing information. -.I pmatch -must be dimensioned to have at least -.I nmatch -elements. -These are filled in by +was passed to +.BR regcomp (), +it is possible to +obtain the locations of matches within +.IR string : .BR regexec () -with substring match addresses. -The offsets of the subexpression starting at the -.IR i th -open parenthesis are stored in -.IR pmatch[i] . -The entire regular expression's match addresses are stored in -.IR pmatch[0] . -(Note that to return the offsets of -.I N -subexpression matches, +fills .I nmatch -must be at least -.IR N+1 .) -Any unused structure elements will contain the value \-1. +elements of +.I pmatch +with results: +.I pmatch[0] +corresponds to the entire match, +.I pmatch[1] +to the first expression, etc. +If there were more matches than +.IR nmatch , +they are discarded; +if fewer, +unused elements of +.I pmatch +are filled with +.BR \-1 s. .PP -Each -.I rm_so -element that is not \-1 indicates the start offset of the next largest -substring match within the string. -The relative -.I rm_eo -element indicates the end offset of the match, -which is the offset of the first character after the matching text. +Each returned valid +.RB (non- \-1 ) +match corresponds to the range +.RI [ string " + " rm_so ", " string " + " rm_eo ). .PP .I regoff_t is a signed integer type -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* [PATCH v6 8/8] regex.3: Further clarify the sole purpose of REG_NOSUB 2023-04-20 19:36 ` [PATCH v6 0/8] regex.3 momento наб ` (6 preceding siblings ...) 2023-04-20 19:37 ` [PATCH v6 7/8] regex.3: Destandardeseify Match offsets наб @ 2023-04-20 19:37 ` наб 2023-04-21 2:01 ` [PATCH v6 0/8] regex.3 momento Alejandro Colomar 8 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 19:37 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 792 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 7b91f5b30..4c450bd7f 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -96,16 +96,14 @@ .SS Compilation searches using this pattern buffer will be case insensitive. .TP .B REG_NOSUB -Do not report position of matches. -The -.I nmatch -and -.I pmatch +Report only overall success. .BR regexec () -arguments will be ignored for this purpose (but +will use only .I pmatch -may still be used for -.BR REG_STARTEND ). +for +.BR REG_STARTEND , +ignoring +.IR nmatch . .TP .B REG_NEWLINE Match-any-character operators don't match a newline. -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v6 0/8] regex.3 momento 2023-04-20 19:36 ` [PATCH v6 0/8] regex.3 momento наб ` (7 preceding siblings ...) 2023-04-20 19:37 ` [PATCH v6 8/8] regex.3: Further clarify the sole purpose of REG_NOSUB наб @ 2023-04-21 2:01 ` Alejandro Colomar 2023-04-21 2:48 ` [PATCH v8 0/5] " наб 8 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 2:01 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 5670 bytes --] Hey наб! On 4/20/23 21:36, наб wrote: > Should include all comments; includes Branden's wording. I'm going to sleep. Would you please rebase and send tomorrow whatever I didn't yet apply? I've got a mess of mailbox by now =) Let's see what I find in the git-log(1)... > > наб (8): > regex.3: Desoupify regexec() description Applied. > regex.3: Desoupify regerror() description Not yet it seems; please resend. > regex.3: Desoupify regfree() description Applied. > regex.3: Improve REG_STARTEND Applied. > regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link > regex_t.3type into regex.3 > regex.3: Finalise move of reg*.3type Both not yet; please resend. > regex.3: Destandardeseify Match offsets Not yet; please resend. > regex.3: Further clarify the sole purpose of REG_NOSUB And not yet; please resend. Cheers, Alex > > man3/regex.3 | 226 ++++++++++++++++++++++---------------- > man3type/regex_t.3type | 64 +---------- > man3type/regmatch_t.3type | 2 +- > man3type/regoff_t.3type | 2 +- > 4 files changed, 133 insertions(+), 161 deletions(-) > > Range-diff against v5: > 1: fcb8df21b < -: --------- regex.3: Desoupify regcomp() description > 2: 7240de5b7 = 1: 1ad1aa6e9 regex.3: Desoupify regexec() description > 3: 108f30cd7 ! 2: 6c4d26f89 regex.3: Desoupify regerror() description > @@ Commit message > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > > ## man3/regex.3 ## > -@@ man3/regex.3: .SH SYNOPSIS > - .BI " int " eflags ); > - .PP > - .BI "size_t regerror(int " errcode ", const regex_t *_Nullable restrict " preg , > --.BI " char " errbuf "[restrict ." errbuf_size "], \ > -+.BI " char " errbuf "[restrict ." errbuf_size "], \ > - size_t " errbuf_size ); > - .BI "void regfree(regex_t *" preg ); > - .fi > @@ man3/regex.3: .SS Error reporting > .BR regexec () > into error message strings. > @@ man3/regex.3: .SS Error reporting > -If both > -.I errbuf > -and > ++If > ++.I preg > ++isn't a null pointer, > +.I errcode > +must be the latest error returned from an operation on > +.IR preg . > -+If > -+.I preg > -+is a null pointer\(emthe latest error. > +.PP > +If > ++.I errbuf_size > ++is > ++.BR 0 , > ++the size of the required buffer is returned. > ++Otherwise, up to > .I errbuf_size > -are nonzero, > -.I errbuf > -is filled in with the first > -.I "errbuf_size \- 1" > -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). > -+is > -+.BR 0 , > -+the size of the required buffer is returned. > -+Otherwise, up to > -+.I errbuf_size > +bytes are copied to > +.IR errbuf ; > +the error string is always null-terminated, and truncated to fit. > .SS Freeing > --Supplying > + Supplying > .BR regfree () > --with a precompiled pattern buffer, > --.IR preg , > --will free the memory allocated to the pattern buffer by the compiling > --process, > -+invalidates the pattern buffer at > -+.IR *preg , > -+which must have been initialized via > - .BR regcomp (). > - .SH RETURN VALUE > - .BR regcomp () > -: --------- > 3: 4b7971a5e regex.3: Desoupify regfree() description > 4: fd1a104d6 ! 4: 5fb4cc16f regex.3: Improve REG_STARTEND > @@ man3/regex.3: .SS Matching > -on large strings. > -It does not use > +on known-length strings. > -+.I pmatch > -+must point to a valid readable object. > +If any matches are returned > +.RB ( REG_NOSUB > +wasn't passed to > @@ man3/regex.3: .SS Matching > -processing. > +> 0), they overwrite > +.I pmatch > -+as usual, and the > -+.B Match offsets > -+remain relative to > ++as usual, and the match offsets remain relative to > +.IR string > +(not > +.IR string " + " pmatch[0].rm_so ). > 5: 198b7b4fa ! 5: 057a4a522 regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 > @@ Commit message > > ## man3/regex.3 ## > @@ man3/regex.3: .SH SYNOPSIS > - .BI " char " errbuf "[restrict ." errbuf_size "], \ > - size_t " errbuf_size ); > + .BI " char " errbuf "[_Nullable restrict ." errbuf_size ], > + .BI " size_t " errbuf_size ); > .BI "void regfree(regex_t *" preg ); > +.PP > +.B typedef struct { > 6: c6bc9cfd0 = 6: 60ac1a4d1 regex.3: Finalise move of reg*.3type > 7: 59b8294c8 = 7: 3313546db regex.3: Destandardeseify Match offsets > 8: 2e199fc3c ! 8: 7fa669481 regex.3: Further clarify the sole purpose of REG_NOSUB > @@ man3/regex.3: .SS Compilation > -.I nmatch > -and > -.I pmatch > -+Only report overall success: > ++Report only overall success. > .BR regexec () > -arguments will be ignored for this purpose (but > -+will only use > ++will use only > .I pmatch > -may still be used for > -.BR REG_STARTEND ). > +for > +.BR REG_STARTEND , > -+and ignore > ++ignoring > +.IR nmatch . > .TP > .B REG_NEWLINE -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v8 0/5] regex.3 momento 2023-04-21 2:01 ` [PATCH v6 0/8] regex.3 momento Alejandro Colomar @ 2023-04-21 2:48 ` наб 2023-04-21 2:48 ` [PATCH v8 1/5] regex.3: Desoupify regerror() description наб ` (5 more replies) 0 siblings, 6 replies; 143+ messages in thread From: наб @ 2023-04-21 2:48 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1766 bytes --] As a pull.rebase = true enjoyer, it was very easy (indeed, git pull and axe the single-line conflict + empty commit), and it's what I've been doing the entire time; recommend it. 5/5 remains a toss-up for me. Apply it if you think it's better, don't if you don't. https://bugs.debian.org/1034658 наб (5): regex.3: Desoupify regerror() description regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 regex.3: Finalise move of reg*.3type regex.3: Destandardeseify Match offsets regex.3: Further clarify the sole purpose of REG_NOSUB man3/regex.3 | 179 +++++++++++++++++++++++--------------- man3type/regex_t.3type | 64 +------------- man3type/regmatch_t.3type | 2 +- man3type/regoff_t.3type | 2 +- 4 files changed, 110 insertions(+), 137 deletions(-) No clue where it got this. The interdiff is just the .IR -> .I. Range-diff against v7: 1: 783a16431 ! 1: 4479e1572 regex.3: Desoupify regerror() description @@ man3/regex.3: .SS Error reporting +.IR errbuf ; +the error string is always null-terminated, and truncated to fit. .SS Freeing - Supplying .BR regfree () + deinitializes the pattern buffer at 2: 5706f1892 < -: --------- regex.3: Desoupify regfree() description 3: baacf086f < -: --------- regex.3: Improve REG_STARTEND 4: 056c3ff04 = 2: bad307847 regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 5: 44d7b775d = 3: edefa8a5e regex.3: Finalise move of reg*.3type 6: 79641df02 = 4: 500070a5e regex.3: Destandardeseify Match offsets 7: 26d06c07f = 5: b01685c7a regex.3: Further clarify the sole purpose of REG_NOSUB -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v8 1/5] regex.3: Desoupify regerror() description 2023-04-21 2:48 ` [PATCH v8 0/5] " наб @ 2023-04-21 2:48 ` наб 2023-04-21 10:06 ` Alejandro Colomar 2023-04-21 2:48 ` [PATCH v8 2/5] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб ` (4 subsequent siblings) 5 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-21 2:48 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1343 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 36 ++++++++++++++++-------------------- 1 file changed, 16 insertions(+), 20 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index d91acc19d..069cc6388 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -210,27 +210,23 @@ .SS Error reporting .BR regexec () into error message strings. .PP -.BR regerror () -is passed the error code, -.IR errcode , -the pattern buffer, -.IR preg , -a pointer to a character string buffer, -.IR errbuf , -and the size of the string buffer, -.IR errbuf_size . -It returns the size of the -.I errbuf -required to contain the null-terminated error message string. -If both -.I errbuf -and +If +.I preg +isn't a null pointer, +.I errcode +must be the latest error returned from an operation on +.IR preg . +.PP +If +.I errbuf_size +is +.BR 0 , +the size of the required buffer is returned. +Otherwise, up to .I errbuf_size -are nonzero, -.I errbuf -is filled in with the first -.I "errbuf_size \- 1" -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). +bytes are copied to +.IR errbuf ; +the error string is always null-terminated, and truncated to fit. .SS Freeing .BR regfree () deinitializes the pattern buffer at -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v8 1/5] regex.3: Desoupify regerror() description 2023-04-21 2:48 ` [PATCH v8 1/5] regex.3: Desoupify regerror() description наб @ 2023-04-21 10:06 ` Alejandro Colomar 2023-04-21 12:03 ` [PATCH v9] " наб 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 10:06 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1011 bytes --] On 4/21/23 04:48, наб wrote: > +If > +.I preg > +isn't a null pointer, > +.I errcode > +must be the latest error returned from an operation on > +.IR preg . > +.PP > +If > +.I errbuf_size > +is > +.BR 0 , > +the size of the required buffer is returned. I wonder what it returns elsewise from that phrasing. Probably the same, right? Which is confusing. Maybe put that text without a conditional, and only say that if errbuf_size is 0 the buffer is ignored and no copy is performed? > +Otherwise, up to > .I errbuf_size > -are nonzero, > -.I errbuf > -is filled in with the first > -.I "errbuf_size \- 1" > -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). > +bytes are copied to > +.IR errbuf ; > +the error string is always null-terminated, and truncated to fit. > .SS Freeing > .BR regfree () > deinitializes the pattern buffer at -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v9] regex.3: Desoupify regerror() description 2023-04-21 10:06 ` Alejandro Colomar @ 2023-04-21 12:03 ` наб 2023-04-21 12:26 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-21 12:03 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2622 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- Range-diff against v8: 1: 4479e1572 ! 1: 38109fcc6 regex.3: Desoupify regerror() description @@ man3/regex.3: .SS Error reporting +.IR preg . +.PP +If -+.I errbuf_size -+is -+.BR 0 , -+the size of the required buffer is returned. -+Otherwise, up to .I errbuf_size -are nonzero, -.I errbuf -is filled in with the first -.I "errbuf_size \- 1" -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). ++isn't 0, up to ++.I errbuf_size +bytes are copied to +.IR errbuf ; +the error string is always null-terminated, and truncated to fit. .SS Freeing .BR regfree () deinitializes the pattern buffer at +@@ man3/regex.3: .SH RETURN VALUE + returns zero for a successful match or + .B REG_NOMATCH + for failure. ++.PP ++.BR regerror () ++returns the size of the buffer required to hold the string. + .SH ERRORS + The following errors can be returned by + .BR regcomp (): man3/regex.3 | 36 ++++++++++++++++-------------------- 1 file changed, 16 insertions(+), 20 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index d91acc19d..efca582d7 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -210,27 +210,20 @@ .SS Error reporting .BR regexec () into error message strings. .PP -.BR regerror () -is passed the error code, -.IR errcode , -the pattern buffer, -.IR preg , -a pointer to a character string buffer, -.IR errbuf , -and the size of the string buffer, -.IR errbuf_size . -It returns the size of the -.I errbuf -required to contain the null-terminated error message string. -If both -.I errbuf -and +If +.I preg +isn't a null pointer, +.I errcode +must be the latest error returned from an operation on +.IR preg . +.PP +If .I errbuf_size -are nonzero, -.I errbuf -is filled in with the first -.I "errbuf_size \- 1" -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). +isn't 0, up to +.I errbuf_size +bytes are copied to +.IR errbuf ; +the error string is always null-terminated, and truncated to fit. .SS Freeing .BR regfree () deinitializes the pattern buffer at @@ -247,6 +240,9 @@ .SH RETURN VALUE returns zero for a successful match or .B REG_NOMATCH for failure. +.PP +.BR regerror () +returns the size of the buffer required to hold the string. .SH ERRORS The following errors can be returned by .BR regcomp (): -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v9] regex.3: Desoupify regerror() description 2023-04-21 12:03 ` [PATCH v9] " наб @ 2023-04-21 12:26 ` Alejandro Colomar 2023-04-21 12:27 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 12:26 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 3015 bytes --] On 4/21/23 14:03, наб wrote: > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Aaand patch applied! I believe I've got all, right? Cheers, Alex > --- > Range-diff against v8: > 1: 4479e1572 ! 1: 38109fcc6 regex.3: Desoupify regerror() description > @@ man3/regex.3: .SS Error reporting > +.IR preg . > +.PP > +If > -+.I errbuf_size > -+is > -+.BR 0 , > -+the size of the required buffer is returned. > -+Otherwise, up to > .I errbuf_size > -are nonzero, > -.I errbuf > -is filled in with the first > -.I "errbuf_size \- 1" > -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). > ++isn't 0, up to > ++.I errbuf_size > +bytes are copied to > +.IR errbuf ; > +the error string is always null-terminated, and truncated to fit. > .SS Freeing > .BR regfree () > deinitializes the pattern buffer at > +@@ man3/regex.3: .SH RETURN VALUE > + returns zero for a successful match or > + .B REG_NOMATCH > + for failure. > ++.PP > ++.BR regerror () > ++returns the size of the buffer required to hold the string. > + .SH ERRORS > + The following errors can be returned by > + .BR regcomp (): > > man3/regex.3 | 36 ++++++++++++++++-------------------- > 1 file changed, 16 insertions(+), 20 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index d91acc19d..efca582d7 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -210,27 +210,20 @@ .SS Error reporting > .BR regexec () > into error message strings. > .PP > -.BR regerror () > -is passed the error code, > -.IR errcode , > -the pattern buffer, > -.IR preg , > -a pointer to a character string buffer, > -.IR errbuf , > -and the size of the string buffer, > -.IR errbuf_size . > -It returns the size of the > -.I errbuf > -required to contain the null-terminated error message string. > -If both > -.I errbuf > -and > +If > +.I preg > +isn't a null pointer, > +.I errcode > +must be the latest error returned from an operation on > +.IR preg . > +.PP > +If > .I errbuf_size > -are nonzero, > -.I errbuf > -is filled in with the first > -.I "errbuf_size \- 1" > -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). > +isn't 0, up to > +.I errbuf_size > +bytes are copied to > +.IR errbuf ; > +the error string is always null-terminated, and truncated to fit. > .SS Freeing > .BR regfree () > deinitializes the pattern buffer at > @@ -247,6 +240,9 @@ .SH RETURN VALUE > returns zero for a successful match or > .B REG_NOMATCH > for failure. > +.PP > +.BR regerror () > +returns the size of the buffer required to hold the string. > .SH ERRORS > The following errors can be returned by > .BR regcomp (): -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v9] regex.3: Desoupify regerror() description 2023-04-21 12:26 ` Alejandro Colomar @ 2023-04-21 12:27 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 12:27 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 3251 bytes --] On 4/21/23 14:26, Alejandro Colomar wrote: > On 4/21/23 14:03, наб wrote: >> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > > Aaand patch applied! I believe I've got all, right? Feel free to add yourself to the copyright. You clearly deserve it ;) > > Cheers, > Alex > >> --- >> Range-diff against v8: >> 1: 4479e1572 ! 1: 38109fcc6 regex.3: Desoupify regerror() description >> @@ man3/regex.3: .SS Error reporting >> +.IR preg . >> +.PP >> +If >> -+.I errbuf_size >> -+is >> -+.BR 0 , >> -+the size of the required buffer is returned. >> -+Otherwise, up to >> .I errbuf_size >> -are nonzero, >> -.I errbuf >> -is filled in with the first >> -.I "errbuf_size \- 1" >> -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). >> ++isn't 0, up to >> ++.I errbuf_size >> +bytes are copied to >> +.IR errbuf ; >> +the error string is always null-terminated, and truncated to fit. >> .SS Freeing >> .BR regfree () >> deinitializes the pattern buffer at >> +@@ man3/regex.3: .SH RETURN VALUE >> + returns zero for a successful match or >> + .B REG_NOMATCH >> + for failure. >> ++.PP >> ++.BR regerror () >> ++returns the size of the buffer required to hold the string. >> + .SH ERRORS >> + The following errors can be returned by >> + .BR regcomp (): >> >> man3/regex.3 | 36 ++++++++++++++++-------------------- >> 1 file changed, 16 insertions(+), 20 deletions(-) >> >> diff --git a/man3/regex.3 b/man3/regex.3 >> index d91acc19d..efca582d7 100644 >> --- a/man3/regex.3 >> +++ b/man3/regex.3 >> @@ -210,27 +210,20 @@ .SS Error reporting >> .BR regexec () >> into error message strings. >> .PP >> -.BR regerror () >> -is passed the error code, >> -.IR errcode , >> -the pattern buffer, >> -.IR preg , >> -a pointer to a character string buffer, >> -.IR errbuf , >> -and the size of the string buffer, >> -.IR errbuf_size . >> -It returns the size of the >> -.I errbuf >> -required to contain the null-terminated error message string. >> -If both >> -.I errbuf >> -and >> +If >> +.I preg >> +isn't a null pointer, >> +.I errcode >> +must be the latest error returned from an operation on >> +.IR preg . >> +.PP >> +If >> .I errbuf_size >> -are nonzero, >> -.I errbuf >> -is filled in with the first >> -.I "errbuf_size \- 1" >> -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). >> +isn't 0, up to >> +.I errbuf_size >> +bytes are copied to >> +.IR errbuf ; >> +the error string is always null-terminated, and truncated to fit. >> .SS Freeing >> .BR regfree () >> deinitializes the pattern buffer at >> @@ -247,6 +240,9 @@ .SH RETURN VALUE >> returns zero for a successful match or >> .B REG_NOMATCH >> for failure. >> +.PP >> +.BR regerror () >> +returns the size of the buffer required to hold the string. >> .SH ERRORS >> The following errors can be returned by >> .BR regcomp (): > -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v8 2/5] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 2023-04-21 2:48 ` [PATCH v8 0/5] " наб 2023-04-21 2:48 ` [PATCH v8 1/5] regex.3: Desoupify regerror() description наб @ 2023-04-21 2:48 ` наб 2023-04-21 11:55 ` Alejandro Colomar 2023-04-21 2:48 ` [PATCH v8 3/5] regex.3: Finalise move of reg*.3type наб ` (3 subsequent siblings) 5 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-21 2:48 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 4267 bytes --] Move-only commit. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 30 ++++++++++++++++++ man3type/regex_t.3type | 64 +-------------------------------------- man3type/regmatch_t.3type | 2 +- man3type/regoff_t.3type | 2 +- 4 files changed, 33 insertions(+), 65 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 069cc6388..f6465d484 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -29,6 +29,20 @@ .SH SYNOPSIS .BI " char " errbuf "[_Nullable restrict ." errbuf_size ], .BI " size_t " errbuf_size ); .BI "void regfree(regex_t *" preg ); +.PP +.B typedef struct { +.BR " size_t re_nsub;" " /* Number of parenthesized subexpressions */" +.B } regex_t; +.PP +.B typedef struct { +.BR " regoff_t rm_so;" " /* Byte offset from start of string" + to start of substring */ +.BR " regoff_t rm_eo;" " /* Byte offset from start of string to" + the first character after the end of + substring */ +.B } regmatch_t; +.PP +.BR typedef " /* ... */ " regoff_t; .fi .SH DESCRIPTION .SS Compilation @@ -202,6 +216,14 @@ .SS Match offsets .I rm_eo element indicates the end offset of the match, which is the offset of the first character after the matching text. +.PP +.I regoff_t +It is a signed integer type +capable of storing the largest value that can be stored in either an +.I ptrdiff_t +type or a +.I ssize_t +type. .SS Error reporting .BR regerror () is used to turn the error codes that can be returned by both @@ -320,6 +342,14 @@ .SH STANDARDS POSIX.1-2008. .SH HISTORY POSIX.1-2001. +.PP +Prior to POSIX.1-2008, +the type was +capable of storing the largest value that can be stored in either an +.I off_t +type or a +.I ssize_t +type. .SH EXAMPLES .EX #include <stdint.h> diff --git a/man3type/regex_t.3type b/man3type/regex_t.3type index 176d2c7a6..c0daaf0ff 100644 --- a/man3type/regex_t.3type +++ b/man3type/regex_t.3type @@ -1,63 +1 @@ -.\" Copyright (c) 2020-2022 by Alejandro Colomar <alx@kernel.org> -.\" and Copyright (c) 2020 by Michael Kerrisk <mtk.manpages@gmail.com> -.\" -.\" SPDX-License-Identifier: Linux-man-pages-copyleft -.\" -.\" -.TH regex_t 3type (date) "Linux man-pages (unreleased)" -.SH NAME -regex_t, regmatch_t, regoff_t -\- regular expression matching -.SH LIBRARY -Standard C library -.RI ( libc ) -.SH SYNOPSIS -.EX -.B #include <regex.h> -.PP -.B typedef struct { -.BR " size_t re_nsub;" " /* Number of parenthesized subexpressions */" -.B } regex_t; -.PP -.B typedef struct { -.BR " regoff_t rm_so;" " /* Byte offset from start of string" - to start of substring */ -.BR " regoff_t rm_eo;" " /* Byte offset from start of string to" - the first character after the end of - substring */ -.B } regmatch_t; -.PP -.BR typedef " /* ... */ " regoff_t; -.EE -.SH DESCRIPTION -.TP -.I regex_t -This is a structure type used in regular expression matching. -It holds a compiled regular expression, -compiled with -.BR regcomp (3). -.TP -.I regmatch_t -This is a structure type used in regular expression matching. -.TP -.I regoff_t -It is a signed integer type -capable of storing the largest value that can be stored in either an -.I ptrdiff_t -type or a -.I ssize_t -type. -.SH STANDARDS -POSIX.1-2008. -.SH HISTORY -POSIX.1-2001. -.PP -Prior to POSIX.1-2008, -the type was -capable of storing the largest value that can be stored in either an -.I off_t -type or a -.I ssize_t -type. -.SH SEE ALSO -.BR regex (3) +.so man3/regex.3 diff --git a/man3type/regmatch_t.3type b/man3type/regmatch_t.3type index dc78f2cf2..c0daaf0ff 100644 --- a/man3type/regmatch_t.3type +++ b/man3type/regmatch_t.3type @@ -1 +1 @@ -.so man3type/regex_t.3type +.so man3/regex.3 diff --git a/man3type/regoff_t.3type b/man3type/regoff_t.3type index dc78f2cf2..c0daaf0ff 100644 --- a/man3type/regoff_t.3type +++ b/man3type/regoff_t.3type @@ -1 +1 @@ -.so man3type/regex_t.3type +.so man3/regex.3 -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v8 2/5] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 2023-04-21 2:48 ` [PATCH v8 2/5] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб @ 2023-04-21 11:55 ` Alejandro Colomar 2023-04-21 11:57 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 11:55 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1080 bytes --] On 4/21/23 04:48, наб wrote: > Move-only commit. > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > man3/regex.3 | 30 ++++++++++++++++++ > man3type/regex_t.3type | 64 +-------------------------------------- > man3type/regmatch_t.3type | 2 +- > man3type/regoff_t.3type | 2 +- > 4 files changed, 33 insertions(+), 65 deletions(-) > [...] > diff --git a/man3type/regex_t.3type b/man3type/regex_t.3type > index 176d2c7a6..c0daaf0ff 100644 > --- a/man3type/regex_t.3type > +++ b/man3type/regex_t.3type > @@ -1,63 +1 @@ > -.\" Copyright (c) 2020-2022 by Alejandro Colomar <alx@kernel.org> > -.\" and Copyright (c) 2020 by Michael Kerrisk <mtk.manpages@gmail.com> > -.\" > -.\" SPDX-License-Identifier: Linux-man-pages-copyleft > -.\" > -.\" > -.TH regex_t 3type (date) "Linux man-pages (unreleased)" > -.SH NAME > -regex_t, regmatch_t, regoff_t Should we keep the names in regex.3? -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v8 2/5] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 2023-04-21 11:55 ` Alejandro Colomar @ 2023-04-21 11:57 ` Alejandro Colomar 2023-04-21 11:57 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 11:57 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1314 bytes --] On 4/21/23 13:55, Alejandro Colomar wrote: > > > On 4/21/23 04:48, наб wrote: >> Move-only commit. >> >> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> >> --- >> man3/regex.3 | 30 ++++++++++++++++++ >> man3type/regex_t.3type | 64 +-------------------------------------- >> man3type/regmatch_t.3type | 2 +- >> man3type/regoff_t.3type | 2 +- >> 4 files changed, 33 insertions(+), 65 deletions(-) >> > [...] > >> diff --git a/man3type/regex_t.3type b/man3type/regex_t.3type >> index 176d2c7a6..c0daaf0ff 100644 >> --- a/man3type/regex_t.3type >> +++ b/man3type/regex_t.3type >> @@ -1,63 +1 @@ >> -.\" Copyright (c) 2020-2022 by Alejandro Colomar <alx@kernel.org> >> -.\" and Copyright (c) 2020 by Michael Kerrisk <mtk.manpages@gmail.com> >> -.\" >> -.\" SPDX-License-Identifier: Linux-man-pages-copyleft >> -.\" >> -.\" >> -.TH regex_t 3type (date) "Linux man-pages (unreleased)" >> -.SH NAME >> -regex_t, regmatch_t, regoff_t > > Should we keep the names in regex.3? Although that probably confuses man(1), since it will believe those are in main section 3, while they are in 3type. Branden, any opinions? > -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v8 2/5] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 2023-04-21 11:57 ` Alejandro Colomar @ 2023-04-21 11:57 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 11:57 UTC (permalink / raw) To: наб, G. Branden Robinson; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1429 bytes --] (forgot to TO Branden) On 4/21/23 13:57, Alejandro Colomar wrote: > > > On 4/21/23 13:55, Alejandro Colomar wrote: >> >> >> On 4/21/23 04:48, наб wrote: >>> Move-only commit. >>> >>> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> >>> --- >>> man3/regex.3 | 30 ++++++++++++++++++ >>> man3type/regex_t.3type | 64 +-------------------------------------- >>> man3type/regmatch_t.3type | 2 +- >>> man3type/regoff_t.3type | 2 +- >>> 4 files changed, 33 insertions(+), 65 deletions(-) >>> >> [...] >> >>> diff --git a/man3type/regex_t.3type b/man3type/regex_t.3type >>> index 176d2c7a6..c0daaf0ff 100644 >>> --- a/man3type/regex_t.3type >>> +++ b/man3type/regex_t.3type >>> @@ -1,63 +1 @@ >>> -.\" Copyright (c) 2020-2022 by Alejandro Colomar <alx@kernel.org> >>> -.\" and Copyright (c) 2020 by Michael Kerrisk <mtk.manpages@gmail.com> >>> -.\" >>> -.\" SPDX-License-Identifier: Linux-man-pages-copyleft >>> -.\" >>> -.\" >>> -.TH regex_t 3type (date) "Linux man-pages (unreleased)" >>> -.SH NAME >>> -regex_t, regmatch_t, regoff_t >> >> Should we keep the names in regex.3? > > Although that probably confuses man(1), since it will believe those are > in main section 3, while they are in 3type. Branden, any opinions? > >> > -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v8 3/5] regex.3: Finalise move of reg*.3type 2023-04-21 2:48 ` [PATCH v8 0/5] " наб 2023-04-21 2:48 ` [PATCH v8 1/5] regex.3: Desoupify regerror() description наб 2023-04-21 2:48 ` [PATCH v8 2/5] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб @ 2023-04-21 2:48 ` наб 2023-04-21 10:33 ` Alejandro Colomar 2023-04-21 2:49 ` [PATCH v8 4/5] regex.3: Destandardeseify Match offsets наб ` (2 subsequent siblings) 5 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-21 2:48 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2891 bytes --] They're inextricably linked, not cross-referenced at all, and not used anywhere else. Now that they (realistically) exist to the reader, add a note on how big nmatch can be; POSIX even says "The application developer should note that there is probably no reason for using a value of nmatch that is larger than preg−>re_nsub+1.". Also remove the now-duplicate regmatch_t declaration. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 54 +++++++++++++++++++++++++++++++++------------------- 1 file changed, 34 insertions(+), 20 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index f6465d484..46fd3adef 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -15,7 +15,7 @@ .SH LIBRARY Standard C library .RI ( libc ", " \-lc ) .SH SYNOPSIS -.nf +.EX .B #include <regex.h> .PP .BI "int regcomp(regex_t *restrict " preg ", const char *restrict " regex , @@ -43,7 +43,7 @@ .SH SYNOPSIS .B } regmatch_t; .PP .BR typedef " /* ... */ " regoff_t; -.fi +.EE .SH DESCRIPTION .SS Compilation .BR regcomp () @@ -60,6 +60,21 @@ .SS Compilation The locale must be the same when running .BR regexec (). .PP +After +.BR regcomp () +succeeds, +.I preg->re_nsub +holds the number of subexpressions in +.IR regex . +Thus, a value of +.I preg->re_nsub ++ 1 +passed as +.I nmatch +to +.BR regexec () +is sufficient to capture all matches. +.PP .I cflags is the bitwise OR @@ -192,22 +207,6 @@ .SS Match offsets .IR N+1 .) Any unused structure elements will contain the value \-1. .PP -The -.I regmatch_t -structure which is the type of -.I pmatch -is defined in -.IR <regex.h> . -.PP -.in +4n -.EX -typedef struct { - regoff_t rm_so; - regoff_t rm_eo; -} regmatch_t; -.EE -.in -.PP Each .I rm_so element that is not \-1 indicates the start offset of the next largest @@ -218,7 +217,7 @@ .SS Match offsets which is the offset of the first character after the matching text. .PP .I regoff_t -It is a signed integer type +is a signed integer type capable of storing the largest value that can be stored in either an .I ptrdiff_t type or a @@ -344,12 +343,27 @@ .SH HISTORY POSIX.1-2001. .PP Prior to POSIX.1-2008, -the type was +.I regoff_t +was required to be capable of storing the largest value that can be stored in either an .I off_t type or a .I ssize_t type. +.SH NOTES +.I re_nsub +is only required to be initialized if +.B REG_NOSUB +wasn't specified, but all known implementations initialize it regardless. +.\" glibc, musl, 4.4BSD, illumos +.PP +Both +.I regex_t +and +.I regmatch_t +may (and do) have more members, in any order. +Always reference them by name. +.\" illumos has two more start/end pairs and the first one is of pointers .SH EXAMPLES .EX #include <stdint.h> -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v8 3/5] regex.3: Finalise move of reg*.3type 2023-04-21 2:48 ` [PATCH v8 3/5] regex.3: Finalise move of reg*.3type наб @ 2023-04-21 10:33 ` Alejandro Colomar 2023-04-21 10:34 ` Alejandro Colomar [not found] ` <1d2d0aa8-cb28-2d7f-c48b-7a02f907cb5b@gmail.com> 0 siblings, 2 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 10:33 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 4242 bytes --] Hi! On 4/21/23 04:48, наб wrote: > They're inextricably linked, not cross-referenced at all, > and not used anywhere else. > > Now that they (realistically) exist to the reader, add a note > on how big nmatch can be; POSIX even says "The application developer > should note that there is probably no reason for using a value of > nmatch that is larger than preg−>re_nsub+1.". > > Also remove the now-duplicate regmatch_t declaration. > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Patch applied, with minor tweaks; see below (I guess you approve them). Cheers, Alex > --- > man3/regex.3 | 54 +++++++++++++++++++++++++++++++++------------------- > 1 file changed, 34 insertions(+), 20 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index f6465d484..46fd3adef 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -15,7 +15,7 @@ .SH LIBRARY > Standard C library > .RI ( libc ", " \-lc ) > .SH SYNOPSIS > -.nf > +.EX I've been thinking about this, but am not yet fully convinced. I'll propose you the two alternatives, and let you decide what looks best. (a) Use .nf/.fi for the function prototypes, and .EX/.EE for the types. (b) .EX/.EE for everything, as you did. Please have a look at the PDF versions (you can run `pdfman ./man3/regex.3` after you `source ./scripts/bash_aliases`). If you're going to use it often, I suggest the following in ~/.bash_aliases: if [ -f ~/src/linux/man-pages/man-pages/main/scripts/bash_aliases ]; then . ~/src/linux/man-pages/man-pages/main/scripts/bash_aliases; fi; I've remove these bits from this patch, since the rest seems uncontroversial to me. > .B #include <regex.h> > .PP > .BI "int regcomp(regex_t *restrict " preg ", const char *restrict " regex , > @@ -43,7 +43,7 @@ .SH SYNOPSIS > .B } regmatch_t; > .PP > .BR typedef " /* ... */ " regoff_t; > -.fi > +.EE > .SH DESCRIPTION > .SS Compilation > .BR regcomp () > @@ -60,6 +60,21 @@ .SS Compilation > The locale must be the same when running > .BR regexec (). > .PP > +After > +.BR regcomp () > +succeeds, > +.I preg->re_nsub > +holds the number of subexpressions in > +.IR regex . > +Thus, a value of > +.I preg->re_nsub > ++ 1 > +passed as > +.I nmatch > +to > +.BR regexec () > +is sufficient to capture all matches. > +.PP > .I cflags > is the > bitwise OR > @@ -192,22 +207,6 @@ .SS Match offsets > .IR N+1 .) > Any unused structure elements will contain the value \-1. > .PP > -The > -.I regmatch_t > -structure which is the type of > -.I pmatch > -is defined in > -.IR <regex.h> . > -.PP > -.in +4n > -.EX > -typedef struct { > - regoff_t rm_so; > - regoff_t rm_eo; > -} regmatch_t; > -.EE > -.in > -.PP > Each > .I rm_so > element that is not \-1 indicates the start offset of the next largest > @@ -218,7 +217,7 @@ .SS Match offsets > which is the offset of the first character after the matching text. > .PP > .I regoff_t > -It is a signed integer type > +is a signed integer type > capable of storing the largest value that can be stored in either an > .I ptrdiff_t > type or a > @@ -344,12 +343,27 @@ .SH HISTORY > POSIX.1-2001. > .PP > Prior to POSIX.1-2008, > -the type was > +.I regoff_t > +was required to be > capable of storing the largest value that can be stored in either an > .I off_t > type or a > .I ssize_t > type. > +.SH NOTES NOTES is dreaded, and only used when no other section would work. CAVEATS (recently added to the Linux man-pages) is more suitable; I've edited your patch to use it. > +.I re_nsub > +is only required to be initialized if > +.B REG_NOSUB > +wasn't specified, but all known implementations initialize it regardless. > +.\" glibc, musl, 4.4BSD, illumos > +.PP > +Both > +.I regex_t > +and > +.I regmatch_t > +may (and do) have more members, in any order. > +Always reference them by name. > +.\" illumos has two more start/end pairs and the first one is of pointers > .SH EXAMPLES > .EX > #include <stdint.h> -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v8 3/5] regex.3: Finalise move of reg*.3type 2023-04-21 10:33 ` Alejandro Colomar @ 2023-04-21 10:34 ` Alejandro Colomar 2023-04-21 11:26 ` наб [not found] ` <1d2d0aa8-cb28-2d7f-c48b-7a02f907cb5b@gmail.com> 1 sibling, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 10:34 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 4577 bytes --] On 4/21/23 12:33, Alejandro Colomar wrote: > Hi! > > On 4/21/23 04:48, наб wrote: >> They're inextricably linked, not cross-referenced at all, >> and not used anywhere else. >> >> Now that they (realistically) exist to the reader, add a note >> on how big nmatch can be; POSIX even says "The application developer >> should note that there is probably no reason for using a value of >> nmatch that is larger than preg−>re_nsub+1.". >> >> Also remove the now-duplicate regmatch_t declaration. >> >> Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > > Patch applied, with minor tweaks; see below (I guess you approve them). > > Cheers, > Alex > >> --- >> man3/regex.3 | 54 +++++++++++++++++++++++++++++++++------------------- >> 1 file changed, 34 insertions(+), 20 deletions(-) >> >> diff --git a/man3/regex.3 b/man3/regex.3 >> index f6465d484..46fd3adef 100644 >> --- a/man3/regex.3 >> +++ b/man3/regex.3 >> @@ -15,7 +15,7 @@ .SH LIBRARY >> Standard C library >> .RI ( libc ", " \-lc ) >> .SH SYNOPSIS >> -.nf >> +.EX > > I've been thinking about this, but am not yet fully convinced. I'll > propose you the two alternatives, and let you decide what looks best. > > (a) Use .nf/.fi for the function prototypes, and .EX/.EE for the > types. > > (b) .EX/.EE for everything, as you did. > > Please have a look at the PDF versions (you can run > `pdfman ./man3/regex.3` after you `source ./scripts/bash_aliases`). > > If you're going to use it often, I suggest the following in > ~/.bash_aliases: > > if [ -f ~/src/linux/man-pages/man-pages/main/scripts/bash_aliases ]; then > . ~/src/linux/man-pages/man-pages/main/scripts/bash_aliases; > fi; > > > I've remove these bits from this patch, since the rest seems > uncontroversial to me. But I haven't pushed, so that we can still have it in the same patch if you confirm. > > >> .B #include <regex.h> >> .PP >> .BI "int regcomp(regex_t *restrict " preg ", const char *restrict " regex , >> @@ -43,7 +43,7 @@ .SH SYNOPSIS >> .B } regmatch_t; >> .PP >> .BR typedef " /* ... */ " regoff_t; >> -.fi >> +.EE >> .SH DESCRIPTION >> .SS Compilation >> .BR regcomp () >> @@ -60,6 +60,21 @@ .SS Compilation >> The locale must be the same when running >> .BR regexec (). >> .PP >> +After >> +.BR regcomp () >> +succeeds, >> +.I preg->re_nsub >> +holds the number of subexpressions in >> +.IR regex . >> +Thus, a value of >> +.I preg->re_nsub >> ++ 1 >> +passed as >> +.I nmatch >> +to >> +.BR regexec () >> +is sufficient to capture all matches. >> +.PP >> .I cflags >> is the >> bitwise OR >> @@ -192,22 +207,6 @@ .SS Match offsets >> .IR N+1 .) >> Any unused structure elements will contain the value \-1. >> .PP >> -The >> -.I regmatch_t >> -structure which is the type of >> -.I pmatch >> -is defined in >> -.IR <regex.h> . >> -.PP >> -.in +4n >> -.EX >> -typedef struct { >> - regoff_t rm_so; >> - regoff_t rm_eo; >> -} regmatch_t; >> -.EE >> -.in >> -.PP >> Each >> .I rm_so >> element that is not \-1 indicates the start offset of the next largest >> @@ -218,7 +217,7 @@ .SS Match offsets >> which is the offset of the first character after the matching text. >> .PP >> .I regoff_t >> -It is a signed integer type >> +is a signed integer type >> capable of storing the largest value that can be stored in either an >> .I ptrdiff_t >> type or a >> @@ -344,12 +343,27 @@ .SH HISTORY >> POSIX.1-2001. >> .PP >> Prior to POSIX.1-2008, >> -the type was >> +.I regoff_t >> +was required to be >> capable of storing the largest value that can be stored in either an >> .I off_t >> type or a >> .I ssize_t >> type. >> +.SH NOTES > > NOTES is dreaded, and only used when no other section would work. > CAVEATS (recently added to the Linux man-pages) is more suitable; > I've edited your patch to use it. > >> +.I re_nsub >> +is only required to be initialized if >> +.B REG_NOSUB >> +wasn't specified, but all known implementations initialize it regardless. >> +.\" glibc, musl, 4.4BSD, illumos >> +.PP >> +Both >> +.I regex_t >> +and >> +.I regmatch_t >> +may (and do) have more members, in any order. >> +Always reference them by name. >> +.\" illumos has two more start/end pairs and the first one is of pointers >> .SH EXAMPLES >> .EX >> #include <stdint.h> > -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v8 3/5] regex.3: Finalise move of reg*.3type 2023-04-21 10:34 ` Alejandro Colomar @ 2023-04-21 11:26 ` наб 2023-04-21 11:36 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-21 11:26 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 169 bytes --] On Fri, Apr 21, 2023 at 12:34:39PM +0200, Alejandro Colomar wrote: > But I haven't pushed, so that we can still have it in the same > patch if you confirm. Yeah, go on. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v8 3/5] regex.3: Finalise move of reg*.3type 2023-04-21 11:26 ` наб @ 2023-04-21 11:36 ` Alejandro Colomar 2023-04-21 11:49 ` наб 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 11:36 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 355 bytes --] On 4/21/23 13:26, наб wrote: > On Fri, Apr 21, 2023 at 12:34:39PM +0200, Alejandro Colomar wrote: >> But I haven't pushed, so that we can still have it in the same >> patch if you confirm. > Yeah, go on. But do you prefer (a) or (b)? -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v8 3/5] regex.3: Finalise move of reg*.3type 2023-04-21 11:36 ` Alejandro Colomar @ 2023-04-21 11:49 ` наб 0 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-21 11:49 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 396 bytes --] On Fri, Apr 21, 2023 at 01:36:19PM +0200, Alejandro Colomar wrote: > On 4/21/23 13:26, наб wrote: > > On Fri, Apr 21, 2023 at 12:34:39PM +0200, Alejandro Colomar wrote: > >> But I haven't pushed, so that we can still have it in the same > >> patch if you confirm. > > Yeah, go on. > But do you prefer (a) or (b)? (a); (b) looks better (imo as a mdoc enjoyer), but blows the A4 margin. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
[parent not found: <1d2d0aa8-cb28-2d7f-c48b-7a02f907cb5b@gmail.com>]
* Re: [PATCH v8 3/5] regex.3: Finalise move of reg*.3type [not found] ` <1d2d0aa8-cb28-2d7f-c48b-7a02f907cb5b@gmail.com> @ 2023-04-21 11:57 ` Ralph Corderoy 2023-04-21 11:59 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: Ralph Corderoy @ 2023-04-21 11:57 UTC (permalink / raw) To: linux-man, groff Hi Alejandro, > > (a) Use .nf/.fi for the function prototypes, and .EX/.EE for the > > types. > > > > (b) .EX/.EE for everything, as you did. > > > > Please have a look at the PDF versions ... > Which one looks better to you? I've attached two PDF files The Synopsis should not be in a fixed-width font. -- Cheers, Ralph. ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v8 3/5] regex.3: Finalise move of reg*.3type 2023-04-21 11:57 ` Ralph Corderoy @ 2023-04-21 11:59 ` Alejandro Colomar 2023-04-21 12:03 ` Alejandro Colomar 2023-04-21 12:09 ` Ralph Corderoy 0 siblings, 2 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 11:59 UTC (permalink / raw) To: Ralph Corderoy, linux-man, groff [-- Attachment #1.1: Type: text/plain, Size: 688 bytes --] Hi Ralph, On 4/21/23 13:57, Ralph Corderoy wrote: > Hi Alejandro, > >>> (a) Use .nf/.fi for the function prototypes, and .EX/.EE for the >>> types. >>> >>> (b) .EX/.EE for everything, as you did. >>> >>> Please have a look at the PDF versions > ... >> Which one looks better to you? I've attached two PDF files > > The Synopsis should not be in a fixed-width font. I know and agree most of the time, but when it has structure types with multi-line comments, you see what happens in the first PDFs I sent (mis-aligned comments). Cheers, Alex > -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v8 3/5] regex.3: Finalise move of reg*.3type 2023-04-21 11:59 ` Alejandro Colomar @ 2023-04-21 12:03 ` Alejandro Colomar 2023-04-21 12:09 ` Ralph Corderoy 1 sibling, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 12:03 UTC (permalink / raw) To: Ralph Corderoy, linux-man, groff [-- Attachment #1.1: Type: text/plain, Size: 912 bytes --] On 4/21/23 13:59, Alejandro Colomar wrote: > Hi Ralph, > > On 4/21/23 13:57, Ralph Corderoy wrote: >> Hi Alejandro, >> >>>> (a) Use .nf/.fi for the function prototypes, and .EX/.EE for the >>>> types. >>>> >>>> (b) .EX/.EE for everything, as you did. >>>> >>>> Please have a look at the PDF versions >> ... >>> Which one looks better to you? I've attached two PDF files >> >> The Synopsis should not be in a fixed-width font. > > I know and agree most of the time, but when it has structure types with > multi-line comments, you see what happens in the first PDFs I sent > (mis-aligned comments). Now I think twice, maybe the answer is to remove those comments, now that the page better explains what these are in the DESCRIPTION. > > Cheers, > Alex > >> > -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v8 3/5] regex.3: Finalise move of reg*.3type 2023-04-21 11:59 ` Alejandro Colomar 2023-04-21 12:03 ` Alejandro Colomar @ 2023-04-21 12:09 ` Ralph Corderoy 2023-04-21 12:14 ` Alejandro Colomar 1 sibling, 1 reply; 143+ messages in thread From: Ralph Corderoy @ 2023-04-21 12:09 UTC (permalink / raw) To: linux-man, groff Hi Alejandro, > when it has structure types with multi-line comments, you see what > happens in the first PDFs I sent (mis-aligned comments). Fix the formatting commands in the troff source so the comments are aligned. The man page is troff source for producing beautifully typeset pages. -- Cheers, Ralph. ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v8 3/5] regex.3: Finalise move of reg*.3type 2023-04-21 12:09 ` Ralph Corderoy @ 2023-04-21 12:14 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 12:14 UTC (permalink / raw) To: Ralph Corderoy, linux-man, groff [-- Attachment #1.1.1: Type: text/plain, Size: 769 bytes --] Hi Ralph, On 4/21/23 14:09, Ralph Corderoy wrote: > Hi Alejandro, > >> when it has structure types with multi-line comments, you see what >> happens in the first PDFs I sent (mis-aligned comments). > > Fix the formatting commands in the troff source so the comments are > aligned. The man page is troff source for producing beautifully typeset > pages. I guess that would involve raw troff commands, right? Might be necessary in some other case, but I dodged the bullet this time with <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=b5c5fd34ac4537fc00089c977d8cb72d4de910e6>. See attached PDF. Cheers, Alex > -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #1.1.2: regex.3.rRwFNb --] [-- Type: application/pdf, Size: 39932 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v8 4/5] regex.3: Destandardeseify Match offsets 2023-04-21 2:48 ` [PATCH v8 0/5] " наб ` (2 preceding siblings ...) 2023-04-21 2:48 ` [PATCH v8 3/5] regex.3: Finalise move of reg*.3type наб @ 2023-04-21 2:49 ` наб 2023-04-21 10:36 ` Alejandro Colomar 2023-04-21 2:49 ` [PATCH v8 5/5] regex.3: Further clarify the sole purpose of REG_NOSUB наб 2023-04-21 10:00 ` [PATCH v8 0/5] regex.3 momento Alejandro Colomar 5 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-21 2:49 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2194 bytes --] This section reads like it were (and pretty much is) lifted from POSIX. That's hard to read, because POSIX is horrendously verbose, as usual. Instead, synopsise it into something less formal but more reasonable, and describe the resulting range with a range instead of a paragraph. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 53 +++++++++++++++++++++++++--------------------------- 1 file changed, 25 insertions(+), 28 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 46fd3adef..55fddd88e 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -184,37 +184,34 @@ .SS Matching .SS Match offsets Unless .B REG_NOSUB -was set for the compilation of the pattern buffer, it is possible to -obtain match addressing information. -.I pmatch -must be dimensioned to have at least -.I nmatch -elements. -These are filled in by +was passed to +.BR regcomp (), +it is possible to +obtain the locations of matches within +.IR string : .BR regexec () -with substring match addresses. -The offsets of the subexpression starting at the -.IR i th -open parenthesis are stored in -.IR pmatch[i] . -The entire regular expression's match addresses are stored in -.IR pmatch[0] . -(Note that to return the offsets of -.I N -subexpression matches, +fills .I nmatch -must be at least -.IR N+1 .) -Any unused structure elements will contain the value \-1. +elements of +.I pmatch +with results: +.I pmatch[0] +corresponds to the entire match, +.I pmatch[1] +to the first expression, etc. +If there were more matches than +.IR nmatch , +they are discarded; +if fewer, +unused elements of +.I pmatch +are filled with +.BR \-1 s. .PP -Each -.I rm_so -element that is not \-1 indicates the start offset of the next largest -substring match within the string. -The relative -.I rm_eo -element indicates the end offset of the match, -which is the offset of the first character after the matching text. +Each returned valid +.RB (non- \-1 ) +match corresponds to the range +.RI [ string " + " rm_so ", " string " + " rm_eo ). .PP .I regoff_t is a signed integer type -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v8 4/5] regex.3: Destandardeseify Match offsets 2023-04-21 2:49 ` [PATCH v8 4/5] regex.3: Destandardeseify Match offsets наб @ 2023-04-21 10:36 ` Alejandro Colomar 2023-04-21 12:55 ` [PATCH v9] " наб 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 10:36 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1047 bytes --] On 4/21/23 04:49, наб wrote: > This section reads like it were (and pretty much is) lifted from POSIX. > That's hard to read, because POSIX is horrendously verbose, as usual. > > Instead, synopsise it into something less formal but more reasonable, > and describe the resulting range with a range instead of a paragraph. > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > man3/regex.3 | 53 +++++++++++++++++++++++++--------------------------- > 1 file changed, 25 insertions(+), 28 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index 46fd3adef..55fddd88e 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -184,37 +184,34 @@ .SS Matching [...] > +Each returned valid > +.RB (non- \-1 ) > +match corresponds to the range > +.RI [ string " + " rm_so ", " string " + " rm_eo ). These be expressions :) > .PP > .I regoff_t > is a signed integer type -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v9] regex.3: Destandardeseify Match offsets 2023-04-21 10:36 ` Alejandro Colomar @ 2023-04-21 12:55 ` наб 2023-04-21 13:15 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-21 12:55 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2874 bytes --] This section reads like it were (and pretty much is) lifted from POSIX. That's hard to read, because POSIX is horrendously verbose, as usual. Instead, synopsise it into something less formal but more reasonable, and describe the resulting range with a range instead of a paragraph. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- This is the last one. Range-diff against v8: 1: 4479e1572 < -: --------- regex.3: Desoupify regerror() description 2: bad307847 < -: --------- regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 3: edefa8a5e < -: --------- regex.3: Finalise move of reg*.3type 4: 500070a5e ! 1: 9af6c6b7f regex.3: Destandardeseify Match offsets @@ man3/regex.3: .SS Matching +Each returned valid +.RB (non- \-1 ) +match corresponds to the range -+.RI [ string " + " rm_so ", " string " + " rm_eo ). ++.RI [ "string + rm_so" , " string + rm_eo" ). .PP .I regoff_t is a signed integer type man3/regex.3 | 53 +++++++++++++++++++++++++--------------------------- 1 file changed, 25 insertions(+), 28 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 30f2ef318..aae31c1e9 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -179,37 +179,34 @@ .SS Matching .SS Match offsets Unless .B REG_NOSUB -was set for the compilation of the pattern buffer, it is possible to -obtain match addressing information. -.I pmatch -must be dimensioned to have at least -.I nmatch -elements. -These are filled in by +was passed to +.BR regcomp (), +it is possible to +obtain the locations of matches within +.IR string : .BR regexec () -with substring match addresses. -The offsets of the subexpression starting at the -.IR i th -open parenthesis are stored in -.IR pmatch[i] . -The entire regular expression's match addresses are stored in -.IR pmatch[0] . -(Note that to return the offsets of -.I N -subexpression matches, +fills .I nmatch -must be at least -.IR N+1 .) -Any unused structure elements will contain the value \-1. +elements of +.I pmatch +with results: +.I pmatch[0] +corresponds to the entire match, +.I pmatch[1] +to the first expression, etc. +If there were more matches than +.IR nmatch , +they are discarded; +if fewer, +unused elements of +.I pmatch +are filled with +.BR \-1 s. .PP -Each -.I rm_so -element that is not \-1 indicates the start offset of the next largest -substring match within the string. -The relative -.I rm_eo -element indicates the end offset of the match, -which is the offset of the first character after the matching text. +Each returned valid +.RB (non- \-1 ) +match corresponds to the range +.RI [ "string + rm_so" , " string + rm_eo" ). .PP .I regoff_t is a signed integer type -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v9] regex.3: Destandardeseify Match offsets 2023-04-21 12:55 ` [PATCH v9] " наб @ 2023-04-21 13:15 ` Alejandro Colomar 2023-04-21 13:29 ` [PATCH v9a] " наб 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 13:15 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 3228 bytes --] On 4/21/23 14:55, наб wrote: > This section reads like it were (and pretty much is) lifted from POSIX. > That's hard to read, because POSIX is horrendously verbose, as usual. > > Instead, synopsise it into something less formal but more reasonable, > and describe the resulting range with a range instead of a paragraph. > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > This is the last one. > > Range-diff against v8: > 1: 4479e1572 < -: --------- regex.3: Desoupify regerror() description > 2: bad307847 < -: --------- regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 > 3: edefa8a5e < -: --------- regex.3: Finalise move of reg*.3type > 4: 500070a5e ! 1: 9af6c6b7f regex.3: Destandardeseify Match offsets > @@ man3/regex.3: .SS Matching > +Each returned valid > +.RB (non- \-1 ) > +match corresponds to the range > -+.RI [ string " + " rm_so ", " string " + " rm_eo ). > ++.RI [ "string + rm_so" , " string + rm_eo" ). > .PP > .I regoff_t > is a signed integer type > > man3/regex.3 | 53 +++++++++++++++++++++++++--------------------------- > 1 file changed, 25 insertions(+), 28 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index 30f2ef318..aae31c1e9 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -179,37 +179,34 @@ .SS Matching > .SS Match offsets > Unless > .B REG_NOSUB > -was set for the compilation of the pattern buffer, it is possible to > -obtain match addressing information. > -.I pmatch > -must be dimensioned to have at least > -.I nmatch > -elements. > -These are filled in by > +was passed to > +.BR regcomp (), > +it is possible to > +obtain the locations of matches within > +.IR string : > .BR regexec () > -with substring match addresses. > -The offsets of the subexpression starting at the > -.IR i th > -open parenthesis are stored in > -.IR pmatch[i] . > -The entire regular expression's match addresses are stored in > -.IR pmatch[0] . > -(Note that to return the offsets of > -.I N > -subexpression matches, > +fills > .I nmatch > -must be at least > -.IR N+1 .) > -Any unused structure elements will contain the value \-1. > +elements of > +.I pmatch > +with results: > +.I pmatch[0] > +corresponds to the entire match, > +.I pmatch[1] > +to the first expression, etc. s/expression/subexpression/? > +If there were more matches than > +.IR nmatch , > +they are discarded; > +if fewer, > +unused elements of > +.I pmatch > +are filled with > +.BR \-1 s. > .PP > -Each > -.I rm_so > -element that is not \-1 indicates the start offset of the next largest > -substring match within the string. > -The relative > -.I rm_eo > -element indicates the end offset of the match, > -which is the offset of the first character after the matching text. > +Each returned valid > +.RB (non- \-1 ) > +match corresponds to the range > +.RI [ "string + rm_so" , " string + rm_eo" ). > .PP > .I regoff_t > is a signed integer type -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v9a] regex.3: Destandardeseify Match offsets 2023-04-21 13:15 ` Alejandro Colomar @ 2023-04-21 13:29 ` наб 2023-04-21 13:55 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-21 13:29 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2564 bytes --] This section reads like it were (and pretty much is) lifted from POSIX. That's hard to read, because POSIX is horrendously verbose, as usual. Instead, synopsise it into something less formal but more reasonable, and describe the resulting range with a range instead of a paragraph. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- Range-diff against v9: 4: 80d247ebc ! 1: c3e45d60e regex.3: Destandardeseify Match offsets @@ man3/regex.3: .SS Matching +.I pmatch[0] +corresponds to the entire match, +.I pmatch[1] -+to the first expression, etc. ++to the first subexpression, etc. +If there were more matches than +.IR nmatch , +they are discarded; man3/regex.3 | 53 +++++++++++++++++++++++++--------------------------- 1 file changed, 25 insertions(+), 28 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 30f2ef318..8efd21d72 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -179,37 +179,34 @@ .SS Matching .SS Match offsets Unless .B REG_NOSUB -was set for the compilation of the pattern buffer, it is possible to -obtain match addressing information. -.I pmatch -must be dimensioned to have at least -.I nmatch -elements. -These are filled in by +was passed to +.BR regcomp (), +it is possible to +obtain the locations of matches within +.IR string : .BR regexec () -with substring match addresses. -The offsets of the subexpression starting at the -.IR i th -open parenthesis are stored in -.IR pmatch[i] . -The entire regular expression's match addresses are stored in -.IR pmatch[0] . -(Note that to return the offsets of -.I N -subexpression matches, +fills .I nmatch -must be at least -.IR N+1 .) -Any unused structure elements will contain the value \-1. +elements of +.I pmatch +with results: +.I pmatch[0] +corresponds to the entire match, +.I pmatch[1] +to the first subexpression, etc. +If there were more matches than +.IR nmatch , +they are discarded; +if fewer, +unused elements of +.I pmatch +are filled with +.BR \-1 s. .PP -Each -.I rm_so -element that is not \-1 indicates the start offset of the next largest -substring match within the string. -The relative -.I rm_eo -element indicates the end offset of the match, -which is the offset of the first character after the matching text. +Each returned valid +.RB (non- \-1 ) +match corresponds to the range +.RI [ "string + rm_so" , " string + rm_eo" ). .PP .I regoff_t is a signed integer type -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v9a] regex.3: Destandardeseify Match offsets 2023-04-21 13:29 ` [PATCH v9a] " наб @ 2023-04-21 13:55 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 13:55 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 2905 bytes --] On 4/21/23 15:29, наб wrote: > This section reads like it were (and pretty much is) lifted from POSIX. > That's hard to read, because POSIX is horrendously verbose, as usual. > > Instead, synopsise it into something less formal but more reasonable, > and describe the resulting range with a range instead of a paragraph. > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Patch applied. Thanks, Alex > --- > Range-diff against v9: > 4: 80d247ebc ! 1: c3e45d60e regex.3: Destandardeseify Match offsets > @@ man3/regex.3: .SS Matching > +.I pmatch[0] > +corresponds to the entire match, > +.I pmatch[1] > -+to the first expression, etc. > ++to the first subexpression, etc. > +If there were more matches than > +.IR nmatch , > +they are discarded; > > man3/regex.3 | 53 +++++++++++++++++++++++++--------------------------- > 1 file changed, 25 insertions(+), 28 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index 30f2ef318..8efd21d72 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -179,37 +179,34 @@ .SS Matching > .SS Match offsets > Unless > .B REG_NOSUB > -was set for the compilation of the pattern buffer, it is possible to > -obtain match addressing information. > -.I pmatch > -must be dimensioned to have at least > -.I nmatch > -elements. > -These are filled in by > +was passed to > +.BR regcomp (), > +it is possible to > +obtain the locations of matches within > +.IR string : > .BR regexec () > -with substring match addresses. > -The offsets of the subexpression starting at the > -.IR i th > -open parenthesis are stored in > -.IR pmatch[i] . > -The entire regular expression's match addresses are stored in > -.IR pmatch[0] . > -(Note that to return the offsets of > -.I N > -subexpression matches, > +fills > .I nmatch > -must be at least > -.IR N+1 .) > -Any unused structure elements will contain the value \-1. > +elements of > +.I pmatch > +with results: > +.I pmatch[0] > +corresponds to the entire match, > +.I pmatch[1] > +to the first subexpression, etc. > +If there were more matches than > +.IR nmatch , > +they are discarded; > +if fewer, > +unused elements of > +.I pmatch > +are filled with > +.BR \-1 s. > .PP > -Each > -.I rm_so > -element that is not \-1 indicates the start offset of the next largest > -substring match within the string. > -The relative > -.I rm_eo > -element indicates the end offset of the match, > -which is the offset of the first character after the matching text. > +Each returned valid > +.RB (non- \-1 ) > +match corresponds to the range > +.RI [ "string + rm_so" , " string + rm_eo" ). > .PP > .I regoff_t > is a signed integer type -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v8 5/5] regex.3: Further clarify the sole purpose of REG_NOSUB 2023-04-21 2:48 ` [PATCH v8 0/5] " наб ` (3 preceding siblings ...) 2023-04-21 2:49 ` [PATCH v8 4/5] regex.3: Destandardeseify Match offsets наб @ 2023-04-21 2:49 ` наб 2023-04-21 11:44 ` Alejandro Colomar 2023-04-21 10:00 ` [PATCH v8 0/5] regex.3 momento Alejandro Colomar 5 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-21 2:49 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 792 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 55fddd88e..060e8a587 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -96,16 +96,14 @@ .SS Compilation searches using this pattern buffer will be case insensitive. .TP .B REG_NOSUB -Do not report position of matches. -The -.I nmatch -and -.I pmatch +Report only overall success. .BR regexec () -arguments will be ignored for this purpose (but +will use only .I pmatch -may still be used for -.BR REG_STARTEND ). +for +.BR REG_STARTEND , +ignoring +.IR nmatch . .TP .B REG_NEWLINE Match-any-character operators don't match a newline. -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v8 5/5] regex.3: Further clarify the sole purpose of REG_NOSUB 2023-04-21 2:49 ` [PATCH v8 5/5] regex.3: Further clarify the sole purpose of REG_NOSUB наб @ 2023-04-21 11:44 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 11:44 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1030 bytes --] Hi nab! On 4/21/23 04:49, наб wrote: > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Patch applied. Thanks, Alex > --- > man3/regex.3 | 14 ++++++-------- > 1 file changed, 6 insertions(+), 8 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index 55fddd88e..060e8a587 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -96,16 +96,14 @@ .SS Compilation > searches using this pattern buffer will be case insensitive. > .TP > .B REG_NOSUB > -Do not report position of matches. > -The > -.I nmatch > -and > -.I pmatch > +Report only overall success. > .BR regexec () > -arguments will be ignored for this purpose (but > +will use only > .I pmatch > -may still be used for > -.BR REG_STARTEND ). > +for > +.BR REG_STARTEND , > +ignoring > +.IR nmatch . > .TP > .B REG_NEWLINE > Match-any-character operators don't match a newline. -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v8 0/5] regex.3 momento 2023-04-21 2:48 ` [PATCH v8 0/5] " наб ` (4 preceding siblings ...) 2023-04-21 2:49 ` [PATCH v8 5/5] regex.3: Further clarify the sole purpose of REG_NOSUB наб @ 2023-04-21 10:00 ` Alejandro Colomar 5 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 10:00 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 2424 bytes --] Hi! On 4/21/23 04:48, наб wrote: > As a pull.rebase = true enjoyer, it was very easy > (indeed, git pull and axe the single-line conflict + empty commit), > and it's what I've been doing the entire time; recommend it. Heh, I never run `git pull`. It feels too dangerous. I prefer `git fetch` and then doing manually whatever needs to be done, so I know exactly what goes on. > > 5/5 remains a toss-up for me. Apply it if you think it's better, > don't if you don't. > > https://bugs.debian.org/1034658 :) > > наб (5): > regex.3: Desoupify regerror() description > regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link > regex_t.3type into regex.3 > regex.3: Finalise move of reg*.3type > regex.3: Destandardeseify Match offsets > regex.3: Further clarify the sole purpose of REG_NOSUB > > man3/regex.3 | 179 +++++++++++++++++++++++--------------- > man3type/regex_t.3type | 64 +------------- > man3type/regmatch_t.3type | 2 +- > man3type/regoff_t.3type | 2 +- > 4 files changed, 110 insertions(+), 137 deletions(-) > > No clue where it got this. The interdiff is just the .IR -> .I. > > Range-diff against v7: > 1: 783a16431 ! 1: 4479e1572 regex.3: Desoupify regerror() description > @@ man3/regex.3: .SS Error reporting > +.IR errbuf ; > +the error string is always null-terminated, and truncated to fit. > .SS Freeing > - Supplying > .BR regfree () > + deinitializes the pattern buffer at This means that the context of the patches changed (due to the rebase), even if the +/- haven't changed themselves. Basically what would be "applying with fuzz" when refreshing a patch. > 2: 5706f1892 < -: --------- regex.3: Desoupify regfree() description > 3: baacf086f < -: --------- regex.3: Improve REG_STARTEND The cause is probably that I applied these before it. Cheers, Alex > 4: 056c3ff04 = 2: bad307847 regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 > 5: 44d7b775d = 3: edefa8a5e regex.3: Finalise move of reg*.3type > 6: 79641df02 = 4: 500070a5e regex.3: Destandardeseify Match offsets > 7: 26d06c07f = 5: b01685c7a regex.3: Further clarify the sole purpose of REG_NOSUB -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v4 2/6] regex.3: Desoupify function descriptions 2023-04-20 11:31 ` Alejandro Colomar 2023-04-20 13:02 ` [PATCH v4 1/6] regex.3: Fix subsection headings наб @ 2023-04-20 13:02 ` наб 2023-04-20 14:00 ` Alejandro Colomar 2023-04-20 13:02 ` [PATCH v4 3/6] regex.3: Improve REG_STARTEND наб ` (3 subsequent siblings) 5 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-20 13:02 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 3758 bytes --] Behold: regerror() is passed the error code, errcode, the pattern buffer, preg, a pointer to a character string buffer, errbuf, and the size of the string buffer, errbuf_size. Absolute soup. This reads to me like an ill-conceived copy from a very early standard version. It looks fine in source form but is horrific to read as running text. Instead, replace all of these with just the descriptions of what they do with their arguments. What the arguments are is very clearly noted in big bold in the prototypes. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 80 +++++++++++++++++++++------------------------------- 1 file changed, 32 insertions(+), 48 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 637cb2231..b4feaba19 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -25,8 +25,8 @@ Standard C library .BI " size_t " nmatch ", regmatch_t " pmatch "[restrict ." nmatch ], .BI " int " eflags ); .PP -.BI "size_t regerror(int " errcode ", const regex_t *restrict " preg , -.BI " char " errbuf "[restrict ." errbuf_size "], \ +.BI "size_t regerror(int " errcode ", const regex_t *_Nullable restrict " preg , +.BI " char " errbuf "[restrict ." errbuf_size "], \ size_t " errbuf_size ); .BI "void regfree(regex_t *" preg ); .fi @@ -38,21 +38,13 @@ for subsequent .BR regexec () searches. .PP -.BR regcomp () -is supplied with -.IR preg , -a pointer to a pattern buffer storage area; -.IR regex , -a pointer to the null-terminated string and -.IR cflags , -flags used to determine the type of compilation. -.PP -All regular expression searching must be done via a compiled pattern -buffer, thus -.BR regexec () -must always be supplied with the address of a -.BR regcomp ()-initialized -pattern buffer. +The pattern buffer at +.I *preg +is initialized. +.I regex +is a null-terminated string. +The locale must be the same when running +.BR regexec (). .PP .I cflags is the @@ -113,12 +105,10 @@ contains .SS Matching .BR regexec () is used to match a null-terminated string -against the precompiled pattern buffer, -.IR preg . -.I nmatch -and -.I pmatch -are used to provide information regarding the location of any matches. +against the compiled pattern buffer in +.IR *preg , +which must have been initialised with +.BR regexec (). .I eflags is the bitwise OR @@ -217,34 +207,28 @@ and .BR regexec () into error message strings. .PP -.BR regerror () -is passed the error code, -.IR errcode , -the pattern buffer, -.IR preg , -a pointer to a character string buffer, -.IR errbuf , -and the size of the string buffer, -.IR errbuf_size . -It returns the size of the -.I errbuf -required to contain the null-terminated error message string. -If both -.I errbuf -and +.I errcode +must be the latest error returned from an operation on +.IR preg . +If +.I preg +is a null pointer\(emthe latest error. +.PP +If +.I errbuf_size +is +.BR 0 , +the size of the required buffer is returned. +Otherwise, up to .I errbuf_size -are nonzero, -.I errbuf -is filled in with the first -.I "errbuf_size \- 1" -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). +bytes are copied to +.IR errbuf ; +the error string is always null-terminated, and truncated to fit. .SS Freeing -Supplying .BR regfree () -with a precompiled pattern buffer, -.IR preg , -will free the memory allocated to the pattern buffer by the compiling -process, +invalidates the pattern buffer at +.IR *preg , +which must have been initialized via .BR regcomp (). .SH RETURN VALUE .BR regcomp () -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v4 2/6] regex.3: Desoupify function descriptions 2023-04-20 13:02 ` [PATCH v4 2/6] regex.3: Desoupify function descriptions наб @ 2023-04-20 14:00 ` Alejandro Colomar 2023-04-20 14:37 ` наб 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 14:00 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 4321 bytes --] Hi! On 4/20/23 15:02, наб wrote: > Behold: > regerror() is passed the error code, errcode, the pattern buffer, > preg, a pointer to a character string buffer, errbuf, and the size > of the string buffer, errbuf_size. > > Absolute soup. This reads to me like an ill-conceived copy from a very > early standard version. It looks fine in source form but is horrific to > read as running text. > > Instead, replace all of these with just the descriptions of what they do > with their arguments. What the arguments are is very clearly noted in > big bold in the prototypes. > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Please break this patch into smaller ones. > --- > man3/regex.3 | 80 +++++++++++++++++++++------------------------------- > 1 file changed, 32 insertions(+), 48 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index 637cb2231..b4feaba19 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -25,8 +25,8 @@ Standard C library > .BI " size_t " nmatch ", regmatch_t " pmatch "[restrict ." nmatch ], > .BI " int " eflags ); > .PP > -.BI "size_t regerror(int " errcode ", const regex_t *restrict " preg , > -.BI " char " errbuf "[restrict ." errbuf_size "], \ > +.BI "size_t regerror(int " errcode ", const regex_t *_Nullable restrict " preg , > +.BI " char " errbuf "[restrict ." errbuf_size "], \ > size_t " errbuf_size ); > .BI "void regfree(regex_t *" preg ); > .fi > @@ -38,21 +38,13 @@ for subsequent > .BR regexec () > searches. > .PP > -.BR regcomp () > -is supplied with > -.IR preg , > -a pointer to a pattern buffer storage area; > -.IR regex , > -a pointer to the null-terminated string and > -.IR cflags , > -flags used to determine the type of compilation. > -.PP > -All regular expression searching must be done via a compiled pattern > -buffer, thus > -.BR regexec () > -must always be supplied with the address of a > -.BR regcomp ()-initialized > -pattern buffer. > +The pattern buffer at > +.I *preg > +is initialized. I think I prefer avoiding passive voice here. No? It initializes the pattern buffer at *preg? Thanks, Alex > +.I regex > +is a null-terminated string. > +The locale must be the same when running > +.BR regexec (). > .PP > .I cflags > is the > @@ -113,12 +105,10 @@ contains > .SS Matching > .BR regexec () > is used to match a null-terminated string > -against the precompiled pattern buffer, > -.IR preg . > -.I nmatch > -and > -.I pmatch > -are used to provide information regarding the location of any matches. > +against the compiled pattern buffer in > +.IR *preg , > +which must have been initialised with > +.BR regexec (). > .I eflags > is the > bitwise OR > @@ -217,34 +207,28 @@ and > .BR regexec () > into error message strings. > .PP > -.BR regerror () > -is passed the error code, > -.IR errcode , > -the pattern buffer, > -.IR preg , > -a pointer to a character string buffer, > -.IR errbuf , > -and the size of the string buffer, > -.IR errbuf_size . > -It returns the size of the > -.I errbuf > -required to contain the null-terminated error message string. > -If both > -.I errbuf > -and > +.I errcode > +must be the latest error returned from an operation on > +.IR preg . > +If > +.I preg > +is a null pointer\(emthe latest error. > +.PP > +If > +.I errbuf_size > +is > +.BR 0 , > +the size of the required buffer is returned. > +Otherwise, up to > .I errbuf_size > -are nonzero, > -.I errbuf > -is filled in with the first > -.I "errbuf_size \- 1" > -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). > +bytes are copied to > +.IR errbuf ; > +the error string is always null-terminated, and truncated to fit. > .SS Freeing > -Supplying > .BR regfree () > -with a precompiled pattern buffer, > -.IR preg , > -will free the memory allocated to the pattern buffer by the compiling > -process, > +invalidates the pattern buffer at > +.IR *preg , > +which must have been initialized via > .BR regcomp (). > .SH RETURN VALUE > .BR regcomp () -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v4 2/6] regex.3: Desoupify function descriptions 2023-04-20 14:00 ` Alejandro Colomar @ 2023-04-20 14:37 ` наб 0 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 14:37 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1392 bytes --] Hi! On Thu, Apr 20, 2023 at 04:00:40PM +0200, Alejandro Colomar wrote: > On 4/20/23 15:02, наб wrote: > > Instead, replace all of these with just the descriptions of what they do > > with their arguments. What the arguments are is very clearly noted in > > big bold in the prototypes. > Please break this patch into smaller ones. Cracked into one each for regcomp/regexec/regerror. > > @@ -38,21 +38,13 @@ for subsequent > > .BR regexec () > > searches. > > .PP > > -.BR regcomp () > > -is supplied with > > -.IR preg , > > -a pointer to a pattern buffer storage area; > > -.IR regex , > > -a pointer to the null-terminated string and > > -.IR cflags , > > -flags used to determine the type of compilation. > > -.PP > > -All regular expression searching must be done via a compiled pattern > > -buffer, thus > > -.BR regexec () > > -must always be supplied with the address of a > > -.BR regcomp ()-initialized > > -pattern buffer. > > +The pattern buffer at > > +.I *preg > > +is initialized. > I think I prefer avoiding passive voice here. No? > It initializes the pattern buffer at *preg? I changed it to On success, the pattern buffer at *preg is initialized. Which makes more sense as a post-condition, and writing it the other way around would be weird ("If it succeeds, it initialises pattern buffer at *preg"? horrendous). Best, [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v4 3/6] regex.3: Improve REG_STARTEND 2023-04-20 11:31 ` Alejandro Colomar 2023-04-20 13:02 ` [PATCH v4 1/6] regex.3: Fix subsection headings наб 2023-04-20 13:02 ` [PATCH v4 2/6] regex.3: Desoupify function descriptions наб @ 2023-04-20 13:02 ` наб 2023-04-20 14:04 ` Alejandro Colomar 2023-04-20 13:02 ` [PATCH v4 4/6] regex.3, regex_t.3type: Move regex_t.3type into regex.3 наб ` (2 subsequent siblings) 5 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-20 13:02 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1837 bytes --] Explicitly spell out the ranges involved. The original wording always confused me, but it's actually very sane. Also change the [0]. to -> here to make more obvious the point that pmatch is used as a pointer-to-object, not array in this scenario. Remove "this doesn't change R_NOTBOL & R_NEWLINE" ‒ so does it change R_NOTEOL? No. That's weird and confusing. String largeness doesn't matter, known-lengthness does. Explicitly spell out the influence on returned matches (relative to string, not start of range). Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index b4feaba19..00e7e2c6b 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -131,23 +131,30 @@ compilation flag above). .TP .B REG_STARTEND -Use -.I pmatch[0] -on the input string, starting at byte -.I pmatch[0].rm_so -and ending before byte -.IR pmatch[0].rm_eo . +Match +.RI [ string " + " pmatch->rm_so ", " string " + " pmatch->rm_eo ) +instead of +.RI [ string ", " string " + \fBstrlen\fP(" string )). This allows matching embedded NUL bytes and avoids a .BR strlen (3) -on large strings. -It does not use +on known-length strings. +.I pmatch +must point to a valid readable object. +If any matches are returned +.RB ( REG_NOSUB +wasn't passed to +.BR regcomp (), +the match succeeded, and .I nmatch -on input, and does not change -.B REG_NOTBOL -or -.B REG_NEWLINE -processing. +> 0), they overwrite +.I pmatch +as usual, and the +.B Match offsets +remain relative to +.IR string +(not +.IR string " + " pmatch->rm_so ). This flag is a BSD extension, not present in POSIX. .SS Match offsets Unless -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v4 3/6] regex.3: Improve REG_STARTEND 2023-04-20 13:02 ` [PATCH v4 3/6] regex.3: Improve REG_STARTEND наб @ 2023-04-20 14:04 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 14:04 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 2346 bytes --] On 4/20/23 15:02, наб wrote: > Explicitly spell out the ranges involved. The original wording always > confused me, but it's actually very sane. I like this change. > > Also change the [0]. to -> here to make more obvious the point that > pmatch is used as a pointer-to-object, not array in this scenario. Since at the same time [>0] can be meaningful, I prefer using [0], to note that the first entry is special in the array. -> looks like there's no array at all, but rather just one object. > > Remove "this doesn't change R_NOTBOL & R_NEWLINE" ‒ so does it change > R_NOTEOL? No. That's weird and confusing. > > String largeness doesn't matter, known-lengthness does. Good. > > Explicitly spell out the influence on returned matches > (relative to string, not start of range). > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cheers, Alex > --- > man3/regex.3 | 33 ++++++++++++++++++++------------- > 1 file changed, 20 insertions(+), 13 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index b4feaba19..00e7e2c6b 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -131,23 +131,30 @@ compilation flag > above). > .TP > .B REG_STARTEND > -Use > -.I pmatch[0] > -on the input string, starting at byte > -.I pmatch[0].rm_so > -and ending before byte > -.IR pmatch[0].rm_eo . > +Match > +.RI [ string " + " pmatch->rm_so ", " string " + " pmatch->rm_eo ) > +instead of > +.RI [ string ", " string " + \fBstrlen\fP(" string )). > This allows matching embedded NUL bytes > and avoids a > .BR strlen (3) > -on large strings. > -It does not use > +on known-length strings. > +.I pmatch > +must point to a valid readable object. > +If any matches are returned > +.RB ( REG_NOSUB > +wasn't passed to > +.BR regcomp (), > +the match succeeded, and > .I nmatch > -on input, and does not change > -.B REG_NOTBOL > -or > -.B REG_NEWLINE > -processing. > +> 0), they overwrite > +.I pmatch > +as usual, and the > +.B Match offsets > +remain relative to > +.IR string > +(not > +.IR string " + " pmatch->rm_so ). > This flag is a BSD extension, not present in POSIX. > .SS Match offsets > Unless -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v4 4/6] regex.3, regex_t.3type: Move regex_t.3type into regex.3 2023-04-20 11:31 ` Alejandro Colomar ` (2 preceding siblings ...) 2023-04-20 13:02 ` [PATCH v4 3/6] regex.3: Improve REG_STARTEND наб @ 2023-04-20 13:02 ` наб 2023-04-20 13:02 ` [PATCH v4 5/6] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move in with regex.3 наб 2023-04-20 13:02 ` [PATCH v4 6/6] regex.3: Destandardeseify Match offsets наб 5 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 13:02 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 3765 bytes --] Move-only commit. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 30 ++++++++++++++++++++ man3type/regex_t.3type | 63 ------------------------------------------ 2 files changed, 30 insertions(+), 63 deletions(-) delete mode 100644 man3type/regex_t.3type diff --git a/man3/regex.3 b/man3/regex.3 index 00e7e2c6b..615e065de 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -29,6 +29,20 @@ Standard C library .BI " char " errbuf "[restrict ." errbuf_size "], \ size_t " errbuf_size ); .BI "void regfree(regex_t *" preg ); +.PP +.B typedef struct { +.BR " size_t re_nsub;" " /* Number of parenthesized subexpressions */" +.B } regex_t; +.PP +.B typedef struct { +.BR " regoff_t rm_so;" " /* Byte offset from start of string" + to start of substring */ +.BR " regoff_t rm_eo;" " /* Byte offset from start of string to" + the first character after the end of + substring */ +.B } regmatch_t; +.PP +.BR typedef " /* ... */ " regoff_t; .fi .SH DESCRIPTION .SS Compilation @@ -206,6 +220,14 @@ The relative .I rm_eo element indicates the end offset of the match, which is the offset of the first character after the matching text. +.PP +.I regoff_t +It is a signed integer type +capable of storing the largest value that can be stored in either an +.I ptrdiff_t +type or a +.I ssize_t +type. .SS Error reporting .BR regerror () is used to turn the error codes that can be returned by both @@ -322,6 +344,14 @@ T} Thread safety MT-Safe POSIX.1-2008. .SH HISTORY POSIX.1-2001. +.PP +Prior to POSIX.1-2008, +the type was +capable of storing the largest value that can be stored in either an +.I off_t +type or a +.I ssize_t +type. .SH EXAMPLES .EX #include <stdint.h> diff --git a/man3type/regex_t.3type b/man3type/regex_t.3type deleted file mode 100644 index 176d2c7a6..000000000 --- a/man3type/regex_t.3type +++ /dev/null @@ -1,63 +0,0 @@ -.\" Copyright (c) 2020-2022 by Alejandro Colomar <alx@kernel.org> -.\" and Copyright (c) 2020 by Michael Kerrisk <mtk.manpages@gmail.com> -.\" -.\" SPDX-License-Identifier: Linux-man-pages-copyleft -.\" -.\" -.TH regex_t 3type (date) "Linux man-pages (unreleased)" -.SH NAME -regex_t, regmatch_t, regoff_t -\- regular expression matching -.SH LIBRARY -Standard C library -.RI ( libc ) -.SH SYNOPSIS -.EX -.B #include <regex.h> -.PP -.B typedef struct { -.BR " size_t re_nsub;" " /* Number of parenthesized subexpressions */" -.B } regex_t; -.PP -.B typedef struct { -.BR " regoff_t rm_so;" " /* Byte offset from start of string" - to start of substring */ -.BR " regoff_t rm_eo;" " /* Byte offset from start of string to" - the first character after the end of - substring */ -.B } regmatch_t; -.PP -.BR typedef " /* ... */ " regoff_t; -.EE -.SH DESCRIPTION -.TP -.I regex_t -This is a structure type used in regular expression matching. -It holds a compiled regular expression, -compiled with -.BR regcomp (3). -.TP -.I regmatch_t -This is a structure type used in regular expression matching. -.TP -.I regoff_t -It is a signed integer type -capable of storing the largest value that can be stored in either an -.I ptrdiff_t -type or a -.I ssize_t -type. -.SH STANDARDS -POSIX.1-2008. -.SH HISTORY -POSIX.1-2001. -.PP -Prior to POSIX.1-2008, -the type was -capable of storing the largest value that can be stored in either an -.I off_t -type or a -.I ssize_t -type. -.SH SEE ALSO -.BR regex (3) -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* [PATCH v4 5/6] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move in with regex.3 2023-04-20 11:31 ` Alejandro Colomar ` (3 preceding siblings ...) 2023-04-20 13:02 ` [PATCH v4 4/6] regex.3, regex_t.3type: Move regex_t.3type into regex.3 наб @ 2023-04-20 13:02 ` наб 2023-04-20 14:07 ` Alejandro Colomar 2023-04-20 13:02 ` [PATCH v4 6/6] regex.3: Destandardeseify Match offsets наб 5 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-20 13:02 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 3327 bytes --] They're inextricably linked, not cross-referenced at all, and not used anywhere else. Now that they (realistically) exist to the reader, add a note on how big nmatch can be; POSIX even says "The application developer should note that there is probably no reason for using a value of nmatch that is larger than preg−>re_nsub+1.". Also remove the now-duplicate regmatch_t declaration. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 40 +++++++++++++++++++-------------------- man3type/regex_t.3type | 1 + man3type/regmatch_t.3type | 2 +- man3type/regoff_t.3type | 2 +- 4 files changed, 23 insertions(+), 22 deletions(-) create mode 100644 man3type/regex_t.3type diff --git a/man3/regex.3 b/man3/regex.3 index 615e065de..6d203fa22 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -15,7 +15,7 @@ regcomp, regexec, regerror, regfree \- POSIX regex functions Standard C library .RI ( libc ", " \-lc ) .SH SYNOPSIS -.nf +.EX .B #include <regex.h> .PP .BI "int regcomp(regex_t *restrict " preg ", const char *restrict " regex , @@ -43,7 +43,7 @@ size_t " errbuf_size ); .B } regmatch_t; .PP .BR typedef " /* ... */ " regoff_t; -.fi +.EE .SH DESCRIPTION .SS Compilation .BR regcomp () @@ -60,6 +60,21 @@ is a null-terminated string. The locale must be the same when running .BR regexec (). .PP +After +.BR regcomp () +succeeds, +.I preg->re_nsub +holds the number of subexpressions in +.IR regex . +Thus, a value of +.I preg->re_nsub ++ 1 +passed as +.I nmatch +to +.BR regexec () +is sufficient to capture all matches. +.PP .I cflags is the bitwise OR @@ -196,22 +211,6 @@ must be at least .IR N+1 .) Any unused structure elements will contain the value \-1. .PP -The -.I regmatch_t -structure which is the type of -.I pmatch -is defined in -.IR <regex.h> . -.PP -.in +4n -.EX -typedef struct { - regoff_t rm_so; - regoff_t rm_eo; -} regmatch_t; -.EE -.in -.PP Each .I rm_so element that is not \-1 indicates the start offset of the next largest @@ -222,7 +221,7 @@ element indicates the end offset of the match, which is the offset of the first character after the matching text. .PP .I regoff_t -It is a signed integer type +is a signed integer type capable of storing the largest value that can be stored in either an .I ptrdiff_t type or a @@ -346,7 +345,8 @@ POSIX.1-2008. POSIX.1-2001. .PP Prior to POSIX.1-2008, -the type was +.I regoff_t +was required to be capable of storing the largest value that can be stored in either an .I off_t type or a diff --git a/man3type/regex_t.3type b/man3type/regex_t.3type new file mode 100644 index 000000000..c0daaf0ff --- /dev/null +++ b/man3type/regex_t.3type @@ -0,0 +1 @@ +.so man3/regex.3 diff --git a/man3type/regmatch_t.3type b/man3type/regmatch_t.3type index dc78f2cf2..c0daaf0ff 100644 --- a/man3type/regmatch_t.3type +++ b/man3type/regmatch_t.3type @@ -1 +1 @@ -.so man3type/regex_t.3type +.so man3/regex.3 diff --git a/man3type/regoff_t.3type b/man3type/regoff_t.3type index dc78f2cf2..c0daaf0ff 100644 --- a/man3type/regoff_t.3type +++ b/man3type/regoff_t.3type @@ -1 +1 @@ -.so man3type/regex_t.3type +.so man3/regex.3 -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v4 5/6] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move in with regex.3 2023-04-20 13:02 ` [PATCH v4 5/6] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move in with regex.3 наб @ 2023-04-20 14:07 ` Alejandro Colomar 0 siblings, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 14:07 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 3937 bytes --] On 4/20/23 15:02, наб wrote: > They're inextricably linked, not cross-referenced at all, > and not used anywhere else. > > Now that they (realistically) exist to the reader, add a note > on how big nmatch can be; POSIX even says "The application developer > should note that there is probably no reason for using a value of > nmatch that is larger than preg−>re_nsub+1.". > > Also remove the now-duplicate regmatch_t declaration. > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > man3/regex.3 | 40 +++++++++++++++++++-------------------- > man3type/regex_t.3type | 1 + > man3type/regmatch_t.3type | 2 +- > man3type/regoff_t.3type | 2 +- > 4 files changed, 23 insertions(+), 22 deletions(-) > create mode 100644 man3type/regex_t.3type > > diff --git a/man3/regex.3 b/man3/regex.3 > index 615e065de..6d203fa22 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -15,7 +15,7 @@ regcomp, regexec, regerror, regfree \- POSIX regex functions > Standard C library > .RI ( libc ", " \-lc ) > .SH SYNOPSIS > -.nf > +.EX > .B #include <regex.h> > .PP > .BI "int regcomp(regex_t *restrict " preg ", const char *restrict " regex , > @@ -43,7 +43,7 @@ size_t " errbuf_size ); > .B } regmatch_t; > .PP > .BR typedef " /* ... */ " regoff_t; > -.fi > +.EE > .SH DESCRIPTION > .SS Compilation > .BR regcomp () > @@ -60,6 +60,21 @@ is a null-terminated string. > The locale must be the same when running > .BR regexec (). > .PP > +After > +.BR regcomp () > +succeeds, > +.I preg->re_nsub > +holds the number of subexpressions in > +.IR regex . > +Thus, a value of > +.I preg->re_nsub > ++ 1 > +passed as > +.I nmatch > +to > +.BR regexec () > +is sufficient to capture all matches. > +.PP > .I cflags > is the > bitwise OR > @@ -196,22 +211,6 @@ must be at least > .IR N+1 .) > Any unused structure elements will contain the value \-1. > .PP > -The > -.I regmatch_t > -structure which is the type of > -.I pmatch > -is defined in > -.IR <regex.h> . > -.PP > -.in +4n > -.EX > -typedef struct { > - regoff_t rm_so; > - regoff_t rm_eo; > -} regmatch_t; > -.EE > -.in > -.PP > Each > .I rm_so > element that is not \-1 indicates the start offset of the next largest > @@ -222,7 +221,7 @@ element indicates the end offset of the match, > which is the offset of the first character after the matching text. > .PP > .I regoff_t > -It is a signed integer type > +is a signed integer type > capable of storing the largest value that can be stored in either an > .I ptrdiff_t > type or a > @@ -346,7 +345,8 @@ POSIX.1-2008. > POSIX.1-2001. > .PP > Prior to POSIX.1-2008, > -the type was > +.I regoff_t > +was required to be > capable of storing the largest value that can be stored in either an > .I off_t > type or a > diff --git a/man3type/regex_t.3type b/man3type/regex_t.3type > new file mode 100644 > index 000000000..c0daaf0ff > --- /dev/null > +++ b/man3type/regex_t.3type The link changes in the same patch that does the move are fine. git should be smart enough to follow that, and it will help humans too. This short removal of the file might be worse than than the previous approach, I fear. > @@ -0,0 +1 @@ > +.so man3/regex.3 > diff --git a/man3type/regmatch_t.3type b/man3type/regmatch_t.3type > index dc78f2cf2..c0daaf0ff 100644 > --- a/man3type/regmatch_t.3type > +++ b/man3type/regmatch_t.3type > @@ -1 +1 @@ > -.so man3type/regex_t.3type > +.so man3/regex.3 > diff --git a/man3type/regoff_t.3type b/man3type/regoff_t.3type > index dc78f2cf2..c0daaf0ff 100644 > --- a/man3type/regoff_t.3type > +++ b/man3type/regoff_t.3type > @@ -1 +1 @@ > -.so man3type/regex_t.3type > +.so man3/regex.3 -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v4 6/6] regex.3: Destandardeseify Match offsets 2023-04-20 11:31 ` Alejandro Colomar ` (4 preceding siblings ...) 2023-04-20 13:02 ` [PATCH v4 5/6] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move in with regex.3 наб @ 2023-04-20 13:02 ` наб 2023-04-20 14:10 ` Alejandro Colomar 5 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-20 13:02 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2231 bytes --] This section reads like it were (and pretty much is) lifted from POSIX. That's hard to read, because POSIX is horrendously verbose, as usual. Instead, synopsise it into something less formal but more reasonable, and describe the resulting range with a range instead of a paragraph. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 53 +++++++++++++++++++++++++--------------------------- 1 file changed, 25 insertions(+), 28 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 6d203fa22..552763940 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -188,37 +188,34 @@ This flag is a BSD extension, not present in POSIX. .SS Match offsets Unless .B REG_NOSUB -was set for the compilation of the pattern buffer, it is possible to -obtain match addressing information. -.I pmatch -must be dimensioned to have at least -.I nmatch -elements. -These are filled in by +was passed to +.BR regcomp (), +it is possible to +obtain the locations of matches within +.IR string : .BR regexec () -with substring match addresses. -The offsets of the subexpression starting at the -.IR i th -open parenthesis are stored in -.IR pmatch[i] . -The entire regular expression's match addresses are stored in -.IR pmatch[0] . -(Note that to return the offsets of -.I N -subexpression matches, +fills .I nmatch -must be at least -.IR N+1 .) -Any unused structure elements will contain the value \-1. +elements of +.I pmatch +with results: +.I pmatch[0] +corresponds to the entire match, +.I pmatch[1] +to the first expression, etc. +If there were more matches than +.IR nmatch , +they are discarded; +if fewer, +unused elements of +.I pmatch +are filled with +.BR \-1 s. .PP -Each -.I rm_so -element that is not \-1 indicates the start offset of the next largest -substring match within the string. -The relative -.I rm_eo -element indicates the end offset of the match, -which is the offset of the first character after the matching text. +Each returned valid +.RB (non- \-1 ) +match corresponds to the range +.RI [ string " + " rm_so ", " string " + " rm_eo ). .PP .I regoff_t is a signed integer type -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v4 6/6] regex.3: Destandardeseify Match offsets 2023-04-20 13:02 ` [PATCH v4 6/6] regex.3: Destandardeseify Match offsets наб @ 2023-04-20 14:10 ` Alejandro Colomar 2023-04-20 15:05 ` наб 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 14:10 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 2670 bytes --] On 4/20/23 15:02, наб wrote: > This section reads like it were (and pretty much is) lifted from POSIX. > That's hard to read, because POSIX is horrendously verbose, as usual. > > Instead, synopsise it into something less formal but more reasonable, > and describe the resulting range with a range instead of a paragraph. > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > man3/regex.3 | 53 +++++++++++++++++++++++++--------------------------- > 1 file changed, 25 insertions(+), 28 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index 6d203fa22..552763940 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -188,37 +188,34 @@ This flag is a BSD extension, not present in POSIX. > .SS Match offsets > Unless > .B REG_NOSUB > -was set for the compilation of the pattern buffer, it is possible to > -obtain match addressing information. > -.I pmatch > -must be dimensioned to have at least > -.I nmatch > -elements. > -These are filled in by > +was passed to > +.BR regcomp (), > +it is possible to > +obtain the locations of matches within > +.IR string : > .BR regexec () > -with substring match addresses. > -The offsets of the subexpression starting at the > -.IR i th > -open parenthesis are stored in > -.IR pmatch[i] . > -The entire regular expression's match addresses are stored in > -.IR pmatch[0] . > -(Note that to return the offsets of > -.I N > -subexpression matches, > +fills > .I nmatch > -must be at least > -.IR N+1 .) > -Any unused structure elements will contain the value \-1. > +elements of > +.I pmatch > +with results: > +.I pmatch[0] > +corresponds to the entire match, I still don't understand this. Does REG_NOSUB also affect pmatch[0]? I would have expected that it would only affect *sub*matches, that is, [>0]. > +.I pmatch[1] > +to the first expression, etc. > +If there were more matches than > +.IR nmatch , > +they are discarded; > +if fewer, > +unused elements of > +.I pmatch > +are filled with > +.BR \-1 s. > .PP > -Each > -.I rm_so > -element that is not \-1 indicates the start offset of the next largest > -substring match within the string. > -The relative > -.I rm_eo > -element indicates the end offset of the match, > -which is the offset of the first character after the matching text. > +Each returned valid > +.RB (non- \-1 ) > +match corresponds to the range > +.RI [ string " + " rm_so ", " string " + " rm_eo ). > .PP > .I regoff_t > is a signed integer type -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v4 6/6] regex.3: Destandardeseify Match offsets 2023-04-20 14:10 ` Alejandro Colomar @ 2023-04-20 15:05 ` наб 2023-04-20 18:51 ` G. Branden Robinson 2023-04-21 11:34 ` Alejandro Colomar 0 siblings, 2 replies; 143+ messages in thread From: наб @ 2023-04-20 15:05 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 5704 bytes --] Hi! On Thu, Apr 20, 2023 at 04:10:04PM +0200, Alejandro Colomar wrote: > On 4/20/23 15:02, наб wrote: > > --- a/man3/regex.3 > > +++ b/man3/regex.3 > > @@ -188,37 +188,34 @@ This flag is a BSD extension, not present in POSIX. > > .SS Match offsets > > Unless > > .B REG_NOSUB > > -was set for the compilation of the pattern buffer, it is possible to > > -obtain match addressing information. > > -.I pmatch > > -must be dimensioned to have at least > > -.I nmatch > > -elements. > > -These are filled in by > > +was passed to > > +.BR regcomp (), > > +it is possible to > > +obtain the locations of matches within > > +.IR string : > > .BR regexec () > > -with substring match addresses. > > -The offsets of the subexpression starting at the > > -.IR i th > > -open parenthesis are stored in > > -.IR pmatch[i] . > > -The entire regular expression's match addresses are stored in > > -.IR pmatch[0] . > > -(Note that to return the offsets of > > -.I N > > -subexpression matches, > > +fills > > .I nmatch > > -must be at least > > -.IR N+1 .) > > -Any unused structure elements will contain the value \-1. > > +elements of > > +.I pmatch > > +with results: > > +.I pmatch[0] > > +corresponds to the entire match, > I still don't understand this. Does REG_NOSUB also affect pmatch[0]? > I would have expected that it would only affect *sub*matches, that is, [>0]. Let's consult the manual: REG_NOSUB Do not report position of matches. [...] REG_NOSUB Compile for matching that need only report success or failure, not what was matched. (4.4BSD) and POSIX: REG_NOSUB Report only success or fail in regexec(). REG_NOSUB Report only success/fail in regexec( ). (yes; the two times it describes it, it's written differently). POSIX says it better I think. And, indeed: $ cat a.c #include <regex.h> #include <stdio.h> int main(int c, char ** v) { regex_t r; regcomp(&r, v[1], 0); regmatch_t dt = {0, 3}; printf("%d\n", regexec(&r, v[2], 1, &dt, REG_STARTEND)); printf("%d, %d\n", (int)dt.rm_so, (int)dt.rm_eo); } $ cc a.c -oac $ ./ac 'c$' 'abcdef' 0 2, 3 $ sed 's/0)/REG_NOSUB)/' a.c | cc -xc - -oac $ ./ac 'c$' 'abcdef' 0 0, 3 ...and I've just realised why you're asking ‒ I think you're reading too much (and ahistorically) into the "SUB" bit; heretofor I've assumed this is for "substitution", which I think is fair. Actually, let's consult POSIX.2 (Draft 11.2): 591 Table B-8 − regcomp() cflags Argument 596 REG_NOSUB Report only success/fail in regexec(). B.5 C Binding for Regular Expression Matching, B.5.2 Description: 609 If the REG_NOSUB flag was not set in cflags, then regcomp() shall set re_nsub to 610 the number of parenthesized subexpressions [delimited by \( \) in basic regular 611 expressions or ( ) in extended regular expressions] found in pattern. both as present-day. B.5.5 Rationale., History of Decisions Made: 791 The working group has rejected, at least for now, the inclusion of a regsub() func- 792 tion that would be used to do substitutions for a matched regular expression. 793 While such a routine would be useful to some applications, its utility would be 794 much more limited than the matching function described here. Both regular 795 expression parsing and substitution are possible to implement without support 796 other than that required by the C Standard {7}, but matching is much more com- 797 plex than substituting. The only ‘‘difficult’’ part of substitution, given the infor- 798 mation supplied by regexec(), is finding the next character in a string when there 799 can be multibyte characters. That is a much wider issue, and one that needs a 800 more general solution. 803 In Draft 9, the interface was modified so that the matched substrings rm_sp and 804 rm_ep are in a separate regmatch_t structure instead of in regex_t. This allows a 805 single compiled regular expression to be used simultaneously in several contexts; 806 in main() and a signal handler, perhaps, or in multiple threads of lightweight 807 processes. (The preg argument to regexec() is declared with type const, so the 808 implementation is not permitted to use the structure to store intermediate 809 results.) It also allows an application to request an arbitrary number of sub- 810 strings from a regular expression. (Previous versions reported only ten sub- 811 strings.) The number of subexpressions in the regular expression is reported in 812 re_nsub in preg. With this change to regexec(), consideration was given to drop- 813 ping the REG_NOSUB flag, since the user can now specify this with a zero nmatch 814 argument to regexec(). However, keeping REG_NOSUB allows an implementation 815 to use a different (perhaps more efficient) algorithm if it knows in regcomp() that 816 no subexpressions need be reported. The implementation is only required to fill 817 in pmatch if nmatch is not zero and if REG_NOSUB is not specified. Note that the 818 size_t type, as defined in the C Standard {7}, is unsigned, so the description of 819 regexec() does not need to address negative values of nmatch. So: yes, there was a substitution interface that got cut. The name is actually a hold-over from "don't allocate for ten subexpressions in regex_t". I think changing our description to REG_NOSUB Only report overall success. regexec() will only use pmatch for REG_STARTEND, and ignore nmatch. may make that more obvious. Best, наб [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v4 6/6] regex.3: Destandardeseify Match offsets 2023-04-20 15:05 ` наб @ 2023-04-20 18:51 ` G. Branden Robinson 2023-04-21 11:34 ` Alejandro Colomar 1 sibling, 0 replies; 143+ messages in thread From: G. Branden Robinson @ 2023-04-20 18:51 UTC (permalink / raw) To: наб; +Cc: Alejandro Colomar, linux-man [-- Attachment #1: Type: text/plain, Size: 642 bytes --] At 2023-04-20T17:05:53+0200, наб wrote: > I think changing our description to > REG_NOSUB Only report overall success. regexec() will only use pmatch > for REG_STARTEND, and ignore nmatch. > may make that more obvious. s/Only report/Report only/ s/only use/use only/ You might then further economize on space: > REG_NOSUB Report only overall success. regexec() will use only pmatch > for REG_STARTEND, ignoring nmatch. As a rule of thumb, get the adverb "only" as close to the word it modifies as you can, because "only" can modify pretty much anything in English. Regards, Branden [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v4 6/6] regex.3: Destandardeseify Match offsets 2023-04-20 15:05 ` наб 2023-04-20 18:51 ` G. Branden Robinson @ 2023-04-21 11:34 ` Alejandro Colomar 1 sibling, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-21 11:34 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 4061 bytes --] Hi, On 4/20/23 17:05, наб wrote: > Hi! > > On Thu, Apr 20, 2023 at 04:10:04PM +0200, Alejandro Colomar wrote: >> On 4/20/23 15:02, наб wrote: >>> --- a/man3/regex.3 >>> +++ b/man3/regex.3 >>> @@ -188,37 +188,34 @@ This flag is a BSD extension, not present in POSIX. >>> .SS Match offsets >>> Unless >>> .B REG_NOSUB >>> -was set for the compilation of the pattern buffer, it is possible to >>> -obtain match addressing information. >>> -.I pmatch >>> -must be dimensioned to have at least >>> -.I nmatch >>> -elements. >>> -These are filled in by >>> +was passed to >>> +.BR regcomp (), >>> +it is possible to >>> +obtain the locations of matches within >>> +.IR string : >>> .BR regexec () >>> -with substring match addresses. >>> -The offsets of the subexpression starting at the >>> -.IR i th >>> -open parenthesis are stored in >>> -.IR pmatch[i] . >>> -The entire regular expression's match addresses are stored in >>> -.IR pmatch[0] . >>> -(Note that to return the offsets of >>> -.I N >>> -subexpression matches, >>> +fills >>> .I nmatch >>> -must be at least >>> -.IR N+1 .) >>> -Any unused structure elements will contain the value \-1. >>> +elements of >>> +.I pmatch >>> +with results: >>> +.I pmatch[0] >>> +corresponds to the entire match, >> I still don't understand this. Does REG_NOSUB also affect pmatch[0]? >> I would have expected that it would only affect *sub*matches, that is, [>0]. > > Let's consult the manual: > REG_NOSUB Do not report position of matches. [...] > REG_NOSUB Compile for matching that need only report success or > failure, not what was matched. (4.4BSD) > and POSIX: > REG_NOSUB Report only success or fail in regexec(). > REG_NOSUB Report only success/fail in regexec( ). > (yes; the two times it describes it, it's written differently). > > POSIX says it better I think. > > And, indeed: > $ cat a.c > #include <regex.h> > #include <stdio.h> > int main(int c, char ** v) { > regex_t r; > regcomp(&r, v[1], 0); > regmatch_t dt = {0, 3}; > printf("%d\n", regexec(&r, v[2], 1, &dt, REG_STARTEND)); > printf("%d, %d\n", (int)dt.rm_so, (int)dt.rm_eo); > } > > $ cc a.c -oac > $ ./ac 'c$' 'abcdef' > 0 > 2, 3 > > $ sed 's/0)/REG_NOSUB)/' a.c | cc -xc - -oac > $ ./ac 'c$' 'abcdef' > 0 > 0, 3 > I like this example, and the quotes from POSIX. I'll link to your message in the commit log. > > ...and I've just realised why you're asking ‒ I think you're reading too > much (and ahistorically) into the "SUB" bit; [...] > Actually, let's consult POSIX.2 (Draft 11.2): [...] > 609 If the REG_NOSUB flag was not set in cflags, then regcomp() shall set re_nsub to > 610 the number of parenthesized subexpressions [delimited by \( \) in basic regular > 611 expressions or ( ) in extended regular expressions] found in pattern. > both as present-day. [...] > It also allows an application to request an arbitrary number of sub- > 810 strings from a regular expression. (Previous versions reported only ten sub- > 811 strings.) The number of subexpressions in the regular expression is reported in > 812 re_nsub in preg. [...] > > So: yes, there was a substitution interface that got cut. > The name is actually a hold-over from > "don't allocate for ten subexpressions in regex_t". So, the name indeed seems to come from "subexpressions", which confirms that it's just confusing as hell. > > I think changing our description to > REG_NOSUB Only report overall success. regexec() will only use pmatch > for REG_STARTEND, and ignore nmatch. > may make that more obvious. Yeah, this, and further the version in v8, makes the behavior clear, even if the name is brain-damaged (but there's nothing we can do about it :/). Cheers, Alex > > Best, > наб -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v2 7/9] regex.3: destandardeseify Byte offsets 2023-04-19 21:20 ` наб ` (6 preceding siblings ...) 2023-04-19 23:25 ` [PATCH v2 6/9] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: move in with regex.3 наб @ 2023-04-19 23:25 ` наб 2023-04-19 23:26 ` [PATCH v2 8/9] regex.3: desoupify function descriptions наб 2023-04-19 23:26 ` [PATCH v2 9/9] regex.3: fix subsection headings наб 9 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-19 23:25 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2232 bytes --] This section reads like it were (and pretty much is) lifted from POSIX. That's hard to read, because POSIX is horrendously verbose, as usual. Instead, synopsise it into something less formal but more reasonable, and describe the resulting range with a range instead of a paragraph. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 53 +++++++++++++++++++++++++--------------------------- 1 file changed, 25 insertions(+), 28 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index b95b3c3b0..9f262f985 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -198,37 +198,34 @@ This flag is a BSD extension, not present in POSIX. .SS Byte offsets Unless .B REG_NOSUB -was set for the compilation of the pattern buffer, it is possible to -obtain match addressing information. -.I pmatch -must be dimensioned to have at least -.I nmatch -elements. -These are filled in by +was passed to +.BR regcomp (), +it is possible to +obtain the locations of matches within +.IR string : .BR regexec () -with substring match addresses. -The offsets of the subexpression starting at the -.IR i th -open parenthesis are stored in -.IR pmatch[i] . -The entire regular expression's match addresses are stored in -.IR pmatch[0] . -(Note that to return the offsets of -.I N -subexpression matches, +fills .I nmatch -must be at least -.IR N+1 .) -Any unused structure elements will contain the value \-1. +elements of +.I pmatch +with results: +.I pmatch[0] +corresponds to the entire match, +.I pmatch[1] +to the first expression, etc. +If there were more matches than +.IR nmatch , +they are discarded; +if fewer, +unused elements of +.I pmatch +are filled with +.BR \-1 s. .PP -Each -.I rm_so -element that is not \-1 indicates the start offset of the next largest -substring match within the string. -The relative -.I rm_eo -element indicates the end offset of the match, -which is the offset of the first character after the matching text. +Each returned valid +.RB (non- \-1 ) +match corresponds to the range +.RI [ string " + " rm_so ", " string " + " rm_eo ). .PP .I regoff_t is a signed integer type -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* [PATCH v2 8/9] regex.3: desoupify function descriptions 2023-04-19 21:20 ` наб ` (7 preceding siblings ...) 2023-04-19 23:25 ` [PATCH v2 7/9] regex.3: destandardeseify Byte offsets наб @ 2023-04-19 23:26 ` наб 2023-04-20 11:15 ` [PATCH v3 " наб 2023-04-19 23:26 ` [PATCH v2 9/9] regex.3: fix subsection headings наб 9 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-19 23:26 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 3798 bytes --] Behold: regerror() is passed the error code, errcode, the pattern buffer, preg, a pointer to a character string buffer, errbuf, and the size of the string buffer, errbuf_size. Absolute soup. This reads to me like an ill-conceived copy from a very early standard version. It looks fine in source form but is horrific to read as running text. Instead, replace all of these with just the descriptions of what they do with their arguments. What the arguments are is very clearly noted in big bold in the prototypes. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 80 +++++++++++++++++++++------------------------------- 1 file changed, 32 insertions(+), 48 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 9f262f985..7d08d4042 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -25,8 +25,8 @@ Standard C library .BI " size_t " nmatch ", regmatch_t " pmatch "[restrict ." nmatch ], .BI " int " eflags ); .PP -.BI "size_t regerror(int " errcode ", const regex_t *restrict " preg , -.BI " char " errbuf "[restrict ." errbuf_size "], \ +.BI "size_t regerror(int " errcode ", const regex_t *_Nullable restrict " preg , +.BI " char " errbuf "[restrict ." errbuf_size "], \ size_t " errbuf_size ); .BI "void regfree(regex_t *" preg ); .PP @@ -52,21 +52,13 @@ for subsequent .BR regexec () searches. .PP -.BR regcomp () -is supplied with -.IR preg , -a pointer to a pattern buffer storage area; -.IR regex , -a pointer to the null-terminated string and -.IR cflags , -flags used to determine the type of compilation. -.PP -All regular expression searching must be done via a compiled pattern -buffer, thus -.BR regexec () -must always be supplied with the address of a -.BR regcomp ()-initialized -pattern buffer. +The pattern buffer at +.I *preg +is initialized. +.I regex +is a null-terminated string. +The locale must be the same when running +.BR regexec (). .PP After .BR regcomp () @@ -142,12 +134,10 @@ contains .SS POSIX regex matching .BR regexec () is used to match a null-terminated string -against the precompiled pattern buffer, -.IR preg . -.I nmatch -and -.I pmatch -are used to provide information regarding the location of any matches. +against the precompiled pattern buffer in +.IR *preg , +which must have been initialised with +.BR regexec (). .I eflags is the bitwise OR @@ -242,34 +232,28 @@ and .BR regexec () into error message strings. .PP -.BR regerror () -is passed the error code, -.IR errcode , -the pattern buffer, -.IR preg , -a pointer to a character string buffer, -.IR errbuf , -and the size of the string buffer, -.IR errbuf_size . -It returns the size of the -.I errbuf -required to contain the null-terminated error message string. -If both -.I errbuf -and +.I errcode +must be the latest error returned from an operation on +.IR preg . +If +.I preg +is a null pointer\(emthe latest error. +.PP +If +.I errbuf_size +is +.BR 0 , +the size of the required buffer is returned. +Otherwise, up to .I errbuf_size -are nonzero, -.I errbuf -is filled in with the first -.I "errbuf_size \- 1" -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). +bytes are copied to +.IR errbuf ; +the error string is always null-terminated, and truncated to fit. .SS POSIX pattern buffer freeing -Supplying .BR regfree () -with a precompiled pattern buffer, -.IR preg , -will free the memory allocated to the pattern buffer by the compiling -process, +invalidates the pattern buffer at +.IR *preg , +which must have been initialized via .BR regcomp (). .SH RETURN VALUE .BR regcomp () -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* [PATCH v3 8/9] regex.3: desoupify function descriptions 2023-04-19 23:26 ` [PATCH v2 8/9] regex.3: desoupify function descriptions наб @ 2023-04-20 11:15 ` наб 2023-04-20 11:43 ` Alejandro Colomar 0 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-20 11:15 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 3827 bytes --] Behold: regerror() is passed the error code, errcode, the pattern buffer, preg, a pointer to a character string buffer, errbuf, and the size of the string buffer, errbuf_size. Absolute soup. This reads to me like an ill-conceived copy from a very early standard version. It looks fine in source form but is horrific to read as running text. Instead, replace all of these with just the descriptions of what they do with their arguments. What the arguments are is very clearly noted in big bold in the prototypes. Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- Left one "pre"compiled buffer. man3/regex.3 | 80 +++++++++++++++++++++------------------------------- 1 file changed, 32 insertions(+), 48 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 9f262f985..9bb4a73ff 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -25,8 +25,8 @@ Standard C library .BI " size_t " nmatch ", regmatch_t " pmatch "[restrict ." nmatch ], .BI " int " eflags ); .PP -.BI "size_t regerror(int " errcode ", const regex_t *restrict " preg , -.BI " char " errbuf "[restrict ." errbuf_size "], \ +.BI "size_t regerror(int " errcode ", const regex_t *_Nullable restrict " preg , +.BI " char " errbuf "[restrict ." errbuf_size "], \ size_t " errbuf_size ); .BI "void regfree(regex_t *" preg ); .PP @@ -52,21 +52,13 @@ for subsequent .BR regexec () searches. .PP -.BR regcomp () -is supplied with -.IR preg , -a pointer to a pattern buffer storage area; -.IR regex , -a pointer to the null-terminated string and -.IR cflags , -flags used to determine the type of compilation. -.PP -All regular expression searching must be done via a compiled pattern -buffer, thus -.BR regexec () -must always be supplied with the address of a -.BR regcomp ()-initialized -pattern buffer. +The pattern buffer at +.I *preg +is initialized. +.I regex +is a null-terminated string. +The locale must be the same when running +.BR regexec (). .PP After .BR regcomp () @@ -142,12 +134,10 @@ contains .SS POSIX regex matching .BR regexec () is used to match a null-terminated string -against the precompiled pattern buffer, -.IR preg . -.I nmatch -and -.I pmatch -are used to provide information regarding the location of any matches. +against the compiled pattern buffer in +.IR *preg , +which must have been initialised with +.BR regexec (). .I eflags is the bitwise OR @@ -242,34 +232,28 @@ and .BR regexec () into error message strings. .PP -.BR regerror () -is passed the error code, -.IR errcode , -the pattern buffer, -.IR preg , -a pointer to a character string buffer, -.IR errbuf , -and the size of the string buffer, -.IR errbuf_size . -It returns the size of the -.I errbuf -required to contain the null-terminated error message string. -If both -.I errbuf -and +.I errcode +must be the latest error returned from an operation on +.IR preg . +If +.I preg +is a null pointer\(emthe latest error. +.PP +If +.I errbuf_size +is +.BR 0 , +the size of the required buffer is returned. +Otherwise, up to .I errbuf_size -are nonzero, -.I errbuf -is filled in with the first -.I "errbuf_size \- 1" -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). +bytes are copied to +.IR errbuf ; +the error string is always null-terminated, and truncated to fit. .SS POSIX pattern buffer freeing -Supplying .BR regfree () -with a precompiled pattern buffer, -.IR preg , -will free the memory allocated to the pattern buffer by the compiling -process, +invalidates the pattern buffer at +.IR *preg , +which must have been initialized via .BR regcomp (). .SH RETURN VALUE .BR regcomp () -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH v3 8/9] regex.3: desoupify function descriptions 2023-04-20 11:15 ` [PATCH v3 " наб @ 2023-04-20 11:43 ` Alejandro Colomar 2023-04-20 11:50 ` наб 0 siblings, 1 reply; 143+ messages in thread From: Alejandro Colomar @ 2023-04-20 11:43 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 4682 bytes --] Hi наб! On 4/20/23 13:15, наб wrote: > Behold: > regerror() is passed the error code, errcode, the pattern buffer, > preg, a pointer to a character string buffer, errbuf, and the size > of the string buffer, errbuf_size. > > Absolute soup. This reads to me like an ill-conceived copy from a very > early standard version. It looks fine in source form but is horrific to > read as running text. > > Instead, replace all of these with just the descriptions of what they do > with their arguments. What the arguments are is very clearly noted in > big bold in the prototypes. > > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> It would be nice to see the --range-diff[1], to easily review changes to patches. I have a hard time running vdiff[2] on the raw patches. [1]: <https://git-scm.com/docs/git-format-patch#Documentation/git-format-patch.txt---range-diffltpreviousgt> See also: <https://git-scm.com/docs/git-range-diff> [2]: <http://catb.org/jargon/html/V/vdiff.html>, not <https://www.unix.com/man-page/linux/1/vdiff/> Cheers, Alex > --- > Left one "pre"compiled buffer. > > man3/regex.3 | 80 +++++++++++++++++++++------------------------------- > 1 file changed, 32 insertions(+), 48 deletions(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index 9f262f985..9bb4a73ff 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -25,8 +25,8 @@ Standard C library > .BI " size_t " nmatch ", regmatch_t " pmatch "[restrict ." nmatch ], > .BI " int " eflags ); > .PP > -.BI "size_t regerror(int " errcode ", const regex_t *restrict " preg , > -.BI " char " errbuf "[restrict ." errbuf_size "], \ > +.BI "size_t regerror(int " errcode ", const regex_t *_Nullable restrict " preg , > +.BI " char " errbuf "[restrict ." errbuf_size "], \ > size_t " errbuf_size ); > .BI "void regfree(regex_t *" preg ); > .PP > @@ -52,21 +52,13 @@ for subsequent > .BR regexec () > searches. > .PP > -.BR regcomp () > -is supplied with > -.IR preg , > -a pointer to a pattern buffer storage area; > -.IR regex , > -a pointer to the null-terminated string and > -.IR cflags , > -flags used to determine the type of compilation. > -.PP > -All regular expression searching must be done via a compiled pattern > -buffer, thus > -.BR regexec () > -must always be supplied with the address of a > -.BR regcomp ()-initialized > -pattern buffer. > +The pattern buffer at > +.I *preg > +is initialized. > +.I regex > +is a null-terminated string. > +The locale must be the same when running > +.BR regexec (). > .PP > After > .BR regcomp () > @@ -142,12 +134,10 @@ contains > .SS POSIX regex matching > .BR regexec () > is used to match a null-terminated string > -against the precompiled pattern buffer, > -.IR preg . > -.I nmatch > -and > -.I pmatch > -are used to provide information regarding the location of any matches. > +against the compiled pattern buffer in > +.IR *preg , > +which must have been initialised with > +.BR regexec (). > .I eflags > is the > bitwise OR > @@ -242,34 +232,28 @@ and > .BR regexec () > into error message strings. > .PP > -.BR regerror () > -is passed the error code, > -.IR errcode , > -the pattern buffer, > -.IR preg , > -a pointer to a character string buffer, > -.IR errbuf , > -and the size of the string buffer, > -.IR errbuf_size . > -It returns the size of the > -.I errbuf > -required to contain the null-terminated error message string. > -If both > -.I errbuf > -and > +.I errcode > +must be the latest error returned from an operation on > +.IR preg . > +If > +.I preg > +is a null pointer\(emthe latest error. > +.PP > +If > +.I errbuf_size > +is > +.BR 0 , > +the size of the required buffer is returned. > +Otherwise, up to > .I errbuf_size > -are nonzero, > -.I errbuf > -is filled in with the first > -.I "errbuf_size \- 1" > -characters of the error message and a terminating null byte (\[aq]\e0\[aq]). > +bytes are copied to > +.IR errbuf ; > +the error string is always null-terminated, and truncated to fit. > .SS POSIX pattern buffer freeing > -Supplying > .BR regfree () > -with a precompiled pattern buffer, > -.IR preg , > -will free the memory allocated to the pattern buffer by the compiling > -process, > +invalidates the pattern buffer at > +.IR *preg , > +which must have been initialized via > .BR regcomp (). > .SH RETURN VALUE > .BR regcomp () -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* Re: [PATCH v3 8/9] regex.3: desoupify function descriptions 2023-04-20 11:43 ` Alejandro Colomar @ 2023-04-20 11:50 ` наб 0 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 11:50 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1253 bytes --] Hi! On Thu, Apr 20, 2023 at 01:43:56PM +0200, Alejandro Colomar wrote: > On 4/20/23 13:15, наб wrote: > > Behold: > > regerror() is passed the error code, errcode, the pattern buffer, > > preg, a pointer to a character string buffer, errbuf, and the size > > of the string buffer, errbuf_size. > > > > Absolute soup. This reads to me like an ill-conceived copy from a very > > early standard version. It looks fine in source form but is horrific to > > read as running text. > > > > Instead, replace all of these with just the descriptions of what they do > > with their arguments. What the arguments are is very clearly noted in > > big bold in the prototypes. > It would be nice to see the --range-diff[1], to easily review changes to > patches. I have a hard time running vdiff[2] on the raw patches. v2: > > -against the precompiled pattern buffer, > > +against the precompiled pattern buffer in v3: > > -against the precompiled pattern buffer, > > +against the compiled pattern buffer in And 9/9 grew this hunk in v3: @@ -179,13 +179,13 @@ the match succeeded, and > 0), they overwrite .I pmatch as usual, and the -.B Byte offsets +.B Match offsets remain relative to .IR string (not Best, [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
* [PATCH v2 9/9] regex.3: fix subsection headings 2023-04-19 21:20 ` наб ` (8 preceding siblings ...) 2023-04-19 23:26 ` [PATCH v2 8/9] regex.3: desoupify function descriptions наб @ 2023-04-19 23:26 ` наб 2023-04-20 11:17 ` [PATCH v3 " наб 9 siblings, 1 reply; 143+ messages in thread From: наб @ 2023-04-19 23:26 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1512 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- man3/regex.3 | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 7d08d4042..58eb81c8b 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -45,7 +45,7 @@ size_t " errbuf_size ); .BR typedef " /* ... */ " regoff_t; .EE .SH DESCRIPTION -.SS POSIX regex compiling +.SS Compilation .BR regcomp () is used to compile a regular expression into a form that is suitable for subsequent @@ -131,7 +131,7 @@ whether .I eflags contains .BR REG_NOTEOL . -.SS POSIX regex matching +.SS Matching .BR regexec () is used to match a null-terminated string against the precompiled pattern buffer in @@ -185,7 +185,7 @@ remain relative to (not .IR string " + " pmatch->rm_so ). This flag is a BSD extension, not present in POSIX. -.SS Byte offsets +.SS Match offsets Unless .B REG_NOSUB was passed to @@ -224,7 +224,7 @@ capable of storing the largest value that can be stored in either an type or a .I ssize_t type. -.SS POSIX error reporting +.SS Error reporting .BR regerror () is used to turn the error codes that can be returned by both .BR regcomp () @@ -249,7 +249,7 @@ Otherwise, up to bytes are copied to .IR errbuf ; the error string is always null-terminated, and truncated to fit. -.SS POSIX pattern buffer freeing +.SS Freeing .BR regfree () invalidates the pattern buffer at .IR *preg , -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* [PATCH v3 9/9] regex.3: fix subsection headings 2023-04-19 23:26 ` [PATCH v2 9/9] regex.3: fix subsection headings наб @ 2023-04-20 11:17 ` наб 0 siblings, 0 replies; 143+ messages in thread From: наб @ 2023-04-20 11:17 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1677 bytes --] Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> --- Missed the .Sx Byte offsets. man3/regex.3 | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/man3/regex.3 b/man3/regex.3 index 9bb4a73ff..552763940 100644 --- a/man3/regex.3 +++ b/man3/regex.3 @@ -45,7 +45,7 @@ size_t " errbuf_size ); .BR typedef " /* ... */ " regoff_t; .EE .SH DESCRIPTION -.SS POSIX regex compiling +.SS Compilation .BR regcomp () is used to compile a regular expression into a form that is suitable for subsequent @@ -131,7 +131,7 @@ whether .I eflags contains .BR REG_NOTEOL . -.SS POSIX regex matching +.SS Matching .BR regexec () is used to match a null-terminated string against the compiled pattern buffer in @@ -179,13 +179,13 @@ the match succeeded, and > 0), they overwrite .I pmatch as usual, and the -.B Byte offsets +.B Match offsets remain relative to .IR string (not .IR string " + " pmatch->rm_so ). This flag is a BSD extension, not present in POSIX. -.SS Byte offsets +.SS Match offsets Unless .B REG_NOSUB was passed to @@ -224,7 +224,7 @@ capable of storing the largest value that can be stored in either an type or a .I ssize_t type. -.SS POSIX error reporting +.SS Error reporting .BR regerror () is used to turn the error codes that can be returned by both .BR regcomp () @@ -249,7 +249,7 @@ Otherwise, up to bytes are copied to .IR errbuf ; the error string is always null-terminated, and truncated to fit. -.SS POSIX pattern buffer freeing +.SS Freeing .BR regfree () invalidates the pattern buffer at .IR *preg , -- 2.30.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 143+ messages in thread
* Re: [PATCH 1/2] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND 2023-04-19 17:47 [PATCH 1/2] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND наб 2023-04-19 17:48 ` [PATCH 2/2] regex.3: improve REG_STARTEND наб @ 2023-04-19 19:51 ` Alejandro Colomar 1 sibling, 0 replies; 143+ messages in thread From: Alejandro Colomar @ 2023-04-19 19:51 UTC (permalink / raw) To: наб; +Cc: linux-man [-- Attachment #1.1: Type: text/plain, Size: 1648 bytes --] Hi наб! On 4/19/23 19:47, наб wrote: > Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> > --- > Also note that in > int regexec(const regex_t *restrict preg, const char *restrict string, > size_t nmatch, regmatch_t pmatch[restrict .nmatch], > int eflags); > pmatch is [1] if nmatch is 0 if eflags®_STARTEND. > Or, more succinctly, > regmatch_t pmatch[restrict !!(.eflags & ®_STARTEND) ?: .nmatch], > > Doesn't really matter, and that's a much worse signature than what's > currently there, but. Please include this in the commit message :) > > man3/regex.3 | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/man3/regex.3 b/man3/regex.3 > index e8fed5147..d54d6024c 100644 > --- a/man3/regex.3 > +++ b/man3/regex.3 > @@ -82,7 +82,9 @@ and > .I pmatch > arguments to > .BR regexec () > -are ignored if the pattern buffer supplied was compiled with this flag set. > +are only used for > +.B REG_STARTEND > +if the pattern buffer supplied was compiled with this flag set. I think it would be clearer with a wording like: +are only used for +.B REG_STARTEND +and only if the pattern buffer supplied was compiled with this flag set. I'm still not convinced by my wording either; please revise. But with your wording, I think it's not clear what happens if REG_STARTEND is not set. Cheers, Alex > .TP > .B REG_NEWLINE > Match-any-character operators don't match a newline. -- <http://www.alejandro-colomar.es/> GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 143+ messages in thread
end of thread, other threads:[~2023-06-03 17:30 UTC | newest] Thread overview: 143+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-04-19 17:47 [PATCH 1/2] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND наб 2023-04-19 17:48 ` [PATCH 2/2] regex.3: improve REG_STARTEND наб 2023-04-19 20:23 ` Alejandro Colomar 2023-04-19 21:20 ` наб 2023-04-19 21:45 ` Alejandro Colomar 2023-04-19 23:23 ` [PATCH v2 1/9] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND наб 2023-04-20 11:21 ` Alejandro Colomar 2023-04-19 23:23 ` [PATCH v2 2/9] regex.3: improve REG_STARTEND наб 2023-04-20 10:00 ` G. Branden Robinson 2023-04-20 11:13 ` наб 2023-04-20 18:33 ` G. Branden Robinson 2023-04-20 22:29 ` Alejandro Colomar 2023-04-21 5:00 ` G. Branden Robinson 2023-04-21 8:06 ` a straw-man `SR` man(7) macro for (sub)section cross references (was: [PATCH v2 2/9] regex.3: improve REG_STARTEND) G. Branden Robinson 2023-04-21 11:07 ` [PATCH v2 2/9] regex.3: improve REG_STARTEND Alejandro Colomar 2023-06-02 0:12 ` Alejandro Colomar 2023-06-02 0:49 ` наб 2023-06-03 17:30 ` Alejandro Colomar 2023-04-19 23:23 ` [PATCH v2 3/9] regex.3: ffix наб 2023-04-20 11:23 ` Alejandro Colomar 2023-04-19 23:23 ` [PATCH v2 4/9] regex.3: wfix наб 2023-04-20 11:27 ` Alejandro Colomar 2023-04-19 23:23 ` [PATCH v2 5/9] regex.3: ffix наб 2023-04-20 11:28 ` Alejandro Colomar 2023-04-20 12:12 ` [PATCH v3 5/9] adjtimex.2, clone.2, mprotect.2, open.2, syscall.2, regex.3: ffix, wfix наб 2023-04-20 12:52 ` Alejandro Colomar 2023-04-20 13:03 ` Alejandro Colomar 2023-04-20 14:13 ` наб 2023-04-20 14:19 ` Alejandro Colomar 2023-04-20 18:42 ` G. Branden Robinson 2023-04-20 22:40 ` Alejandro Colomar 2023-04-19 23:25 ` [PATCH v2 6/9] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: move in with regex.3 наб 2023-04-20 11:31 ` Alejandro Colomar 2023-04-20 13:02 ` [PATCH v4 1/6] regex.3: Fix subsection headings наб 2023-04-20 13:13 ` Alejandro Colomar 2023-04-20 13:24 ` наб 2023-04-20 13:35 ` Alejandro Colomar 2023-04-20 15:35 ` [PATCH v5 0/8] regex.3 momento наб 2023-04-20 15:35 ` [PATCH v5 1/8] regex.3: Desoupify regcomp() description наб 2023-04-20 16:37 ` Alejandro Colomar 2023-04-20 15:35 ` [PATCH v5 2/8] regex.3: Desoupify regexec() description наб 2023-04-20 15:35 ` [PATCH v5 3/8] regex.3: Desoupify regerror() description наб 2023-04-20 16:42 ` Alejandro Colomar 2023-04-20 18:50 ` наб 2023-04-20 16:50 ` Alejandro Colomar 2023-04-20 17:23 ` Alejandro Colomar 2023-04-20 18:46 ` наб 2023-04-20 22:45 ` Alejandro Colomar 2023-04-20 23:05 ` наб 2023-04-20 15:35 ` [PATCH v5 4/8] regex.3: Improve REG_STARTEND наб 2023-04-20 17:29 ` Alejandro Colomar 2023-04-20 19:30 ` наб 2023-04-20 19:33 ` наб 2023-04-20 23:01 ` Alejandro Colomar 2023-04-21 0:13 ` наб 2023-04-20 15:36 ` [PATCH v5 5/8] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб 2023-04-20 15:36 ` [PATCH v5 6/8] regex.3: Finalise move of reg*.3type наб 2023-04-20 15:36 ` [PATCH v5 7/8] regex.3: Destandardeseify Match offsets наб 2023-04-20 15:36 ` [PATCH v5 8/8] regex.3: Further clarify the sole purpose of REG_NOSUB наб 2023-04-20 19:36 ` [PATCH v6 0/8] regex.3 momento наб 2023-04-20 19:36 ` [PATCH v6 1/8] regex.3: Desoupify regexec() description наб 2023-04-20 23:24 ` Alejandro Colomar 2023-04-21 0:33 ` наб 2023-04-21 0:49 ` Alejandro Colomar 2023-04-20 19:36 ` [PATCH v6 2/8] regex.3: Desoupify regerror() description наб 2023-04-20 19:37 ` [PATCH v6 3/8] regex.3: Desoupify regfree() description наб 2023-04-20 23:35 ` Alejandro Colomar 2023-04-21 0:27 ` наб 2023-04-21 0:37 ` [PATCH v7 " наб 2023-04-21 0:58 ` [PATCH v6 " Alejandro Colomar 2023-04-21 1:24 ` [PATCH v7a " наб 2023-04-21 1:55 ` Alejandro Colomar 2023-04-20 19:37 ` [PATCH v6 4/8] regex.3: Improve REG_STARTEND наб 2023-04-20 23:15 ` Alejandro Colomar 2023-04-21 0:39 ` [PATCH v7 " наб 2023-04-21 1:42 ` Alejandro Colomar 2023-04-21 2:16 ` наб 2023-04-21 9:45 ` Alejandro Colomar 2023-04-21 12:13 ` наб 2023-04-21 12:21 ` Alejandro Colomar 2023-04-21 12:23 ` Alejandro Colomar 2023-04-21 10:19 ` Jakub Wilk 2023-04-21 10:22 ` Alejandro Colomar 2023-04-21 10:44 ` Jakub Wilk 2023-04-21 11:16 ` Alejandro Colomar 2023-04-21 11:34 ` наб 2023-04-21 12:46 ` Jakub Wilk 2023-04-20 19:37 ` [PATCH v6 5/8] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб 2023-04-20 19:37 ` [PATCH v6 6/8] regex.3: Finalise move of reg*.3type наб 2023-04-20 19:37 ` [PATCH v6 7/8] regex.3: Destandardeseify Match offsets наб 2023-04-20 19:37 ` [PATCH v6 8/8] regex.3: Further clarify the sole purpose of REG_NOSUB наб 2023-04-21 2:01 ` [PATCH v6 0/8] regex.3 momento Alejandro Colomar 2023-04-21 2:48 ` [PATCH v8 0/5] " наб 2023-04-21 2:48 ` [PATCH v8 1/5] regex.3: Desoupify regerror() description наб 2023-04-21 10:06 ` Alejandro Colomar 2023-04-21 12:03 ` [PATCH v9] " наб 2023-04-21 12:26 ` Alejandro Colomar 2023-04-21 12:27 ` Alejandro Colomar 2023-04-21 2:48 ` [PATCH v8 2/5] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move & link regex_t.3type into regex.3 наб 2023-04-21 11:55 ` Alejandro Colomar 2023-04-21 11:57 ` Alejandro Colomar 2023-04-21 11:57 ` Alejandro Colomar 2023-04-21 2:48 ` [PATCH v8 3/5] regex.3: Finalise move of reg*.3type наб 2023-04-21 10:33 ` Alejandro Colomar 2023-04-21 10:34 ` Alejandro Colomar 2023-04-21 11:26 ` наб 2023-04-21 11:36 ` Alejandro Colomar 2023-04-21 11:49 ` наб [not found] ` <1d2d0aa8-cb28-2d7f-c48b-7a02f907cb5b@gmail.com> 2023-04-21 11:57 ` Ralph Corderoy 2023-04-21 11:59 ` Alejandro Colomar 2023-04-21 12:03 ` Alejandro Colomar 2023-04-21 12:09 ` Ralph Corderoy 2023-04-21 12:14 ` Alejandro Colomar 2023-04-21 2:49 ` [PATCH v8 4/5] regex.3: Destandardeseify Match offsets наб 2023-04-21 10:36 ` Alejandro Colomar 2023-04-21 12:55 ` [PATCH v9] " наб 2023-04-21 13:15 ` Alejandro Colomar 2023-04-21 13:29 ` [PATCH v9a] " наб 2023-04-21 13:55 ` Alejandro Colomar 2023-04-21 2:49 ` [PATCH v8 5/5] regex.3: Further clarify the sole purpose of REG_NOSUB наб 2023-04-21 11:44 ` Alejandro Colomar 2023-04-21 10:00 ` [PATCH v8 0/5] regex.3 momento Alejandro Colomar 2023-04-20 13:02 ` [PATCH v4 2/6] regex.3: Desoupify function descriptions наб 2023-04-20 14:00 ` Alejandro Colomar 2023-04-20 14:37 ` наб 2023-04-20 13:02 ` [PATCH v4 3/6] regex.3: Improve REG_STARTEND наб 2023-04-20 14:04 ` Alejandro Colomar 2023-04-20 13:02 ` [PATCH v4 4/6] regex.3, regex_t.3type: Move regex_t.3type into regex.3 наб 2023-04-20 13:02 ` [PATCH v4 5/6] regex.3, regex_t.3type, regmatch_t.3type, regoff_t.3type: Move in with regex.3 наб 2023-04-20 14:07 ` Alejandro Colomar 2023-04-20 13:02 ` [PATCH v4 6/6] regex.3: Destandardeseify Match offsets наб 2023-04-20 14:10 ` Alejandro Colomar 2023-04-20 15:05 ` наб 2023-04-20 18:51 ` G. Branden Robinson 2023-04-21 11:34 ` Alejandro Colomar 2023-04-19 23:25 ` [PATCH v2 7/9] regex.3: destandardeseify Byte offsets наб 2023-04-19 23:26 ` [PATCH v2 8/9] regex.3: desoupify function descriptions наб 2023-04-20 11:15 ` [PATCH v3 " наб 2023-04-20 11:43 ` Alejandro Colomar 2023-04-20 11:50 ` наб 2023-04-19 23:26 ` [PATCH v2 9/9] regex.3: fix subsection headings наб 2023-04-20 11:17 ` [PATCH v3 " наб 2023-04-19 19:51 ` [PATCH 1/2] regex.3: note that pmatch is still used if REG_NOSUB if REG_STARTEND Alejandro Colomar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).