linux-man.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* strncpy clarify result may not be null terminated
@ 2023-11-04 11:27 Jonny Grant
  2023-11-04 19:33 ` Alejandro Colomar
                   ` (7 more replies)
  0 siblings, 8 replies; 138+ messages in thread
From: Jonny Grant @ 2023-11-04 11:27 UTC (permalink / raw)
  To: linux-man, Alejandro Colomar (man-pages)

Hello
I have a suggestion for strncpy.

C23 draft states this caveat for strncpy. 

"373) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will not be null-
terminated."


https://man7.org/linux/man-pages/man3/strncpy.3.html

"If the destination buffer, limited by its size, isn't large
enough to hold the copy, the resulting character sequence is
truncated. "

How about clarifying this as:


"If the destination buffer, limited by its size, isn't large
enough to hold the copy, the resulting character sequence is
truncated; where there is no null terminating byte in the first n
characters the result will not be null terminated. "

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant
@ 2023-11-04 19:33 ` Alejandro Colomar
  2023-11-04 21:18   ` Jonny Grant
  2023-11-05 21:16   ` Jonny Grant
  2023-11-12  9:17 ` [PATCH 0/2] Expand BUGS section of string_copying(7) Alejandro Colomar
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-04 19:33 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man

[-- Attachment #1: Type: text/plain, Size: 1940 bytes --]

Hi Jonny,

On Sat, Nov 04, 2023 at 11:27:44AM +0000, Jonny Grant wrote:
> Hello
> I have a suggestion for strncpy.
> 
> C23 draft states this caveat for strncpy. 
> 
> "373) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will not be null-
> terminated."
> 
> 
> https://man7.org/linux/man-pages/man3/strncpy.3.html
> 
> "If the destination buffer, limited by its size, isn't large
> enough to hold the copy, the resulting character sequence is
> truncated. "

The use of the term "character sequence" instead of "string" isn't
casual.  A "string" is a sequence of zero or more non-zero characters,
followed by exactly one NUL.  A "character sequence" is a sequence of
zero or more non-zero characters, period.

To be clearer in that regard, the CAVEATS section of the same page says
this:

CAVEATS
     The name of these functions is confusing.  These  functions  pro‐
     duce   a  null‐padded  character  sequence,  not  a  string  (see
     string_copying(7)).

Saying that these functions don't produce a string should warn anyone
thinking it would.  The page string_copying(7) goes into more detail.

> 
> How about clarifying this as:
> 
> 
> "If the destination buffer, limited by its size, isn't large
> enough to hold the copy, the resulting character sequence is
> truncated; where there is no null terminating byte in the first n
> characters the result will not be null terminated. "

strncpy(3) should !*NEVER*! be used to produce a string.
I don't think that should be conditional.  Your suggested change could
induce to the mistake of thinking that strncpy(3) is useful if the size
of the buffer is enough.  Do not ever use that function for producing
strings.  Use something else, like strlcpy(3), strcpy(3), or stpecpy(3).

Cheers,
Alex

> 
> Kind regards, Jonny

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-04 19:33 ` Alejandro Colomar
@ 2023-11-04 21:18   ` Jonny Grant
  2023-11-05  1:36     ` Alejandro Colomar
  2023-11-05 21:16   ` Jonny Grant
  1 sibling, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-04 21:18 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: linux-man



On 04/11/2023 19:33, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Sat, Nov 04, 2023 at 11:27:44AM +0000, Jonny Grant wrote:
>> Hello
>> I have a suggestion for strncpy.
>>
>> C23 draft states this caveat for strncpy. 
>>
>> "373) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will not be null-
>> terminated."
>>
>>
>> https://man7.org/linux/man-pages/man3/strncpy.3.html
>>
>> "If the destination buffer, limited by its size, isn't large
>> enough to hold the copy, the resulting character sequence is
>> truncated. "
> 
> The use of the term "character sequence" instead of "string" isn't
> casual.  A "string" is a sequence of zero or more non-zero characters,
> followed by exactly one NUL.  A "character sequence" is a sequence of
> zero or more non-zero characters, period.

Ok that's good to know. C23 calls it those "array", POSIX too. POSIX explains if the array is a string (ie null terminated) it pads with nulls, I'll paste it below:

https://pubs.opengroup.org/onlinepubs/009696899/functions/strncpy.html

"If the array pointed to by s2 is a string that is shorter than n bytes, null bytes shall be appended to the copy in the array pointed to by s1, until n bytes in all are written."

> 
> To be clearer in that regard, the CAVEATS section of the same page says
> this:
> 
> CAVEATS
>      The name of these functions is confusing.  These  functions  pro‐
>      duce   a  null‐padded  character  sequence,  not  a  string  (see
>      string_copying(7)).
> 
> Saying that these functions don't produce a string should warn anyone
> thinking it would.  The page string_copying(7) goes into more detail.
> 
>>
>> How about clarifying this as:
>>
>>
>> "If the destination buffer, limited by its size, isn't large
>> enough to hold the copy, the resulting character sequence is
>> truncated; where there is no null terminating byte in the first n
>> characters the result will not be null terminated. "
> 
> strncpy(3) should !*NEVER*! be used to produce a string.
> I don't think that should be conditional.  Your suggested change could
> induce to the mistake of thinking that strncpy(3) is useful if the size
> of the buffer is enough.  Do not ever use that function for producing
> strings.  Use something else, like strlcpy(3), strcpy(3), or stpecpy(3).

Just documentation feedback based on C23, not writing code today.

Perhaps you may have seen  Michael Kerrisk article about the risks with strlcpy.
https://lwn.net/Articles/507319/

re strcpy doesn't that risk buffer overruns? That's a surely a cyber security risk?
strlcpy is also bad in certain ways, it breaks ISO TR24731 "Do not unexpectedly truncate strings", can cause overruns and crashes.

I guess if you feel strncpy should "never be used to produce a string" you could describe that somewhere with an explanation in an article. You didn't mention why you feel it is not useful even if the size of the buffer is enough - including a null terminator I hope!

strncpy_s is a better solution, not widely available, and not part of glibc. That's another debate.

Is stpecpy standardised? If you can send me an online manual for it, I'll take a look.

Regards, Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-04 21:18   ` Jonny Grant
@ 2023-11-05  1:36     ` Alejandro Colomar
  0 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-05  1:36 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man

[-- Attachment #1: Type: text/plain, Size: 7865 bytes --]

Hi Jonny,

On Sat, Nov 04, 2023 at 09:18:08PM +0000, Jonny Grant wrote:
> On 04/11/2023 19:33, Alejandro Colomar wrote:
> > Hi Jonny,
> > 
> > On Sat, Nov 04, 2023 at 11:27:44AM +0000, Jonny Grant wrote:
> >> Hello
> >> I have a suggestion for strncpy.
> >>
> >> C23 draft states this caveat for strncpy. 
> >>
> >> "373) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will not be null-
> >> terminated."
> >>
> >>
> >> https://man7.org/linux/man-pages/man3/strncpy.3.html
> >>
> >> "If the destination buffer, limited by its size, isn't large
> >> enough to hold the copy, the resulting character sequence is
> >> truncated. "
> > 
> > The use of the term "character sequence" instead of "string" isn't
> > casual.  A "string" is a sequence of zero or more non-zero characters,
> > followed by exactly one NUL.  A "character sequence" is a sequence of
> > zero or more non-zero characters, period.
> 
> Ok that's good to know. C23 calls it those "array", POSIX too. POSIX explains if the array is a string (ie null terminated) it pads with nulls, I'll paste it below:
> 
> https://pubs.opengroup.org/onlinepubs/009696899/functions/strncpy.html
> 
> "If the array pointed to by s2 is a string that is shorter than n bytes, null bytes shall be appended to the copy in the array pointed to by s1, until n bytes in all are written."

By array, C23 and POSIX (AFAICS) refer to the array of char (so, a
`char []`) that holds the data, and not to the data itself.

By character sequence, I refer to the data, with consists of characters
in the range [1, 255] (zero or more of them).  Note that a character
sequence doesn't contain null characters.  The padding that strncpy(3)
writes after the character sequence is not part of the character
sequence, even though it is contained in the character array.

> > To be clearer in that regard, the CAVEATS section of the same page says
> > this:
> > 
> > CAVEATS
> >      The name of these functions is confusing.  These  functions  pro‐
> >      duce   a  null‐padded  character  sequence,  not  a  string  (see
> >      string_copying(7)).
> > 
> > Saying that these functions don't produce a string should warn anyone
> > thinking it would.  The page string_copying(7) goes into more detail.
> > 
> >>
> >> How about clarifying this as:
> >>
> >>
> >> "If the destination buffer, limited by its size, isn't large
> >> enough to hold the copy, the resulting character sequence is
> >> truncated; where there is no null terminating byte in the first n
> >> characters the result will not be null terminated. "
> > 
> > strncpy(3) should !*NEVER*! be used to produce a string.
> > I don't think that should be conditional.  Your suggested change could
> > induce to the mistake of thinking that strncpy(3) is useful if the size
> > of the buffer is enough.  Do not ever use that function for producing
> > strings.  Use something else, like strlcpy(3), strcpy(3), or stpecpy(3).
> 
> Just documentation feedback based on C23, not writing code today.
> 
> Perhaps you may have seen  Michael Kerrisk article about the risks with strlcpy.
> https://lwn.net/Articles/507319/

Yes.  I believe Michael's article and I agree on most terms.  That
article, though, is a bit outdated, and recent versions of
_FORTIFY_SOURCE (see ftm(7)) have changed things significantly.

> 
> re strcpy doesn't that risk buffer overruns? That's a surely a cyber security risk?

Not so much if you use _FORTIFY_SOURCE.  The feature probably still has
a few corner cases that it cannot detect, but I'm going to guess that
they are few.

> strlcpy is also bad in certain ways, it breaks ISO TR24731 "Do not unexpectedly truncate strings", can cause overruns and crashes.

And does strncpy(3) do any better?  It also truncates, so it necessarily
shares the same problems that strlcpy(3) has.  And then it has its own
ones.

-  strlcpy(3) truncates the resulting string, which most of the time is
   bad, and a bug if it the return value is ignored.  However, the
   the return value tells if there was truncation.

-  strncpy(3) truncates the resulting character sequence (it's not null-
   terminated, so it's not a string), _and_ it can't report truncation
   via the return value.  See: by yourself:

	char a[4];  strncpy(a, "asdf");

   There was no truncation, since the entire data is available in the
   resulting character sequence.  However, there's still the bug if you
   try to read that as a string.

> 
> I guess if you feel strncpy should "never be used to produce a string" you could describe that somewhere with an explanation in an article. You didn't mention why you feel it is not useful even if the size of the buffer is enough - including a null terminator I hope!

Yes.  The article, or explanation, you can find it in string_copying(7),
a manual page that I wrote recently to address precisely this.

Regarding why:

-  In case you don't want truncation, and prefer to abort, it is usually
   preferable to call strcpy(3) and rely on _FORTIFY_SOURCE.  Only if
   you have doubts about the ability of _FORTIFY_SOURCE to know the
   buffer size, you should use a different function (continue reading
   for that).  Such a case would be if you do very obscure operations to
   get a buffer and the compiler will be blind to it.

-  In case you want truncation, which is seldom, you need to use
   strlcpy(3), which is the only standard function that creates a
   truncated string.

-  In case you don't want truncation, and don't have _FORTIFY_SOURCE
   available (or you know it won't be able to handle a specific case),
   or you don't want to crash your program and want to simplify report
   an error, you also need to use strlcpy(3), which detects truncation
   easily, so you can check for that and report an error.

But there's no case where you want a string and the most suitable call
would be strncpy(3); it is never the best function.  Except when you
don't want a string, of course.  If you're working with utmp(5), then go
ahead and use that function.  But for new interfaces, you should not
design them so that they use this function.  utmp(5) and strncpy(3)
should be a mistake of the past, not to be repeated.

> 
> strncpy_s is a better solution, not widely available, and not part of glibc. That's another debate.

No, it's not.  strncpy_s(3)'s interface is rather bad.  It is a function
to catch programmer errors, by adding another parameter that the
programmer has to write.  What if the programmer makes an error while
writing the new argument of these _s functions?  Kaboom.

_FORTIFY_SOURCE accomplishes the same task, but the size is calculated
internally by the implementation, which means the programmer can't write
a bug in the code that is trying to prevent bugs.

Here's an article on these Annex K interfaces:
<https://open-std.org/jtc1/sc22/wg14/www/docs/n1967.htm>

> 
> Is stpecpy standardised? If you can send me an online manual for it, I'll take a look.

No it's not.  It's similar to strlcpy(3), but designed to chain better.
So, if you just need to call strlcpy(3), it's probably simpler to do it.
But if you need to call strlcat(3), then you may consider stpecpy(3) a
better alternative.  The main difference is that with strlcpy(3) +
strlcat(3), you need to check for truncation after every call, while
with stpecpy(3) you only need to check once after the last call.  Also,
it's simpler (less tricky) to implement (although now that strlcpy(3) is
standard, it's less of a problem).

You can find stpecpy(3) documented, with an implementation, in the
string_copying(7) manual page.

Cheers,
Alex

> 
> Regards, Jonny

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-04 19:33 ` Alejandro Colomar
  2023-11-04 21:18   ` Jonny Grant
@ 2023-11-05 21:16   ` Jonny Grant
  2023-11-05 23:31     ` Alejandro Colomar
  1 sibling, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-05 21:16 UTC (permalink / raw)
  To: Michael Kerrisk; +Cc: linux-man, Alejandro Colomar



On 04/11/2023 19:33, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Sat, Nov 04, 2023 at 11:27:44AM +0000, Jonny Grant wrote:
>> Hello
>> I have a suggestion for strncpy.
>>
>> C23 draft states this caveat for strncpy. 
>>
>> "373) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will not be null-
>> terminated."
>>
>>
>> https://man7.org/linux/man-pages/man3/strncpy.3.html
>>
>> "If the destination buffer, limited by its size, isn't large
>> enough to hold the copy, the resulting character sequence is
>> truncated. "
> 
> The use of the term "character sequence" instead of "string" isn't
> casual.  A "string" is a sequence of zero or more non-zero characters,
> followed by exactly one NUL.  A "character sequence" is a sequence of
> zero or more non-zero characters, period.
> 
> To be clearer in that regard, the CAVEATS section of the same page says
> this:
> 
> CAVEATS
>      The name of these functions is confusing.  These  functions  pro‐
>      duce   a  null‐padded  character  sequence,  not  a  string  (see
>      string_copying(7)).
> 
> Saying that these functions don't produce a string should warn anyone
> thinking it would.  The page string_copying(7) goes into more detail.
> 
>>
>> How about clarifying this as:
>>
>>
>> "If the destination buffer, limited by its size, isn't large
>> enough to hold the copy, the resulting character sequence is
>> truncated; where there is no null terminating byte in the first n
>> characters the result will not be null terminated. "
> 
> strncpy(3) should !*NEVER*! be used to produce a string.
> I don't think that should be conditional.  Your suggested change could
> induce to the mistake of thinking that strncpy(3) is useful if the size
> of the buffer is enough.  Do not ever use that function for producing
> strings.  Use something else, like strlcpy(3), strcpy(3), or stpecpy(3).
> 
> Cheers,
> Alex
> 
>>
>> Kind regards, Jonny


Michael, what do you think about this documentation suggestion I have made. Interested to hear your opinion.

Should the man page follow the C spec description of the strncpy function and how when it copies the arrays, it may leave the resulting array of characters not terminated, and warn about this pitfall.

C99 had this, and it is still there in latest C23 draft - worth clarifying on strncpy(3)?

"7.21.2.4 The strncpy function"

"269) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will
not be null-terminated."

Note, I'm not using strncpy myself, it's a documentation clarification proposal.

Kind regards
Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-05 21:16   ` Jonny Grant
@ 2023-11-05 23:31     ` Alejandro Colomar
  2023-11-07 11:52       ` Jonny Grant
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-05 23:31 UTC (permalink / raw)
  To: Jonny Grant; +Cc: Michael Kerrisk, linux-man

[-- Attachment #1: Type: text/plain, Size: 2240 bytes --]

Hi Jonny,

On Sun, Nov 05, 2023 at 09:16:25PM +0000, Jonny Grant wrote:
> Michael, what do you think about this documentation suggestion I have made. Interested to hear your opinion.
> 
> Should the man page follow the C spec description of the strncpy function and how when it copies the arrays, it may leave the resulting array of characters not terminated, and warn about this pitfall.
> 
> C99 had this, and it is still there in latest C23 draft - worth clarifying on strncpy(3)?
> 
> "7.21.2.4 The strncpy function"
> 
> "269) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will
> not be null-terminated."

What ISO C has said and continues to say about strncpy(3) is the actual
harmful stuff, which has led many programmers to believe strncpy(3) was
useful at all for producing strings.

The problem I see with what ISO C says about strncpy(3) is that it
treats it as a string-copying function.  If you treat strncpy(3) as a
string-copying function, then it is really broken and should be removed
from libc.

However, its functionality is still useful for those cases where you
don't want a string, which is the only reason I didn't mark the function
as [[deprecated]].

> 
> Note, I'm not using strncpy myself, it's a documentation clarification proposal.

I think it could be useful to add a note that one should first read the
CAVEATS section and string_copying(7) and only then read this page.


diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index 239a2eb7e..c7bb79028 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -37,6 +37,12 @@ .SH SYNOPSIS
         _GNU_SOURCE
 .fi
 .SH DESCRIPTION
+.IR Note :
+These functions are probably not what you want.
+Read CAVEATS below,
+and
+.BR string_copying (7).
+.PP
 These functions copy the string pointed to by
 .I src
 into a null-padded character sequence at the fixed-width buffer pointed to by


Is this scary enough?  Do you think this would tell readers to never use
this function unless they know what they're doing (and even when they
think they do, they probably don't)?

Cheers,
Alex

> 
> Kind regards
> Jonny

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-05 23:31     ` Alejandro Colomar
@ 2023-11-07 11:52       ` Jonny Grant
  2023-11-07 13:23         ` Alejandro Colomar
  0 siblings, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-07 11:52 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Michael Kerrisk, linux-man



On 05/11/2023 23:31, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Sun, Nov 05, 2023 at 09:16:25PM +0000, Jonny Grant wrote:
>> Michael, what do you think about this documentation suggestion I have made. Interested to hear your opinion.
>>
>> Should the man page follow the C spec description of the strncpy function and how when it copies the arrays, it may leave the resulting array of characters not terminated, and warn about this pitfall.
>>
>> C99 had this, and it is still there in latest C23 draft - worth clarifying on strncpy(3)?
>>
>> "7.21.2.4 The strncpy function"
>>
>> "269) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will
>> not be null-terminated."
> 
> What ISO C has said and continues to say about strncpy(3) is the actual
> harmful stuff, which has led many programmers to believe strncpy(3) was
> useful at all for producing strings.
> 
> The problem I see with what ISO C says about strncpy(3) is that it
> treats it as a string-copying function.  If you treat strncpy(3) as a
> string-copying function, then it is really broken and should be removed
> from libc.
> 
> However, its functionality is still useful for those cases where you
> don't want a string, which is the only reason I didn't mark the function
> as [[deprecated]].
> 
>>
>> Note, I'm not using strncpy myself, it's a documentation clarification proposal.
> 
> I think it could be useful to add a note that one should first read the
> CAVEATS section and string_copying(7) and only then read this page.
> 
> 
> diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
> index 239a2eb7e..c7bb79028 100644
> --- a/man3/stpncpy.3
> +++ b/man3/stpncpy.3
> @@ -37,6 +37,12 @@ .SH SYNOPSIS
>          _GNU_SOURCE
>  .fi
>  .SH DESCRIPTION
> +.IR Note :
> +These functions are probably not what you want.
> +Read CAVEATS below,
> +and
> +.BR string_copying (7).
> +.PP
>  These functions copy the string pointed to by
>  .I src
>  into a null-padded character sequence at the fixed-width buffer pointed to by
> 
> 
> Is this scary enough?  Do you think this would tell readers to never use
> this function unless they know what they're doing (and even when they
> think they do, they probably don't)?
> 
> Cheers,
> Alex
> 
>>
>> Kind regards
>> Jonny
> 

Alejandro,

We see things differently, I'm on the C standard side on this one. Would any information change your mind?

With kind regards, Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-07 11:52       ` Jonny Grant
@ 2023-11-07 13:23         ` Alejandro Colomar
  2023-11-07 14:19           ` Jonny Grant
  2023-11-08  2:12           ` strncpy clarify result may not be null terminated Matthew House
  0 siblings, 2 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-07 13:23 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man

[-- Attachment #1: Type: text/plain, Size: 420 bytes --]

On Tue, Nov 07, 2023 at 11:52:44AM +0000, Jonny Grant wrote:
> We see things differently, I'm on the C standard side on this one. Would any information change your mind?

It's difficult to say, but I doubt it.  But let me ask you something:
In what cases would you find strncpy(3) appropriate to use, and why?
Maybe if I understand that it helps.

Kind regards,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-07 13:23         ` Alejandro Colomar
@ 2023-11-07 14:19           ` Jonny Grant
  2023-11-07 16:17             ` Alejandro Colomar
  2023-11-08  2:12           ` strncpy clarify result may not be null terminated Matthew House
  1 sibling, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-07 14:19 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: linux-man



On 07/11/2023 13:23, Alejandro Colomar wrote:
> On Tue, Nov 07, 2023 at 11:52:44AM +0000, Jonny Grant wrote:
>> We see things differently, I'm on the C standard side on this one. Would any information change your mind?
> 
> It's difficult to say, but I doubt it.  But let me ask you something:
> In what cases would you find strncpy(3) appropriate to use, and why?
> Maybe if I understand that it helps.
> 
> Kind regards,
> Alex

I don't find strncpy appropriate - that's why I proposed a change to clarify the known defect in the man page of strncpy that C99 describes. Worth reading my first email if you're unclear.

If you doubt the esteemed C standards, I won't add anything further.
Kind regards Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-07 14:19           ` Jonny Grant
@ 2023-11-07 16:17             ` Alejandro Colomar
  2023-11-07 17:00               ` Jonny Grant
  2023-11-08  6:18               ` Oskari Pirhonen
  0 siblings, 2 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-07 16:17 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man

[-- Attachment #1: Type: text/plain, Size: 4236 bytes --]

Hi Jonny,

On Tue, Nov 07, 2023 at 02:19:56PM +0000, Jonny Grant wrote:
> 
> 
> On 07/11/2023 13:23, Alejandro Colomar wrote:
> > On Tue, Nov 07, 2023 at 11:52:44AM +0000, Jonny Grant wrote:
> >> We see things differently, I'm on the C standard side on this one. Would any information change your mind?
> > 
> > It's difficult to say, but I doubt it.  But let me ask you something:
> > In what cases would you find strncpy(3) appropriate to use, and why?
> > Maybe if I understand that it helps.
> > 
> > Kind regards,
> > Alex
> 
> I don't find strncpy appropriate 

Would any information change your mind in this regard?

Let me show you some structure to which you should write using strncpy(3):

$ man utmp | sed 's/^           //' | grepc -h utmp
struct utmp {
    short   ut_type;              /* Type of record */
    pid_t   ut_pid;               /* PID of login process */
    char    ut_line[UT_LINESIZE]; /* Device name of tty - "/dev/" */
    char    ut_id[4];             /* Terminal name suffix,
                                     or inittab(5) ID */
    char    ut_user[UT_NAMESIZE]; /* Username */
    char    ut_host[UT_HOSTSIZE]; /* Hostname for remote login, or
                                     kernel version for run-level
                                     messages */
    struct  exit_status ut_exit;  /* Exit status of a process
                                     marked as DEAD_PROCESS; not
                                     used by Linux init(1) */
    /* The ut_session and ut_tv fields must be the same size when
       compiled 32- and 64-bit.  This allows data files and shared
       memory to be shared between 32- and 64-bit applications. */
#if __WORDSIZE == 64 && defined __WORDSIZE_COMPAT32
    int32_t ut_session;           /* Session ID (getsid(2)),
                                     used for windowing */
    struct {
        int32_t tv_sec;           /* Seconds */
        int32_t tv_usec;          /* Microseconds */
    } ut_tv;                      /* Time entry was made */
#else
     long   ut_session;           /* Session ID */
     struct timeval ut_tv;        /* Time entry was made */
#endif

    int32_t ut_addr_v6[4];        /* Internet address of remote
                                     host; IPv4 address uses
                                     just ut_addr_v6[0] */
    char __unused[20];            /* Reserved for future use */
};


The fields 'ut_line', 'ut_user', amd 'ut_host' are fixed-width character
array without a terminating NUL.  I wish this API hadn't been designed
this way, and thus that strncpy(3) wouldn't be useful for writing to
these structures, but we got what we got.  strcpy(3) and strlcpy(3) will
both try to write a NUL byte, thus not being able to use the last one
byte.  I would happily waste that last byte, but then if you write
portable shadow utils that are compatible with other software that may
have written those fields previously, you need to be able to support
that last character, and so you need strncpy(3).


>- that's why I proposed a change to clarify the known defect in the man page of strncpy that C99 describes. Worth reading my first email if you're unclear.

I would love to find this API useless, and in that case, I'd go further
and add [[deprecated]] in the synopsis, and write a heavy statement in a
BUGS section.  But I can't do that while it's still a good function in
some cases (even if those cases are bad design, such as utmp(5)).

On the other hand, utmp(5) has other issues, like Y2038, and AFAIR it's
being deprecated, so maybe we could consider deprecating strncpy(3).

If I see enough proof that all APIs that require this function are
deprecated, I'll happily declare the function deprecated as well.
(in fact I already did some time ago, but then found this use with
utmp(5), which is why I removed the deprecation; see
<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/man3/strncpy.3?id=30d458d1a6261221bad15e58f1862e0dda24f4a0>).

Cheers,
Alex

> 
> If you doubt the esteemed C standards, I won't add anything further.
> Kind regards Jonny

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-07 16:17             ` Alejandro Colomar
@ 2023-11-07 17:00               ` Jonny Grant
  2023-11-07 17:20                 ` Alejandro Colomar
  2023-11-08  6:18               ` Oskari Pirhonen
  1 sibling, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-07 17:00 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: linux-man

Your comments don't relate to aligning the man page to C99 spec.
Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-07 17:00               ` Jonny Grant
@ 2023-11-07 17:20                 ` Alejandro Colomar
  0 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-07 17:20 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man

[-- Attachment #1: Type: text/plain, Size: 537 bytes --]

On Tue, Nov 07, 2023 at 05:00:19PM +0000, Jonny Grant wrote:
> Your comments don't relate to aligning the man page to C99 spec.

No, and blindly repeating what the spec says isn't positive in itself.
My comments align with recommending safe use of libc functions, and
recommending against using bogus functions.  For reading the spec, we
already have the spec.  I only want to add information if it is useful.
I welcome you to convince me that it's useful.

Thanks,
Alex

> Jonny

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-07 13:23         ` Alejandro Colomar
  2023-11-07 14:19           ` Jonny Grant
@ 2023-11-08  2:12           ` Matthew House
  2023-11-08 19:33             ` Alejandro Colomar
  1 sibling, 1 reply; 138+ messages in thread
From: Matthew House @ 2023-11-08  2:12 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jonny Grant, linux-man

On Tue, Nov 7, 2023 at 8:21 AM Alejandro Colomar <alx@kernel.org> wrote:
> On Tue, Nov 07, 2023 at 11:52:44AM +0000, Jonny Grant wrote:
> > We see things differently, I'm on the C standard side on this one. Would any information change your mind?
>
> It's difficult to say, but I doubt it.  But let me ask you something:
> In what cases would you find strncpy(3) appropriate to use, and why?
> Maybe if I understand that it helps.
>
> Kind regards,
> Alex

Man pages aren't read only by people writing new code, but also by people
reading and modifying existing code. And despite your preferences regarding
which functions ought to be used to produce strings, it's a widespread (and
correct) practice to produce a string from the character sequence created
by strncpy(3). There are two ways of doing this, either by setting the last
character of the destination buffer to null if you want to produce a
truncated string, or by testing the last character against zero if you want
to detect truncation and raise an error.

I'm not aware of any alternative to a strncpy(3)-based snippet for
producing a possibly-truncated copy of a string, except for your preferred
strlcpy(3) or stpecpy(3), which aren't available to anyone without a
brand-new glibc (nor, by extension, any applications or libraries that want
to support people without a brand-new glibc, nor any libraries that want to
support other platforms like Windows with only ISO C and POSIX-ish
functions); snprintf(3), which has the insidious flaw of not supporting
more than INT_MAX characters on pain of UB, and also produces a warning if
the compiler notices the possible truncation; or strlen(3) + min() +
memcpy(3) + manually adding a null terminator, which is certainly more
explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if
that poses a performance problem, but similarly opens up room for
off-by-one errors.

For the sake of reference, I looked into a few big C and C++ projects to
see how often a strncpy(3)-based snippet was used to produce a truncated
copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite
of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in
GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22,
10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note
that I haven't filtered out vendored dependencies, so there's a little bit
of double-counting.) It seems like most codebases that don't ban strncpy(3)
use a derived snippet somewhere or another. Also, I found 3 instances in
glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by
checking the last character.

So these two snippets really are widespread, especially among the long tail
of smaller C and C++ applications and libraries that don't perform enough
string manipulation that it warrants creating a custom set of more-
foolproof wrapper functions (at least, in the opinion of their authors).
Thus, since they're not going away, it would be useful for anyone reading
the code to understand the concept behind how these two snippets work, that
the only difference between the strncpy(3)'s special "character sequence"
and an ordinary C string is an additional null terminator at the end of the
destination buffer.

In other words, strncpy(3) doesn't create a truncated string, but it
creates something which can be easily turned into to a truncated string,
and that's its most relevant quality for most of its uses in existing code.
Further, apart from snprintf(3), there's no other portable way to produce a
truncated string without manual arithmetic. Thus, I'd also find it
reasonable to highlight precisely why strncpy(3)'s output isn't a string
(viz., the lack of a null terminator), instead of trying to insist that its
output is worlds apart from anything string-related, especially given the
volume of existing correct code that belies that notion.

Or, to answer your question, "It's appropriate to keep using strncpy(3) in
existing code where it's currently used as part of creating a truncated
string, and it's not especially inappropriate to use strncpy(3) in new code
as part of creating a truncated string, if the code must support platforms
without strlcpy(3) or similar, and if the resulting snippets are few enough
and well-commented enough that they create less mental load than creating
and maintaining a custom helper function."

(As an aside, I find the remark in the man page that "It's impossible to
distinguish truncation by the result of the call" extremely misleading at
best, since truncation can easily be distinguished by inspecting the last
output character.)

Thank you,
Matthew House

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-07 16:17             ` Alejandro Colomar
  2023-11-07 17:00               ` Jonny Grant
@ 2023-11-08  6:18               ` Oskari Pirhonen
  2023-11-08  9:51                 ` Alejandro Colomar
  1 sibling, 1 reply; 138+ messages in thread
From: Oskari Pirhonen @ 2023-11-08  6:18 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 1306 bytes --]

On Tue, Nov 07, 2023 at 17:17:29 +0100, Alejandro Colomar wrote:
> 
> I would love to find this API useless, and in that case, I'd go further
> and add [[deprecated]] in the synopsis, and write a heavy statement in a
> BUGS section.  But I can't do that while it's still a good function in
> some cases (even if those cases are bad design, such as utmp(5)).
> 
> On the other hand, utmp(5) has other issues, like Y2038, and AFAIR it's
> being deprecated, so maybe we could consider deprecating strncpy(3).
> 
> If I see enough proof that all APIs that require this function are
> deprecated, I'll happily declare the function deprecated as well.
> (in fact I already did some time ago, but then found this use with
> utmp(5), which is why I removed the deprecation; see
> <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/man3/strncpy.3?id=30d458d1a6261221bad15e58f1862e0dda24f4a0>).
> 

If you ask me, I'd not mark libc functions as deprecated without some
kind of consesnsus from the libc maintainers too. They may not go so far
as to add the `deprecated` attribute in their own headers, at least not
yet at that point in time, but some kind of written "Yes, please don't
use this function" would be nice to have before marking them in the man
pages.

- Oskari

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08  6:18               ` Oskari Pirhonen
@ 2023-11-08  9:51                 ` Alejandro Colomar
  2023-11-08  9:59                   ` Thorsten Kukuk
                                     ` (2 more replies)
  0 siblings, 3 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-08  9:51 UTC (permalink / raw)
  To: libc-alpha, Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 1930 bytes --]

On Wed, Nov 08, 2023 at 12:18:09AM -0600, Oskari Pirhonen wrote:
> On Tue, Nov 07, 2023 at 17:17:29 +0100, Alejandro Colomar wrote:
> > 
> > I would love to find this API useless, and in that case, I'd go further
> > and add [[deprecated]] in the synopsis, and write a heavy statement in a
> > BUGS section.  But I can't do that while it's still a good function in
> > some cases (even if those cases are bad design, such as utmp(5)).
> > 
> > On the other hand, utmp(5) has other issues, like Y2038, and AFAIR it's
> > being deprecated, so maybe we could consider deprecating strncpy(3).
> > 
> > If I see enough proof that all APIs that require this function are
> > deprecated, I'll happily declare the function deprecated as well.
> > (in fact I already did some time ago, but then found this use with
> > utmp(5), which is why I removed the deprecation; see
> > <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/man3/strncpy.3?id=30d458d1a6261221bad15e58f1862e0dda24f4a0>).
> > 
> 
> If you ask me, I'd not mark libc functions as deprecated without some
> kind of consesnsus from the libc maintainers too. They may not go so far
> as to add the `deprecated` attribute in their own headers, at least not
> yet at that point in time, but some kind of written "Yes, please don't
> use this function" would be nice to have before marking them in the man
> pages.

Okay, let's ask them.

Hi glibc developers,

strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
and `struct utmpx`.  Is there any other libc API that needs strncpy(3)?
Of those two APIs (utmp and utmpx) and any other that need strncpy(3),
are those deprecated, or is any such API still good for new code?

If all APIs that need strncpy(3) are deprecated, I propose recommending
against its use in new code.

Thanks,
Alex

> 
> - Oskari



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08  9:51                 ` Alejandro Colomar
@ 2023-11-08  9:59                   ` Thorsten Kukuk
  2023-11-08 15:09                     ` Alejandro Colomar
       [not found]                     ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
  2023-11-08 14:06                   ` Zack Weinberg
  2023-11-08 19:04                   ` DJ Delorie
  2 siblings, 2 replies; 138+ messages in thread
From: Thorsten Kukuk @ 2023-11-08  9:59 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: libc-alpha, Jonny Grant, linux-man

On Wed, Nov 08, Alejandro Colomar wrote:

> strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
> and `struct utmpx`.  Is there any other libc API that needs strncpy(3)?
> Of those two APIs (utmp and utmpx) and any other that need strncpy(3),
> are those deprecated, or is any such API still good for new code?

Everything around utmp/utmpx/wtmp/lastlog is deprecated.

openSUSE Tumbleweed and MicroOS are no longer using nor supporting them
and fresh installations don't have that files anymore.
So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives
are e.g. systemd-logind/wtmpdb/lastlog2.

  Thorsten

-- 
Thorsten Kukuk, Distinguished Engineer, Senior Architect, Future Technologies
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nuernberg, Germany
Managing Director: Ivo Totev, Andrew McDonald, Werner Knoblich
(HRB 36809, AG Nürnberg)

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08  9:51                 ` Alejandro Colomar
  2023-11-08  9:59                   ` Thorsten Kukuk
@ 2023-11-08 14:06                   ` Zack Weinberg
  2023-11-08 15:07                     ` Alejandro Colomar
  2023-11-08 19:04                   ` DJ Delorie
  2 siblings, 1 reply; 138+ messages in thread
From: Zack Weinberg @ 2023-11-08 14:06 UTC (permalink / raw)
  To: Alejandro Colomar, GNU libc development, Jonny Grant,
	'linux-man'

>> If you ask me, I'd not mark libc functions as deprecated without some
>> kind of consesnsus from the libc maintainers too.
...
> Okay, let's ask them.
...
> Hi glibc developers,
>
> strncpy(3)
...

Speaking only for myself, I would be very reluctant to declare any standardized function "deprecated" by glibc unless the relevant standards have also made that declaration. This goes double for anything that was in C89.

Also speaking only for myself, the Linux manpages are welcome to discourage the use of any function that you feel is not a wise choice for new programs, but the word "deprecated" should be reserved for cases where there really has been a declaration of deprecation by us and/or the standards. The word "obsolete" should also be used very cautiously; it's broader, but I personally would only use it in situations where there is a direct replacement (e.g. sigaction replaces signal, strsep replaces strtok and strtok_r).

In the specific cases we're discussing: I would definitely like to see a BUGS or NOTES section in the strncpy(3) manpage, warning people that it's probably not what they want and recommending use of strlen+memcpy instead. I don't know enough about the utmp(x) situation to have a strong opinion, but I do think the manpages need to be very clear that this particular proposed replacement for utmp(x) is Linux-specific and still somewhat experimental.

zw

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 14:06                   ` Zack Weinberg
@ 2023-11-08 15:07                     ` Alejandro Colomar
  2023-11-08 19:45                       ` G. Branden Robinson
  2023-11-08 21:35                       ` Carlos O'Donell
  0 siblings, 2 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-08 15:07 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: GNU libc development, Jonny Grant, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 3528 bytes --]

Hi Zack!

On Wed, Nov 08, 2023 at 09:06:48AM -0500, Zack Weinberg wrote:
> >> If you ask me, I'd not mark libc functions as deprecated without some
> >> kind of consesnsus from the libc maintainers too.
> ...
> > Okay, let's ask them.
> ...
> > Hi glibc developers,
> >
> > strncpy(3)
> ...
> 
> Speaking only for myself, I would be very reluctant to declare any
> standardized function "deprecated" by glibc unless the relevant
> standards have also made that declaration.  This goes double for
> anything that was in C89.

I understand your point of view, but disagree with it.  Deprecation by
ISO C or POSIX takes very very long.  We had gets(3) for decades until
they realized it should be removed from the standards.

	STANDARDS
	     POSIX.1‐2008.

	HISTORY
	     C89, POSIX.1‐2001.

	     LSB deprecates gets().  POSIX.1‐2008 marks gets()  obsoles‐
	     cent.  ISO C11 removes the specification of gets() from the
	     C  language, and since glibc 2.16, glibc header files don’t
	     expose the function declaration if the _ISOC11_SOURCE  fea‐
	     ture test macro is defined.

So we had it in ISO C in C89 and C99, and only in C11 they realized it
had to be removed.  POSIX hasn't even removed it yet!  I won't hesitate
to kill a function just because of bureaucracy.

The standard, especially C89, was just a reflection of the commonalities
of most implementation.  It was a burden of implementations to add new
stuff or to remove existing stuff.  Later revisions of the standards
invented more, though.

In this case, since ISO C has no APIs that use strncpy(3), it could (and
should) already deprecate strncpy(3) from ISO C.  POSIX still needs it
while it keeps utmpx(5), because there's no other way to correctly write
to the fixed-width buffers within struct utmpx.

> 
> Also speaking only for myself, the Linux manpages are welcome to
> discourage the use of any function that you feel is not a wise choicei
> for new programs, but the word "deprecated" should be reserved for
> cases where there really has been a declaration of deprecation by us
> and/or the standards.

If a function is deprecated by a standard or other entity, that will be
reflected in the STANDARDS or HISTORY section.  For deprecation by the
manual itself, the SYNOPSIS (and BUGS) sections are fine.  In the end,
the word 'deprecate' isn't any magic.

	From WordNet (r) 3.0 (2006) [wn]:

	  deprecate
	      v 1: express strong disapproval of; deplore

That term applies to strncpy(3).

> The word "obsolete" should also be used very cautiously; it's broader,
> but I personally would only use it in situations where there is a
> direct replacement (e.g. sigaction replaces signal, strsep replaces strtok and strtok_r).
> 
> In the specific cases we're discussing: I would definitely like to see
> a BUGS or NOTES section in the strncpy(3) manpage, warning people that
> it's probably not what they want and recommending use of strlen+memcpy
> instead. I don't know enough about the utmp(x) situation to have a
> strong opinion, but I do think the manpages need to be very clear that
> this particular proposed replacement for utmp(x) is Linux-specific and
> still somewhat experimental.

But yes, we need to make sure that the APIs that need strncpy(3) are
all deprecated.  If other Unix systems still need utmpx or similar
stuff, strncpy(3) will still be necessary.

Cheers,
Alex

> 
> zw

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08  9:59                   ` Thorsten Kukuk
@ 2023-11-08 15:09                     ` Alejandro Colomar
       [not found]                     ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
  1 sibling, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-08 15:09 UTC (permalink / raw)
  To: Thorsten Kukuk; +Cc: libc-alpha, Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 899 bytes --]

On Wed, Nov 08, 2023 at 09:59:11AM +0000, Thorsten Kukuk wrote:
> On Wed, Nov 08, Alejandro Colomar wrote:
> 
> > strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
> > and `struct utmpx`.  Is there any other libc API that needs strncpy(3)?
> > Of those two APIs (utmp and utmpx) and any other that need strncpy(3),
> > are those deprecated, or is any such API still good for new code?
> 

Hi Thorsten!

> Everything around utmp/utmpx/wtmp/lastlog is deprecated.

Is this a Linux-specific thing?  Do you know if the BSDs also deprecated
utmpx?

Thanks,
Alex

> 
> openSUSE Tumbleweed and MicroOS are no longer using nor supporting them
> and fresh installations don't have that files anymore.
> So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives
> are e.g. systemd-logind/wtmpdb/lastlog2.

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
       [not found]                     ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
@ 2023-11-08 15:44                       ` Thorsten Kukuk
  2023-11-08 17:26                         ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 138+ messages in thread
From: Thorsten Kukuk @ 2023-11-08 15:44 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: libc-alpha, Jonny Grant, linux-man

On Wed, Nov 08, Alejandro Colomar wrote:

> On Wed, Nov 08, 2023 at 09:59:11AM +0000, Thorsten Kukuk wrote:
> > On Wed, Nov 08, Alejandro Colomar wrote:
> > 
> > > strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
> > > and `struct utmpx`.  Is there any other libc API that needs strncpy(3)?
> > > Of those two APIs (utmp and utmpx) and any other that need strncpy(3),
> > > are those deprecated, or is any such API still good for new code?
> > 
> 
> Hi Thorsten!
> 
> > Everything around utmp/utmpx/wtmp/lastlog is deprecated.
> 
> Is this a Linux-specific thing?  Do you know if the BSDs also deprecated
> utmpx?

Beside the design issues of the interface, which are generic, the Y2038
issue is more or less glibc specific and a result of supporting 32bit
and 64bit userland at the same time.
For most other implementations I'm aware of there is no Y2038 problem,
either because they don't support utmp/utmpx/... like musl libc, or they
were able to switch to a 64bit time variable or used that already.
So no need to change anything.
For BSD I don't really know the situation, but as far as I know, they
don't have the problem and thus no need to change anything.

  Thorsten

> Thanks,
> Alex
> 
> > 
> > openSUSE Tumbleweed and MicroOS are no longer using nor supporting them
> > and fresh installations don't have that files anymore.
> > So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives
> > are e.g. systemd-logind/wtmpdb/lastlog2.
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
Thorsten Kukuk, Distinguished Engineer, Senior Architect, Future Technologies
SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nuernberg, Germany
Managing Director: Ivo Totev, Andrew McDonald, Werner Knoblich
(HRB 36809, AG Nürnberg)

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 15:44                       ` Thorsten Kukuk
@ 2023-11-08 17:26                         ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 138+ messages in thread
From: Adhemerval Zanella Netto @ 2023-11-08 17:26 UTC (permalink / raw)
  To: Thorsten Kukuk, Alejandro Colomar; +Cc: libc-alpha, Jonny Grant, linux-man



On 08/11/23 12:44, Thorsten Kukuk wrote:
> On Wed, Nov 08, Alejandro Colomar wrote:
> 
>> On Wed, Nov 08, 2023 at 09:59:11AM +0000, Thorsten Kukuk wrote:
>>> On Wed, Nov 08, Alejandro Colomar wrote:
>>>
>>>> strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
>>>> and `struct utmpx`.  Is there any other libc API that needs strncpy(3)?
>>>> Of those two APIs (utmp and utmpx) and any other that need strncpy(3),
>>>> are those deprecated, or is any such API still good for new code?
>>>
>>
>> Hi Thorsten!
>>
>>> Everything around utmp/utmpx/wtmp/lastlog is deprecated.
>>
>> Is this a Linux-specific thing?  Do you know if the BSDs also deprecated
>> utmpx?
> 
> Beside the design issues of the interface, which are generic, the Y2038
> issue is more or less glibc specific and a result of supporting 32bit
> and 64bit userland at the same time.
> For most other implementations I'm aware of there is no Y2038 problem,
> either because they don't support utmp/utmpx/... like musl libc, or they
> were able to switch to a 64bit time variable or used that already.
> So no need to change anything.

In fact the glibc utmp y2038 support depends of the ABI, some 64 bit ABIs
decided to be compatible with 32 bits so the utmp files could be read/parsed
by both ABIs (defined by __WORDSIZE_TIME64_COMPAT32).  This required the 
ut_tv field to be define not as a 'struct timeval', but rather with a similar
struct with 32 bit tv_sec (yes, it is a mess and not sure why it was
considered a good idea back then).

It means that for 64 bits that define __WORDSIZE_TIME64_COMPAT32ABI (mips, 
riscv, s390, sparc, powerpc, and x86) the utmp ABI is broken regarding
y2038 support. The ut_tv is also defined depending of the time_t at build 
time (_TIME_BITS), so if you have programs with different time_t support, 
they won't correctly access the utmp (gnulib seems to have some overrides 
to fix it).

Fixing those issues would require a lot of work that I don't think it worth 
for a API with some inherent implementation flaws [1] (most likely it would
require a complete rewrite, which logind basically did).  That's why I am
leaning to complete remove glibc implementation and mimic what musl did
(no-op implementation that return -1/ENOTSUP where applicable). 

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=24492

> For BSD I don't really know the situation, but as far as I know, they
> don't have the problem and thus no need to change anything.
> 
>   Thorsten
> 
>> Thanks,
>> Alex
>>
>>>
>>> openSUSE Tumbleweed and MicroOS are no longer using nor supporting them
>>> and fresh installations don't have that files anymore.
>>> So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives
>>> are e.g. systemd-logind/wtmpdb/lastlog2.
>>
>> -- 
>> <https://www.alejandro-colomar.es/>
> 
> 
> 

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08  9:51                 ` Alejandro Colomar
  2023-11-08  9:59                   ` Thorsten Kukuk
  2023-11-08 14:06                   ` Zack Weinberg
@ 2023-11-08 19:04                   ` DJ Delorie
  2023-11-08 19:40                     ` Alejandro Colomar
  2 siblings, 1 reply; 138+ messages in thread
From: DJ Delorie @ 2023-11-08 19:04 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: libc-alpha, jg, linux-man

Alejandro Colomar <alx@kernel.org> writes:
> strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
> and `struct utmpx`.  Is there any other libc API that needs strncpy(3)?

Let's not limit ourselves to glibc APIs.  Tar format, for example, uses
fixed length fields (and my bet is that strncpy was created for it) yet
tar is not part of glibc.

IMHO the solution here is to document strncpy with sufficiently obvious
intent that it is NOT a length-limited strcpy (i.e. strlcpy) and should
ONLY be used for its intended purpose (filling a space-padded but not
null-terminated field)

It is not documentation's purpose to limit programmer's creativity, just
to give them an accurate representation of what the functions do.


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08  2:12           ` strncpy clarify result may not be null terminated Matthew House
@ 2023-11-08 19:33             ` Alejandro Colomar
  2023-11-08 19:40               ` Alejandro Colomar
                                 ` (2 more replies)
  0 siblings, 3 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-08 19:33 UTC (permalink / raw)
  To: Matthew House; +Cc: Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 8926 bytes --]

Hi Matthew,

On Tue, Nov 07, 2023 at 09:12:37PM -0500, Matthew House wrote:
> On Tue, Nov 7, 2023 at 8:21 AM Alejandro Colomar <alx@kernel.org> wrote:
> > On Tue, Nov 07, 2023 at 11:52:44AM +0000, Jonny Grant wrote:
> > > We see things differently, I'm on the C standard side on this one. Would any information change your mind?
> >
> > It's difficult to say, but I doubt it.  But let me ask you something:
> > In what cases would you find strncpy(3) appropriate to use, and why?
> > Maybe if I understand that it helps.
> >
> > Kind regards,
> > Alex
> 
> Man pages aren't read only by people writing new code, but also by people
> reading and modifying existing code. And despite your preferences regarding
> which functions ought to be used to produce strings, it's a widespread (and
> correct) practice to produce a string from the character sequence created
> by strncpy(3). There are two ways of doing this, either by setting the last
> character of the destination buffer to null if you want to produce a
> truncated string, or by testing the last character against zero if you want
> to detect truncation and raise an error.

It is not strncpy(3) who truncated, but the programmer by adding a NULL
in buff[BUFSIZ - 1].  In the following snippet, strncpy(3) will not
truncate:

	char cs[3];

	strncpy(cs, "foo", 3);

And yet your code doing if (cs[2] != '\0') { goto error; } would think
it did.  That's because you deformed strncpy(3) to implement a poor
man's strlcpy(3).  


	char cs[3];

	strncpy(cs, "foo", 3);
	cs[2] = '\0';  // The truncation is here, not in strncpy(3).

> I'm not aware of any alternative to a strncpy(3)-based snippet for
> producing a possibly-truncated copy of a string, except for your preferred
> strlcpy(3) or stpecpy(3), which aren't available to anyone without a

The Linux kernel has strscpy(3), which is also good, but is not
available to user space.

> brand-new glibc (nor, by extension, any applications or libraries that want

libbsd has provided strlcpy(3) since basically forever.  It is a very
portable library.  You don't need a brand-new glibc for having
strlcpy(3).

<https://libbsd.freedesktop.org/wiki/>

> to support people without a brand-new glibc, nor any libraries that want to
> support other platforms like Windows with only ISO C and POSIX-ish

If you program for Windows, it depends.  If you have POSIX available,
you may be able to port libbsd; I don't know.  In any case, I don't
case about Windows enough.  You could always write your own string-
copying function for Windows.

> functions); snprintf(3), which has the insidious flaw of not supporting
> more than INT_MAX characters on pain of UB, and also produces a warning if
> the compiler notices the possible truncation; or strlen(3) + min() +
> memcpy(3) + manually adding a null terminator, which is certainly more
> explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if
> that poses a performance problem, but similarly opens up room for
> off-by-one errors.

More than the performance problem, I'm more worried about the
maintainability of strncpy(3).  When 20 years from now, a programmer
reading a piece of code full of strncpy(3) wants to migrate to a sane
function like strlcpy(3) or strcpy(3), the programmer needs to
understand if the zeroing was purposeful or just accidental.  Because
by using strlcpy(3), it may start leaking some trailing data if the
trailing of the buffer is meaningful to some program.

> 
> For the sake of reference, I looked into a few big C and C++ projects to
> see how often a strncpy(3)-based snippet was used to produce a truncated
> copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite
> of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in
> GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22,
> 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note
> that I haven't filtered out vendored dependencies, so there's a little bit
> of double-counting.) It seems like most codebases that don't ban strncpy(3)
> use a derived snippet somewhere or another. Also, I found 3 instances in
> glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by
> checking the last character.

I know.  I've been rewriting the code handling strings in shadow-utils
for the last year, and ther was a lot of it.  I fixed several small bugs
in the process, so I recommend avoiding it.

> 
> So these two snippets really are widespread, especially among the long tail
> of smaller C and C++ applications and libraries that don't perform enough
> string manipulation that it warrants creating a custom set of more-
> foolproof wrapper functions (at least, in the opinion of their authors).



> Thus, since they're not going away, it would be useful for anyone reading
> the code to understand the concept behind how these two snippets work, that
> the only difference between the strncpy(3)'s special "character sequence"
> and an ordinary C string is an additional null terminator at the end of the
> destination buffer.

This is part of string_copying(7):

DESCRIPTION
   Terms (and abbreviations)
     string (str)
            is  a  sequence  of zero or more non‐null characters followed by a
            null byte.

     character sequence
            is a sequence of zero or  more  non‐null  characters.   A  program
            should  never use a character sequence where a string is required.
            However, with appropriate care, a string can be used in the  place
            of a character sequence.

I think that is very explicit in the difference.  strncpy(3) refers to
that page for understanding the differences, so I think it is
documented.

strncpy(3):
CAVEATS
     The  name  of  these  functions  is confusing.  These functions produce a
     null‐padded character sequence, not a string (see string_copying(7)).

> 
> In other words, strncpy(3) doesn't create a truncated string, but it
> creates something which can be easily turned into to a truncated string,
> and that's its most relevant quality for most of its uses in existing code.
> Further, apart from snprintf(3), there's no other portable way to produce a
> truncated string without manual arithmetic. Thus, I'd also find it

Portable is relative.  With libbsd, you can port to most POSIX systems.
Windows is another story.

> reasonable to highlight precisely why strncpy(3)'s output isn't a string

How about this?:

diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index d4c2ce83d..c80c8b640 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -108,7 +108,10 @@ .SH HISTORY
 .SH CAVEATS
 The name of these functions is confusing.
 These functions produce a null-padded character sequence,
-not a string (see
+not a string.
+While strings have a terminating NUL byte,
+character sequences do not have any terminating byte
+(see
 .BR string_copying (7)).
 .P
 It's impossible to distinguish truncation by the result of the call,


> (viz., the lack of a null terminator), instead of trying to insist that its
> output is worlds apart from anything string-related, especially given the
> volume of existing correct code that belies that notion.

It is not correct code.  That code is doing extra work which confuses
maintainers.  It is a lot like writing dead code, since you're writing
zeros that nobody is reading, which confuses maintainers.

Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no,
it's not correct code.  It's rather dangerous code that just happens to
not be vulnerable most of the time.

> 
> Or, to answer your question, "It's appropriate to keep using strncpy(3) in
> existing code where it's currently used as part of creating a truncated
> string, and it's not especially inappropriate to use strncpy(3) in new code
> as part of creating a truncated string, if the code must support platforms
> without strlcpy(3) or similar, and if the resulting snippets are few enough
> and well-commented enough that they create less mental load than creating
> and maintaining a custom helper function."

strncpy(3) calls are never well documented.  Do you add a comment in
each such call saying "this zeroing is superfluous"?  Probably not.

> 
> (As an aside, I find the remark in the man page that "It's impossible to
> distinguish truncation by the result of the call" extremely misleading at
> best, since truncation can easily be distinguished by inspecting the last
> output character.)

Again, strncpy(3)'s truncation is impossible to detect.  What you can
detect is that your construct that resembles strlcpy(3) truncates, which
is a different thing.

Thanks,
Alex

> 
> Thank you,
> Matthew House

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 19:04                   ` DJ Delorie
@ 2023-11-08 19:40                     ` Alejandro Colomar
  2023-11-08 19:58                       ` DJ Delorie
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-08 19:40 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha, jg, linux-man

[-- Attachment #1: Type: text/plain, Size: 1849 bytes --]

Hi DJ,

On Wed, Nov 08, 2023 at 02:04:45PM -0500, DJ Delorie wrote:
> Alejandro Colomar <alx@kernel.org> writes:
> > strncpy(3) is useful to write to fixed-width buffers like `struct utmp`
> > and `struct utmpx`.  Is there any other libc API that needs strncpy(3)?
> 
> Let's not limit ourselves to glibc APIs.  Tar format, for example, uses
> fixed length fields (and my bet is that strncpy was created for it) yet
> tar is not part of glibc.
> 
> IMHO the solution here is to document strncpy with sufficiently obvious
> intent that it is NOT a length-limited strcpy (i.e. strlcpy) and should
> ONLY be used for its intended purpose (filling a space-padded but not
> null-terminated field)

Indeed.  That's what I did (I think).

DESCRIPTION
     These  functions  copy  the string pointed to by src into a null‐
     padded character sequence at the fixed‐width buffer pointed to by
     dst.  If the destination buffer, limited by its size, isn’t large
     enough to hold the copy,  the  resulting  character  sequence  is
     truncated.

...

CAVEATS
     The name of these functions is confusing.  These  functions  pro‐
     duce   a  null‐padded  character  sequence,  not  a  string  (see
     string_copying(7)).

     It’s impossible to distinguish truncation by the  result  of  the
     call,  from  a  character sequence that just fits the destination
     buffer; truncation should be detected by comparing the length  of
     the input string with the size of the destination buffer.


I refuse to add any hints that strncpy(3) is good for copying strings.

> 
> It is not documentation's purpose to limit programmer's creativity, just
> to give them an accurate representation of what the functions do.

Thanks!

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 19:33             ` Alejandro Colomar
@ 2023-11-08 19:40               ` Alejandro Colomar
  2023-11-09  3:13               ` Matthew House
  2023-11-10 10:40               ` strncpy clarify result may not be null terminated Stefan Puiu
  2 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-08 19:40 UTC (permalink / raw)
  To: Matthew House; +Cc: Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 9484 bytes --]

On Wed, Nov 08, 2023 at 08:33:34PM +0100, Alejandro Colomar wrote:
> Hi Matthew,
> 
> On Tue, Nov 07, 2023 at 09:12:37PM -0500, Matthew House wrote:
> > On Tue, Nov 7, 2023 at 8:21 AM Alejandro Colomar <alx@kernel.org> wrote:
> > > On Tue, Nov 07, 2023 at 11:52:44AM +0000, Jonny Grant wrote:
> > > > We see things differently, I'm on the C standard side on this one. Would any information change your mind?
> > >
> > > It's difficult to say, but I doubt it.  But let me ask you something:
> > > In what cases would you find strncpy(3) appropriate to use, and why?
> > > Maybe if I understand that it helps.
> > >
> > > Kind regards,
> > > Alex
> > 
> > Man pages aren't read only by people writing new code, but also by people
> > reading and modifying existing code. And despite your preferences regarding
> > which functions ought to be used to produce strings, it's a widespread (and
> > correct) practice to produce a string from the character sequence created
> > by strncpy(3). There are two ways of doing this, either by setting the last
> > character of the destination buffer to null if you want to produce a
> > truncated string, or by testing the last character against zero if you want
> > to detect truncation and raise an error.
> 
> It is not strncpy(3) who truncated, but the programmer by adding a NULL

Oops.  s/NULL/NUL/

> in buff[BUFSIZ - 1].  In the following snippet, strncpy(3) will not
> truncate:
> 
> 	char cs[3];
> 
> 	strncpy(cs, "foo", 3);
> 
> And yet your code doing if (cs[2] != '\0') { goto error; } would think
> it did.  That's because you deformed strncpy(3) to implement a poor
> man's strlcpy(3).  
> 
> 
> 	char cs[3];
> 
> 	strncpy(cs, "foo", 3);
> 	cs[2] = '\0';  // The truncation is here, not in strncpy(3).
> 
> > I'm not aware of any alternative to a strncpy(3)-based snippet for
> > producing a possibly-truncated copy of a string, except for your preferred
> > strlcpy(3) or stpecpy(3), which aren't available to anyone without a
> 
> The Linux kernel has strscpy(3), which is also good, but is not
> available to user space.
> 
> > brand-new glibc (nor, by extension, any applications or libraries that want
> 
> libbsd has provided strlcpy(3) since basically forever.  It is a very
> portable library.  You don't need a brand-new glibc for having
> strlcpy(3).
> 
> <https://libbsd.freedesktop.org/wiki/>
> 
> > to support people without a brand-new glibc, nor any libraries that want to
> > support other platforms like Windows with only ISO C and POSIX-ish
> 
> If you program for Windows, it depends.  If you have POSIX available,
> you may be able to port libbsd; I don't know.  In any case, I don't
> case about Windows enough.  You could always write your own string-
> copying function for Windows.
> 
> > functions); snprintf(3), which has the insidious flaw of not supporting
> > more than INT_MAX characters on pain of UB, and also produces a warning if
> > the compiler notices the possible truncation; or strlen(3) + min() +
> > memcpy(3) + manually adding a null terminator, which is certainly more
> > explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if
> > that poses a performance problem, but similarly opens up room for
> > off-by-one errors.
> 
> More than the performance problem, I'm more worried about the
> maintainability of strncpy(3).  When 20 years from now, a programmer
> reading a piece of code full of strncpy(3) wants to migrate to a sane
> function like strlcpy(3) or strcpy(3), the programmer needs to
> understand if the zeroing was purposeful or just accidental.  Because
> by using strlcpy(3), it may start leaking some trailing data if the
> trailing of the buffer is meaningful to some program.
> 
> > 
> > For the sake of reference, I looked into a few big C and C++ projects to
> > see how often a strncpy(3)-based snippet was used to produce a truncated
> > copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite
> > of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in
> > GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22,
> > 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note
> > that I haven't filtered out vendored dependencies, so there's a little bit
> > of double-counting.) It seems like most codebases that don't ban strncpy(3)
> > use a derived snippet somewhere or another. Also, I found 3 instances in
> > glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by
> > checking the last character.
> 
> I know.  I've been rewriting the code handling strings in shadow-utils
> for the last year, and ther was a lot of it.  I fixed several small bugs
> in the process, so I recommend avoiding it.
> 
> > 
> > So these two snippets really are widespread, especially among the long tail
> > of smaller C and C++ applications and libraries that don't perform enough
> > string manipulation that it warrants creating a custom set of more-
> > foolproof wrapper functions (at least, in the opinion of their authors).
> 
> 
> 
> > Thus, since they're not going away, it would be useful for anyone reading
> > the code to understand the concept behind how these two snippets work, that
> > the only difference between the strncpy(3)'s special "character sequence"
> > and an ordinary C string is an additional null terminator at the end of the
> > destination buffer.
> 
> This is part of string_copying(7):
> 
> DESCRIPTION
>    Terms (and abbreviations)
>      string (str)
>             is  a  sequence  of zero or more non‐null characters followed by a
>             null byte.
> 
>      character sequence
>             is a sequence of zero or  more  non‐null  characters.   A  program
>             should  never use a character sequence where a string is required.
>             However, with appropriate care, a string can be used in the  place
>             of a character sequence.
> 
> I think that is very explicit in the difference.  strncpy(3) refers to
> that page for understanding the differences, so I think it is
> documented.
> 
> strncpy(3):
> CAVEATS
>      The  name  of  these  functions  is confusing.  These functions produce a
>      null‐padded character sequence, not a string (see string_copying(7)).
> 
> > 
> > In other words, strncpy(3) doesn't create a truncated string, but it
> > creates something which can be easily turned into to a truncated string,
> > and that's its most relevant quality for most of its uses in existing code.
> > Further, apart from snprintf(3), there's no other portable way to produce a
> > truncated string without manual arithmetic. Thus, I'd also find it
> 
> Portable is relative.  With libbsd, you can port to most POSIX systems.
> Windows is another story.
> 
> > reasonable to highlight precisely why strncpy(3)'s output isn't a string
> 
> How about this?:
> 
> diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
> index d4c2ce83d..c80c8b640 100644
> --- a/man3/stpncpy.3
> +++ b/man3/stpncpy.3
> @@ -108,7 +108,10 @@ .SH HISTORY
>  .SH CAVEATS
>  The name of these functions is confusing.
>  These functions produce a null-padded character sequence,
> -not a string (see
> +not a string.
> +While strings have a terminating NUL byte,
> +character sequences do not have any terminating byte
> +(see
>  .BR string_copying (7)).
>  .P
>  It's impossible to distinguish truncation by the result of the call,
> 
> 
> > (viz., the lack of a null terminator), instead of trying to insist that its
> > output is worlds apart from anything string-related, especially given the
> > volume of existing correct code that belies that notion.
> 
> It is not correct code.  That code is doing extra work which confuses
> maintainers.  It is a lot like writing dead code, since you're writing
> zeros that nobody is reading, which confuses maintainers.
> 
> Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no,
> it's not correct code.  It's rather dangerous code that just happens to
> not be vulnerable most of the time.
> 
> > 
> > Or, to answer your question, "It's appropriate to keep using strncpy(3) in
> > existing code where it's currently used as part of creating a truncated
> > string, and it's not especially inappropriate to use strncpy(3) in new code
> > as part of creating a truncated string, if the code must support platforms
> > without strlcpy(3) or similar, and if the resulting snippets are few enough
> > and well-commented enough that they create less mental load than creating
> > and maintaining a custom helper function."
> 
> strncpy(3) calls are never well documented.  Do you add a comment in
> each such call saying "this zeroing is superfluous"?  Probably not.
> 
> > 
> > (As an aside, I find the remark in the man page that "It's impossible to
> > distinguish truncation by the result of the call" extremely misleading at
> > best, since truncation can easily be distinguished by inspecting the last
> > output character.)
> 
> Again, strncpy(3)'s truncation is impossible to detect.  What you can
> detect is that your construct that resembles strlcpy(3) truncates, which
> is a different thing.
> 
> Thanks,
> Alex
> 
> > 
> > Thank you,
> > Matthew House
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 15:07                     ` Alejandro Colomar
@ 2023-11-08 19:45                       ` G. Branden Robinson
  2023-11-08 21:35                       ` Carlos O'Donell
  1 sibling, 0 replies; 138+ messages in thread
From: G. Branden Robinson @ 2023-11-08 19:45 UTC (permalink / raw)
  To: 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 9838 bytes --]

[bouncing a copy to linux-man with PDF attachment stripped]

Hi Alex,

At 2023-11-08T16:07:42+0100, Alejandro Colomar wrote:
> I understand your point of view, but disagree with it.  Deprecation by
> ISO C or POSIX takes very very long.  We had gets(3) for decades until
> they realized it should be removed from the standards.

I think it likely that the humans involved in the decision-making
processes realized that gets(3) _should_ be removed a long time before
it actually was.  It is often difficult to get to the truth of why there
is so much inertia, particularly when large commercial vendors are
involved; such entities have long traditions of opacity.

Sometimes it is because they send relatively clueless people as
representatives to the standards body, because they don't value
standards development as "real work" (how does it generate profit?),
because it's a handy place to dump someone who's been awarded a
sinecure--or who annoys many colleagues but isn't worth the effort to
fire, or because that person is on an unstated mission to frustrate a
market rival and doesn't care what the collateral damage is.

My favorite example of the last is when Groupe Bull sent a fool[1] to
the ISO 8859 standardization group.  DEC's MCS (multinational character
set) was a sound candidate to become ISO 8859-1 as-was, but it must have
been thought that this would be "handing a victory" to DEC, so the Bull
representative--one source says it was a Belgian--endorsed disruptive
changes that made the encoding objectively worse for representation of
standard French script.

Sometimes Gallic chauvinism has to take a back seat to giving Maynard,
Massachusetts a poke in the eye.

Source attached.  It's in French.

I therefore think it's beneficial for you to pursue your campaign
against strncpy().  Vested interests cling to interfaces for reasons
they won't disclose, and cargo-cult programmers will employ them for
reasons they don't understand.  One of the fruits of discussions like
these is that we can get the actual technical merits and demerits of
such interfaces on the record.

> So we had it in ISO C in C89 and C99, and only in C11 they realized it
> had to be removed.  POSIX hasn't even removed it yet!  I won't
> hesitate to kill a function just because of bureaucracy.

You can't kill it; implementations will retain it practically forever to
keep old code compiling.  But you can sometimes scare away the cargo
cultists by lighting yourself on fire and waving your arms.

> The standard, especially C89, was just a reflection of the
> commonalities of most implementation.  It was a burden of
> implementations to add new stuff or to remove existing stuff.  Later
> revisions of the standards invented more, though.

And for what it's worth, Dennis Ritchie thought they lost the plot by
doing so.[2]

I admire a great deal of what Ritchie achieved, but I'm not confident he
made the right call there.  One elitist explanation I've seen ventured
is that Bell Labs simply had inherently smarter people than most other
software development shops could gather.  _Maybe_ there is some truth to
that, but I would venture a hypothesis less grounded on individual
characteristics.  The CSRC was a _research_ environment.  It was
emphatically not about measuring productivity by counting lines of code,
or "moving fast and breaking stuff", or how many "Ship It" boxes you've
ticket on your projects in the last year.

Google was pretty explicit that suitability for production-line
code output was a design objective for the Go language.[3]  They had
hired tons upon tons of smart people but found that it was hard to get
their "ship it" metrics satisfactorily high when driving all their newly
hired sheep through the mine fields of C (and especially C++[4])
programming.  An old adage says, "it's a poor workman who blames his
tools".  But when nearly every worker to whom you give a set of tools
struggles with high failure rates, it's time to question the fitness of
those tools for the objective you have in mind.  So Google did, and
attempted to recreate for software engineers what Frederick Winslow
Taylor achieved for factory laborers a hundred years ago.  If there's
less room for individual initiative, creativity, or insight, too
bad--those don't keep the share price up.[5]  You're a grunt.  GBTW.

> In this case, since ISO C has no APIs that use strncpy(3), it could
> (and should) already deprecate strncpy(3) from ISO C.  POSIX still
> needs it while it keeps utmpx(5), because there's no other way to
> correctly write to the fixed-width buffers within struct utmpx.

I would like to emphasize that a fixed-width buffer is inherently an
uneasy fit with C-style strings in the first place.  The major selling
point of null-terminated strings is their length flexibility.  They are
the entire reason we don't use Pascal-style strings, upon which C coders
eagerly spit (too easily, when they embarrass themselves with
strncpy()).  And yet fixed-width buffers are traditionally ubiquitous in
C, especially in the days before the GNU Coding Standards (and
programmers' frequent desires for generality and adaptability) spurred C
codes to use dynamic allocation much more aggressively.

Why were these practices in tension is a language as purportedly shot
through with genius as C was?  Because, in my opinion, it was a bit of
unfinished business in the language.  This is why malloc(3) and free(3)
are managed by the runtime rather than defined in the language proper.
Back in 1970s and 1980s, "everybody knew" that you couldn't have safe
dynamic memory allocation without a garbage collector, and there was no
way to have a garbage collector run deterministically in general, a
fatal flaw in real-time applications.

(Even then, there were alternatives to throwing everything explicitly
onto the heap.[6])

Thanks to particular improvements in compiler development (originally
intended for code optimization), static analysis tools, an influential
(if under-recognized) research programming language called Cyclone,[7]
and a new language--Rust--that is making the fruits of these
improvements available to a wide audience, we're learning to be better
programmers.

...against the resistance of C grognards, who of course vociferously
oppose deprecation of strncpy(3), because (they claim) it never caused
_them_ any problems.

> > Also speaking only for myself, the Linux manpages are welcome to
> > discourage the use of any function that you feel is not a wise
> > choicei for new programs, but the word "deprecated" should be
> > reserved for cases where there really has been a declaration of
> > deprecation by us and/or the standards.
> 
> If a function is deprecated by a standard or other entity, that will be
> reflected in the STANDARDS or HISTORY section.  For deprecation by the
> manual itself, the SYNOPSIS (and BUGS) sections are fine.  In the end,
> the word 'deprecate' isn't any magic.
> 
> 	From WordNet (r) 3.0 (2006) [wn]:
> 
> 	  deprecate
> 	      v 1: express strong disapproval of; deplore
> 
> That term applies to strncpy(3).

Yes, but Zack raises a good point.  Deprecation by ISO, by POSIX, by the
glibc developers, and by the Linux man-pages project are all different
things, and they all have different implications for portability.  It is
helpful for the everyday C programmer to know which of those
implications to infer.

Were I in your shoes, I would use the term "discourage".

"The Linux man-pages project discourages use of strncpy() {for the
reasons listed above, because ...}."

> But yes, we need to make sure that the APIs that need strncpy(3) are
> all deprecated.  If other Unix systems still need utmpx or similar
> stuff, strncpy(3) will still be necessary.

You might also say this: "The deprecated strncpy(3) is mainly used
in conjunction with other deprecated interfaces, like utmpx(5)."

Regards,
Branden

[1] The term "moron" also comes to mind.  Too strong a term?  Just
    applying Hanlon's Razor here.

[2] https://www.computerworld.com/article/2826125/the-future-according-to-dennis-ritchie--a-2000-interview-.html?page=2

    This, followed by his death, is why there's never been a third
    edition of _The C Programming Language_, which I guess continues to
    be a best-seller for its publisher, even though it's not a good idea
    for newcomers to C to learn from it, any more than Kernighan &
    Pike's _The Unix Programming Environment_ is.  (Once you've acquired
    a little historical perspective, they're _excellent_ resources!)

[3] https://go.dev/talks/2012/splash.article

    Just read every sentence containing the word "productive".

[4] https://google.github.io/styleguide/cppguide.html

[5] That has to await your elevation to the C-suite, where more
    marketing dollars will be spent burnishing your reputation as a
    "genius" than any level of personal productivity could conceivably
    justify.  See, e.g., Steve Jobs.  Silicon Valley's thought leaders
    are on a work slowdown, you see--their compensation ratio needs to
    be higher[8] or they won't turn their massive brains to the trivial
    problems of cold fusion or room-temperature superconductors.  Atlas
    ain't shrugging yet, but he's leaning over really far, shooting you
    a meaningful look, and clucking about the dire precedent set by this
    year's UAW strike.  Where are the Pinkertons when you need them?
    And what's Erik Prince up to these days?

[6] https://docs.adacore.com/gnat_ugx-docs/html/gnat_ugx/gnat_ugx/the_stacks.html
[7] https://en.wikipedia.org/wiki/Cyclone_(programming_language)
[8] https://www.epi.org/publication/ceo-pay-in-2021/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 19:40                     ` Alejandro Colomar
@ 2023-11-08 19:58                       ` DJ Delorie
  2023-11-08 20:13                         ` Alejandro Colomar
  0 siblings, 1 reply; 138+ messages in thread
From: DJ Delorie @ 2023-11-08 19:58 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: libc-alpha, jg, linux-man


Perhaps an example that shows the problem?

EXAMPLES

    strncpy (buf, "1", 5);
    { '1', 0, 0, 0, 0 }

    strncpy (buf, "1234", 5);
    { '1', '2', '3', '4', 0 }

    strncpy (buf, "12345", 5);
    { '1', '2', '3', '4', '5' }

    strncpy (buf, "123456", 5);
    { '1', '2', '3', '4', '5' }

Maybe strcpy and strncpy shouldn't even share man pages, since they're
not as related as we once thought?


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 19:58                       ` DJ Delorie
@ 2023-11-08 20:13                         ` Alejandro Colomar
  2023-11-08 21:07                           ` DJ Delorie
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-08 20:13 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha, jg, linux-man

[-- Attachment #1: Type: text/plain, Size: 2647 bytes --]

Hi DJ,

On Wed, Nov 08, 2023 at 02:58:24PM -0500, DJ Delorie wrote:
> 
> Perhaps an example that shows the problem?

Maybe.

> 
> EXAMPLES
> 
>     strncpy (buf, "1", 5);
>     { '1', 0, 0, 0, 0 }
> 
>     strncpy (buf, "1234", 5);
>     { '1', '2', '3', '4', 0 }
> 
>     strncpy (buf, "12345", 5);
>     { '1', '2', '3', '4', '5' }
> 
>     strncpy (buf, "123456", 5);
>     { '1', '2', '3', '4', '5' }

Would you mind reading the latest versions of strcpy(3), strncpy(3), and
string_copying(7), as in the git repository, and comment your thoughts?

You don't even need to install the pages from git.  You can read them
with this:

$ git clone https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/
$ cd man-pages/
$ man ./man3/strcpy.3
$ man ./man3/strncpy.3
$ man ./man7/string_copying.7

Also check the examples and suggest if anything could be clearer.

Thanks!

> 
> Maybe strcpy and strncpy shouldn't even share man pages, since they're
> not as related as we once thought?

They don't (anymore):

	$ pwd
	/home/alx/src/linux/man-pages/man-pages/master
	$ git log --oneline -1
	b8584be14 (HEAD -> master, korg/master, alx/main, main) bcmp.3: wfix

	$ grep -e '\.TH ' -e '\.so ' man3/strcpy.3 
	.TH strcpy 3 (date) "Linux man-pages (unreleased)"
	$ grep -e '\.TH ' -e '\.so ' man3/stpcpy.3 
	.so man3/strcpy.3

	$ grep -e '\.TH ' -e '\.so ' man3/strncpy.3 
	.so man3/stpncpy.3
	$ grep -e '\.TH ' -e '\.so ' man3/stpncpy.3 
	.TH stpncpy 3 (date) "Linux man-pages (unreleased)"

The only shared page is string_copying(7), which attempts to clarify all
of this.  It was only in old versions of the Linux man-pages where they
shared page.

	$ pwd
	/home/alx/src/linux/man-pages/man-pages/5/5.13
	$ git log --oneline -1
	091fbf1fe (HEAD, tag: man-pages-5.13) Ready for 5.13

	$ grep -e '\.TH ' -e '\.so ' man3/strcpy.3 
	.TH STRCPY 3  2021-03-22 "GNU" "Linux Programmer's Manual"
	$ grep -e '\.TH ' -e '\.so ' man3/stpcpy.3 
	.TH STPCPY 3  2021-03-22 "GNU" "Linux Programmer's Manual"

	$ grep -e '\.TH ' -e '\.so ' man3/strncpy.3 
	.so man3/strcpy.3
	$ grep -e '\.TH ' -e '\.so ' man3/stpncpy.3 
	.TH STPNCPY 3  2021-03-22 "GNU" "Linux Programmer's Manual"

I've spent the last year working on shadow-utils' string handling code,
while at the same time wrote string_copying(7) as a complete guide to
*cpy() functions, detailing what they do and what they don't, and also
rewrote all the pages for these functions with shorter reference guides
that refer to string_copying(7) for more details.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 20:13                         ` Alejandro Colomar
@ 2023-11-08 21:07                           ` DJ Delorie
  2023-11-08 21:50                             ` Alejandro Colomar
  0 siblings, 1 reply; 138+ messages in thread
From: DJ Delorie @ 2023-11-08 21:07 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: libc-alpha, jg, linux-man

Alejandro Colomar <alx@kernel.org> writes:
> Would you mind reading the latest versions of strcpy(3), strncpy(3), and
> string_copying(7), as in the git repository, and comment your thoughts?

I think my examples would work well after the first CAVEATS paragaph:

       The name of these functions is confusing.  These functions
       produce a null-padded character sequence, not a string (see
       string_copying(7)), like this:

     strncpy (buf, "1", 5) -> { '1', 0, 0, 0, 0 }
     strncpy (buf, "1234", 5) -> { '1', '2', '3', '4', 0 }
     strncpy (buf, "12345", 5) -> { '1', '2', '3', '4', '5' }
     strncpy (buf, "123456", 5) -> { '1', '2', '3', '4', '5' }

>       These functions copy the string pointed to by src  into  a  null-padded
>       character sequence at the fixed-width buffer pointed to by dst.  If the
>       destination buffer, limited by its size, isn't large enough to hold the
>       copy,  the  resulting character sequence is truncated.

hmmm... perhaps

  These functions copy at most SZ bytes from SRC into a fixed-length
  buffer DST, padding any unwritten bytes in DST with NUL bytes.
  Specifically, if SRC has a NUL byte in the first SZ bytes, copying
  stops there and any remaining bytes in DST are filled with NUL bytes.
  If there are no NUL bytes in the first SZ bytes of SRC, SZ bytes are
  copied to DST.

This avoids the term "string" completely and emphasises the not-string
nature of the destination.

 stpncpy,  strncpy  - zero a fixed-width buffer and copy a string into a
       character sequence with truncation and zero the rest of it

Or "fill a fixed-width zero-padded buffer with bytes from a string"

That avoids saying "copy a string"

string_copying.7:

> For historic reasons, some standard APIs, such as utmpx(5),

Perhaps "some standard APIs and file formats,, such as utmpx(5) or
tar(1)," ?

> however, those padding null bytes are not part of the character
> sequence.

add ", and may not be present if not needed." ?


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 15:07                     ` Alejandro Colomar
  2023-11-08 19:45                       ` G. Branden Robinson
@ 2023-11-08 21:35                       ` Carlos O'Donell
  2023-11-08 22:11                         ` Alejandro Colomar
  1 sibling, 1 reply; 138+ messages in thread
From: Carlos O'Donell @ 2023-11-08 21:35 UTC (permalink / raw)
  To: Alejandro Colomar, Zack Weinberg
  Cc: GNU libc development, Jonny Grant, 'linux-man'

On 11/8/23 10:07, Alejandro Colomar wrote:
> So we had it in ISO C in C89 and C99, and only in C11 they realized it
> had to be removed.  POSIX hasn't even removed it yet!  I won't hesitate
> to kill a function just because of bureaucracy.

Attempting to get consensus at an international level, across cultural boundaries,
use cases, workloads, and developer workflows is difficult and not intended to be
bureaucracy for the sake of bureaucracy.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 21:07                           ` DJ Delorie
@ 2023-11-08 21:50                             ` Alejandro Colomar
  2023-11-08 22:17                               ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-08 21:50 UTC (permalink / raw)
  To: DJ Delorie; +Cc: libc-alpha, jg, linux-man

[-- Attachment #1: Type: text/plain, Size: 2856 bytes --]

Hi DJ,

On Wed, Nov 08, 2023 at 04:07:07PM -0500, DJ Delorie wrote:
> Alejandro Colomar <alx@kernel.org> writes:
> > Would you mind reading the latest versions of strcpy(3), strncpy(3), and
> > string_copying(7), as in the git repository, and comment your thoughts?
> 
> I think my examples would work well after the first CAVEATS paragaph:
> 
>        The name of these functions is confusing.  These functions
>        produce a null-padded character sequence, not a string (see
>        string_copying(7)), like this:
> 
>      strncpy (buf, "1", 5) -> { '1', 0, 0, 0, 0 }
>      strncpy (buf, "1234", 5) -> { '1', '2', '3', '4', 0 }
>      strncpy (buf, "12345", 5) -> { '1', '2', '3', '4', '5' }
>      strncpy (buf, "123456", 5) -> { '1', '2', '3', '4', '5' }

It fits perfectly there.  And it also merges nicely with the paragraph
below.

> 
> >       These functions copy the string pointed to by src  into  a  null-padded
> >       character sequence at the fixed-width buffer pointed to by dst.  If the
> >       destination buffer, limited by its size, isn't large enough to hold the
> >       copy,  the  resulting character sequence is truncated.
> 
> hmmm... perhaps
> 
>   These functions copy at most SZ bytes from SRC into a fixed-length
>   buffer DST, padding any unwritten bytes in DST with NUL bytes.
>   Specifically, if SRC has a NUL byte in the first SZ bytes, copying
>   stops there and any remaining bytes in DST are filled with NUL bytes.
>   If there are no NUL bytes in the first SZ bytes of SRC, SZ bytes are
>   copied to DST.
> 
> This avoids the term "string" completely and emphasises the not-string
> nature of the destination.

I don't like that, because it talks a lot about what the function does
in terms of low-level copies of bytes.  That may induce programmers to
try to find an abstraction in terms of strings.

> 
>  stpncpy,  strncpy  - zero a fixed-width buffer and copy a string into a
>        character sequence with truncation and zero the rest of it
> 
> Or "fill a fixed-width zero-padded buffer with bytes from a string"

But this wording is perfect!  I also used a similar wording for the
description.  I'll send a patch in a moment.

> 
> That avoids saying "copy a string"

Yep!

> 
> string_copying.7:
> 
> > For historic reasons, some standard APIs, such as utmpx(5),
> 
> Perhaps "some standard APIs and file formats,, such as utmpx(5) or
> tar(1)," ?

Yes; thanks!

> 
> > however, those padding null bytes are not part of the character
> > sequence.
> 
> add ", and may not be present if not needed." ?

I'm not convinced about this one.  "needed" is not the right word I
think.  For now, I'll add the other suggestions to a patch.  Expect it
in a moment.

Cheers,
Alex 

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 21:35                       ` Carlos O'Donell
@ 2023-11-08 22:11                         ` Alejandro Colomar
  2023-11-08 23:31                           ` Paul Eggert
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-08 22:11 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Zack Weinberg, GNU libc development, Jonny Grant, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 968 bytes --]

On Wed, Nov 08, 2023 at 04:35:12PM -0500, Carlos O'Donell wrote:
> On 11/8/23 10:07, Alejandro Colomar wrote:
> > So we had it in ISO C in C89 and C99, and only in C11 they realized it
> > had to be removed.  POSIX hasn't even removed it yet!  I won't hesitate
> > to kill a function just because of bureaucracy.
> 
> Attempting to get consensus at an international level, across cultural boundaries,
> use cases, workloads, and developer workflows is difficult and not intended to be
> bureaucracy for the sake of bureaucracy.

Hi Carlos!

I understand that, and respect ISO's work.  I just don't think we need,
as GNU or Linux projects, to be restricted to the decisions of ISO.  We
can realize that certain functions are bad, and mark them as deprecated
in our scope.  If others want to imitate (ISO might even take it as
"prior art"), then great.

Cheers,
Alex

> 
> -- 
> Cheers,
> Carlos.
> 

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-08 21:50                             ` Alejandro Colomar
@ 2023-11-08 22:17                               ` Alejandro Colomar
  2023-11-08 23:06                                 ` Paul Eggert
                                                   ` (3 more replies)
  0 siblings, 4 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-08 22:17 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, DJ Delorie, Jonny Grant,
	Matthew House, Oskari Pirhonen, Thorsten Kukuk,
	Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
	Carlos O'Donell

[-- Attachment #1: Type: text/plain, Size: 3837 bytes --]

These copy *from* a string.  But the destination is a simple character
sequence within an array; not a string.

Suggested-by: DJ Delorie <dj@redhat.com>
Cc: Jonny Grant <jg@jguk.org>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---

Resending, including the mailing lists, which I forgot.

 man3/stpncpy.3        | 17 +++++++++++++----
 man7/string_copying.7 | 20 ++++++++++----------
 2 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index b6bbfd0a3..f86ff8c29 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -6,9 +6,8 @@
 .TH stpncpy 3 (date) "Linux man-pages (unreleased)"
 .SH NAME
 stpncpy, strncpy
-\- zero a fixed-width buffer and
-copy a string into a character sequence with truncation
-and zero the rest of it
+\-
+fill a fixed-width null-padded buffer with bytes from a string
 .SH LIBRARY
 Standard C library
 .RI ( libc ", " \-lc )
@@ -37,7 +36,7 @@ .SH SYNOPSIS
         _GNU_SOURCE
 .fi
 .SH DESCRIPTION
-These functions copy the string pointed to by
+These functions copy bytes from the string pointed to by
 .I src
 into a null-padded character sequence at the fixed-width buffer pointed to by
 .IR dst .
@@ -110,6 +109,16 @@ .SH CAVEATS
 These functions produce a null-padded character sequence,
 not a string (see
 .BR string_copying (7)).
+For example:
+.P
+.in +4n
+.EX
+strncpy(buf, "1", 5);       // { \[aq]1\[aq],   0,   0,   0,   0 }
+strncpy(buf, "1234", 5);    // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq],   0 }
+strncpy(buf, "12345", 5);   // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
+strncpy(buf, "123456", 5);  // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
+.EE
+.in
 .P
 It's impossible to distinguish truncation by the result of the call,
 from a character sequence that just fits the destination buffer;
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index cadf1c539..0e179ba34 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -41,15 +41,11 @@ .SS Strings
 .\" ----- SYNOPSIS :: Null-padded character sequences --------/
 .SS Null-padded character sequences
 .nf
-// Zero a fixed-width buffer, and
-// copy a string into a character sequence with truncation.
-.BI "char *stpncpy(char " dst "[restrict ." sz "], \
+// Fill a fixed-width null-padded buffer with bytes from a string.
+.BI "char *strncpy(char " dst "[restrict ." sz "], \
 const char *restrict " src ,
 .BI "               size_t " sz );
-.P
-// Zero a fixed-width buffer, and
-// copy a string into a character sequence with truncation.
-.BI "char *strncpy(char " dst "[restrict ." sz "], \
+.BI "char *stpncpy(char " dst "[restrict ." sz "], \
 const char *restrict " src ,
 .BI "               size_t " sz );
 .P
@@ -240,14 +236,18 @@ .SS Truncate or not?
 .\" ----- DESCRIPTION :: Null-padded character sequences --------------/
 .SS Null-padded character sequences
 For historic reasons,
-some standard APIs,
+some standard APIs and file formats,
 such as
-.BR utmpx (5),
+.BR utmpx (5)
+and
+.BR tar (1),
 use null-padded character sequences in fixed-width buffers.
 To interface with them,
 specialized functions need to be used.
 .P
-To copy strings into them, use
+To copy bytes from strings into these buffers, use
+.BR strncpy (3)
+or
 .BR stpncpy (3).
 .P
 To copy from an unterminated string within a fixed-width buffer into a string,
-- 
2.42.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-08 22:17                               ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
@ 2023-11-08 23:06                                 ` Paul Eggert
  2023-11-08 23:28                                   ` DJ Delorie
                                                     ` (2 more replies)
  2023-11-09  7:23                                 ` Oskari Pirhonen
                                                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 138+ messages in thread
From: Paul Eggert @ 2023-11-08 23:06 UTC (permalink / raw)
  To: Alejandro Colomar, linux-man
  Cc: libc-alpha, DJ Delorie, Jonny Grant, Matthew House,
	Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto,
	Zack Weinberg, G. Branden Robinson, Carlos O'Donell

On 11/8/23 14:17, Alejandro Colomar wrote:
> These copy*from*  a string

Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be 
a string.

By the way, have you looked at the recent (i.e., this-year) changes to 
the glibc manual's string section? They're relevant.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-08 23:06                                 ` Paul Eggert
@ 2023-11-08 23:28                                   ` DJ Delorie
  2023-11-09  0:24                                   ` Alejandro Colomar
  2023-11-09 14:11                                   ` Jonny Grant
  2 siblings, 0 replies; 138+ messages in thread
From: DJ Delorie @ 2023-11-08 23:28 UTC (permalink / raw)
  To: Paul Eggert
  Cc: alx, linux-man, libc-alpha, jg, mattlloydhouse, xxc3ncoredxx,
	kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos

Paul Eggert <eggert@cs.ucla.edu> writes:
> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be 
> a string.

But it will be treated as one, for the purposes of this function.


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 22:11                         ` Alejandro Colomar
@ 2023-11-08 23:31                           ` Paul Eggert
  2023-11-09  0:29                             ` Alejandro Colomar
  0 siblings, 1 reply; 138+ messages in thread
From: Paul Eggert @ 2023-11-08 23:31 UTC (permalink / raw)
  To: Alejandro Colomar, Carlos O'Donell
  Cc: Zack Weinberg, GNU libc development, Jonny Grant, 'linux-man'

On 11/8/23 14:11, Alejandro Colomar wrote:
> I just don't think we need,
> as GNU or Linux projects, to be restricted to the decisions of ISO.  We
> can realize that certain functions are bad, and mark them as deprecated
> in our scope.

There's enough use of strncpy for the intended use (smallish fixed size 
character arrays that are null padded, not null terminated) that saying 
it's deprecated would likely cause more trouble than it's worth. It's 
not just utmp and tar; it's also socket programming (sun_path) and I'm 
sure other stuff.

Were we designing the C library from scratch I'd agree with you: in that 
context, strncpy would clearly be more trouble than it's worth. But now 
that we're stuck with strncpy we have better things to do than try to 
deprecate it.

Instead of saying "deprecate" I suggest we say something like "This 
function is generally a poor choice for processing strings" and point to 
the longer man page about strings in general. That's what the glibc 
manual does and it works reasonably well.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-08 23:06                                 ` Paul Eggert
  2023-11-08 23:28                                   ` DJ Delorie
@ 2023-11-09  0:24                                   ` Alejandro Colomar
  2023-11-09 14:11                                   ` Jonny Grant
  2 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09  0:24 UTC (permalink / raw)
  To: Paul Eggert
  Cc: linux-man, libc-alpha, DJ Delorie, Jonny Grant, Matthew House,
	Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto,
	Zack Weinberg, G. Branden Robinson, Carlos O'Donell

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]

Hi Paul,

On Wed, Nov 08, 2023 at 03:06:40PM -0800, Paul Eggert wrote:
> On 11/8/23 14:17, Alejandro Colomar wrote:
> > These copy*from*  a string
> 
> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a
> string.

Pedantically, true.  But since it's quite rare to copy from a
fixed-width null-padded array into another, I didn't want to waste
space on that and possibly confuse readers.  In such a case, the source
buffer must be at least as large as the destination buffer, and will
likely be the same size (because having fixed-width stuff, why make it
different), so memcpy(3) will probably be simpler.

> 
> By the way, have you looked at the recent (i.e., this-year) changes to the
> glibc manual's string section? They're relevant.

I hadn't; after your message, I have.
<https://sourceware.org/glibc/manual/2.38/html_mono/libc.html#String-and-Array-Utilities>

I like how it connects all the functions, and it explains the concepts
and gives advice (e.g., avoid truncation as it's usually evil), and
compares the different functions.

However, I think it misses a few things:

-  strncpy(3) and strncat(3) are not related at all.  They don't have
   the same relation that strcpy(3) and strcat(3) have.  You can't
   write the following code in any case:

	strncpy(dst, foo, sizeof(dst));
	strncat(dst, bar, sizeof(dst));

   as you would with strcpy(3) or strlcpy(3).

   strncpy(3) and strncat(3) are opposite functions: the former reads
   from a string and writes to a fixed-width null-padded buffer, and the
   latter reads from a fixed-width buffer and writes to a string.  (You
   can use them in other cases, pedantically, as you said above, but
   those cases are rather unreal.)

-  strncpy(3) is in a section that starts by saying:

   > The functions described in this section copy or concatenate the
   > possibly-truncated contents of a string or array to another

   This may mislead programmers to believe it is useful for producing
   strings, when it's not.

In general, I would like the manual to put some more distance between
these functions and the term "string".  As DJ mentioned, it might be
useful to mention utmp(5) and tar(1) as niche use cases for
st[rp]ncpy(3).

And now for some typo:

-  In the following sentence under "5.2 String and Array Conventions":

   > The array arguments and return values for these functions have type
   > void * or wchar_t.

   I believe it meant `void *` or `wchar_t *`


Cheers,

Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 23:31                           ` Paul Eggert
@ 2023-11-09  0:29                             ` Alejandro Colomar
  2023-11-09 10:13                               ` Jonny Grant
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09  0:29 UTC (permalink / raw)
  To: Paul Eggert
  Cc: Carlos O'Donell, Zack Weinberg, GNU libc development,
	Jonny Grant, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 1811 bytes --]

Hi Pail,

On Wed, Nov 08, 2023 at 03:31:38PM -0800, Paul Eggert wrote:
> On 11/8/23 14:11, Alejandro Colomar wrote:
> > I just don't think we need,
> > as GNU or Linux projects, to be restricted to the decisions of ISO.  We
> > can realize that certain functions are bad, and mark them as deprecated
> > in our scope.
> 
> There's enough use of strncpy for the intended use (smallish fixed size
> character arrays that are null padded, not null terminated) that saying it's
> deprecated would likely cause more trouble than it's worth. It's not just
> utmp and tar; it's also socket programming (sun_path) and I'm sure other
> stuff.
> 
> Were we designing the C library from scratch I'd agree with you: in that
> context, strncpy would clearly be more trouble than it's worth. But now that
> we're stuck with strncpy we have better things to do than try to deprecate
> it.

No, no, I'm not trying to deprecate it.  I was just saying that *iff*
all of its uses were dead, I'd deprecate it.  But they're clearly not
dead, so it's a perfect function for those cases.

> 
> Instead of saying "deprecate" I suggest we say something like "This function
> is generally a poor choice for processing strings" and point to the longer
> man page about strings in general. That's what the glibc manual does and it
> works reasonably well.

Yes, I've done something like this.  string_copying(7) recommends
avoiding fixed-width null-padded buffers in APIs.  But for those use
cases that already exist, this is the function to use.

I'm also refusing to document how to (mis)use this function for
truncating strings.  If one wants to struncate strings, they'll need
functions that were designed to do that (e.g., strlcpy(3)).

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 19:33             ` Alejandro Colomar
  2023-11-08 19:40               ` Alejandro Colomar
@ 2023-11-09  3:13               ` Matthew House
  2023-11-09 10:26                 ` Jonny Grant
                                   ` (2 more replies)
  2023-11-10 10:40               ` strncpy clarify result may not be null terminated Stefan Puiu
  2 siblings, 3 replies; 138+ messages in thread
From: Matthew House @ 2023-11-09  3:13 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jonny Grant, linux-man

On Wed, Nov 8, 2023 at 2:33 PM Alejandro Colomar <alx@kernel.org> wrote:
> On Tue, Nov 07, 2023 at 09:12:37PM -0500, Matthew House wrote:
> > Man pages aren't read only by people writing new code, but also by people
> > reading and modifying existing code. And despite your preferences regarding
> > which functions ought to be used to produce strings, it's a widespread (and
> > correct) practice to produce a string from the character sequence created
> > by strncpy(3). There are two ways of doing this, either by setting the last
> > character of the destination buffer to null if you want to produce a
> > truncated string, or by testing the last character against zero if you want
> > to detect truncation and raise an error.
>
> It is not strncpy(3) who truncated, but the programmer by adding a NULL
> in buff[BUFSIZ - 1].  In the following snippet, strncpy(3) will not
> truncate:
>
>         char cs[3];
>
>         strncpy(cs, "foo", 3);
>
> And yet your code doing if (cs[2] != '\0') { goto error; } would think
> it did.  That's because you deformed strncpy(3) to implement a poor
> man's strlcpy(3).
>
>         char cs[3];
>
>         strncpy(cs, "foo", 3);
>         cs[2] = '\0';  // The truncation is here, not in strncpy(3).

That's indeed a self-consistent interpretation of strncpy(3)'s function,
but I don't think it's borne out by its formal definition, which I was
basing my reasoning on. The current Linux man page for strncpy(3) says,

  These functions copy the string pointed to by src into a null-padded
  character sequence at the fixed-width buffer pointed to by dst. If the
  destination buffer, limited by its size, isn't large enough to hold the
  copy, the resulting character sequence is truncated.

Notice how it "copies the string": as your string_copying(7) says, a string
includes both a character sequence and a final null byte. So I'd ordinarily
read this definition as saying that strncpy(3) tries to copy src up to and
including the null byte, but produces a truncated copy of the whole string
if the destination buffer is too small. Thus, even if the destination
buffer contains all non-null characters in the original string, then the
copy has still been "truncated" in this sense.

The ISO C definition, and by extension, the POSIX definition, make this
interpretation even more explicit:

  The strncpy function copies not more than n characters (characters that
  follow a null character are not copied) from the array pointed to by s2
  to the array pointed to by s1.

That is, the terminating null byte is part of the copy, but not anything
after the terminating null byte.

So one can interpret strncpy(3) as copying a prefix of a character sequence
into a buffer (and zero-filling the remainder), in which case you're
correct that truncation cannot be detected. But the function is fomally
defined as copying a prefix of a string into a buffer (and zero-filling the
remainder), in which case the string has been truncated if the buffer
doesn't end in a null byte afterward. It's just that one may not care about
the terminating null byte being truncated if the user of the result just
wants the initial character sequence.

> > I'm not aware of any alternative to a strncpy(3)-based snippet for
> > producing a possibly-truncated copy of a string, except for your preferred
> > strlcpy(3) or stpecpy(3), which aren't available to anyone without a
>
> The Linux kernel has strscpy(3), which is also good, but is not
> available to user space.
>
> > brand-new glibc (nor, by extension, any applications or libraries that want
>
> libbsd has provided strlcpy(3) since basically forever.  It is a very
> portable library.  You don't need a brand-new glibc for having
> strlcpy(3).
>
> <https://libbsd.freedesktop.org/wiki/>

That's a nice library that I didn't know about! Unfortunately, I don't
think it's a very viable option for the long tail of small libraries I've
referred to, which generally don't have any sub-dependencies of their own,
apart from those provided by the platform.

Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their
configure scripts from whatever build system you're using (in such a way
that libbsd can locate libmd), ensuring they're safe for cross-compilation
if that's a goal, ensuring you bundle them in a way that respects their
license terms, and ensuring that any user of your library links to the two
dependencies and doesn't duplicate them. At that point, rolling your own
strlcpy(3) equivalent definitely sounds like less mental load, at least to
me.

> > functions); snprintf(3), which has the insidious flaw of not supporting
> > more than INT_MAX characters on pain of UB, and also produces a warning if
> > the compiler notices the possible truncation; or strlen(3) + min() +
> > memcpy(3) + manually adding a null terminator, which is certainly more
> > explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if
> > that poses a performance problem, but similarly opens up room for
> > off-by-one errors.
>
> More than the performance problem, I'm more worried about the
> maintainability of strncpy(3).  When 20 years from now, a programmer
> reading a piece of code full of strncpy(3) wants to migrate to a sane
> function like strlcpy(3) or strcpy(3), the programmer needs to
> understand if the zeroing was purposeful or just accidental.  Because
> by using strlcpy(3), it may start leaking some trailing data if the
> trailing of the buffer is meaningful to some program.

I didn't see this as an issue in practice when I was reviewing all those
existing usages of strncpy(3). The vast majority were used in the midst of
simple string manipulation, where the destination buffer starts as
uninitialized or zeroed out, and ultimately gets passed into a user
expecting an ordinary null-terminated string.

(One exception was a few functions that used strncpy(dst, "", len) to zero
out the buffer, which is thankfully pretty obvious. Another exception was
the functions that actually used strncpy(3) to produce a null-padded
character sequence, e.g., when writing a value into a section of a binary.
But in general, I found that it's usually not difficult to tell when a
usage is being clever enough that the null padding might be significant.)

In fact, the greater confusion came from the surprisingly common practice
of using strncpy(3) like it's memcpy(3), by giving it the known length of
the source string, or of some prefix computed through strchr(3) or similar.
This is often then followed up by strncat(3) or similar, indicating that
the writer clearly expects the full length to have non-null characters. But
if the length computation is separated far enough from the actual call to
strncpy(3), then it can become unclear whether the source is actually
expected to have any interior null bytes before the computed length. (So if
a list of alternatives to strncpy(3) is ever drawn up, then I'd suggest
that ordinary memcpy(3) be one of them.)

> > For the sake of reference, I looked into a few big C and C++ projects to
> > see how often a strncpy(3)-based snippet was used to produce a truncated
> > copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite
> > of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in
> > GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22,
> > 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note
> > that I haven't filtered out vendored dependencies, so there's a little bit
> > of double-counting.) It seems like most codebases that don't ban strncpy(3)
> > use a derived snippet somewhere or another. Also, I found 3 instances in
> > glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by
> > checking the last character.
>
> I know.  I've been rewriting the code handling strings in shadow-utils
> for the last year, and ther was a lot of it.  I fixed several small bugs
> in the process, so I recommend avoiding it.

I can't tell you about your own experience, but in mine, the root cause of
most string-handling bugs has been excessive cleverness in using the
standard string functions, rather than the behavior of the functions
themselves. So one worry of mine is that if strncpy(3) ends up being
deprecated or whatever, then authors of portable libraries will start
writing lots of custom memcpy(3)-based replacements to their strncpy(3)-
based snippets, and more lines of code will introduce more opportunities
for cleverness.

(This is also why I was confused by your support for strcpy(3) on the
grounds that _FORTIFY_SOURCE exists. Sure, it's better than strncpy(3) in
that its behavior isn't nearly so subtle, but _FORTIFY_SOURCE can only
protect us from overruns, not from all the "small bugs" that might ensue
from people becoming more clever with sizing the destination buffer with
strcpy(3). Also, if it were truly a panacea, then we'd hardly have to worry
about the problems of strncpy(3) at all, since it would detect any misuse
of the function.)

Probably the only way to solve the cleverness issue for good is to have an
immediately-available, foolproof, performant set of string functions that
are extremely straightforward to understand and use, flexible enough for
any use case, and generally agreed to be the first choice for string
manipulation.

Unfortunately, probably the closest match to those criteria, especially the
availability criterion, is snprintf(3), which has the flaws of using int
instead of size_t for most sizes, not being very performant, and not being
async-signal-safe. Alas, it will likely remain a dream, given all the wars
over which safer string functions have the best API. But at least
strlcpy(3) has a pretty sound interface, if other platforms ever get around
to including it by default.

> > the code to understand the concept behind how these two snippets work, that
> > the only difference between the strncpy(3)'s special "character sequence"
> > and an ordinary C string is an additional null terminator at the end of the
> > destination buffer.
>
> This is part of string_copying(7):
>
> DESCRIPTION
>    Terms (and abbreviations)
>      string (str)
>             is  a  sequence  of zero or more non‐null characters followed by a
>             null byte.
>
>      character sequence
>             is a sequence of zero or  more  non‐null  characters.   A  program
>             should  never use a character sequence where a string is required.
>             However, with appropriate care, a string can be used in the  place
>             of a character sequence.
>
> I think that is very explicit in the difference.  strncpy(3) refers to
> that page for understanding the differences, so I think it is
> documented.
>
> strncpy(3):
> CAVEATS
>      The  name  of  these  functions  is confusing.  These functions produce a
>      null‐padded character sequence, not a string (see string_copying(7)).

My point is isn't that the difference is undocumented, but that the typical
man page reader isn't reading the man pages for their own sake, but because
they're looking at some code, and they want to Know What It's Doing as soon
as possible. If they're getting directed around elsewhere with weird
warnings about "not a string" ("what's it going on about, I thought it was
null-padded?"), then I worry there's a good chance that they'll instead
bounce off the man page and try figuring it out some other way. And even if
they do follow the reference, then they might have difficulty understanding
the implications, since many people don't think of things in terms of
formal definitions.

> > reasonable to highlight precisely why strncpy(3)'s output isn't a string
>
> How about this?:
>
> diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
> index d4c2ce83d..c80c8b640 100644
> --- a/man3/stpncpy.3
> +++ b/man3/stpncpy.3
> @@ -108,7 +108,10 @@ .SH HISTORY
>  .SH CAVEATS
>  The name of these functions is confusing.
>  These functions produce a null-padded character sequence,
> -not a string (see
> +not a string.
> +While strings have a terminating NUL byte,
> +character sequences do not have any terminating byte
> +(see
>  .BR string_copying (7)).
>  .P
>  It's impossible to distinguish truncation by the result of the call,

Yes, I'd be perfectly happy with something like that. That way, the
scariness is far more immediate ("the output might not be terminated!?"),
and thus more accessible to the typical reader.

> > (viz., the lack of a null terminator), instead of trying to insist that its
> > output is worlds apart from anything string-related, especially given the
> > volume of existing correct code that belies that notion.
>
> It is not correct code.  That code is doing extra work which confuses
> maintainers.  It is a lot like writing dead code, since you're writing
> zeros that nobody is reading, which confuses maintainers.

I am really not a fan of conflating the notions of "code that is difficult
to maintain" with "code that doesn't perform the task it is intended to
perform". When I think about incorrect code, I think about things like
setenv(3) that are just waiting to cause trouble in popular libraries built
and deployed today.

Meanwhile, "confusing maintainers" is a very subjective notion specific to
the both the code and the maintainers: if someone sees some code allocating
a fresh buffer, strncpy(3)ing a string into it, slapping a terminator on
the end, and finally passing the result into something clearly expecting a
string, then why would they be guaranteed to be sweating bullets over
whatever happened to rest of the fresh buffer? Especially given how
widespread the strncpy(3) + extra null terminator pattern already is.

Instead, it's code making use of strncpy(3) in a particularly clever way
that I'd find confusing, and in those cases, I lie the blame squarely on
the cleverness rather than the function itself.

> Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no,
> it's not correct code.  It's rather dangerous code that just happens to
> not be vulnerable most of the time.

So will all the custom strlen(3)+memcpy(3)-based replacements suddenly be
immune to off-by-one bugs? Or will the vast majority of current strncpy(3)
users be willing to either restrict their platform support or add two extra
dependencies to their build process just to have strlcpy(3)? I'd hardly be
inclined to think that off-by-one bugs are a particular specialty of
strncpy(3).

> > Or, to answer your question, "It's appropriate to keep using strncpy(3) in
> > existing code where it's currently used as part of creating a truncated
> > string, and it's not especially inappropriate to use strncpy(3) in new code
> > as part of creating a truncated string, if the code must support platforms
> > without strlcpy(3) or similar, and if the resulting snippets are few enough
> > and well-commented enough that they create less mental load than creating
> > and maintaining a custom helper function."
>
> strncpy(3) calls are never well documented.  Do you add a comment in
> each such call saying "this zeroing is superfluous"?  Probably not.

By that standard, every call to a function that takes an output pointer and
returns the number of elements written (say, readlink(2)) would need a
comment saying "the remaining elements in this array now have undefined
values". I don't think it's controversial that in many situations, we
tacitly understand that we simply don't care about the remainder of a
buffer after a certain point. In the case of producing a string, that point
is going to be the null terminator, in the absence of on-site documentation
to the contrary; I'd label anything else as overly clever.

Meanwhile, "never" would be a strong word to describe the rate that
strncpy(3)'s lack of null termination is documented at the call site; 30 of
the 339 call sites I mentioned have an associated comment regarding null
termination. (ICU seems to be the best library comment-wise, but even it
doesn't place them consistently.) It's obviously far from routine in
existing code, but it's not something that never happens.

Thank you,
Matthew House

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-08 22:17                               ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
  2023-11-08 23:06                                 ` Paul Eggert
@ 2023-11-09  7:23                                 ` Oskari Pirhonen
  2023-11-09 15:20                                 ` [PATCH v2 1/2] " Alejandro Colomar
  2023-11-09 15:20                                 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
  3 siblings, 0 replies; 138+ messages in thread
From: Oskari Pirhonen @ 2023-11-09  7:23 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: linux-man, libc-alpha, DJ Delorie, Jonny Grant, Matthew House,
	Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
	G. Branden Robinson, Carlos O'Donell

[-- Attachment #1: Type: text/plain, Size: 4198 bytes --]

On Wed, Nov 08, 2023 at 23:17:07 +0100, Alejandro Colomar wrote:
> These copy *from* a string.  But the destination is a simple character
> sequence within an array; not a string.
> 
> Suggested-by: DJ Delorie <dj@redhat.com>
> Cc: Jonny Grant <jg@jguk.org>
> Cc: Matthew House <mattlloydhouse@gmail.com>
> Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
> Cc: Thorsten Kukuk <kukuk@suse.com>
> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> Cc: Zack Weinberg <zack@owlfolio.org>
> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> Cc: Carlos O'Donell <carlos@redhat.com>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---

I like the "with bytes from a string" wording. Good call.

- Oskari

> 
> Resending, including the mailing lists, which I forgot.
> 
>  man3/stpncpy.3        | 17 +++++++++++++----
>  man7/string_copying.7 | 20 ++++++++++----------
>  2 files changed, 23 insertions(+), 14 deletions(-)
> 
> diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
> index b6bbfd0a3..f86ff8c29 100644
> --- a/man3/stpncpy.3
> +++ b/man3/stpncpy.3
> @@ -6,9 +6,8 @@
>  .TH stpncpy 3 (date) "Linux man-pages (unreleased)"
>  .SH NAME
>  stpncpy, strncpy
> -\- zero a fixed-width buffer and
> -copy a string into a character sequence with truncation
> -and zero the rest of it
> +\-
> +fill a fixed-width null-padded buffer with bytes from a string
>  .SH LIBRARY
>  Standard C library
>  .RI ( libc ", " \-lc )
> @@ -37,7 +36,7 @@ .SH SYNOPSIS
>          _GNU_SOURCE
>  .fi
>  .SH DESCRIPTION
> -These functions copy the string pointed to by
> +These functions copy bytes from the string pointed to by
>  .I src
>  into a null-padded character sequence at the fixed-width buffer pointed to by
>  .IR dst .
> @@ -110,6 +109,16 @@ .SH CAVEATS
>  These functions produce a null-padded character sequence,
>  not a string (see
>  .BR string_copying (7)).
> +For example:
> +.P
> +.in +4n
> +.EX
> +strncpy(buf, "1", 5);       // { \[aq]1\[aq],   0,   0,   0,   0 }
> +strncpy(buf, "1234", 5);    // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq],   0 }
> +strncpy(buf, "12345", 5);   // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
> +strncpy(buf, "123456", 5);  // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
> +.EE
> +.in
>  .P
>  It's impossible to distinguish truncation by the result of the call,
>  from a character sequence that just fits the destination buffer;
> diff --git a/man7/string_copying.7 b/man7/string_copying.7
> index cadf1c539..0e179ba34 100644
> --- a/man7/string_copying.7
> +++ b/man7/string_copying.7
> @@ -41,15 +41,11 @@ .SS Strings
>  .\" ----- SYNOPSIS :: Null-padded character sequences --------/
>  .SS Null-padded character sequences
>  .nf
> -// Zero a fixed-width buffer, and
> -// copy a string into a character sequence with truncation.
> -.BI "char *stpncpy(char " dst "[restrict ." sz "], \
> +// Fill a fixed-width null-padded buffer with bytes from a string.
> +.BI "char *strncpy(char " dst "[restrict ." sz "], \
>  const char *restrict " src ,
>  .BI "               size_t " sz );
> -.P
> -// Zero a fixed-width buffer, and
> -// copy a string into a character sequence with truncation.
> -.BI "char *strncpy(char " dst "[restrict ." sz "], \
> +.BI "char *stpncpy(char " dst "[restrict ." sz "], \
>  const char *restrict " src ,
>  .BI "               size_t " sz );
>  .P
> @@ -240,14 +236,18 @@ .SS Truncate or not?
>  .\" ----- DESCRIPTION :: Null-padded character sequences --------------/
>  .SS Null-padded character sequences
>  For historic reasons,
> -some standard APIs,
> +some standard APIs and file formats,
>  such as
> -.BR utmpx (5),
> +.BR utmpx (5)
> +and
> +.BR tar (1),
>  use null-padded character sequences in fixed-width buffers.
>  To interface with them,
>  specialized functions need to be used.
>  .P
> -To copy strings into them, use
> +To copy bytes from strings into these buffers, use
> +.BR strncpy (3)
> +or
>  .BR stpncpy (3).
>  .P
>  To copy from an unterminated string within a fixed-width buffer into a string,
> -- 
> 2.42.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09  0:29                             ` Alejandro Colomar
@ 2023-11-09 10:13                               ` Jonny Grant
  2023-11-09 11:08                                 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
  2023-11-09 11:13                                 ` strncpy clarify result may not be null terminated Alejandro Colomar
  0 siblings, 2 replies; 138+ messages in thread
From: Jonny Grant @ 2023-11-09 10:13 UTC (permalink / raw)
  To: Alejandro Colomar, Paul Eggert
  Cc: Carlos O'Donell, Zack Weinberg, GNU libc development,
	'linux-man'



On 09/11/2023 00:29, Alejandro Colomar wrote:
> Hi Pail,
> 
> On Wed, Nov 08, 2023 at 03:31:38PM -0800, Paul Eggert wrote:
>> On 11/8/23 14:11, Alejandro Colomar wrote:
>>> I just don't think we need,
>>> as GNU or Linux projects, to be restricted to the decisions of ISO.  We
>>> can realize that certain functions are bad, and mark them as deprecated
>>> in our scope.
>>
>> There's enough use of strncpy for the intended use (smallish fixed size
>> character arrays that are null padded, not null terminated) that saying it's
>> deprecated would likely cause more trouble than it's worth. It's not just
>> utmp and tar; it's also socket programming (sun_path) and I'm sure other
>> stuff.
>>
>> Were we designing the C library from scratch I'd agree with you: in that
>> context, strncpy would clearly be more trouble than it's worth. But now that
>> we're stuck with strncpy we have better things to do than try to deprecate
>> it.
> 
> No, no, I'm not trying to deprecate it.  I was just saying that *iff*
> all of its uses were dead, I'd deprecate it.  But they're clearly not
> dead, so it's a perfect function for those cases.
> 
>>
>> Instead of saying "deprecate" I suggest we say something like "This function
>> is generally a poor choice for processing strings" and point to the longer
>> man page about strings in general. That's what the glibc manual does and it
>> works reasonably well.
> 
> Yes, I've done something like this.  string_copying(7) recommends
> avoiding fixed-width null-padded buffers in APIs.  But for those use
> cases that already exist, this is the function to use.

https://man7.org/linux/man-pages/man7/string_copying.7.html
Rather than "catenation", in my experience "concatenation" is the common term to explain what it does. There are quite a few on that page. Probably other man pages too.

How about following the style of the other man pages that put the notes about each function below them? (rather than above)
https://man7.org/linux/man-pages/man3/string.3.html

size_t strlen(const char *s);
Return the length of the string s.


At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different:

// Copy/catenate a string.
char *strcpy(char *restrict dst, const char *restrict src);
char *strcat(char *restrict dst, const char *restrict src);


Kind regards
Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09  3:13               ` Matthew House
@ 2023-11-09 10:26                 ` Jonny Grant
  2023-11-09 10:31                 ` Jonny Grant
  2023-11-09 12:23                 ` Alejandro Colomar
  2 siblings, 0 replies; 138+ messages in thread
From: Jonny Grant @ 2023-11-09 10:26 UTC (permalink / raw)
  To: Matthew House; +Cc: Alejandro Colomar, linux-man

On Thu, 9 Nov 2023 at 03:13, Matthew House <mattlloydhouse@gmail.com> wrote:
>
> On Wed, Nov 8, 2023 at 2:33 PM Alejandro Colomar <alx@kernel.org> wrote:
> > On Tue, Nov 07, 2023 at 09:12:37PM -0500, Matthew House wrote:
> > > Man pages aren't read only by people writing new code, but also by people
> > > reading and modifying existing code. And despite your preferences regarding
> > > which functions ought to be used to produce strings, it's a widespread (and
> > > correct) practice to produce a string from the character sequence created
> > > by strncpy(3). There are two ways of doing this, either by setting the last
> > > character of the destination buffer to null if you want to produce a
> > > truncated string, or by testing the last character against zero if you want
> > > to detect truncation and raise an error.
> >
> > It is not strncpy(3) who truncated, but the programmer by adding a NULL
> > in buff[BUFSIZ - 1].  In the following snippet, strncpy(3) will not
> > truncate:
> >
> >         char cs[3];
> >
> >         strncpy(cs, "foo", 3);
> >
> > And yet your code doing if (cs[2] != '\0') { goto error; } would think
> > it did.  That's because you deformed strncpy(3) to implement a poor
> > man's strlcpy(3).
> >
> >         char cs[3];
> >
> >         strncpy(cs, "foo", 3);
> >         cs[2] = '\0';  // The truncation is here, not in strncpy(3).
>
> That's indeed a self-consistent interpretation of strncpy(3)'s function,
> but I don't think it's borne out by its formal definition, which I was
> basing my reasoning on. The current Linux man page for strncpy(3) says,
>
>   These functions copy the string pointed to by src into a null-padded
>   character sequence at the fixed-width buffer pointed to by dst. If the
>   destination buffer, limited by its size, isn't large enough to hold the
>   copy, the resulting character sequence is truncated.
>
> Notice how it "copies the string": as your string_copying(7) says, a string
> includes both a character sequence and a final null byte. So I'd ordinarily
> read this definition as saying that strncpy(3) tries to copy src up to and
> including the null byte, but produces a truncated copy of the whole string
> if the destination buffer is too small. Thus, even if the destination
> buffer contains all non-null characters in the original string, then the
> copy has still been "truncated" in this sense.
>
> The ISO C definition, and by extension, the POSIX definition, make this
> interpretation even more explicit:
>
>   The strncpy function copies not more than n characters (characters that
>   follow a null character are not copied) from the array pointed to by s2
>   to the array pointed to by s1.
>
> That is, the terminating null byte is part of the copy, but not anything
> after the terminating null byte.
>
> So one can interpret strncpy(3) as copying a prefix of a character sequence
> into a buffer (and zero-filling the remainder), in which case you're
> correct that truncation cannot be detected. But the function is fomally
> defined as copying a prefix of a string into a buffer (and zero-filling the
> remainder), in which case the string has been truncated if the buffer
> doesn't end in a null byte afterward. It's just that one may not care about
> the terminating null byte being truncated if the user of the result just
> wants the initial character sequence.
>
> > > I'm not aware of any alternative to a strncpy(3)-based snippet for
> > > producing a possibly-truncated copy of a string, except for your preferred
> > > strlcpy(3) or stpecpy(3), which aren't available to anyone without a
> >
> > The Linux kernel has strscpy(3), which is also good, but is not
> > available to user space.
> >
> > > brand-new glibc (nor, by extension, any applications or libraries that want
> >
> > libbsd has provided strlcpy(3) since basically forever.  It is a very
> > portable library.  You don't need a brand-new glibc for having
> > strlcpy(3).
> >
> > <https://libbsd.freedesktop.org/wiki/>
>
> That's a nice library that I didn't know about! Unfortunately, I don't
> think it's a very viable option for the long tail of small libraries I've
> referred to, which generally don't have any sub-dependencies of their own,
> apart from those provided by the platform.
>
> Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their
> configure scripts from whatever build system you're using (in such a way
> that libbsd can locate libmd), ensuring they're safe for cross-compilation
> if that's a goal, ensuring you bundle them in a way that respects their
> license terms, and ensuring that any user of your library links to the two
> dependencies and doesn't duplicate them. At that point, rolling your own
> strlcpy(3) equivalent definitely sounds like less mental load, at least to
> me.
>
> > > functions); snprintf(3), which has the insidious flaw of not supporting
> > > more than INT_MAX characters on pain of UB, and also produces a warning if
> > > the compiler notices the possible truncation; or strlen(3) + min() +
> > > memcpy(3) + manually adding a null terminator, which is certainly more
> > > explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if
> > > that poses a performance problem, but similarly opens up room for
> > > off-by-one errors.
> >
> > More than the performance problem, I'm more worried about the
> > maintainability of strncpy(3).  When 20 years from now, a programmer
> > reading a piece of code full of strncpy(3) wants to migrate to a sane
> > function like strlcpy(3) or strcpy(3), the programmer needs to
> > understand if the zeroing was purposeful or just accidental.  Because
> > by using strlcpy(3), it may start leaking some trailing data if the
> > trailing of the buffer is meaningful to some program.
>
> I didn't see this as an issue in practice when I was reviewing all those
> existing usages of strncpy(3). The vast majority were used in the midst of
> simple string manipulation, where the destination buffer starts as
> uninitialized or zeroed out, and ultimately gets passed into a user
> expecting an ordinary null-terminated string.
>
> (One exception was a few functions that used strncpy(dst, "", len) to zero
> out the buffer, which is thankfully pretty obvious. Another exception was
> the functions that actually used strncpy(3) to produce a null-padded
> character sequence, e.g., when writing a value into a section of a binary.
> But in general, I found that it's usually not difficult to tell when a
> usage is being clever enough that the null padding might be significant.)
>
> In fact, the greater confusion came from the surprisingly common practice
> of using strncpy(3) like it's memcpy(3), by giving it the known length of
> the source string, or of some prefix computed through strchr(3) or similar.
> This is often then followed up by strncat(3) or similar, indicating that
> the writer clearly expects the full length to have non-null characters. But
> if the length computation is separated far enough from the actual call to
> strncpy(3), then it can become unclear whether the source is actually
> expected to have any interior null bytes before the computed length. (So if
> a list of alternatives to strncpy(3) is ever drawn up, then I'd suggest
> that ordinary memcpy(3) be one of them.)
>
> > > For the sake of reference, I looked into a few big C and C++ projects to
> > > see how often a strncpy(3)-based snippet was used to produce a truncated
> > > copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite
> > > of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in
> > > GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22,
> > > 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note
> > > that I haven't filtered out vendored dependencies, so there's a little bit
> > > of double-counting.) It seems like most codebases that don't ban strncpy(3)
> > > use a derived snippet somewhere or another. Also, I found 3 instances in
> > > glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by
> > > checking the last character.
> >
> > I know.  I've been rewriting the code handling strings in shadow-utils
> > for the last year, and ther was a lot of it.  I fixed several small bugs
> > in the process, so I recommend avoiding it.
>
> I can't tell you about your own experience, but in mine, the root cause of
> most string-handling bugs has been excessive cleverness in using the
> standard string functions, rather than the behavior of the functions
> themselves. So one worry of mine is that if strncpy(3) ends up being
> deprecated or whatever, then authors of portable libraries will start
> writing lots of custom memcpy(3)-based replacements to their strncpy(3)-
> based snippets, and more lines of code will introduce more opportunities
> for cleverness.
>
> (This is also why I was confused by your support for strcpy(3) on the
> grounds that _FORTIFY_SOURCE exists. Sure, it's better than strncpy(3) in
> that its behavior isn't nearly so subtle, but _FORTIFY_SOURCE can only
> protect us from overruns, not from all the "small bugs" that might ensue
> from people becoming more clever with sizing the destination buffer with
> strcpy(3). Also, if it were truly a panacea, then we'd hardly have to worry
> about the problems of strncpy(3) at all, since it would detect any misuse
> of the function.)

Matthew, thank you for sharing your information.

https://www.gnu.org/software/libc/manual/html_node/Source-Fortification.html

I do find _FORTIFY_SOURCE useful in a developer build, for testing, it
raises SIGABRT and we can get useful coredump. Without that macro, it
would likely still crash or corrupt. However, in my experience in
safety critical applications, we really need to avoid the crashes, so
we'd write user-space functions that do the same sanity checks (in the
same way that fortify does) and then propagate the error back to the
application to report the failure, and log it.

>
> Probably the only way to solve the cleverness issue for good is to have an
> immediately-available, foolproof, performant set of string functions that
> are extremely straightforward to understand and use, flexible enough for
> any use case, and generally agreed to be the first choice for string
> manipulation.

What's the best standardized function for C string copying in your
opinion?  They all seem to have drawbacks, strlcpy truncates (I'd
rather it rejected if it didn't have enough buffer - could cause
issues if the meaning of the string changed due to truncation, eg if
it was a file path). Other alternative functions aren't widely in use.

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09  3:13               ` Matthew House
  2023-11-09 10:26                 ` Jonny Grant
@ 2023-11-09 10:31                 ` Jonny Grant
  2023-11-09 11:38                   ` Alejandro Colomar
  2023-11-09 12:23                 ` Alejandro Colomar
  2 siblings, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-09 10:31 UTC (permalink / raw)
  To: Matthew House; +Cc: Alejandro Colomar, linux-man, GNU C Library

With glibc added

On Thu, 9 Nov 2023 at 03:13, Matthew House <mattlloydhouse@gmail.com> wrote:
>
> On Wed, Nov 8, 2023 at 2:33 PM Alejandro Colomar <alx@kernel.org> wrote:
> > On Tue, Nov 07, 2023 at 09:12:37PM -0500, Matthew House wrote:
> > > Man pages aren't read only by people writing new code, but also by people
> > > reading and modifying existing code. And despite your preferences regarding
> > > which functions ought to be used to produce strings, it's a widespread (and
> > > correct) practice to produce a string from the character sequence created
> > > by strncpy(3). There are two ways of doing this, either by setting the last
> > > character of the destination buffer to null if you want to produce a
> > > truncated string, or by testing the last character against zero if you want
> > > to detect truncation and raise an error.
> >
> > It is not strncpy(3) who truncated, but the programmer by adding a NULL
> > in buff[BUFSIZ - 1].  In the following snippet, strncpy(3) will not
> > truncate:
> >
> >         char cs[3];
> >
> >         strncpy(cs, "foo", 3);
> >
> > And yet your code doing if (cs[2] != '\0') { goto error; } would think
> > it did.  That's because you deformed strncpy(3) to implement a poor
> > man's strlcpy(3).
> >
> >         char cs[3];
> >
> >         strncpy(cs, "foo", 3);
> >         cs[2] = '\0';  // The truncation is here, not in strncpy(3).
>
> That's indeed a self-consistent interpretation of strncpy(3)'s function,
> but I don't think it's borne out by its formal definition, which I was
> basing my reasoning on. The current Linux man page for strncpy(3) says,
>
>   These functions copy the string pointed to by src into a null-padded
>   character sequence at the fixed-width buffer pointed to by dst. If the
>   destination buffer, limited by its size, isn't large enough to hold the
>   copy, the resulting character sequence is truncated.
>
> Notice how it "copies the string": as your string_copying(7) says, a string
> includes both a character sequence and a final null byte. So I'd ordinarily
> read this definition as saying that strncpy(3) tries to copy src up to and
> including the null byte, but produces a truncated copy of the whole string
> if the destination buffer is too small. Thus, even if the destination
> buffer contains all non-null characters in the original string, then the
> copy has still been "truncated" in this sense.
>
> The ISO C definition, and by extension, the POSIX definition, make this
> interpretation even more explicit:
>
>   The strncpy function copies not more than n characters (characters that
>   follow a null character are not copied) from the array pointed to by s2
>   to the array pointed to by s1.
>
> That is, the terminating null byte is part of the copy, but not anything
> after the terminating null byte.
>
> So one can interpret strncpy(3) as copying a prefix of a character sequence
> into a buffer (and zero-filling the remainder), in which case you're
> correct that truncation cannot be detected. But the function is fomally
> defined as copying a prefix of a string into a buffer (and zero-filling the
> remainder), in which case the string has been truncated if the buffer
> doesn't end in a null byte afterward. It's just that one may not care about
> the terminating null byte being truncated if the user of the result just
> wants the initial character sequence.
>
> > > I'm not aware of any alternative to a strncpy(3)-based snippet for
> > > producing a possibly-truncated copy of a string, except for your preferred
> > > strlcpy(3) or stpecpy(3), which aren't available to anyone without a
> >
> > The Linux kernel has strscpy(3), which is also good, but is not
> > available to user space.
> >
> > > brand-new glibc (nor, by extension, any applications or libraries that want
> >
> > libbsd has provided strlcpy(3) since basically forever.  It is a very
> > portable library.  You don't need a brand-new glibc for having
> > strlcpy(3).
> >
> > <https://libbsd.freedesktop.org/wiki/>
>
> That's a nice library that I didn't know about! Unfortunately, I don't
> think it's a very viable option for the long tail of small libraries I've
> referred to, which generally don't have any sub-dependencies of their own,
> apart from those provided by the platform.
>
> Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their
> configure scripts from whatever build system you're using (in such a way
> that libbsd can locate libmd), ensuring they're safe for cross-compilation
> if that's a goal, ensuring you bundle them in a way that respects their
> license terms, and ensuring that any user of your library links to the two
> dependencies and doesn't duplicate them. At that point, rolling your own
> strlcpy(3) equivalent definitely sounds like less mental load, at least to
> me.
>
> > > functions); snprintf(3), which has the insidious flaw of not supporting
> > > more than INT_MAX characters on pain of UB, and also produces a warning if
> > > the compiler notices the possible truncation; or strlen(3) + min() +
> > > memcpy(3) + manually adding a null terminator, which is certainly more
> > > explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if
> > > that poses a performance problem, but similarly opens up room for
> > > off-by-one errors.
> >
> > More than the performance problem, I'm more worried about the
> > maintainability of strncpy(3).  When 20 years from now, a programmer
> > reading a piece of code full of strncpy(3) wants to migrate to a sane
> > function like strlcpy(3) or strcpy(3), the programmer needs to
> > understand if the zeroing was purposeful or just accidental.  Because
> > by using strlcpy(3), it may start leaking some trailing data if the
> > trailing of the buffer is meaningful to some program.
>
> I didn't see this as an issue in practice when I was reviewing all those
> existing usages of strncpy(3). The vast majority were used in the midst of
> simple string manipulation, where the destination buffer starts as
> uninitialized or zeroed out, and ultimately gets passed into a user
> expecting an ordinary null-terminated string.
>
> (One exception was a few functions that used strncpy(dst, "", len) to zero
> out the buffer, which is thankfully pretty obvious. Another exception was
> the functions that actually used strncpy(3) to produce a null-padded
> character sequence, e.g., when writing a value into a section of a binary.
> But in general, I found that it's usually not difficult to tell when a
> usage is being clever enough that the null padding might be significant.)
>
> In fact, the greater confusion came from the surprisingly common practice
> of using strncpy(3) like it's memcpy(3), by giving it the known length of
> the source string, or of some prefix computed through strchr(3) or similar.
> This is often then followed up by strncat(3) or similar, indicating that
> the writer clearly expects the full length to have non-null characters. But
> if the length computation is separated far enough from the actual call to
> strncpy(3), then it can become unclear whether the source is actually
> expected to have any interior null bytes before the computed length. (So if
> a list of alternatives to strncpy(3) is ever drawn up, then I'd suggest
> that ordinary memcpy(3) be one of them.)
>
> > > For the sake of reference, I looked into a few big C and C++ projects to
> > > see how often a strncpy(3)-based snippet was used to produce a truncated
> > > copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite
> > > of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in
> > > GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22,
> > > 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note
> > > that I haven't filtered out vendored dependencies, so there's a little bit
> > > of double-counting.) It seems like most codebases that don't ban strncpy(3)
> > > use a derived snippet somewhere or another. Also, I found 3 instances in
> > > glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by
> > > checking the last character.
> >
> > I know.  I've been rewriting the code handling strings in shadow-utils
> > for the last year, and ther was a lot of it.  I fixed several small bugs
> > in the process, so I recommend avoiding it.
>
> I can't tell you about your own experience, but in mine, the root cause of
> most string-handling bugs has been excessive cleverness in using the
> standard string functions, rather than the behavior of the functions
> themselves. So one worry of mine is that if strncpy(3) ends up being
> deprecated or whatever, then authors of portable libraries will start
> writing lots of custom memcpy(3)-based replacements to their strncpy(3)-
> based snippets, and more lines of code will introduce more opportunities
> for cleverness.
>
> (This is also why I was confused by your support for strcpy(3) on the
> grounds that _FORTIFY_SOURCE exists. Sure, it's better than strncpy(3) in
> that its behavior isn't nearly so subtle, but _FORTIFY_SOURCE can only
> protect us from overruns, not from all the "small bugs" that might ensue
> from people becoming more clever with sizing the destination buffer with
> strcpy(3). Also, if it were truly a panacea, then we'd hardly have to worry
> about the problems of strncpy(3) at all, since it would detect any misuse
> of the function.)

Matthew, thank you for sharing your information.

https://www.gnu.org/software/libc/manual/html_node/Source-Fortification.html

I do find _FORTIFY_SOURCE useful in a developer build, for testing, it
raises SIGABRT and we can get useful coredump. Without that macro, it
would likely still crash or corrupt. However, in my experience in
safety critical applications, we really need to avoid the crashes, so
we'd write user-space functions that do the same sanity checks (in the
same way that fortify does) and then propagate the error back to the
application to report the failure, and log it.

>
> Probably the only way to solve the cleverness issue for good is to have an
> immediately-available, foolproof, performant set of string functions that
> are extremely straightforward to understand and use, flexible enough for
> any use case, and generally agreed to be the first choice for string
> manipulation.

What's the best standardized function for C string copying in your
opinion?  They all seem to have drawbacks, strlcpy truncates (I'd
rather it rejected if it didn't have enough buffer - could cause
issues if the meaning of the string changed due to truncation, eg if
it was a file path). Other alternative functions aren't widely in use.

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* catenate vs concatenate (was: strncpy clarify result may not be null terminated)
  2023-11-09 10:13                               ` Jonny Grant
@ 2023-11-09 11:08                                 ` Alejandro Colomar
  2023-11-09 14:06                                   ` catenate vs concatenate Jonny Grant
  2023-11-27 14:33                                   ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
  2023-11-09 11:13                                 ` strncpy clarify result may not be null terminated Alejandro Colomar
  1 sibling, 2 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09 11:08 UTC (permalink / raw)
  To: Jonny Grant
  Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg,
	GNU libc development, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 796 bytes --]

Hi Jonny,

On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote:
> https://man7.org/linux/man-pages/man7/string_copying.7.html
> Rather than "catenation", in my experience "concatenation" is the common term to explain what it does. There are quite a few on that page. Probably other man pages too.

Here's why:
<https://lore.kernel.org/linux-man/CAKH6PiUrQzb7vRZxUs0742WnfaLpcUec0QfdJQJ5Di8LqFg+NA@mail.gmail.com/>

Douglas McIlroy wrote (Wed, 14 Dec 2022 11:22:05 -0500):
>> concatenate
> 
> We began fighting this pomposity before v7. There has only been
> backsliding since..
> "Catenate" is crisper, means the same thing, and concurs with the "cat" command.
> I invite you to join the battle for simplicity.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 10:13                               ` Jonny Grant
  2023-11-09 11:08                                 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
@ 2023-11-09 11:13                                 ` Alejandro Colomar
  2023-11-09 14:05                                   ` Jonny Grant
  1 sibling, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09 11:13 UTC (permalink / raw)
  To: Jonny Grant
  Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg,
	GNU libc development, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 1169 bytes --]

Hi Jonny,

On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote:
> On 09/11/2023 00:29, Alejandro Colomar wrote:
> How about following the style of the other man pages that put the notes about each function below them? (rather than above)
> https://man7.org/linux/man-pages/man3/string.3.html
> 
> size_t strlen(const char *s);
> Return the length of the string s.
> 
> 
> At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different:
> 
> // Copy/catenate a string.
> char *strcpy(char *restrict dst, const char *restrict src);
> char *strcat(char *restrict dst, const char *restrict src);

The reason for this presentation is that I want to first look at what
they do, and only then look at the function you need to do that.

So, if you want to copy from a character sequence into a string, you
search for that, and it will tell you what functions you can use for
that (strncat(3) is the only standard one).

If you want to search for a specific function, you can always search
with '/strncpy'.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 10:31                 ` Jonny Grant
@ 2023-11-09 11:38                   ` Alejandro Colomar
  2023-11-09 12:43                     ` Alejandro Colomar
                                       ` (3 more replies)
  0 siblings, 4 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09 11:38 UTC (permalink / raw)
  To: Jonny Grant; +Cc: Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 2122 bytes --]

Hi Jonny,

On Thu, Nov 09, 2023 at 10:31:49AM +0000, Jonny Grant wrote:
> > Probably the only way to solve the cleverness issue for good is to have an
> > immediately-available, foolproof, performant set of string functions that
> > are extremely straightforward to understand and use, flexible enough for
> > any use case, and generally agreed to be the first choice for string
> > manipulation.
> 
> What's the best standardized function for C string copying in your

strlcpy(3) will soon be standard.  POSIX.1-202x (Issue 8) will add it,
which is why it's been added recently to glibc.  Hopefully, ISO C3x will
follow (yeah, it's not like tomorrow).

> opinion?  They all seem to have drawbacks, strlcpy truncates (I'd
> rather it rejected if it didn't have enough buffer - could cause
> issues if the meaning of the string changed due to truncation, eg if
> it was a file path). Other alternative functions aren't widely in use.

If you are consistent in checking the return value of strlcpy(3) and
reporting an error, it's the best standard alternative nowadays.
snprintf(3), except for using int instead of size_t, has an equivalent
API, and is in C99, in case that means something.

If you would want to write something based on Michael Kerrisk's article,
you could do this:

	ssize_t
	strxcpy(char *restrict dst, char *restrict src, size_t dsize)
	{
		if (strlen(src) < dsize)
			return -1;

		strcpy(dst, src);
	}

You may also want to calculate 'dsize' automagically, to avoid human
error, in case it's an array, so you could write a macro on top of it:

	#define STRXCPY(dst, src)  strxcpy(dst, src, ARRAY_SIZE(dst))

These are just small wrappers over standard functions, so you shouldn't
have problems adding them to your project.

This is my long term plan for shadow-utils, indeed.  I'm first
transforming strncpy(3) calls into strlcpy(3) to remove the superfluous
padding, and later will use this strxcpy() to remove the truncated
strings to avoid misinterpretation.

Cheers,
Alex

> 
> Kind regards, Jonny

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09  3:13               ` Matthew House
  2023-11-09 10:26                 ` Jonny Grant
  2023-11-09 10:31                 ` Jonny Grant
@ 2023-11-09 12:23                 ` Alejandro Colomar
  2023-11-09 12:35                   ` Alejandro Colomar
                                     ` (2 more replies)
  2 siblings, 3 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09 12:23 UTC (permalink / raw)
  To: Matthew House; +Cc: Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 20957 bytes --]

Hi Matthew,

On Wed, Nov 08, 2023 at 10:13:39PM -0500, Matthew House wrote:
> On Wed, Nov 8, 2023 at 2:33 PM Alejandro Colomar <alx@kernel.org> wrote:
> > On Tue, Nov 07, 2023 at 09:12:37PM -0500, Matthew House wrote:
> > > Man pages aren't read only by people writing new code, but also by people
> > > reading and modifying existing code. And despite your preferences regarding
> > > which functions ought to be used to produce strings, it's a widespread (and
> > > correct) practice to produce a string from the character sequence created
> > > by strncpy(3). There are two ways of doing this, either by setting the last
> > > character of the destination buffer to null if you want to produce a
> > > truncated string, or by testing the last character against zero if you want
> > > to detect truncation and raise an error.
> >
> > It is not strncpy(3) who truncated, but the programmer by adding a NULL
> > in buff[BUFSIZ - 1].  In the following snippet, strncpy(3) will not
> > truncate:
> >
> >         char cs[3];
> >
> >         strncpy(cs, "foo", 3);
> >
> > And yet your code doing if (cs[2] != '\0') { goto error; } would think
> > it did.  That's because you deformed strncpy(3) to implement a poor
> > man's strlcpy(3).
> >
> >         char cs[3];
> >
> >         strncpy(cs, "foo", 3);
> >         cs[2] = '\0';  // The truncation is here, not in strncpy(3).
> 
> That's indeed a self-consistent interpretation of strncpy(3)'s function,
> but I don't think it's borne out by its formal definition, which I was
> basing my reasoning on. The current Linux man page for strncpy(3) says,
> 
>   These functions copy the string pointed to by src into a null-padded
>   character sequence at the fixed-width buffer pointed to by dst. If the
>   destination buffer, limited by its size, isn't large enough to hold the
>   copy, the resulting character sequence is truncated.
> 
> Notice how it "copies the string": as your string_copying(7) says, a string
> includes both a character sequence and a final null byte. So I'd ordinarily
> read this definition as saying that strncpy(3) tries to copy src up to and
> including the null byte, but produces a truncated copy of the whole string
> if the destination buffer is too small. Thus, even if the destination
> buffer contains all non-null characters in the original string, then the
> copy has still been "truncated" in this sense.

Yes, that was an inconsistency in my definition.  Thanks to DJ's
suggestion ("copies bytes from the string", that has been fixed.  Maybe
it would be even better to say "copies characters from the string".

> 
> The ISO C definition, and by extension, the POSIX definition, make this
> interpretation even more explicit:
> 
>   The strncpy function copies not more than n characters (characters that
>   follow a null character are not copied) from the array pointed to by s2
>   to the array pointed to by s1.
> 
> That is, the terminating null byte is part of the copy, but not anything
> after the terminating null byte.
> 
> So one can interpret strncpy(3) as copying a prefix of a character sequence
> into a buffer (and zero-filling the remainder), in which case you're
> correct that truncation cannot be detected. But the function is fomally
> defined as copying a prefix of a string into a buffer (and zero-filling the
> remainder), in which case the string has been truncated if the buffer
> doesn't end in a null byte afterward. It's just that one may not care about
> the terminating null byte being truncated if the user of the result just
> wants the initial character sequence.

Yes, with the ISO C definition of strncpy(3), you can detect truncation.
The problem is that while my definition of it is complete, the
definition by ISO C makes it an incomplete function (to complete its
functionallity in copying strings, you need to add an explicit '\0'
after the call).  So I prefer mine, and for self-consistency, it can't
report truncation.

> 
> > > I'm not aware of any alternative to a strncpy(3)-based snippet for
> > > producing a possibly-truncated copy of a string, except for your preferred
> > > strlcpy(3) or stpecpy(3), which aren't available to anyone without a
> >
> > The Linux kernel has strscpy(3), which is also good, but is not
> > available to user space.
> >
> > > brand-new glibc (nor, by extension, any applications or libraries that want
> >
> > libbsd has provided strlcpy(3) since basically forever.  It is a very
> > portable library.  You don't need a brand-new glibc for having
> > strlcpy(3).
> >
> > <https://libbsd.freedesktop.org/wiki/>
> 
> That's a nice library that I didn't know about! Unfortunately, I don't
> think it's a very viable option for the long tail of small libraries I've
> referred to, which generally don't have any sub-dependencies of their own,
> apart from those provided by the platform.
> 
> Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their
> configure scripts from whatever build system you're using (in such a way
> that libbsd can locate libmd), ensuring they're safe for cross-compilation
> if that's a goal, ensuring you bundle them in a way that respects their
> license terms, and ensuring that any user of your library links to the two
> dependencies and doesn't duplicate them. At that point, rolling your own
> strlcpy(3) equivalent definitely sounds like less mental load, at least to
> me.

Yes, if you had 0 deps, it might be simpler to add your implementation.
Although it's a tricky function to implement, so I'd be careful.  If you
need to roll your own, I would go for a simpler function; maybe a
wrapper over strlen(3)+strcpy(3).

> 
> > > functions); snprintf(3), which has the insidious flaw of not supporting
> > > more than INT_MAX characters on pain of UB, and also produces a warning if
> > > the compiler notices the possible truncation; or strlen(3) + min() +
> > > memcpy(3) + manually adding a null terminator, which is certainly more
> > > explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if
> > > that poses a performance problem, but similarly opens up room for
> > > off-by-one errors.
> >
> > More than the performance problem, I'm more worried about the
> > maintainability of strncpy(3).  When 20 years from now, a programmer
> > reading a piece of code full of strncpy(3) wants to migrate to a sane
> > function like strlcpy(3) or strcpy(3), the programmer needs to
> > understand if the zeroing was purposeful or just accidental.  Because
> > by using strlcpy(3), it may start leaking some trailing data if the
> > trailing of the buffer is meaningful to some program.
> 
> I didn't see this as an issue in practice when I was reviewing all those
> existing usages of strncpy(3). The vast majority were used in the midst of
> simple string manipulation, where the destination buffer starts as
> uninitialized or zeroed out, and ultimately gets passed into a user
> expecting an ordinary null-terminated string.
> 
> (One exception was a few functions that used strncpy(dst, "", len) to zero

Holy crap!  Didn't these programmers know bzero(3) or memset(3)?  :D

> out the buffer, which is thankfully pretty obvious. Another exception was
> the functions that actually used strncpy(3) to produce a null-padded
> character sequence, e.g., when writing a value into a section of a binary.
> But in general, I found that it's usually not difficult to tell when a
> usage is being clever enough that the null padding might be significant.)
> 
> In fact, the greater confusion came from the surprisingly common practice
> of using strncpy(3) like it's memcpy(3), by giving it the known length of

It gets better!  :D

> the source string, or of some prefix computed through strchr(3) or similar.
> This is often then followed up by strncat(3) or similar, indicating that
> the writer clearly expects the full length to have non-null characters. But
> if the length computation is separated far enough from the actual call to
> strncpy(3), then it can become unclear whether the source is actually
> expected to have any interior null bytes before the computed length. (So if
> a list of alternatives to strncpy(3) is ever drawn up, then I'd suggest
> that ordinary memcpy(3) be one of them.)

string_copying(7) was initially devised as a page indicating
alternatives to strncpy(3), depending on the purpose of the code.
memcpy(3) is not mentioned (except in SEE ALSO), but mempcpy(3) is,
which is essentially the same (but with a more useful return value).

> > > For the sake of reference, I looked into a few big C and C++ projects to
> > > see how often a strncpy(3)-based snippet was used to produce a truncated
> > > copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite
> > > of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in
> > > GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22,
> > > 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note
> > > that I haven't filtered out vendored dependencies, so there's a little bit
> > > of double-counting.) It seems like most codebases that don't ban strncpy(3)
> > > use a derived snippet somewhere or another. Also, I found 3 instances in
> > > glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by
> > > checking the last character.
> >
> > I know.  I've been rewriting the code handling strings in shadow-utils
> > for the last year, and ther was a lot of it.  I fixed several small bugs
> > in the process, so I recommend avoiding it.
> 
> I can't tell you about your own experience, but in mine, the root cause of
> most string-handling bugs has been excessive cleverness in using the
> standard string functions, rather than the behavior of the functions
> themselves. So one worry of mine is that if strncpy(3) ends up being
> deprecated or whatever, then authors of portable libraries will start
> writing lots of custom memcpy(3)-based replacements to their strncpy(3)-
> based snippets, and more lines of code will introduce more opportunities
> for cleverness.

Don't worry.  strncpy(3) won't be deprecated, thanks to tar(1).  ;)

> 
> (This is also why I was confused by your support for strcpy(3) on the
> grounds that _FORTIFY_SOURCE exists. Sure, it's better than strncpy(3) in
> that its behavior isn't nearly so subtle, but _FORTIFY_SOURCE can only
> protect us from overruns, not from all the "small bugs" that might ensue
> from people becoming more clever with sizing the destination buffer with
> strcpy(3).

I don't think strcpy(3) is as propense as strncpy(3) to ask programmers
to be clever about it.  In the case of strncpy(3) it's due to it being
an incomplete string-copying function.  strcpy(3) is complete.

> Also, if it were truly a panacea, then we'd hardly have to worry
> about the problems of strncpy(3) at all, since it would detect any misuse
> of the function.)

Fortification detects overruns in writes, which is how it protects
strcpy(3).  However, fortification can't protect against overruns in
reads, which is what strncpy(3) causes due to missing null terminators.
strncpy(3) also causes off-by-one bugs (I'll detail below), which
strcpy(3) doesn't (and strlcpy(3) doesn't either).

> 
> Probably the only way to solve the cleverness issue for good is to have an
> immediately-available, foolproof, performant set of string functions that
> are extremely straightforward to understand and use, flexible enough for
> any use case, and generally agreed to be the first choice for string
> manipulation.
> 
> Unfortunately, probably the closest match to those criteria, especially the
> availability criterion, is snprintf(3), which has the flaws of using int
> instead of size_t for most sizes, not being very performant, and not being
> async-signal-safe. Alas, it will likely remain a dream, given all the wars
> over which safer string functions have the best API. But at least
> strlcpy(3) has a pretty sound interface, if other platforms ever get around
> to including it by default.

strlcpy(3) will be in POSIX.1-202x (Issue 8), so it's a matter of time
that it'll be widespread.

> 
> > > the code to understand the concept behind how these two snippets work, that
> > > the only difference between the strncpy(3)'s special "character sequence"
> > > and an ordinary C string is an additional null terminator at the end of the
> > > destination buffer.
> >
> > This is part of string_copying(7):
> >
> > DESCRIPTION
> >    Terms (and abbreviations)
> >      string (str)
> >             is  a  sequence  of zero or more non‐null characters followed by a
> >             null byte.
> >
> >      character sequence
> >             is a sequence of zero or  more  non‐null  characters.   A  program
> >             should  never use a character sequence where a string is required.
> >             However, with appropriate care, a string can be used in the  place
> >             of a character sequence.
> >
> > I think that is very explicit in the difference.  strncpy(3) refers to
> > that page for understanding the differences, so I think it is
> > documented.
> >
> > strncpy(3):
> > CAVEATS
> >      The  name  of  these  functions  is confusing.  These functions produce a
> >      null‐padded character sequence, not a string (see string_copying(7)).
> 
> My point is isn't that the difference is undocumented, but that the typical
> man page reader isn't reading the man pages for their own sake, but because
> they're looking at some code, and they want to Know What It's Doing as soon
> as possible.

We could maybe add a list of ways people have tried to be clever with
strncpy(3) in the past and failed, and then explain why those uses are
broken.  This could be in a BUGS section.

> If they're getting directed around elsewhere with weird
> warnings about "not a string" ("what's it going on about, I thought it was
> null-padded?"), then I worry there's a good chance that they'll instead
> bounce off the man page and try figuring it out some other way. And even if
> they do follow the reference, then they might have difficulty understanding
> the implications, since many people don't think of things in terms of
> formal definitions.
> 
> > > reasonable to highlight precisely why strncpy(3)'s output isn't a string
> >
> > How about this?:
> >
> > diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
> > index d4c2ce83d..c80c8b640 100644
> > --- a/man3/stpncpy.3
> > +++ b/man3/stpncpy.3
> > @@ -108,7 +108,10 @@ .SH HISTORY
> >  .SH CAVEATS
> >  The name of these functions is confusing.
> >  These functions produce a null-padded character sequence,
> > -not a string (see
> > +not a string.
> > +While strings have a terminating NUL byte,
> > +character sequences do not have any terminating byte
> > +(see
> >  .BR string_copying (7)).
> >  .P
> >  It's impossible to distinguish truncation by the result of the call,
> 
> Yes, I'd be perfectly happy with something like that. That way, the
> scariness is far more immediate ("the output might not be terminated!?"),
> and thus more accessible to the typical reader.

Ok; I'll add that.

> 
> > > (viz., the lack of a null terminator), instead of trying to insist that its
> > > output is worlds apart from anything string-related, especially given the
> > > volume of existing correct code that belies that notion.
> >
> > It is not correct code.  That code is doing extra work which confuses
> > maintainers.  It is a lot like writing dead code, since you're writing
> > zeros that nobody is reading, which confuses maintainers.
> 
> I am really not a fan of conflating the notions of "code that is difficult
> to maintain" with "code that doesn't perform the task it is intended to
> perform". When I think about incorrect code, I think about things like
> setenv(3) that are just waiting to cause trouble in popular libraries built
> and deployed today.
> 
> Meanwhile, "confusing maintainers" is a very subjective notion specific to
> the both the code and the maintainers: if someone sees some code allocating
> a fresh buffer, strncpy(3)ing a string into it, slapping a terminator on
> the end, and finally passing the result into something clearly expecting a
> string, then why would they be guaranteed to be sweating bullets over
> whatever happened to rest of the fresh buffer? Especially given how
> widespread the strncpy(3) + extra null terminator pattern already is.
> 
> Instead, it's code making use of strncpy(3) in a particularly clever way
> that I'd find confusing, and in those cases, I lie the blame squarely on
> the cleverness rather than the function itself.

I blame the definition of the function of ISO C.  Why?  Because by being
an incomplete string-copying function, it forces the programmer to be
clever about it.  You can't just use strncpy(3) and that's all; you need
to do something else, and then you do clever stuff, which ends up badly.

> 
> > Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no,
> > it's not correct code.  It's rather dangerous code that just happens to
> > not be vulnerable most of the time.
> 
> So will all the custom strlen(3)+memcpy(3)-based replacements suddenly be
> immune to off-by-one bugs?

Slightly.  Here's the typical use of strlen(3)+strcpy(3):

if (strlen(src) >= dsize)
	goto error;
strcpy(dst, src);

There's no +1 or -1 in that code, so it's hard to make an off-by-one
mistake.  Okay, you may have seen that it has a '>=', which one could
accidentally replace by a '>', causing an off-by-one.  I'd wrap that
thing in a strxcpy() wrapper so you avoid repetition. 

> Or will the vast majority of current strncpy(3)
> users be willing to either restrict their platform support or add two extra
> dependencies to their build process just to have strlcpy(3)? I'd hardly be
> inclined to think that off-by-one bugs are a particular specialty of
> strncpy(3).

They are.  Here's the typical use of strncpy(3) as a replacement:

strncpy(dst, src, dsize);
if (dst[dsize - 1] != '\0')
	goto error;
dst[dsize - 1] = '\0';

There are many more moving parts, so more chances to make mistakes.
And you see it forces the programmer to write explicitly -1 twice.  I've
seen code that forgets to do the -1, and also code that uses -1 in the
strncpy(3) call (which makes it impossible to detect truncation).

> 
> > > Or, to answer your question, "It's appropriate to keep using strncpy(3) in
> > > existing code where it's currently used as part of creating a truncated
> > > string, and it's not especially inappropriate to use strncpy(3) in new code
> > > as part of creating a truncated string, if the code must support platforms
> > > without strlcpy(3) or similar, and if the resulting snippets are few enough
> > > and well-commented enough that they create less mental load than creating
> > > and maintaining a custom helper function."
> >
> > strncpy(3) calls are never well documented.  Do you add a comment in
> > each such call saying "this zeroing is superfluous"?  Probably not.
> 
> By that standard, every call to a function that takes an output pointer and
> returns the number of elements written (say, readlink(2)) would need a
> comment saying "the remaining elements in this array now have undefined
> values".

No, because it does precisely what is intended.  It is when you add dead
code when you need to justify it.

> I don't think it's controversial that in many situations, we
> tacitly understand that we simply don't care about the remainder of a

While the analysis isn't very hard, it takes some time, examining all
surrounding code to make sure nothing cares about the trailing bytes.
When you have a hundred such calls, you need to make sure nobody was too
clever around any of them.

> buffer after a certain point. In the case of producing a string, that point
> is going to be the null terminator, in the absence of on-site documentation
> to the contrary; I'd label anything else as overly clever.

But again, strncpy(3) forces you to be clever.

> 
> Meanwhile, "never" would be a strong word to describe the rate that
> strncpy(3)'s lack of null termination is documented at the call site; 30 of
> the 339 call sites I mentioned have an associated comment regarding null

Hmm, I should have said rarely.

Cheers,
Alex

> termination. (ICU seems to be the best library comment-wise, but even it
> doesn't place them consistently.) It's obviously far from routine in
> existing code, but it's not something that never happens.
> 
> Thank you,
> Matthew House

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 12:23                 ` Alejandro Colomar
@ 2023-11-09 12:35                   ` Alejandro Colomar
  2023-11-10  7:06                   ` Oskari Pirhonen
  2023-11-10 16:06                   ` Matthew House
  2 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09 12:35 UTC (permalink / raw)
  To: Matthew House; +Cc: Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 2310 bytes --]

Hi Matthew,

On Thu, Nov 09, 2023 at 01:23:14PM +0100, Alejandro Colomar wrote:
> > > > reasonable to highlight precisely why strncpy(3)'s output isn't a string
> > >
> > > How about this?:
> > >
> > > diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
> > > index d4c2ce83d..c80c8b640 100644
> > > --- a/man3/stpncpy.3
> > > +++ b/man3/stpncpy.3
> > > @@ -108,7 +108,10 @@ .SH HISTORY
> > >  .SH CAVEATS
> > >  The name of these functions is confusing.
> > >  These functions produce a null-padded character sequence,
> > > -not a string (see
> > > +not a string.
> > > +While strings have a terminating NUL byte,
> > > +character sequences do not have any terminating byte
> > > +(see
> > >  .BR string_copying (7)).
> > >  .P
> > >  It's impossible to distinguish truncation by the result of the call,
> > 
> > Yes, I'd be perfectly happy with something like that. That way, the
> > scariness is far more immediate ("the output might not be terminated!?"),
> > and thus more accessible to the typical reader.
> 
> Ok; I'll add that.

I think DJ's suggestion of providing an example shows this without
needing a wordy explanation:

<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=f502d3290c9f6f13f870cc041f553073af434949>
Here's the page now:

CAVEATS
     The name of these functions is confusing.  These  functions  pro‐
     duce   a  null‐padded  character  sequence,  not  a  string  (see
     string_copying(7)).  For example:

         strncpy(buf, "1", 5);       // { '1',   0,   0,   0,   0 }
         strncpy(buf, "1234", 5);    // { '1', '2', '3', '4',   0 }
         strncpy(buf, "12345", 5);   // { '1', '2', '3', '4', '5' }
         strncpy(buf, "123456", 5);  // { '1', '2', '3', '4', '5' }

     It’s impossible to distinguish truncation by the  result  of  the
     call,  from  a  character sequence that just fits the destination
     buffer; truncation should be detected by comparing the length  of
     the input string with the size of the destination buffer.

I think this is quite clear regarding what this functions does and
doesn't.

I'll leave it like that, I think.

Cheers,
Alex

> -- 
> <https://www.alejandro-colomar.es/>



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 11:38                   ` Alejandro Colomar
@ 2023-11-09 12:43                     ` Alejandro Colomar
  2023-11-09 12:51                     ` Xi Ruoyao
                                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09 12:43 UTC (permalink / raw)
  To: Jonny Grant; +Cc: Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]

On Thu, Nov 09, 2023 at 12:38:37PM +0100, Alejandro Colomar wrote:
> If you would want to write something based on Michael Kerrisk's article,
> you could do this:
> 
> 	ssize_t
> 	strxcpy(char *restrict dst, char *restrict src, size_t dsize)
> 	{
> 		if (strlen(src) < dsize)

Heh, here's my off-by-one bug of the day.  Good thing is I can fix it in
a single place; unlike calling strncpy(3) all the time.

This should have been <=.

Cheers,
Alex

> 			return -1;
> 
> 		strcpy(dst, src);
> 	}
> 
> You may also want to calculate 'dsize' automagically, to avoid human
> error, in case it's an array, so you could write a macro on top of it:
> 
> 	#define STRXCPY(dst, src)  strxcpy(dst, src, ARRAY_SIZE(dst))
> 
> These are just small wrappers over standard functions, so you shouldn't
> have problems adding them to your project.
> 
> This is my long term plan for shadow-utils, indeed.  I'm first
> transforming strncpy(3) calls into strlcpy(3) to remove the superfluous
> padding, and later will use this strxcpy() to remove the truncated
> strings to avoid misinterpretation.

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 11:38                   ` Alejandro Colomar
  2023-11-09 12:43                     ` Alejandro Colomar
@ 2023-11-09 12:51                     ` Xi Ruoyao
  2023-11-09 14:01                       ` Alejandro Colomar
  2023-11-09 18:11                     ` Paul Eggert
  2023-11-10 11:23                     ` Jonny Grant
  3 siblings, 1 reply; 138+ messages in thread
From: Xi Ruoyao @ 2023-11-09 12:51 UTC (permalink / raw)
  To: Alejandro Colomar, Jonny Grant; +Cc: Matthew House, linux-man, GNU C Library

On Thu, 2023-11-09 at 12:38 +0100, Alejandro Colomar wrote:
> If you are consistent in checking the return value of strlcpy(3) and
> reporting an error, it's the best standard alternative nowadays.
> snprintf(3), except for using int instead of size_t, has an equivalent
> API, and is in C99, in case that means something.

Yes, you can always create your own wrapper instead of demanding a
standard function which must be implemented by every libc.

> If you would want to write something based on Michael Kerrisk's article,
> you could do this:

> 	ssize_t
> 	strxcpy(char *restrict dst, char *restrict src, size_t dsize)
> 	{
> 		if (strlen(src) < dsize)
> 			return -1;
> 
> 		strcpy(dst, src);
> 	}

I'd like to add __attribute__ ((warn_unused_result)) for this wrapper as
well.

-- 
Xi Ruoyao <xry111@xry111.site>
School of Aerospace Science and Technology, Xidian University

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 12:51                     ` Xi Ruoyao
@ 2023-11-09 14:01                       ` Alejandro Colomar
  0 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09 14:01 UTC (permalink / raw)
  To: Xi Ruoyao; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 1089 bytes --]

On Thu, Nov 09, 2023 at 08:51:34PM +0800, Xi Ruoyao wrote:
> On Thu, 2023-11-09 at 12:38 +0100, Alejandro Colomar wrote:
> > If you are consistent in checking the return value of strlcpy(3) and
> > reporting an error, it's the best standard alternative nowadays.
> > snprintf(3), except for using int instead of size_t, has an equivalent
> > API, and is in C99, in case that means something.
> 
> Yes, you can always create your own wrapper instead of demanding a
> standard function which must be implemented by every libc.
> 
> > If you would want to write something based on Michael Kerrisk's article,
> > you could do this:
> 
> > 	ssize_t
> > 	strxcpy(char *restrict dst, char *restrict src, size_t dsize)
> > 	{
> > 		if (strlen(src) < dsize)
> > 			return -1;
> > 
> > 		strcpy(dst, src);
> > 	}
> 
> I'd like to add __attribute__ ((warn_unused_result)) for this wrapper as
> well.

Indeed.  Thanks!

> 
> -- 
> Xi Ruoyao <xry111@xry111.site>
> School of Aerospace Science and Technology, Xidian University

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 11:13                                 ` strncpy clarify result may not be null terminated Alejandro Colomar
@ 2023-11-09 14:05                                   ` Jonny Grant
  2023-11-09 15:04                                     ` Alejandro Colomar
  0 siblings, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-09 14:05 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg,
	GNU libc development, 'linux-man'



On 09/11/2023 11:13, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote:
>> On 09/11/2023 00:29, Alejandro Colomar wrote:
>> How about following the style of the other man pages that put the notes about each function below them? (rather than above)
>> https://man7.org/linux/man-pages/man3/string.3.html
>>
>> size_t strlen(const char *s);
>> Return the length of the string s.
>>
>>
>> At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different:
>>
>> // Copy/catenate a string.
>> char *strcpy(char *restrict dst, const char *restrict src);
>> char *strcat(char *restrict dst, const char *restrict src);
> 
> The reason for this presentation is that I want to first look at what
> they do, and only then look at the function you need to do that.

That appears different to the man page convention. It looks odd especially with the extra // that I don't recall other pages having in the description, usually that would be for examples. Consistency is best, but I'll leave it with you.
Kind regards
Jonny

> 
> So, if you want to copy from a character sequence into a string, you
> search for that, and it will tell you what functions you can use for
> that (strncat(3) is the only standard one).
> 
> If you want to search for a specific function, you can always search
> with '/strncpy'.
> 
> Cheers,
> Alex
> 

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: catenate vs concatenate
  2023-11-09 11:08                                 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
@ 2023-11-09 14:06                                   ` Jonny Grant
  2023-11-27 14:33                                   ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
  1 sibling, 0 replies; 138+ messages in thread
From: Jonny Grant @ 2023-11-09 14:06 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg,
	GNU libc development, 'linux-man'



On 09/11/2023 11:08, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote:
>> https://man7.org/linux/man-pages/man7/string_copying.7.html
>> Rather than "catenation", in my experience "concatenation" is the common term to explain what it does. There are quite a few on that page. Probably other man pages too.
> 
> Here's why:
> <https://lore.kernel.org/linux-man/CAKH6PiUrQzb7vRZxUs0742WnfaLpcUec0QfdJQJ5Di8LqFg+NA@mail.gmail.com/>
> 
> Douglas McIlroy wrote (Wed, 14 Dec 2022 11:22:05 -0500):
>>> concatenate
>>
>> We began fighting this pomposity before v7. There has only been
>> backsliding since..
>> "Catenate" is crisper, means the same thing, and concurs with the "cat" command.
>> I invite you to join the battle for simplicity.
> 
> Cheers,
> Alex
> 

Looks like it's already been discussed. Where a term is already in use, it's a question if to change the commonly used term. Technical documents seem to be mostly 'concatenate'. Looks like people have already decided on going with 'catenate'.
Kind regards
Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-08 23:06                                 ` Paul Eggert
  2023-11-08 23:28                                   ` DJ Delorie
  2023-11-09  0:24                                   ` Alejandro Colomar
@ 2023-11-09 14:11                                   ` Jonny Grant
  2023-11-09 14:35                                     ` Alejandro Colomar
  2 siblings, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-09 14:11 UTC (permalink / raw)
  To: Paul Eggert, Alejandro Colomar, linux-man
  Cc: libc-alpha, DJ Delorie, Matthew House, Oskari Pirhonen,
	Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
	G. Branden Robinson, Carlos O'Donell



On 08/11/2023 23:06, Paul Eggert wrote:
> On 11/8/23 14:17, Alejandro Colomar wrote:
>> These copy*from*  a string
> 
> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string.
> 
> By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant.

That's a great reference page Paul, lots of useful information in the manual.
https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html

Re this man page:

https://man7.org/linux/man-pages/man3/string.3.html

 Obsolete functions
       char *strncpy(char dest[restrict .n], const char src[restrict .n],
                     size_t n);
              Copy at most n bytes from string src to dest, returning a
              pointer to the start of dest.


It could clarify
"Copy at most n bytes from string src to ARRAY dest, returning a
pointer to the start of ARRAY dest."

(caps for my emphasis in this email)

Kind regards
Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-09 14:11                                   ` Jonny Grant
@ 2023-11-09 14:35                                     ` Alejandro Colomar
  2023-11-09 14:47                                       ` Jonny Grant
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09 14:35 UTC (permalink / raw)
  To: Jonny Grant
  Cc: Paul Eggert, linux-man, libc-alpha, DJ Delorie, Matthew House,
	Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto,
	Zack Weinberg, G. Branden Robinson, Carlos O'Donell

[-- Attachment #1: Type: text/plain, Size: 1456 bytes --]

Hi Jonny,

On Thu, Nov 09, 2023 at 02:11:14PM +0000, Jonny Grant wrote:
> On 08/11/2023 23:06, Paul Eggert wrote:
> > On 11/8/23 14:17, Alejandro Colomar wrote:
> >> These copy*from*  a string
> > 
> > Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string.
> > 
> > By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant.
> 
> That's a great reference page Paul, lots of useful information in the manual.
> https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html
> 
> Re this man page:
> 
> https://man7.org/linux/man-pages/man3/string.3.html
> 
>  Obsolete functions
>        char *strncpy(char dest[restrict .n], const char src[restrict .n],
>                      size_t n);
>               Copy at most n bytes from string src to dest, returning a
>               pointer to the start of dest.

Uh, I forgot about that page.  I'll have a look at it and update it.  At
least, I need to remove that "Obsolete functions".

> 
> 
> It could clarify
> "Copy at most n bytes from string src to ARRAY dest, returning a
> pointer to the start of ARRAY dest."

I think I prefer DJ's suggestion:

"Fill a fixed‐width null‐padded buffer with bytes from a string."

Thanks!
Alex

> 
> (caps for my emphasis in this email)
> 
> Kind regards
> Jonny

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-09 14:35                                     ` Alejandro Colomar
@ 2023-11-09 14:47                                       ` Jonny Grant
  2023-11-09 15:02                                         ` Alejandro Colomar
  0 siblings, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-09 14:47 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Paul Eggert, linux-man, libc-alpha, DJ Delorie, Matthew House,
	Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto,
	Zack Weinberg, G. Branden Robinson, Carlos O'Donell



On 09/11/2023 14:35, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Thu, Nov 09, 2023 at 02:11:14PM +0000, Jonny Grant wrote:
>> On 08/11/2023 23:06, Paul Eggert wrote:
>>> On 11/8/23 14:17, Alejandro Colomar wrote:
>>>> These copy*from*  a string
>>>
>>> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string.
>>>
>>> By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant.
>>
>> That's a great reference page Paul, lots of useful information in the manual.
>> https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html
>>
>> Re this man page:
>>
>> https://man7.org/linux/man-pages/man3/string.3.html
>>
>>  Obsolete functions
>>        char *strncpy(char dest[restrict .n], const char src[restrict .n],
>>                      size_t n);
>>               Copy at most n bytes from string src to dest, returning a
>>               pointer to the start of dest.
> 
> Uh, I forgot about that page.  I'll have a look at it and update it.  At
> least, I need to remove that "Obsolete functions".
> 
>>
>>
>> It could clarify
>> "Copy at most n bytes from string src to ARRAY dest, returning a
>> pointer to the start of ARRAY dest."
> 
> I think I prefer DJ's suggestion:
> 
> "Fill a fixed‐width null‐padded buffer with bytes from a string."

Better to make it clear it's null-padded after?

"Fill a fixed‐width buffer with bytes from a string and pad with null bytes."

I'll leave it with you.

Kind regards
Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-09 14:47                                       ` Jonny Grant
@ 2023-11-09 15:02                                         ` Alejandro Colomar
  2023-11-09 17:30                                           ` DJ Delorie
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09 15:02 UTC (permalink / raw)
  To: Jonny Grant
  Cc: Paul Eggert, linux-man, libc-alpha, DJ Delorie, Matthew House,
	Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto,
	Zack Weinberg, G. Branden Robinson, Carlos O'Donell

[-- Attachment #1: Type: text/plain, Size: 756 bytes --]

On Thu, Nov 09, 2023 at 02:47:05PM +0000, Jonny Grant wrote:
> >> It could clarify
> >> "Copy at most n bytes from string src to ARRAY dest, returning a
> >> pointer to the start of ARRAY dest."
> > 
> > I think I prefer DJ's suggestion:
> > 
> > "Fill a fixed‐width null‐padded buffer with bytes from a string."
> 
> Better to make it clear it's null-padded after?
> 
> "Fill a fixed‐width buffer with bytes from a string and pad with null bytes."

Yes, that looks even better.  And I wasn't very happy with "bytes".
Maybe:

"Fill a fixed-width buffer with characters from a string and pad with
null bytes."

Thanks,
Alex

> 
> I'll leave it with you.
> 
> Kind regards
> Jonny

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 14:05                                   ` Jonny Grant
@ 2023-11-09 15:04                                     ` Alejandro Colomar
  0 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09 15:04 UTC (permalink / raw)
  To: Jonny Grant
  Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg,
	GNU libc development, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 1545 bytes --]

On Thu, Nov 09, 2023 at 02:05:38PM +0000, Jonny Grant wrote:
> 
> 
> On 09/11/2023 11:13, Alejandro Colomar wrote:
> > Hi Jonny,
> > 
> > On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote:
> >> On 09/11/2023 00:29, Alejandro Colomar wrote:
> >> How about following the style of the other man pages that put the notes about each function below them? (rather than above)
> >> https://man7.org/linux/man-pages/man3/string.3.html
> >>
> >> size_t strlen(const char *s);
> >> Return the length of the string s.
> >>
> >>
> >> At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different:
> >>
> >> // Copy/catenate a string.
> >> char *strcpy(char *restrict dst, const char *restrict src);
> >> char *strcat(char *restrict dst, const char *restrict src);
> > 
> > The reason for this presentation is that I want to first look at what
> > they do, and only then look at the function you need to do that.
> 
> That appears different to the man page convention. It looks odd especially with the extra // that I don't recall other pages having in the description, usually that would be for examples. Consistency is best, but I'll leave it with you.

The difference is that you're comparing to man3 pages, which document
specific functions.  string_copying(7) instead documents how to copy
functions, and specific functions are only means to that end.  I'll keep
it this way.

Thanks,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v2 1/2] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-08 22:17                               ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
  2023-11-08 23:06                                 ` Paul Eggert
  2023-11-09  7:23                                 ` Oskari Pirhonen
@ 2023-11-09 15:20                                 ` Alejandro Colomar
  2023-11-09 15:20                                 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
  3 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09 15:20 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, DJ Delorie, Oskari Pirhonen,
	Jonny Grant, Matthew House, Thorsten Kukuk,
	Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
	Carlos O'Donell, Paul Eggert, Xi Ruoyao

These copy *from* a string.  But the destination is a simple character
sequence within an array; not a string.

Suggested-by: DJ Delorie <dj@redhat.com>
Acked-by: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Jonny Grant <jg@jguk.org>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---

Patch 1/2 is just a resend, with more CCs.
Patch 2/2 is a new one further clarifying the wording, after Jonny's
suggestions.

 man3/stpncpy.3        | 17 +++++++++++++----
 man7/string_copying.7 | 20 ++++++++++----------
 2 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index b6bbfd0a3..f86ff8c29 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -6,9 +6,8 @@
 .TH stpncpy 3 (date) "Linux man-pages (unreleased)"
 .SH NAME
 stpncpy, strncpy
-\- zero a fixed-width buffer and
-copy a string into a character sequence with truncation
-and zero the rest of it
+\-
+fill a fixed-width null-padded buffer with bytes from a string
 .SH LIBRARY
 Standard C library
 .RI ( libc ", " \-lc )
@@ -37,7 +36,7 @@ .SH SYNOPSIS
         _GNU_SOURCE
 .fi
 .SH DESCRIPTION
-These functions copy the string pointed to by
+These functions copy bytes from the string pointed to by
 .I src
 into a null-padded character sequence at the fixed-width buffer pointed to by
 .IR dst .
@@ -110,6 +109,16 @@ .SH CAVEATS
 These functions produce a null-padded character sequence,
 not a string (see
 .BR string_copying (7)).
+For example:
+.P
+.in +4n
+.EX
+strncpy(buf, "1", 5);       // { \[aq]1\[aq],   0,   0,   0,   0 }
+strncpy(buf, "1234", 5);    // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq],   0 }
+strncpy(buf, "12345", 5);   // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
+strncpy(buf, "123456", 5);  // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] }
+.EE
+.in
 .P
 It's impossible to distinguish truncation by the result of the call,
 from a character sequence that just fits the destination buffer;
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index cadf1c539..0e179ba34 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -41,15 +41,11 @@ .SS Strings
 .\" ----- SYNOPSIS :: Null-padded character sequences --------/
 .SS Null-padded character sequences
 .nf
-// Zero a fixed-width buffer, and
-// copy a string into a character sequence with truncation.
-.BI "char *stpncpy(char " dst "[restrict ." sz "], \
+// Fill a fixed-width null-padded buffer with bytes from a string.
+.BI "char *strncpy(char " dst "[restrict ." sz "], \
 const char *restrict " src ,
 .BI "               size_t " sz );
-.P
-// Zero a fixed-width buffer, and
-// copy a string into a character sequence with truncation.
-.BI "char *strncpy(char " dst "[restrict ." sz "], \
+.BI "char *stpncpy(char " dst "[restrict ." sz "], \
 const char *restrict " src ,
 .BI "               size_t " sz );
 .P
@@ -240,14 +236,18 @@ .SS Truncate or not?
 .\" ----- DESCRIPTION :: Null-padded character sequences --------------/
 .SS Null-padded character sequences
 For historic reasons,
-some standard APIs,
+some standard APIs and file formats,
 such as
-.BR utmpx (5),
+.BR utmpx (5)
+and
+.BR tar (1),
 use null-padded character sequences in fixed-width buffers.
 To interface with them,
 specialized functions need to be used.
 .P
-To copy strings into them, use
+To copy bytes from strings into these buffers, use
+.BR strncpy (3)
+or
 .BR stpncpy (3).
 .P
 To copy from an unterminated string within a fixed-width buffer into a string,
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes
  2023-11-08 22:17                               ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
                                                   ` (2 preceding siblings ...)
  2023-11-09 15:20                                 ` [PATCH v2 1/2] " Alejandro Colomar
@ 2023-11-09 15:20                                 ` Alejandro Colomar
  2023-11-10  5:47                                   ` Oskari Pirhonen
  3 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09 15:20 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, Jonny Grant, DJ Delorie,
	Matthew House, Oskari Pirhonen, Thorsten Kukuk,
	Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
	Carlos O'Donell, Paul Eggert, Xi Ruoyao

The previous wording could be interpreted as if the nulls were already
in place.  Clarify that it's this function which pads with null bytes.

Also, it copies "characters" from the src string.  That's a bit more
specific than copying "bytes", and makes it clearer that the terminating
null byte in src is not part of the copy.

Suggested-by: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Jonny Grant <jg@jguk.org>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/stpncpy.3        | 10 ++++++----
 man3/string.3         | 11 ++---------
 man7/string_copying.7 |  3 ++-
 3 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index f86ff8c29..3cf4eb371 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -7,7 +7,8 @@
 .SH NAME
 stpncpy, strncpy
 \-
-fill a fixed-width null-padded buffer with bytes from a string
+fill a fixed-width buffer with characters from a string
+and pad with null bytes
 .SH LIBRARY
 Standard C library
 .RI ( libc ", " \-lc )
@@ -36,10 +37,11 @@ .SH SYNOPSIS
         _GNU_SOURCE
 .fi
 .SH DESCRIPTION
-These functions copy bytes from the string pointed to by
+These functions copy characters from the string pointed to by
 .I src
-into a null-padded character sequence at the fixed-width buffer pointed to by
-.IR dst .
+into a character sequence at the fixed-width buffer pointed to by
+.IR dst ,
+and pad with null bytes.
 If the destination buffer,
 limited by its size,
 isn't large enough to hold the copy,
diff --git a/man3/string.3 b/man3/string.3
index aba5efd2b..bd8b342a6 100644
--- a/man3/string.3
+++ b/man3/string.3
@@ -179,21 +179,14 @@ .SH SYNOPSIS
 .I n
 bytes to
 .IR dest .
-.SS Obsolete functions
 .TP
 .nf
 .BI "char *strncpy(char " dest "[restrict ." n "], \
 const char " src "[restrict ." n ],
 .BI "       size_t " n );
 .fi
-Copy at most
-.I n
-bytes from string
-.I src
-to
-.IR dest ,
-returning a pointer to the start of
-.IR dest .
+Fill a fixed‐width buffer with characters from a string
+and pad with null bytes.
 .SH DESCRIPTION
 The string functions perform operations on null-terminated
 strings.
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index 0e179ba34..865271c6f 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -41,7 +41,8 @@ .SS Strings
 .\" ----- SYNOPSIS :: Null-padded character sequences --------/
 .SS Null-padded character sequences
 .nf
-// Fill a fixed-width null-padded buffer with bytes from a string.
+// Fill a fixed-width buffer with characters from a string
+// and pad with null bytes.
 .BI "char *strncpy(char " dst "[restrict ." sz "], \
 const char *restrict " src ,
 .BI "               size_t " sz );
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-09 15:02                                         ` Alejandro Colomar
@ 2023-11-09 17:30                                           ` DJ Delorie
  2023-11-09 17:54                                             ` Andreas Schwab
                                                               ` (2 more replies)
  0 siblings, 3 replies; 138+ messages in thread
From: DJ Delorie @ 2023-11-09 17:30 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: jg, eggert, linux-man, libc-alpha, mattlloydhouse, xxc3ncoredxx,
	kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos

Alejandro Colomar <alx@kernel.org> writes:
> "Fill a fixed-width buffer with characters from a string and pad with
> null bytes."

The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
nul/NUL is a character, null/NULL is a pointer.


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-09 17:30                                           ` DJ Delorie
@ 2023-11-09 17:54                                             ` Andreas Schwab
  2023-11-09 18:00                                             ` Alejandro Colomar
  2023-11-09 19:42                                             ` Jonny Grant
  2 siblings, 0 replies; 138+ messages in thread
From: Andreas Schwab @ 2023-11-09 17:54 UTC (permalink / raw)
  To: DJ Delorie
  Cc: Alejandro Colomar, jg, eggert, linux-man, libc-alpha,
	mattlloydhouse, xxc3ncoredxx, kukuk, adhemerval.zanella, zack,
	g.branden.robinson, carlos

On Nov 09 2023, DJ Delorie wrote:

> The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
> nul/NUL is a character, null/NULL is a pointer.

NUL is the ASCII abbreviation for Null (see RFC 20).

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-09 17:30                                           ` DJ Delorie
  2023-11-09 17:54                                             ` Andreas Schwab
@ 2023-11-09 18:00                                             ` Alejandro Colomar
  2023-11-09 19:42                                             ` Jonny Grant
  2 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09 18:00 UTC (permalink / raw)
  To: DJ Delorie
  Cc: jg, eggert, linux-man, libc-alpha, mattlloydhouse, xxc3ncoredxx,
	kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos

[-- Attachment #1: Type: text/plain, Size: 2519 bytes --]

Hi DJ,

On Thu, Nov 09, 2023 at 12:30:17PM -0500, DJ Delorie wrote:
> Alejandro Colomar <alx@kernel.org> writes:
> > "Fill a fixed-width buffer with characters from a string and pad with
> > null bytes."
> 
> The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
> nul/NUL is a character, null/NULL is a pointer.

Here's what man-pages(7) (written by Michael Kerrisk) says:

   NULL, NUL, null pointer, and null byte
     A null pointer is a pointer that points to nothing, and  is  nor‐
     mally  indicated by the constant NULL.  On the other hand, NUL is
     the null byte, a byte with the value 0, represented in C via  the
     character constant '\0'.

     The  preferred  term  for the pointer is "null pointer" or simply
     "NULL"; avoid writing "NULL pointer".

     The preferred term for the byte is "null  byte".   Avoid  writing
     "NUL",  since  it is too easily confused with "NULL".  Avoid also
     the terms "zero byte" and "null character".  The byte that termi‐
     nates a C string should be described  as  "the  terminating  null
     byte";  strings  may be described as "null‐terminated", but avoid
     the use of "NUL‐terminated".


I don't necessarily agree with all of that, but mostly.  I don't agree
with not saying null character, because as well as we have the null wide
character (L'\0'), using null character for '\0' makes it symmetric.

Other than that, I mostly agree with Michael.  Here's what I think of
these terms:

-  NULL is a null pointer constant (as well as 0 is another null pointer
   constant).

-  A null pointer is a more generic term that includes a run-time null
   pointer as well. 

-  The null byte is 0.

-  The null character, '\0', is composed of a null byte.

-  The null wide character, L'\0' is composed of several null bytes.

-  NUL is the ASCII name of the null byte, or maybe is it null character
   here?  It's a bit muddy.

I use null byte for padding, and null character for the string
terminator, to make a stronger difference between strings and
null-padded fixed-width arrays.  I need to review string_copying(7) to
make sure I was consistent in this regard.

Colloquially, I find it fine to write NULL instead of null pointer (even
for non-constant cases), and NUL instead of any of "null character",
"null byte", or "null wide character", but for being precise, I prefer
"null something".

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 11:38                   ` Alejandro Colomar
  2023-11-09 12:43                     ` Alejandro Colomar
  2023-11-09 12:51                     ` Xi Ruoyao
@ 2023-11-09 18:11                     ` Paul Eggert
  2023-11-09 23:48                       ` Alejandro Colomar
  2023-11-10 11:23                     ` Jonny Grant
  3 siblings, 1 reply; 138+ messages in thread
From: Paul Eggert @ 2023-11-09 18:11 UTC (permalink / raw)
  To: Alejandro Colomar, Jonny Grant; +Cc: Matthew House, linux-man, GNU C Library

On 2023-11-09 03:38, Alejandro Colomar wrote:
> If you are consistent in checking the return value of strlcpy(3) and
> reporting an error, it's the best standard alternative nowadays.

Not necessarily. strlcpy is subject to denial-of-service attacks if the 
attacker has control of the source string and can attack by using long 
source strings. strncpy, as bad as it is, does not have this problem.

Instead of this:

    if (strlcpy (dst, src, dstsize) == dstsize)
      return failure;

applications that want want to copy a string into a small nonempty 
fixed-size buffer, failing if the string doesn't fit, should do 
something like this:

    if (strncpy (dst, src, dstsize)[dstsize - 1])
      return failure;

This avoids the denial-of-service attack and is portable all the way 
back to K&R C.

It's unfortunate that strlcpy was misdesigned but here we are.


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string
  2023-11-09 17:30                                           ` DJ Delorie
  2023-11-09 17:54                                             ` Andreas Schwab
  2023-11-09 18:00                                             ` Alejandro Colomar
@ 2023-11-09 19:42                                             ` Jonny Grant
  2 siblings, 0 replies; 138+ messages in thread
From: Jonny Grant @ 2023-11-09 19:42 UTC (permalink / raw)
  To: DJ Delorie, Alejandro Colomar
  Cc: eggert, linux-man, libc-alpha, mattlloydhouse, xxc3ncoredxx,
	kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos



On 09/11/2023 17:30, DJ Delorie wrote:
> Alejandro Colomar <alx@kernel.org> writes:
>> "Fill a fixed-width buffer with characters from a string and pad with
>> null bytes."
> 
> The pedant in me says it should be NUL bytes (or NUL's), not null bytes.
> nul/NUL is a character, null/NULL is a pointer.
> 

NUL would be a big improvement.

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 18:11                     ` Paul Eggert
@ 2023-11-09 23:48                       ` Alejandro Colomar
  2023-11-10  5:36                         ` Paul Eggert
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-09 23:48 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 1126 bytes --]

On Thu, Nov 09, 2023 at 10:11:10AM -0800, Paul Eggert wrote:
> On 2023-11-09 03:38, Alejandro Colomar wrote:
> > If you are consistent in checking the return value of strlcpy(3) and
> > reporting an error, it's the best standard alternative nowadays.
> 
> Not necessarily. strlcpy is subject to denial-of-service attacks if the
> attacker has control of the source string and can attack by using long
> source strings. strncpy, as bad as it is, does not have this problem.

Interesting thing.  I'd then just use strlen(3)+strcpy(3), avoiding
strncpy(3).

> 
> Instead of this:
> 
>    if (strlcpy (dst, src, dstsize) == dstsize)
>      return failure;
> 
> applications that want want to copy a string into a small nonempty
> fixed-size buffer, failing if the string doesn't fit, should do something
> like this:
> 
>    if (strncpy (dst, src, dstsize)[dstsize - 1])
>      return failure;
> 
> This avoids the denial-of-service attack and is portable all the way back to
> K&R C.
> 
> It's unfortunate that strlcpy was misdesigned but here we are.
> 

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 23:48                       ` Alejandro Colomar
@ 2023-11-10  5:36                         ` Paul Eggert
  2023-11-10 11:05                           ` Alejandro Colomar
  2023-11-10 11:36                           ` Jonny Grant
  0 siblings, 2 replies; 138+ messages in thread
From: Paul Eggert @ 2023-11-10  5:36 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

On 2023-11-09 15:48, Alejandro Colomar wrote:
> I'd then just use strlen(3)+strcpy(3), avoiding
> strncpy(3).

But that is vulnerable to the same denial-of-service attack that strlcpy 
is vulnerable to. You'd need strnlen+strcpy instead.

The strncpy approach I suggested is simpler, and (though this doesn't 
matter much in practice) is typically significantly faster than 
strnlen+strcpy in the typical case where the destination is a small 
fixed-size buffer.

Although strncpy is not a good design, it's often simpler or faster or 
safer than later "improvements".


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes
  2023-11-09 15:20                                 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
@ 2023-11-10  5:47                                   ` Oskari Pirhonen
  2023-11-10 10:47                                     ` Alejandro Colomar
  0 siblings, 1 reply; 138+ messages in thread
From: Oskari Pirhonen @ 2023-11-10  5:47 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: linux-man, libc-alpha, Jonny Grant, DJ Delorie, Matthew House,
	Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
	G. Branden Robinson, Carlos O'Donell, Paul Eggert, Xi Ruoyao

[-- Attachment #1: Type: text/plain, Size: 1941 bytes --]

On Thu, Nov 09, 2023 at 16:20:39 +0100, Alejandro Colomar wrote:
> The previous wording could be interpreted as if the nulls were already
> in place.  Clarify that it's this function which pads with null bytes.
> 
> Also, it copies "characters" from the src string.  That's a bit more
> specific than copying "bytes", and makes it clearer that the terminating
> null byte in src is not part of the copy.
> 
> Suggested-by: Jonny Grant <jg@jguk.org>
> Cc: DJ Delorie <dj@redhat.com>
> Cc: Jonny Grant <jg@jguk.org>
> Cc: Matthew House <mattlloydhouse@gmail.com>
> Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
> Cc: Thorsten Kukuk <kukuk@suse.com>
> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> Cc: Zack Weinberg <zack@owlfolio.org>
> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> Cc: Carlos O'Donell <carlos@redhat.com>
> Cc: Paul Eggert <eggert@cs.ucla.edu>
> Cc: Xi Ruoyao <xry111@xry111.site>
> Signed-off-by: Alejandro Colomar <alx@kernel.org>
> ---
>  man3/stpncpy.3        | 10 ++++++----
>  man3/string.3         | 11 ++---------
>  man7/string_copying.7 |  3 ++-
>  3 files changed, 10 insertions(+), 14 deletions(-)
> 

... snip ...

> diff --git a/man3/string.3 b/man3/string.3
> index aba5efd2b..bd8b342a6 100644
> --- a/man3/string.3
> +++ b/man3/string.3
> @@ -179,21 +179,14 @@ .SH SYNOPSIS
>  .I n
>  bytes to
>  .IR dest .
> -.SS Obsolete functions

If you're removing this section ...

>  .TP
>  .nf
>  .BI "char *strncpy(char " dest "[restrict ." n "], \
>  const char " src "[restrict ." n ],
>  .BI "       size_t " n );
>  .fi
> -Copy at most
> -.I n
> -bytes from string
> -.I src
> -to
> -.IR dest ,
> -returning a pointer to the start of
> -.IR dest .
> +Fill a fixed‐width buffer with characters from a string
> +and pad with null bytes.

... shouldn't you also move the rest of this up to keep it alphabetized?

- Oskari

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 12:23                 ` Alejandro Colomar
  2023-11-09 12:35                   ` Alejandro Colomar
@ 2023-11-10  7:06                   ` Oskari Pirhonen
  2023-11-10 11:18                     ` Alejandro Colomar
  2023-11-10 16:06                   ` Matthew House
  2 siblings, 1 reply; 138+ messages in thread
From: Oskari Pirhonen @ 2023-11-10  7:06 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Matthew House, Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 5095 bytes --]

On Thu, Nov 09, 2023 at 13:23:14 +0100, Alejandro Colomar wrote:

... snip ...

> > > > For the sake of reference, I looked into a few big C and C++ projects to
> > > > see how often a strncpy(3)-based snippet was used to produce a truncated
> > > > copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite
> > > > of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in
> > > > GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22,
> > > > 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note
> > > > that I haven't filtered out vendored dependencies, so there's a little bit
> > > > of double-counting.) It seems like most codebases that don't ban strncpy(3)
> > > > use a derived snippet somewhere or another. Also, I found 3 instances in
> > > > glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by
> > > > checking the last character.
> > >
> > > I know.  I've been rewriting the code handling strings in shadow-utils
> > > for the last year, and ther was a lot of it.  I fixed several small bugs
> > > in the process, so I recommend avoiding it.
> > 
> > I can't tell you about your own experience, but in mine, the root cause of
> > most string-handling bugs has been excessive cleverness in using the
> > standard string functions, rather than the behavior of the functions
> > themselves. So one worry of mine is that if strncpy(3) ends up being
> > deprecated or whatever, then authors of portable libraries will start
> > writing lots of custom memcpy(3)-based replacements to their strncpy(3)-
> > based snippets, and more lines of code will introduce more opportunities
> > for cleverness.
> 
> Don't worry.  strncpy(3) won't be deprecated, thanks to tar(1).  ;)
> 

Just please don't tar and feather [1] the people who use it ;)

... snip ...

> > > > the code to understand the concept behind how these two snippets work, that
> > > > the only difference between the strncpy(3)'s special "character sequence"
> > > > and an ordinary C string is an additional null terminator at the end of the
> > > > destination buffer.
> > >
> > > This is part of string_copying(7):
> > >
> > > DESCRIPTION
> > >    Terms (and abbreviations)
> > >      string (str)
> > >             is  a  sequence  of zero or more non‐null characters followed by a
> > >             null byte.
> > >
> > >      character sequence
> > >             is a sequence of zero or  more  non‐null  characters.   A  program
> > >             should  never use a character sequence where a string is required.
> > >             However, with appropriate care, a string can be used in the  place
> > >             of a character sequence.
> > >
> > > I think that is very explicit in the difference.  strncpy(3) refers to
> > > that page for understanding the differences, so I think it is
> > > documented.
> > >
> > > strncpy(3):
> > > CAVEATS
> > >      The  name  of  these  functions  is confusing.  These functions produce a
> > >      null‐padded character sequence, not a string (see string_copying(7)).
> > 
> > My point is isn't that the difference is undocumented, but that the typical
> > man page reader isn't reading the man pages for their own sake, but because
> > they're looking at some code, and they want to Know What It's Doing as soon
> > as possible.
> 
> We could maybe add a list of ways people have tried to be clever with
> strncpy(3) in the past and failed, and then explain why those uses are
> broken.  This could be in a BUGS section.
> 

This would be a very fun read.

... snip ...

> > > Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no,
> > > it's not correct code.  It's rather dangerous code that just happens to
> > > not be vulnerable most of the time.
> > 
> > So will all the custom strlen(3)+memcpy(3)-based replacements suddenly be
> > immune to off-by-one bugs?
> 
> Slightly.  Here's the typical use of strlen(3)+strcpy(3):
> 
> if (strlen(src) >= dsize)
> 	goto error;
> strcpy(dst, src);
> 
> There's no +1 or -1 in that code, so it's hard to make an off-by-one
> mistake.  Okay, you may have seen that it has a '>=', which one could
> accidentally replace by a '>', causing an off-by-one.  I'd wrap that
> thing in a strxcpy() wrapper so you avoid repetition. 
> 

Might I go so far as to recommend strnlen(3) instead of strlen(3)? That
way, instead of blindly looking for a null terminator, you stop after a
predetermined max length. Especially nice for untrusted input where you
can't make assumptions on the "fitness for a purpose" of what's being
fed in.

    if (src == NULL || strnlen(src, dsize) == dsize)
        goto error;
    strcpy(dst, src);

This, of course, assumes you have POSIX at your disposal.

I'm writing this before going to bed. I did briefly sanity check it with
a simple test prog, but it would be quite ironic if I missed something
wouldn't it...

- Oskari

[1]: https://en.wikipedia.org/wiki/Tarring_and_feathering

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-08 19:33             ` Alejandro Colomar
  2023-11-08 19:40               ` Alejandro Colomar
  2023-11-09  3:13               ` Matthew House
@ 2023-11-10 10:40               ` Stefan Puiu
  2023-11-10 11:06                 ` Jonny Grant
  2023-11-10 11:20                 ` Alejandro Colomar
  2 siblings, 2 replies; 138+ messages in thread
From: Stefan Puiu @ 2023-11-10 10:40 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Matthew House, Jonny Grant, linux-man

Hi Alex,

On Wed, Nov 8, 2023 at 9:33 PM Alejandro Colomar <alx@kernel.org> wrote:
[.....]
> strncpy(3):
> CAVEATS
>      The  name  of  these  functions  is confusing.  These functions produce a
>      null‐padded character sequence, not a string (see string_copying(7)).

I'm a bit confused by this distinction. Isn't a null-padded sequence
technically also null-terminated? If there's a '0' at the end, then
it's a string, in my understanding. Or was the intention to say "a
character sequence that may be null-padded", where the case in which
there's no padding at all being the reason for the distinction?

Thanks,
Stefan.


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes
  2023-11-10  5:47                                   ` Oskari Pirhonen
@ 2023-11-10 10:47                                     ` Alejandro Colomar
  0 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-10 10:47 UTC (permalink / raw)
  To: linux-man, libc-alpha, Jonny Grant, DJ Delorie, Matthew House,
	Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
	G. Branden Robinson, Carlos O'Donell, Paul Eggert, Xi Ruoyao

[-- Attachment #1: Type: text/plain, Size: 2310 bytes --]

On Thu, Nov 09, 2023 at 11:47:34PM -0600, Oskari Pirhonen wrote:
> On Thu, Nov 09, 2023 at 16:20:39 +0100, Alejandro Colomar wrote:
> > The previous wording could be interpreted as if the nulls were already
> > in place.  Clarify that it's this function which pads with null bytes.
> > 
> > Also, it copies "characters" from the src string.  That's a bit more
> > specific than copying "bytes", and makes it clearer that the terminating
> > null byte in src is not part of the copy.
> > 
> > Suggested-by: Jonny Grant <jg@jguk.org>
> > Cc: DJ Delorie <dj@redhat.com>
> > Cc: Jonny Grant <jg@jguk.org>
> > Cc: Matthew House <mattlloydhouse@gmail.com>
> > Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
> > Cc: Thorsten Kukuk <kukuk@suse.com>
> > Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
> > Cc: Zack Weinberg <zack@owlfolio.org>
> > Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
> > Cc: Carlos O'Donell <carlos@redhat.com>
> > Cc: Paul Eggert <eggert@cs.ucla.edu>
> > Cc: Xi Ruoyao <xry111@xry111.site>
> > Signed-off-by: Alejandro Colomar <alx@kernel.org>
> > ---
> >  man3/stpncpy.3        | 10 ++++++----
> >  man3/string.3         | 11 ++---------
> >  man7/string_copying.7 |  3 ++-
> >  3 files changed, 10 insertions(+), 14 deletions(-)
> > 
> 
> ... snip ...
> 
> > diff --git a/man3/string.3 b/man3/string.3
> > index aba5efd2b..bd8b342a6 100644
> > --- a/man3/string.3
> > +++ b/man3/string.3
> > @@ -179,21 +179,14 @@ .SH SYNOPSIS
> >  .I n
> >  bytes to
> >  .IR dest .
> > -.SS Obsolete functions
> 
> If you're removing this section ...
> 
> >  .TP
> >  .nf
> >  .BI "char *strncpy(char " dest "[restrict ." n "], \
> >  const char " src "[restrict ." n ],
> >  .BI "       size_t " n );
> >  .fi
> > -Copy at most
> > -.I n
> > -bytes from string
> > -.I src
> > -to
> > -.IR dest ,
> > -returning a pointer to the start of
> > -.IR dest .
> > +Fill a fixed‐width buffer with characters from a string
> > +and pad with null bytes.
> 
> ... shouldn't you also move the rest of this up to keep it alphabetized?

Hi Oskari,

Sure!  I was trying to find a pattern in the order, but didn't see it
yesterday.  Thanks!  :)

Cheers,
Alex

> 
> - Oskari



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10  5:36                         ` Paul Eggert
@ 2023-11-10 11:05                           ` Alejandro Colomar
  2023-11-10 11:47                             ` Alejandro Colomar
  2023-11-10 17:58                             ` Paul Eggert
  2023-11-10 11:36                           ` Jonny Grant
  1 sibling, 2 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-10 11:05 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]

Hi Paul,

On Thu, Nov 09, 2023 at 09:36:43PM -0800, Paul Eggert wrote:
> On 2023-11-09 15:48, Alejandro Colomar wrote:
> > I'd then just use strlen(3)+strcpy(3), avoiding
> > strncpy(3).

Heh, brain fart on my side.

> 
> But that is vulnerable to the same denial-of-service attack that strlcpy is
> vulnerable to. You'd need strnlen+strcpy instead.
> 
> The strncpy approach I suggested is simpler, and (though this doesn't matter

Yeah, although you can always wrap strnlen(3)+memcpy(3) in a strxcpy()
inline function and have it even simpler.

Rewriting the strxcpy() wrapper I wrote the other day to not be
vulnerable to DoS, and hoping I get it right today.

[[nodiscard]]
inline ssize_t
strxcpy(char *restrict dst, const char *restrict src, size_t dsize)
{
	size_t  slen;

	slen = strnlen(src, dsize);
	if (slen >= dsize)
		return -1;

	memcpy(dst, src, slen + 1);

	return slen;
}

Hopefully, it won't be so bad in terms of performance.  And it is still
protected by fortification of memcpy(3).  And thanks to [[nodiscard]],
it should be hard to misuse.

> much in practice) is typically significantly faster than strnlen+strcpy in
> the typical case where the destination is a small fixed-size buffer.
> 
> Although strncpy is not a good design, it's often simpler or faster or safer
> than later "improvements".

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 10:40               ` strncpy clarify result may not be null terminated Stefan Puiu
@ 2023-11-10 11:06                 ` Jonny Grant
  2023-11-10 11:20                 ` Alejandro Colomar
  1 sibling, 0 replies; 138+ messages in thread
From: Jonny Grant @ 2023-11-10 11:06 UTC (permalink / raw)
  To: Stefan Puiu, Alejandro Colomar; +Cc: Matthew House, linux-man, GNU C Library



On 10/11/2023 10:40, Stefan Puiu wrote:
> Hi Alex,
> 
> On Wed, Nov 8, 2023 at 9:33 PM Alejandro Colomar <alx@kernel.org> wrote:
> [.....]
>> strncpy(3):
>> CAVEATS
>>      The  name  of  these  functions  is confusing.  These functions produce a
>>      null‐padded character sequence, not a string (see string_copying(7)).
> 
> I'm a bit confused by this distinction. Isn't a null-padded sequence
> technically also null-terminated? If there's a '0' at the end, then
> it's a string, in my understanding. Or was the intention to say "a
> character sequence that may be null-padded", where the case in which
> there's no padding at all being the reason for the distinction?

This is a null padded sequence of characters in an array:

char buf[4] = {'a', '\0', '\0', '\0'};

I'm sure we are all well aware from this long email thread, strncpy is designed to fill fixed sized arrays, and pad with NUL bytes '\0' if any space left. Otherwise, the array buffer is left not padded.. there in lies the trouble, a possibly not terminated sequence of characters. Someone thought saving the extra byte was a good idea. It would have been better if that programmer had crafted their own local function rather than put out the strncpy function which is similarly named to strcpy(), they could have called it copy_to_array_nul_pad().

// a not terminated array - using printf, or strlen will carry on reading off down the memory until it finds a NUL byte '\0', perhaps reading out side the addressable space of the process, causing a SEGV.
char buf[4] = {'a', 'b', 'c', 'd'};

Hope that helps.

Kind regards, Jonny


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10  7:06                   ` Oskari Pirhonen
@ 2023-11-10 11:18                     ` Alejandro Colomar
  2023-11-11  7:55                       ` Oskari Pirhonen
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-10 11:18 UTC (permalink / raw)
  To: Matthew House, Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 2970 bytes --]

Hi Oskari,

On Fri, Nov 10, 2023 at 01:06:44AM -0600, Oskari Pirhonen wrote:
> On Thu, Nov 09, 2023 at 13:23:14 +0100, Alejandro Colomar wrote:
> > Don't worry.  strncpy(3) won't be deprecated, thanks to tar(1).  ;)
> > 
> 
> Just please don't tar and feather [1] the people who use it ;)

Hmmm, it just caught me after a year fixing broken strncpy(3) calls.  I
was a bit unfair.  I'm sorry if I wasn't so nice.  Hopefully, we've all
learnt something about string-copying functions.  :)

> > We could maybe add a list of ways people have tried to be clever with
> > strncpy(3) in the past and failed, and then explain why those uses are
> > broken.  This could be in a BUGS section.
> > 
> 
> This would be a very fun read.

I'll write it then!  :D

> 
> ... snip ...
> 
> > > > Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no,
> > > > it's not correct code.  It's rather dangerous code that just happens to
> > > > not be vulnerable most of the time.
> > > 
> > > So will all the custom strlen(3)+memcpy(3)-based replacements suddenly be
> > > immune to off-by-one bugs?
> > 
> > Slightly.  Here's the typical use of strlen(3)+strcpy(3):
> > 
> > if (strlen(src) >= dsize)
> > 	goto error;
> > strcpy(dst, src);
> > 
> > There's no +1 or -1 in that code, so it's hard to make an off-by-one
> > mistake.  Okay, you may have seen that it has a '>=', which one could
> > accidentally replace by a '>', causing an off-by-one.  I'd wrap that
> > thing in a strxcpy() wrapper so you avoid repetition. 
> > 
> 
> Might I go so far as to recommend strnlen(3) instead of strlen(3)? That
> way, instead of blindly looking for a null terminator, you stop after a
> predetermined max length. Especially nice for untrusted input where you
> can't make assumptions on the "fitness for a purpose" of what's being
> fed in.
> 
>     if (src == NULL || strnlen(src, dsize) == dsize)
>         goto error;
>     strcpy(dst, src);

A NULL check shouldn't be necessary (no other copying functions have,
and that's not a big deal with them, although I have mixed feelings
about things like memcpy(dst, NULL, 0)).

About strnlen(3), you're right, and Paul also pointed that out.  See the
other mail I sent to the list with an inline implementation of strxcpy()
using strnlen(3).

> 
> This, of course, assumes you have POSIX at your disposal.

I always assume this.  If not, please ask your vendor to provide a POSIX
layer.  Or at least the parts of POSIX that can be implemented in a
free-standing implementation.  Or stop using that vendor.

> 
> I'm writing this before going to bed. I did briefly sanity check it with
> a simple test prog, but it would be quite ironic if I missed something
> wouldn't it...

Looks good at first glance.  :)

Cheers,
Alex

> 
> - Oskari
> 
> [1]: https://en.wikipedia.org/wiki/Tarring_and_feathering



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 10:40               ` strncpy clarify result may not be null terminated Stefan Puiu
  2023-11-10 11:06                 ` Jonny Grant
@ 2023-11-10 11:20                 ` Alejandro Colomar
  1 sibling, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-10 11:20 UTC (permalink / raw)
  To: Stefan Puiu; +Cc: Matthew House, Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 862 bytes --]

Hi Stefan,

On Fri, Nov 10, 2023 at 12:40:48PM +0200, Stefan Puiu wrote:
> Hi Alex,
> 
> On Wed, Nov 8, 2023 at 9:33 PM Alejandro Colomar <alx@kernel.org> wrote:
> [.....]
> > strncpy(3):
> > CAVEATS
> >      The  name  of  these  functions  is confusing.  These functions produce a
> >      null‐padded character sequence, not a string (see string_copying(7)).
> 
> I'm a bit confused by this distinction. Isn't a null-padded sequence
> technically also null-terminated? If there's a '0' at the end, then
> it's a string, in my understanding. Or was the intention to say "a
> character sequence that may be null-padded", where the case in which
> there's no padding at all being the reason for the distinction?

The latter.  I'll check the wording.

Thanks!
Alex

> 
> Thanks,
> Stefan.

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 11:38                   ` Alejandro Colomar
                                       ` (2 preceding siblings ...)
  2023-11-09 18:11                     ` Paul Eggert
@ 2023-11-10 11:23                     ` Jonny Grant
  3 siblings, 0 replies; 138+ messages in thread
From: Jonny Grant @ 2023-11-10 11:23 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Matthew House, linux-man, GNU C Library



On 09/11/2023 11:38, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Thu, Nov 09, 2023 at 10:31:49AM +0000, Jonny Grant wrote:
>>> Probably the only way to solve the cleverness issue for good is to have an
>>> immediately-available, foolproof, performant set of string functions that
>>> are extremely straightforward to understand and use, flexible enough for
>>> any use case, and generally agreed to be the first choice for string
>>> manipulation.
>>
>> What's the best standardized function for C string copying in your
> 
> strlcpy(3) will soon be standard.  POSIX.1-202x (Issue 8) will add it,
> which is why it's been added recently to glibc.  Hopefully, ISO C3x will
> follow (yeah, it's not like tomorrow).
> 
>> opinion?  They all seem to have drawbacks, strlcpy truncates (I'd
>> rather it rejected if it didn't have enough buffer - could cause
>> issues if the meaning of the string changed due to truncation, eg if
>> it was a file path). Other alternative functions aren't widely in use.
> 
> If you are consistent in checking the return value of strlcpy(3) and
> reporting an error, it's the best standard alternative nowadays.
> snprintf(3), except for using int instead of size_t, has an equivalent
> API, and is in C99, in case that means something.
> 
> If you would want to write something based on Michael Kerrisk's article,
> you could do this:
> 
> 	ssize_t
> 	strxcpy(char *restrict dst, char *restrict src, size_t dsize)
> 	{
> 		if (strlen(src) < dsize)
> 			return -1;
> 
> 		strcpy(dst, src);
> 	}
> 
> You may also want to calculate 'dsize' automagically, to avoid human
> error, in case it's an array, so you could write a macro on top of it:
> 
> 	#define STRXCPY(dst, src)  strxcpy(dst, src, ARRAY_SIZE(dst))
> 
> These are just small wrappers over standard functions, so you shouldn't
> have problems adding them to your project.
> 
> This is my long term plan for shadow-utils, indeed.  I'm first
> transforming strncpy(3) calls into strlcpy(3) to remove the superfluous
> padding, and later will use this strxcpy() to remove the truncated
> strings to avoid misinterpretation.
> 
> Cheers,
> Alex
> 
>>
>> Kind regards, Jonny
> 

Yes, I like to look for a libc library function before writing my own wrapper, but I would consider something like strxcpy.

snprintf will truncate if not enough space, but will then return the number of bytes that would have been written had there not been truncation. So one could use snprintf on an array buffer on the stack, and then if truncation, discard the buffer and return an error, otherwise carry on using the string (that wasn't truncated).

Re strlcpy I see BSD man page gives some examples how to check for truncation by strlcpy. Perhaps examples could be added to linux kernel man page.
https://man.freebsd.org/cgi/man.cgi?query=strlcat&sektion=3

Kind regards, Jonny


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10  5:36                         ` Paul Eggert
  2023-11-10 11:05                           ` Alejandro Colomar
@ 2023-11-10 11:36                           ` Jonny Grant
  2023-11-10 13:15                             ` Alejandro Colomar
  1 sibling, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-10 11:36 UTC (permalink / raw)
  To: Paul Eggert, Alejandro Colomar; +Cc: Matthew House, linux-man, GNU C Library



On 10/11/2023 05:36, Paul Eggert wrote:
> On 2023-11-09 15:48, Alejandro Colomar wrote:
>> I'd then just use strlen(3)+strcpy(3), avoiding
>> strncpy(3).
> 
> But that is vulnerable to the same denial-of-service attack that strlcpy is vulnerable to. You'd need strnlen+strcpy instead.
> 
> The strncpy approach I suggested is simpler, and (though this doesn't matter much in practice) is typically significantly faster than strnlen+strcpy in the typical case where the destination is a small fixed-size buffer.
> 
> Although strncpy is not a good design, it's often simpler or faster or safer than later "improvements".

As you say, it is a known API. I recall looking for a standardized bounded string copy a few years ago that avoids pitfalls:

1) cost of any initial strnlen() reading memory to determine input src size
2) accepts a src_max_size to actually try to copy from src
3) does not truncate by writing anything to the buffer if there isn't enough space in the dest_max_size to fit src_max_size
4) check for NULL pointers
5) probably other thing I've overlooked

Something like this API:
int my_str_copy(char *dest, const char *src, size_t dest_max_size, size_t src_max_size, size_t * dest_written);
These sizes are including any NUL terminating byte.

0 on success, or an an error code like EINVAL, or ERANGE if would truncate

All comments welcome.

Kind regards, Jonny


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 11:05                           ` Alejandro Colomar
@ 2023-11-10 11:47                             ` Alejandro Colomar
  2023-11-10 17:58                             ` Paul Eggert
  1 sibling, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-10 11:47 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 1666 bytes --]

On Fri, Nov 10, 2023 at 12:05:31PM +0100, Alejandro Colomar wrote:
> Hi Paul,
> 
> On Thu, Nov 09, 2023 at 09:36:43PM -0800, Paul Eggert wrote:
> > On 2023-11-09 15:48, Alejandro Colomar wrote:
> > > I'd then just use strlen(3)+strcpy(3), avoiding
> > > strncpy(3).
> 
> Heh, brain fart on my side.
> 
> > 
> > But that is vulnerable to the same denial-of-service attack that strlcpy is
> > vulnerable to. You'd need strnlen+strcpy instead.
> > 
> > The strncpy approach I suggested is simpler, and (though this doesn't matter
> 
> Yeah, although you can always wrap strnlen(3)+memcpy(3) in a strxcpy()
> inline function and have it even simpler.
> 
> Rewriting the strxcpy() wrapper I wrote the other day to not be
> vulnerable to DoS, and hoping I get it right today.
> 
> [[nodiscard]]
> inline ssize_t
> strxcpy(char *restrict dst, const char *restrict src, size_t dsize)
> {
> 	size_t  slen;
> 
> 	slen = strnlen(src, dsize);
> 	if (slen >= dsize)

Oops:  s/>=/==/

> 		return -1;
> 
> 	memcpy(dst, src, slen + 1);
> 
> 	return slen;
> }
> 
> Hopefully, it won't be so bad in terms of performance.  And it is still
> protected by fortification of memcpy(3).  And thanks to [[nodiscard]],
> it should be hard to misuse.
> 
> > much in practice) is typically significantly faster than strnlen+strcpy in
> > the typical case where the destination is a small fixed-size buffer.
> > 
> > Although strncpy is not a good design, it's often simpler or faster or safer
> > than later "improvements".
> 
> Cheers,
> Alex
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 11:36                           ` Jonny Grant
@ 2023-11-10 13:15                             ` Alejandro Colomar
  2023-11-18 23:40                               ` Jonny Grant
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-10 13:15 UTC (permalink / raw)
  To: Jonny Grant; +Cc: Paul Eggert, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 3255 bytes --]

Hi Jonny,

On Fri, Nov 10, 2023 at 11:36:20AM +0000, Jonny Grant wrote:
> 
> 
> On 10/11/2023 05:36, Paul Eggert wrote:
> > On 2023-11-09 15:48, Alejandro Colomar wrote:
> >> I'd then just use strlen(3)+strcpy(3), avoiding
> >> strncpy(3).
> > 
> > But that is vulnerable to the same denial-of-service attack that strlcpy is vulnerable to. You'd need strnlen+strcpy instead.
> > 
> > The strncpy approach I suggested is simpler, and (though this doesn't matter much in practice) is typically significantly faster than strnlen+strcpy in the typical case where the destination is a small fixed-size buffer.
> > 
> > Although strncpy is not a good design, it's often simpler or faster or safer than later "improvements".
> 
> As you say, it is a known API. I recall looking for a standardized bounded string copy a few years ago that avoids pitfalls:
> 
> 1) cost of any initial strnlen() reading memory to determine input src size
> 2) accepts a src_max_size to actually try to copy from src
> 3) does not truncate by writing anything to the buffer if there isn't enough space in the dest_max_size to fit src_max_size
> 4) check for NULL pointers
> 5) probably other thing I've overlooked
> 
> Something like this API:
> int my_str_copy(char *dest, const char *src, size_t dest_max_size, size_t src_max_size, size_t * dest_written);
> These sizes are including any NUL terminating byte.
> 
> 0 on success, or an an error code like EINVAL, or ERANGE if would truncate

-  Linux kernel's strscpy() returns -E2BIG if it would truncate.  You
   may want to follow suit if you want such an errno(3) code.

   However, I think it's simpler to return the "standard" user-space
   error return value: -1

   If you'd need to distinguish error reasons, you could distinguish
   error codes, but for a string-copying function I think it's not so
   useful.

-  Why specify the src buffer size?  If you're copying strings, then you
   know it'll be null-terminated, so strnlen(3) will not overrun.  If
   you're not copying strings, then you'll need a different function
   that reads from a non-string.  The only standard such function is
   strncat(3), which reads from a fixed-width null-padded buffer, and
   writes to a string.  You may want to write a function similar to
   strncat(3) that doesn't catenate, if you want to just copy; I call
   that function zustr2stp(), and you can find an implementation in
   string_copying(7).

-  You can reuse the return value for the dest_written value with
   ssize_t.  Just return -1 on error and the string length on success.
   That's how most libc functions behave.

-  Regarding NULL checks, it depends on how you program.  I wouldn't add
   them, but if you want to avoid crashes at all costs, it may be
   necessary for you.  You could do a wrapper over strxcpy():


	inline ssize_t
	strxcpy0(char *restrict dst, const char *restrict src, size_t dsize)
	{
		if (dst == NULL || src == NULL)
			return -1;

		return strxcpy(dst, src, dsize);
	}

   I used 0 in the name to mark that this function checks for null
   pointers.

Cheers,
Alex

> 
> All comments welcome.
> 
> Kind regards, Jonny

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-09 12:23                 ` Alejandro Colomar
  2023-11-09 12:35                   ` Alejandro Colomar
  2023-11-10  7:06                   ` Oskari Pirhonen
@ 2023-11-10 16:06                   ` Matthew House
  2023-11-10 17:48                     ` Alejandro Colomar
  2023-11-11 20:55                     ` Jonny Grant
  2 siblings, 2 replies; 138+ messages in thread
From: Matthew House @ 2023-11-10 16:06 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jonny Grant, linux-man

On Thu, Nov 9, 2023 at 7:23 AM Alejandro Colomar <alx@kernel.org> wrote:
> > So one can interpret strncpy(3) as copying a prefix of a character sequence
> > into a buffer (and zero-filling the remainder), in which case you're
> > correct that truncation cannot be detected. But the function is fomally
> > defined as copying a prefix of a string into a buffer (and zero-filling the
> > remainder), in which case the string has been truncated if the buffer
> > doesn't end in a null byte afterward. It's just that one may not care about
> > the terminating null byte being truncated if the user of the result just
> > wants the initial character sequence.
>
> Yes, with the ISO C definition of strncpy(3), you can detect truncation.
> The problem is that while my definition of it is complete, the
> definition by ISO C makes it an incomplete function (to complete its
> functionallity in copying strings, you need to add an explicit '\0'
> after the call).  So I prefer mine, and for self-consistency, it can't
> report truncation.

Personally, I'm a pragmatist, and I like to see it as kind of a duality: it
can be used as part of a routine that copies part of a string and reports
truncation, and it can also be used as a complete routine that copies part
of a character sequence but can't report truncation. That reflects how it's
used in practice. And it would hardly be the first such duality in C,
either, given things like the fundamental practice of manipulating
arbitrary objects as if they're character arrays.

(Some of these other dualities are similarly infamous in their room for
error, e.g., forgetting to multiply by the element size when calling
malloc(3), which I have often been guilty of myself. And still, a worrying
amount of code neglects to test for multiplication overflow when doing
this, even when the length comes from an untrusted source. Yet somehow I
haven't seen any calls for a mallocarray(3) function to replace it. Ditto
with memset(3), which can and has caused actual hard-to-notice bugs due to
the first few elements looking correct even if the provided length is too
short.)

But you're entitled to your opinion on how it ought to be best represented
in the man page, as long as the immediate shortcoming of the function w.r.t
producing strings is made very clear, even to readers who aren't in the
habit of contemplating formal definitions. I'm satisfied by your patch in
that regard.

> > That's a nice library that I didn't know about! Unfortunately, I don't
> > think it's a very viable option for the long tail of small libraries I've
> > referred to, which generally don't have any sub-dependencies of their own,
> > apart from those provided by the platform.
> >
> > Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their
> > configure scripts from whatever build system you're using (in such a way
> > that libbsd can locate libmd), ensuring they're safe for cross-compilation
> > if that's a goal, ensuring you bundle them in a way that respects their
> > license terms, and ensuring that any user of your library links to the two
> > dependencies and doesn't duplicate them. At that point, rolling your own
> > strlcpy(3) equivalent definitely sounds like less mental load, at least to
> > me.
>
> Yes, if you had 0 deps, it might be simpler to add your implementation.
> Although it's a tricky function to implement, so I'd be careful.  If you
> need to roll your own, I would go for a simpler function; maybe a
> wrapper over strlen(3)+strcpy(3).

Such a wrapper would indeed be useful for detecting truncation, but a full
strlcpy(3) equivalent would be necessary for permitting the truncation and
continuing, which is the behavior of the majority of existing strncpy(3)-
based code.

I don't deny that this truncation behavior is often done dubiously and
rarely receives enough scrutiny, but a significant chunk of the uses really
are just building an informative string which won't cause any harm if
truncated, and installing additional control flow to handle truncation
errors in places where there currently isn't any can introduce its own
bugs.

> > I didn't see this as an issue in practice when I was reviewing all those
> > existing usages of strncpy(3). The vast majority were used in the midst of
> > simple string manipulation, where the destination buffer starts as
> > uninitialized or zeroed out, and ultimately gets passed into a user
> > expecting an ordinary null-terminated string.
> >
> > (One exception was a few functions that used strncpy(dst, "", len) to zero
>
> Holy crap!  Didn't these programmers know bzero(3) or memset(3)?  :D
>
> > out the buffer, which is thankfully pretty obvious. Another exception was
> > the functions that actually used strncpy(3) to produce a null-padded
> > character sequence, e.g., when writing a value into a section of a binary.
> > But in general, I found that it's usually not difficult to tell when a
> > usage is being clever enough that the null padding might be significant.)
> >
> > In fact, the greater confusion came from the surprisingly common practice
> > of using strncpy(3) like it's memcpy(3), by giving it the known length of
>
> It gets better!  :D

In all these cases, I think the function naming really is having somewhat
of a psychological effect: the authors are wrangling with strthis(3) and
strthat(3) for dozens of lines, so they'd find it scary to start mixing it
up with mem*(3) functions ("I'm working with C strings, not with byte
arrays!"), or perhaps they don't even consider it. They'd rather remain
with strncpy(3), even when it means they have to manually append it with a
null terminator or another string. But I'm no psychoanalyst, so take that
with a big grain of salt.

(Meanwhile, in my own code, I try to work with pointer-and-length arrays
whenever possible instead of fooling around with null terminators and all
their off-by-one fun, so I've become leery of using any str*(3) functions
apart from strlen(3) and strnlen(3).)

> > (This is also why I was confused by your support for strcpy(3) on the
> > grounds that _FORTIFY_SOURCE exists. Sure, it's better than strncpy(3) in
> > that its behavior isn't nearly so subtle, but _FORTIFY_SOURCE can only
> > protect us from overruns, not from all the "small bugs" that might ensue
> > from people becoming more clever with sizing the destination buffer with
> > strcpy(3).
>
> I don't think strcpy(3) is as propense as strncpy(3) to ask programmers
> to be clever about it.  In the case of strncpy(3) it's due to it being
> an incomplete string-copying function.  strcpy(3) is complete.
>
> > Also, if it were truly a panacea, then we'd hardly have to worry
> > about the problems of strncpy(3) at all, since it would detect any misuse
> > of the function.)
>
> Fortification detects overruns in writes, which is how it protects
> strcpy(3).  However, fortification can't protect against overruns in
> reads, which is what strncpy(3) causes due to missing null terminators.
> strncpy(3) also causes off-by-one bugs (I'll detail below), which
> strcpy(3) doesn't (and strlcpy(3) doesn't either).

Ah, thank you, I wasn't aware of that limitation in _FORTIFY_SOURCE.

But I think my notion of problematic cleverness is somewhat different than
yours. When I think of code being excessively clever, I specifically think
of places where it relies on a certain property of the program state, but
it's unclear how that property is upheld at that point in the program.

This cleverness primarily appears in two different forms, in my experience.
In one form, snippet A is immediately followed by snippet B, but B depends
on some non-obvious property set up by A, and the code has no comments or
other documentation to this effect. In the other (more common) form,
snippet A sets up an obvious property that snippet B depends on, but the
two snippets are miles apart in the code, and it's difficult to see the
connection between the two. (The latter can be exacerbated by intervening
control flow.)

In this sense, cleverness is mostly orthogonal to the 'completeness' of a
particular function interface. A non-clever use of strncpy(3) would be
calling it and then immediately appending or testing for a null terminator;
then, we have two lines forming a functionally complete whole. A clever
use of strncpy(3) (of the second form) would be setting or testing the null
terminator way earlier or way later in the code, both of which were
unfortunately frequent in my review, though still a minority of uses.

Another clever use, of the first form, would be appending a null
terminator, using the output in a way that looks like we just want a
string, but then secretly depending on the buffer being null-padded to the
full length. This seems to be a particular concern of yours, but in
practice, I haven't been able to find a single instance of this, except
possibly in GNU binutils which already clearly exudes evil from every line.

On the other hand, I also see strcpy(3) as no less prone to overly clever
usage, despite being 'complete' in its own definition. The problem is that
it's generally not a complete operation in the context of its typical use
cases, which only have a finite destination buffer and need to ensure that
the entire source string will fit. The author has a choice to make in
deciding how to make this guarantee, and some of these choices can be
arbitrarily clever. In particular, since the author doesn't strictly need
to know the exact size of the source string or destination buffer at the
time they call the function, they can make those sizes as nebulous and
indirect as possible.

For example, a non-clever use of strcpy(3) would be immediately preceding
it by either an "if (strlen(src) >= dsize)" check, or an allocation of
strlen(src) + 1 bytes, which I think we both agree is the ideal scenario;
the code makes the guarantee and then immediately acts on it. But a clever
use would be exporting this length check to all the function's callers, or
only calling strlen(3) on some precursor(s) of the source string and then
deriving its full length with a tricky and error-prone formula, or simply
not testing the length of the source string at all, but sizing the
destination buffer based on the general vibes of the interface.

In fact, we can once again look at how code abuses strcpy(3) in practice:
- Of sizing the destination buffer in some far-off corner of the file, I
  found 4 instances in GNU binutils. Similarly, of sizing the source string
  in a far-off corner and not checking it, I found 6 instances in llvm-nm.
- Of sizing the destination buffer with an involved calculation and then
  trusting the result, I found 15 instances in GNU binutils, 1 in GDB, 1 in
  CPython, 3 in Firefox, and 4 in .NET Runtime.
- Of accepting an arbitrary destination buffer size without clearly
  bounding it below by the source string's length, I found 24 instances in
  GNU binutils; I believe at least 2 can cause UB with certain
  configurations and inputs. (I gave up trying to enumerate these in the
  other codebases, since it's generally not clear at all whether a minimum
  size is understood to be implied by the interface.)
- Of not checking the source string's length nor otherwise clearly bounding
  it above, I found 37 instances in GNU binutils, 3 in CPython, 14 in
  Firefox, 3 in .NET Runtime, and 6 in OpenJDK; I believe at least 19 can
  cause UB.
- Of obvious off-by-one errors that will trivially result in UB, I found 2
  instances in GNU binutils, 6 in CPython, 3 in Firefox, and 1 in OpenJDK.
- Finally, of a non-obvious but critical side effect (i.e., unintentionally
  clever code of the first form), I found just 1 instance in Firefox, where
  a certain error branch just happens to be reachable only when the buffer
  is large enough for the error message to fit.
And these aren't even counting its cousins strcat(3) and sprintf(3)!

So I hope you'll forgive me if I have a hard time believing that authors
are less likely to be overly clever with strcpy(3) than with strncpy(3),
purely on account of the former's interface being more 'complete'.

> > Probably the only way to solve the cleverness issue for good is to have an
> > immediately-available, foolproof, performant set of string functions that
> > are extremely straightforward to understand and use, flexible enough for
> > any use case, and generally agreed to be the first choice for string
> > manipulation.
> >
> > Unfortunately, probably the closest match to those criteria, especially the
> > availability criterion, is snprintf(3), which has the flaws of using int
> > instead of size_t for most sizes, not being very performant, and not being
> > async-signal-safe. Alas, it will likely remain a dream, given all the wars
> > over which safer string functions have the best API. But at least
> > strlcpy(3) has a pretty sound interface, if other platforms ever get around
> > to including it by default.
>
> strlcpy(3) will be in POSIX.1-202x (Issue 8), so it's a matter of time
> that it'll be widespread.

I noticed that, but I've always been a pessimist regarding the timelines of
cool new things being rolled out. It will take some months to years before
Issue 8 is released, months to years for all the relevant platforms to get
the memo and implement it, many years for the knowledge to trickle down to
the everyday library authors, and many more years for old versions of
platforms to reach the end of their support periods. And I don't want to
be one of those people advertising stuff that's perpetually 'just around
the corner'. (For that matter, I wonder how many decades it will be before
I see widespread use of posix_close(2) in a serious codebase, if ever.)

> > My point is isn't that the difference is undocumented, but that the typical
> > man page reader isn't reading the man pages for their own sake, but because
> > they're looking at some code, and they want to Know What It's Doing as soon
> > as possible.
>
> We could maybe add a list of ways people have tried to be clever with
> strncpy(3) in the past and failed, and then explain why those uses are
> broken.  This could be in a BUGS section.

I'd be interested in your experiences of people "trying to be clever" per
your perspective; as I mentioned, in my earlier review of actual strncpy(3)
usage, the only cleverness that occurs in non-negligible amounts has been
either in the midst of using it in its 'intended' role for producing a
null-padded character sequence (I'm referring to binutils here), or messing
around with which part of the code is responsible for appending the
terminator.

> > Instead, it's code making use of strncpy(3) in a particularly clever way
> > that I'd find confusing, and in those cases, I lie the blame squarely on
> > the cleverness rather than the function itself.
>
> I blame the definition of the function of ISO C.  Why?  Because by being
> an incomplete string-copying function, it forces the programmer to be
> clever about it.  You can't just use strncpy(3) and that's all; you need
> to do something else, and then you do clever stuff, which ends up badly.

It forces the programmer to perform an extra step, but it doesn't force the
programmer to be clever in performing that extra step. As I have described
above, strcpy(3) also needs an extra step that the programmer can be
inordinately clever with, regardless of being a complete string-copying
function. So I don't see strncpy(3) as being uniquely evil here.

> > So will all the custom strlen(3)+memcpy(3)-based replacements suddenly be
> > immune to off-by-one bugs?
>
> Slightly.  Here's the typical use of strlen(3)+strcpy(3):
>
> if (strlen(src) >= dsize)
>         goto error;
> strcpy(dst, src);
>
> There's no +1 or -1 in that code, so it's hard to make an off-by-one
> mistake.  Okay, you may have seen that it has a '>=', which one could
> accidentally replace by a '>', causing an off-by-one.  I'd wrap that
> thing in a strxcpy() wrapper so you avoid repetition.

As I learned, the typical use of strcpy(3) (at least 80% of uses in my
estimation) is actually copying a string into a new buffer, not an existing
buffer. And that does need a +1 to calculate a size to pass to the
allocation function, and usually a lot more +s if it's going to be
concatenating further strings. (Did you know that it's not an uncommon
practice to use "char value[1];" for a variable-length string at the end of
a struct, then depend on that 1 byte being included in the size of the
struct when allocating it?) Meanwhile, code does manage to make that off-by-
one error between >= and > in practice regardless.

Relatedly, as I also learned from all the manual strdup(3)-like snippets
that use a custom allocator, the typical library author is deathly allergic
to writing a custom wrapper over anything that isn't an allocation
function; they'll repeat the entirety of the logic inline as many times as
it takes. So I don't buy that most people would be replacing numerous calls
to strncpy(3) with calls to a unified wrapper function that can be
inspected and fixed all in one place, as you seem to suggest in your later
email.

> > Or will the vast majority of current strncpy(3)
> > users be willing to either restrict their platform support or add two extra
> > dependencies to their build process just to have strlcpy(3)? I'd hardly be
> > inclined to think that off-by-one bugs are a particular specialty of
> > strncpy(3).
>
> They are.  Here's the typical use of strncpy(3) as a replacement:
>
> strncpy(dst, src, dsize);
> if (dst[dsize - 1] != '\0')
>         goto error;
> dst[dsize - 1] = '\0';
>
> There are many more moving parts, so more chances to make mistakes.
> And you see it forces the programmer to write explicitly -1 twice.  I've
> seen code that forgets to do the -1, and also code that uses -1 in the
> strncpy(3) call (which makes it impossible to detect truncation).

That "dst[dsize - 1] = '\0';" line is extraneous, and none of the existing
truncation-detecting uses of strncpy(3) I saw have its equivalent; after
all, we just checked that character with the if statement, there's no need
to set it again. Without that line, there are only two lines of logic, and
a single -1, matching the single +1 needed by the typical use of strcpy(3).

Also, the typical use of strncpy(3) by far is to allow a truncated string
rather than raising an error on truncation, and in that use case, it makes
no difference whether or not the size inside the strncpy(3) call has a -1.
The memcpy(3) replacement for truncation needs an additional min() ternary
or macro, and it still needs a manual null terminator that can have the
exact same off-by-one error.

> > By that standard, every call to a function that takes an output pointer and
> > returns the number of elements written (say, readlink(2)) would need a
> > comment saying "the remaining elements in this array now have undefined
> > values".
>
> No, because it does precisely what is intended.  It is when you add dead
> code when you need to justify it.

Again, that seems like an odd standard to apply only to strncpy(3)'s
destination buffer. For instance, suppose that an API accepts an input
struct with optional fields. It's a common practice to zero out every field
with memset(3) or = {0}, then fill in the input fields that are actually
used, regardless of whether the API is specified as actively ignoring the
remaining fields.

Certainly, it can be quite a task to figure out whether the fields are
actually read, if the API is poorly specified; without going through its
entire implementation, any of those "unused" fields could be copied around
or compared before being discarded, making it dangerous to leave them
uninitialized. But need we add a comment to every one of those memset(3)
calls, "I'm unsure whether this zeroing is significant at all"? Perhaps
such a comment might be helpful, if there really is reason to suspect that
the API is nefarious, but I've hardly ever seen stuff like that in
practice.

(Or, for a silly reductio ad absurdum, if some code calls malloc(3), then
continues with some cleanup functions if it returns NULL, then would that
code have to justify why malloc(3) set an errno value that seemingly
never gets read? Those cleanup functions could be doing something clever by
reading errno on entry, after all!)

> > I don't think it's controversial that in many situations, we
> > tacitly understand that we simply don't care about the remainder of a
>
> While the analysis isn't very hard, it takes some time, examining all
> surrounding code to make sure nothing cares about the trailing bytes.
> When you have a hundred such calls, you need to make sure nobody was too
> clever around any of them.

Sure, there's a hypothetical concern that some later consumer might notice
the zeroing and act on it. But strncpy(3) is hardly the only thing in the
typical codebase that produces an unnecessarily-zeroed buffer. Authors
often use calloc(3) or memset(3) for peace of mind and no other purpose,
or, especially in C++, zero out any local buffers in a class constructor to
avoid the specter of uninitialized memory.

And of course, lots of code repeatedly reuses the same buffer for different
strings, handing out pointers to it, and callers could just as easily leak
the left-over data after the null terminator. Verifying that an alleged
string buffer truly is only used as a string is just a fact of life when
refactoring unfamiliar code in C.

> > buffer after a certain point. In the case of producing a string, that point
> > is going to be the null terminator, in the absence of on-site documentation
> > to the contrary; I'd label anything else as overly clever.
>
> But again, strncpy(3) forces you to be clever.

If forces you to do extra work, the same way strcpy(3) forces you to do
extra work. And it allows you to be clever, the same way strcpy(3) allows
you to be clever. But at least it bounds the extent of your cleverness in
that it forces you to remember the size of your destination buffer. I'd
much rather review a hundred typical calls to strncpy(3) than a hundred
typical calls to strcpy(3) any day of the week.

Thank you,
Matthew House


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 16:06                   ` Matthew House
@ 2023-11-10 17:48                     ` Alejandro Colomar
  2023-11-13 15:01                       ` Matthew House
  2023-11-11 20:55                     ` Jonny Grant
  1 sibling, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-10 17:48 UTC (permalink / raw)
  To: Matthew House; +Cc: Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 27929 bytes --]

Hi Matthew,

On Fri, Nov 10, 2023 at 11:06:00AM -0500, Matthew House wrote:
> On Thu, Nov 9, 2023 at 7:23 AM Alejandro Colomar <alx@kernel.org> wrote:
> > > So one can interpret strncpy(3) as copying a prefix of a character sequence
> > > into a buffer (and zero-filling the remainder), in which case you're
> > > correct that truncation cannot be detected. But the function is fomally
> > > defined as copying a prefix of a string into a buffer (and zero-filling the
> > > remainder), in which case the string has been truncated if the buffer
> > > doesn't end in a null byte afterward. It's just that one may not care about
> > > the terminating null byte being truncated if the user of the result just
> > > wants the initial character sequence.
> >
> > Yes, with the ISO C definition of strncpy(3), you can detect truncation.
> > The problem is that while my definition of it is complete, the
> > definition by ISO C makes it an incomplete function (to complete its
> > functionallity in copying strings, you need to add an explicit '\0'
> > after the call).  So I prefer mine, and for self-consistency, it can't
> > report truncation.
> 
> Personally, I'm a pragmatist, and I like to see it as kind of a duality: it
> can be used as part of a routine that copies part of a string and reports
> truncation, and it can also be used as a complete routine that copies part
> of a character sequence but can't report truncation. That reflects how it's
> used in practice. And it would hardly be the first such duality in C,
> either, given things like the fundamental practice of manipulating
> arbitrary objects as if they're character arrays.
> 
> (Some of these other dualities are similarly infamous in their room for
> error, e.g., forgetting to multiply by the element size when calling
> malloc(3), which I have often been guilty of myself. And still, a worrying
> amount of code neglects to test for multiplication overflow when doing
> this, even when the length comes from an untrusted source. Yet somehow I
> haven't seen any calls for a mallocarray(3) function to replace it. Ditto

Funnily enough, I have, often.

Here's something I wrote about the malloc(3) family recently:
<https://software.codidact.com/posts/285898/288023#answer-288023>
Pretty early in that text I recommend writing your own mallocarray(3),
even if libc doesn't provide it.

In shadow-utils, I replaced all of the allocation calls by safer
wrappers: macros that make it really hard to make mistakes, which
themselves wrap *array() functions, that wrap malloc(3) basic functions.
<https://github.com/shadow-maint/shadow/blob/master/lib/alloc.h>

I'll fight that battle when I'm done with str*() ones.  ;)

> with memset(3), which can and has caused actual hard-to-notice bugs due to
> the first few elements looking correct even if the provided length is too
> short.)

Heh, and my other one battle for standardizing bzero(3) again.  You're
perfectly right in that memset(3) is dangerous (well, compilers have
improved in their warnings, and nowadays it isn't so bad, but still
unnecessary risk).  I am of the opinion that you should only use
bzero(3) unless you really want to set the bytes to something else.
That something else is usually UINT8_MAX (and that's already rare), and
seldom something else.

glibc developers reading this might recall my suggestions to reinstate
bzero(3) in its right.  Such is my preference to this function, that I
removed some deprecation messages about it from the manual, relegating
it to the minimum necessary to document in HISTORY that POSIX did remove
it.

> 
> But you're entitled to your opinion on how it ought to be best represented
> in the man page, as long as the immediate shortcoming of the function w.r.t
> producing strings is made very clear, even to readers who aren't in the
> habit of contemplating formal definitions. I'm satisfied by your patch in
> that regard.

Thanks.  :)

> 
> > > That's a nice library that I didn't know about! Unfortunately, I don't
> > > think it's a very viable option for the long tail of small libraries I've
> > > referred to, which generally don't have any sub-dependencies of their own,
> > > apart from those provided by the platform.
> > >
> > > Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their
> > > configure scripts from whatever build system you're using (in such a way
> > > that libbsd can locate libmd), ensuring they're safe for cross-compilation
> > > if that's a goal, ensuring you bundle them in a way that respects their
> > > license terms, and ensuring that any user of your library links to the two
> > > dependencies and doesn't duplicate them. At that point, rolling your own
> > > strlcpy(3) equivalent definitely sounds like less mental load, at least to
> > > me.
> >
> > Yes, if you had 0 deps, it might be simpler to add your implementation.
> > Although it's a tricky function to implement, so I'd be careful.  If you
> > need to roll your own, I would go for a simpler function; maybe a
> > wrapper over strlen(3)+strcpy(3).
> 
> Such a wrapper would indeed be useful for detecting truncation, but a full
> strlcpy(3) equivalent would be necessary for permitting the truncation and
> continuing, which is the behavior of the majority of existing strncpy(3)-
> based code.

Yes, in string_copying(3) I document strlcpy(3) as the function you
should use for such a use case.  Still, I need to revise that page after
this discussion; I think we clarified many things, and that page should
reflect them.

> 
> I don't deny that this truncation behavior is often done dubiously and
> rarely receives enough scrutiny, but a significant chunk of the uses really
> are just building an informative string which won't cause any harm if
> truncated, and installing additional control flow to handle truncation
> errors in places where there currently isn't any can introduce its own
> bugs.

Yes.  And in fact, in shadow-utils I'm taking so slow because I want to
avoid a big-bang change that could introduce more errors than it fixes.
So I'm first removing the superfluous zeroing of strncpy(3) by using
strlcpy(3), while keeping truncation, and only when I'm done with that
I'll check if truncation poses any risks and should be fixed; but fixing
too much can break stuff.  Granted.

> 
> > > I didn't see this as an issue in practice when I was reviewing all those
> > > existing usages of strncpy(3). The vast majority were used in the midst of
> > > simple string manipulation, where the destination buffer starts as
> > > uninitialized or zeroed out, and ultimately gets passed into a user
> > > expecting an ordinary null-terminated string.
> > >
> > > (One exception was a few functions that used strncpy(dst, "", len) to zero
> >
> > Holy crap!  Didn't these programmers know bzero(3) or memset(3)?  :D
> >
> > > out the buffer, which is thankfully pretty obvious. Another exception was
> > > the functions that actually used strncpy(3) to produce a null-padded
> > > character sequence, e.g., when writing a value into a section of a binary.
> > > But in general, I found that it's usually not difficult to tell when a
> > > usage is being clever enough that the null padding might be significant.)
> > >
> > > In fact, the greater confusion came from the surprisingly common practice
> > > of using strncpy(3) like it's memcpy(3), by giving it the known length of
> >
> > It gets better!  :D
> 
> In all these cases, I think the function naming really is having somewhat
> of a psychological effect: the authors are wrangling with strthis(3) and
> strthat(3) for dozens of lines, so they'd find it scary to start mixing it
> up with mem*(3) functions ("I'm working with C strings, not with byte
> arrays!"), or perhaps they don't even consider it. They'd rather remain
> with strncpy(3), even when it means they have to manually append it with a
> null terminator or another string. But I'm no psychoanalyst, so take that
> with a big grain of salt.
> 
> (Meanwhile, in my own code, I try to work with pointer-and-length arrays
> whenever possible instead of fooling around with null terminators and all
> their off-by-one fun, so I've become leery of using any str*(3) functions
> apart from strlen(3) and strnlen(3).)
> 
> > > (This is also why I was confused by your support for strcpy(3) on the
> > > grounds that _FORTIFY_SOURCE exists. Sure, it's better than strncpy(3) in
> > > that its behavior isn't nearly so subtle, but _FORTIFY_SOURCE can only
> > > protect us from overruns, not from all the "small bugs" that might ensue
> > > from people becoming more clever with sizing the destination buffer with
> > > strcpy(3).
> >
> > I don't think strcpy(3) is as propense as strncpy(3) to ask programmers
> > to be clever about it.  In the case of strncpy(3) it's due to it being
> > an incomplete string-copying function.  strcpy(3) is complete.
> >
> > > Also, if it were truly a panacea, then we'd hardly have to worry
> > > about the problems of strncpy(3) at all, since it would detect any misuse
> > > of the function.)
> >
> > Fortification detects overruns in writes, which is how it protects
> > strcpy(3).  However, fortification can't protect against overruns in
> > reads, which is what strncpy(3) causes due to missing null terminators.
> > strncpy(3) also causes off-by-one bugs (I'll detail below), which
> > strcpy(3) doesn't (and strlcpy(3) doesn't either).
> 
> Ah, thank you, I wasn't aware of that limitation in _FORTIFY_SOURCE.
> 
> But I think my notion of problematic cleverness is somewhat different than
> yours. When I think of code being excessively clever, I specifically think
> of places where it relies on a certain property of the program state, but
> it's unclear how that property is upheld at that point in the program.
> 
> This cleverness primarily appears in two different forms, in my experience.
> In one form, snippet A is immediately followed by snippet B, but B depends
> on some non-obvious property set up by A, and the code has no comments or
> other documentation to this effect. In the other (more common) form,
> snippet A sets up an obvious property that snippet B depends on, but the
> two snippets are miles apart in the code, and it's difficult to see the
> connection between the two. (The latter can be exacerbated by intervening
> control flow.)
> 
> In this sense, cleverness is mostly orthogonal to the 'completeness' of a
> particular function interface. A non-clever use of strncpy(3) would be
> calling it and then immediately appending or testing for a null terminator;
> then, we have two lines forming a functionally complete whole. A clever
> use of strncpy(3) (of the second form) would be setting or testing the null
> terminator way earlier or way later in the code, both of which were
> unfortunately frequent in my review, though still a minority of uses.
> 
> Another clever use, of the first form, would be appending a null
> terminator, using the output in a way that looks like we just want a
> string, but then secretly depending on the buffer being null-padded to the
> full length. This seems to be a particular concern of yours, but in
> practice, I haven't been able to find a single instance of this, except
> possibly in GNU binutils which already clearly exudes evil from every line.
> 
> On the other hand, I also see strcpy(3) as no less prone to overly clever
> usage, despite being 'complete' in its own definition. The problem is that
> it's generally not a complete operation in the context of its typical use
> cases, which only have a finite destination buffer and need to ensure that
> the entire source string will fit. The author has a choice to make in
> deciding how to make this guarantee, and some of these choices can be
> arbitrarily clever. In particular, since the author doesn't strictly need
> to know the exact size of the source string or destination buffer at the
> time they call the function, they can make those sizes as nebulous and
> indirect as possible.
> 
> For example, a non-clever use of strcpy(3) would be immediately preceding
> it by either an "if (strlen(src) >= dsize)" check, or an allocation of
> strlen(src) + 1 bytes, which I think we both agree is the ideal scenario;
> the code makes the guarantee and then immediately acts on it. But a clever
> use would be exporting this length check to all the function's callers, or
> only calling strlen(3) on some precursor(s) of the source string and then
> deriving its full length with a tricky and error-prone formula, or simply
> not testing the length of the source string at all, but sizing the
> destination buffer based on the general vibes of the interface.
> 
> In fact, we can once again look at how code abuses strcpy(3) in practice:
> - Of sizing the destination buffer in some far-off corner of the file, I
>   found 4 instances in GNU binutils. Similarly, of sizing the source string
>   in a far-off corner and not checking it, I found 6 instances in llvm-nm.
> - Of sizing the destination buffer with an involved calculation and then
>   trusting the result, I found 15 instances in GNU binutils, 1 in GDB, 1 in
>   CPython, 3 in Firefox, and 4 in .NET Runtime.
> - Of accepting an arbitrary destination buffer size without clearly
>   bounding it below by the source string's length, I found 24 instances in
>   GNU binutils; I believe at least 2 can cause UB with certain
>   configurations and inputs. (I gave up trying to enumerate these in the
>   other codebases, since it's generally not clear at all whether a minimum
>   size is understood to be implied by the interface.)
> - Of not checking the source string's length nor otherwise clearly bounding
>   it above, I found 37 instances in GNU binutils, 3 in CPython, 14 in
>   Firefox, 3 in .NET Runtime, and 6 in OpenJDK; I believe at least 19 can
>   cause UB.
> - Of obvious off-by-one errors that will trivially result in UB, I found 2
>   instances in GNU binutils, 6 in CPython, 3 in Firefox, and 1 in OpenJDK.
> - Finally, of a non-obvious but critical side effect (i.e., unintentionally
>   clever code of the first form), I found just 1 instance in Firefox, where
>   a certain error branch just happens to be reachable only when the buffer
>   is large enough for the error message to fit.
> And these aren't even counting its cousins strcat(3) and sprintf(3)!
> 
> So I hope you'll forgive me if I have a hard time believing that authors
> are less likely to be overly clever with strcpy(3) than with strncpy(3),
> purely on account of the former's interface being more 'complete'.
> 
> > > Probably the only way to solve the cleverness issue for good is to have an
> > > immediately-available, foolproof, performant set of string functions that
> > > are extremely straightforward to understand and use, flexible enough for
> > > any use case, and generally agreed to be the first choice for string
> > > manipulation.
> > >
> > > Unfortunately, probably the closest match to those criteria, especially the
> > > availability criterion, is snprintf(3), which has the flaws of using int
> > > instead of size_t for most sizes, not being very performant, and not being
> > > async-signal-safe. Alas, it will likely remain a dream, given all the wars
> > > over which safer string functions have the best API. But at least
> > > strlcpy(3) has a pretty sound interface, if other platforms ever get around
> > > to including it by default.
> >
> > strlcpy(3) will be in POSIX.1-202x (Issue 8), so it's a matter of time
> > that it'll be widespread.
> 
> I noticed that, but I've always been a pessimist regarding the timelines of
> cool new things being rolled out. It will take some months to years before
> Issue 8 is released, months to years for all the relevant platforms to get
> the memo and implement it, many years for the knowledge to trickle down to
> the everyday library authors, and many more years for old versions of
> platforms to reach the end of their support periods. And I don't want to
> be one of those people advertising stuff that's perpetually 'just around
> the corner'. (For that matter, I wonder how many decades it will be before
> I see widespread use of posix_close(2) in a serious codebase, if ever.)
> 
> > > My point is isn't that the difference is undocumented, but that the typical
> > > man page reader isn't reading the man pages for their own sake, but because
> > > they're looking at some code, and they want to Know What It's Doing as soon
> > > as possible.
> >
> > We could maybe add a list of ways people have tried to be clever with
> > strncpy(3) in the past and failed, and then explain why those uses are
> > broken.  This could be in a BUGS section.
> 
> I'd be interested in your experiences of people "trying to be clever" per
> your perspective; as I mentioned, in my earlier review of actual strncpy(3)
> usage, the only cleverness that occurs in non-negligible amounts has been
> either in the midst of using it in its 'intended' role for producing a
> null-padded character sequence (I'm referring to binutils here), or messing
> around with which part of the code is responsible for appending the
> terminator.
> 
> > > Instead, it's code making use of strncpy(3) in a particularly clever way
> > > that I'd find confusing, and in those cases, I lie the blame squarely on
> > > the cleverness rather than the function itself.
> >
> > I blame the definition of the function of ISO C.  Why?  Because by being
> > an incomplete string-copying function, it forces the programmer to be
> > clever about it.  You can't just use strncpy(3) and that's all; you need
> > to do something else, and then you do clever stuff, which ends up badly.
> 
> It forces the programmer to perform an extra step, but it doesn't force the
> programmer to be clever in performing that extra step. As I have described
> above, strcpy(3) also needs an extra step that the programmer can be
> inordinately clever with, regardless of being a complete string-copying
> function. So I don't see strncpy(3) as being uniquely evil here.
> 
> > > So will all the custom strlen(3)+memcpy(3)-based replacements suddenly be
> > > immune to off-by-one bugs?
> >
> > Slightly.  Here's the typical use of strlen(3)+strcpy(3):
> >
> > if (strlen(src) >= dsize)
> >         goto error;
> > strcpy(dst, src);
> >
> > There's no +1 or -1 in that code, so it's hard to make an off-by-one
> > mistake.  Okay, you may have seen that it has a '>=', which one could
> > accidentally replace by a '>', causing an off-by-one.  I'd wrap that
> > thing in a strxcpy() wrapper so you avoid repetition.
> 
> As I learned, the typical use of strcpy(3) (at least 80% of uses in my
> estimation) is actually copying a string into a new buffer, not an existing
> buffer. And that does need a +1 to calculate a size to pass to the
> allocation function, and usually a lot more +s if it's going to be

If you strcpy(3) to a new buffer, you'd usually strdup(3), no?  Unless
it's part of a larger object.

> concatenating further strings. (Did you know that it's not an uncommon
> practice to use "char value[1];" for a variable-length string at the end of
> a struct, then depend on that 1 byte being included in the size of the
> struct when allocating it?)

Not exactly that, but I've seen things like that, yeah.  I wish I didn't.

> Meanwhile, code does manage to make that off-by-
> one error between >= and > in practice regardless.

I made that error yesterday, so yes.  :)

> 
> Relatedly, as I also learned from all the manual strdup(3)-like snippets
> that use a custom allocator, the typical library author is deathly allergic
> to writing a custom wrapper over anything that isn't an allocation
> function; they'll repeat the entirety of the logic inline as many times as
> it takes. So I don't buy that most people would be replacing numerous calls
> to strncpy(3) with calls to a unified wrapper function that can be
> inspected and fixed all in one place, as you seem to suggest in your later
> email.

I try to avoid cowboy programmers, but we know it's impossible.  I just
do what I can.  But cowboy programmers will nevertheless continue to
exist and negate reality.

<https://github.com/nginx/unit/issues/795>
<https://github.com/nginx/unit/issues/804>
<https://github.com/nginx/unit/issues/923>

The responses from a programmer from nginx are gems, doubting that UB is
a problem, or even suggesting implementing a cosmetic patch instead of
fixing an API.  You can read those links if you want some fun.

> 
> > > Or will the vast majority of current strncpy(3)
> > > users be willing to either restrict their platform support or add two extra
> > > dependencies to their build process just to have strlcpy(3)? I'd hardly be
> > > inclined to think that off-by-one bugs are a particular specialty of
> > > strncpy(3).
> >
> > They are.  Here's the typical use of strncpy(3) as a replacement:
> >
> > strncpy(dst, src, dsize);
> > if (dst[dsize - 1] != '\0')
> >         goto error;
> > dst[dsize - 1] = '\0';
> >
> > There are many more moving parts, so more chances to make mistakes.
> > And you see it forces the programmer to write explicitly -1 twice.  I've
> > seen code that forgets to do the -1, and also code that uses -1 in the
> > strncpy(3) call (which makes it impossible to detect truncation).
> 
> That "dst[dsize - 1] = '\0';" line is extraneous, and none of the existing
> truncation-detecting uses of strncpy(3) I saw have its equivalent; after
> all, we just checked that character with the if statement, there's no need
> to set it again. Without that line, there are only two lines of logic, and
> a single -1, matching the single +1 needed by the typical use of strcpy(3).

Hmm you're right.  I took an actual typical use of strncpy(3) as you
could find them in shadow-utils, that is, without the truncation check,
and added the truncation check myself without removing the zeroing.
You can remove that like.  And yes, that makes it a signle off-by-one
chance, as well as with strlen(3).

So, as long as you wrap this in an inline function, it should be as
safe.  Except that you still do the superfluous zeroing that I find
confusing.  But if you go and write a decent wrapper around strncpy(3),
I would see it as decent code.

> 
> Also, the typical use of strncpy(3) by far is to allow a truncated string
> rather than raising an error on truncation, and in that use case, it makes
> no difference whether or not the size inside the strncpy(3) call has a -1.

True; that's a benign off-by-one cancer.  But still a cancer.

> The memcpy(3) replacement for truncation needs an additional min() ternary
> or macro, and it still needs a manual null terminator that can have the
> exact same off-by-one error.
> 
> > > By that standard, every call to a function that takes an output pointer and
> > > returns the number of elements written (say, readlink(2)) would need a
> > > comment saying "the remaining elements in this array now have undefined
> > > values".
> >
> > No, because it does precisely what is intended.  It is when you add dead
> > code when you need to justify it.
> 
> Again, that seems like an odd standard to apply only to strncpy(3)'s
> destination buffer. For instance, suppose that an API accepts an input
> struct with optional fields. It's a common practice to zero out every field
> with memset(3) or = {0}, then fill in the input fields that are actually
> used, regardless of whether the API is specified as actively ignoring the
> remaining fields.
> 
> Certainly, it can be quite a task to figure out whether the fields are
> actually read, if the API is poorly specified; without going through its
> entire implementation, any of those "unused" fields could be copied around
> or compared before being discarded, making it dangerous to leave them
> uninitialized. But need we add a comment to every one of those memset(3)
> calls, "I'm unsure whether this zeroing is significant at all"? Perhaps
> such a comment might be helpful, if there really is reason to suspect that
> the API is nefarious, but I've hardly ever seen stuff like that in
> practice.

Maybe it's because in the code I've worked with, there were actual calls
to strncpy(3) where the zeroing matters, and they're disguised between
other strncpy(3) calls, which make it all a funny amusement park.

If you _only_ use strings, and wrap strncpy(3) in a wrapper that
protects against off-by-ones, it would be acceptable, I must say.  It's
just that I don't find that code when I see strncpy(3) calls.  Maybe I
don't look at the right code bases.

> (Or, for a silly reductio ad absurdum, if some code calls malloc(3), then
> continues with some cleanup functions if it returns NULL, then would that
> code have to justify why malloc(3) set an errno value that seemingly
> never gets read? Those cleanup functions could be doing something clever by
> reading errno on entry, after all!)
> 
> > > I don't think it's controversial that in many situations, we
> > > tacitly understand that we simply don't care about the remainder of a
> >
> > While the analysis isn't very hard, it takes some time, examining all
> > surrounding code to make sure nothing cares about the trailing bytes.
> > When you have a hundred such calls, you need to make sure nobody was too
> > clever around any of them.
> 
> Sure, there's a hypothetical concern that some later consumer might notice
> the zeroing and act on it. But strncpy(3) is hardly the only thing in the
> typical codebase that produces an unnecessarily-zeroed buffer. Authors
> often use calloc(3) or memset(3) for peace of mind and no other purpose,

Those are as nefarious IMO.  They remove the ability of a static
analyzer of detecting uninitialized uses.  I.e., if you zero-initialize
all of your code, -Wuninitialized and -Wmaybe-uninitialized (and
-fanalyzer also plays a role there) get completely useless, and your
program still will behave wrongly if you miss one of those cases; it's
just that the compiler won't help you fix them.

> or, especially in C++, zero out any local buffers in a class constructor to
> avoid the specter of uninitialized memory.
> 
> And of course, lots of code repeatedly reuses the same buffer for different
> strings, handing out pointers to it, and callers could just as easily leak
> the left-over data after the null terminator. Verifying that an alleged
> string buffer truly is only used as a string is just a fact of life when
> refactoring unfamiliar code in C.
> 
> > > buffer after a certain point. In the case of producing a string, that point
> > > is going to be the null terminator, in the absence of on-site documentation
> > > to the contrary; I'd label anything else as overly clever.
> >
> > But again, strncpy(3) forces you to be clever.
> 
> If forces you to do extra work, the same way strcpy(3) forces you to do
> extra work.

strncpy(3) still requires you to know your buffer sizes.  So any dangers
of strcpy(3) in that regard should be shared by strncpy(3).  No?

Cheers,
Alex

> And it allows you to be clever, the same way strcpy(3) allows
> you to be clever. But at least it bounds the extent of your cleverness in
> that it forces you to remember the size of your destination buffer. I'd
> much rather review a hundred typical calls to strncpy(3) than a hundred
> typical calls to strcpy(3) any day of the week.
> 
> Thank you,
> Matthew House

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 11:05                           ` Alejandro Colomar
  2023-11-10 11:47                             ` Alejandro Colomar
@ 2023-11-10 17:58                             ` Paul Eggert
  2023-11-10 18:36                               ` Alejandro Colomar
  2023-11-10 19:52                               ` Alejandro Colomar
  1 sibling, 2 replies; 138+ messages in thread
From: Paul Eggert @ 2023-11-10 17:58 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

On 2023-11-10 03:05, Alejandro Colomar wrote:
> Hopefully, it won't be so bad in terms of performance.

It's significantly slower than strncpy for typical use (smallish 
fixed-size destination buffers). So just use strncpy for that. It may be 
bad, but it's better than the alternatives you've mentioned. You can 
package strncpy inside a [[nodiscard]] inline wrapper if you like.

More importantly, the manual should not push strlcpy as being superior 
or being in any way a "fix" for strncpy's problems. strlcpy is worse 
than strncpy in important ways and besides - as mentioned in the glibc 
manual - neither function is a good choice for string processing.


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 17:58                             ` Paul Eggert
@ 2023-11-10 18:36                               ` Alejandro Colomar
  2023-11-10 20:19                                 ` Alejandro Colomar
  2023-11-10 19:52                               ` Alejandro Colomar
  1 sibling, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-10 18:36 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 2030 bytes --]

Hi Paul,


On Fri, Nov 10, 2023 at 09:58:42AM -0800, Paul Eggert wrote:
> On 2023-11-10 03:05, Alejandro Colomar wrote:
> > Hopefully, it won't be so bad in terms of performance.
> 
> It's significantly slower than strncpy for typical use (smallish fixed-size
> destination buffers). So just use strncpy for that. It may be bad, but it's
> better than the alternatives you've mentioned. You can package strncpy
> inside a [[nodiscard]] inline wrapper if you like.
> 
> More importantly, the manual should not push strlcpy as being superior or
> being in any way a "fix" for strncpy's problems. strlcpy is worse than
> strncpy in important ways and besides - as mentioned in the glibc manual -
> neither function is a good choice for string processing.

Hmmmm, that sounds convincing.  How about this as a starting point?

diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index 3cf4eb371..3aff18106 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -67,6 +67,38 @@ .SH DESCRIPTION
 }
 .EE
 .in
+.\"
+.SS Copying a string with truncation
+Although this function wasn't designed to copy a string with truncation,
+it can be used with appropriate care for that purpose.
+Such use is prone to off-by-one bugs,
+so it is recommended that you write a wrapper function
+that encloses all the danger.
+.P
+.in +4n
+.EX
+[[nodiscard]]
+inline ssize_t
+strxcpy(char *restrict dst, const char *restrict src, char dsize)
+{
+    char  *p;
+
+    p = stpncpy(dst, src, dsize);
+    if (dst[dsize - 1] != '\0')
+        return -1;
+
+    return p - dst - 1;
+}
+.EE
+.in
+You could implement a similar function in terms of
+.BR strlen (3)
+and
+.BR memcpy (3),
+or in terms of
+.BR strlcpy (3),
+and it would be simpler,
+but this implementation is faster.
 .SH RETURN VALUE
 .TP
 .BR strncpy ()


I used stpncpy(3), assuming it will have the same performance of
strncpy(3), because it can be used to return the length.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 17:58                             ` Paul Eggert
  2023-11-10 18:36                               ` Alejandro Colomar
@ 2023-11-10 19:52                               ` Alejandro Colomar
  2023-11-10 22:14                                 ` Paul Eggert
  1 sibling, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-10 19:52 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 1010 bytes --]

On Fri, Nov 10, 2023 at 09:58:42AM -0800, Paul Eggert wrote:
> On 2023-11-10 03:05, Alejandro Colomar wrote:
> > Hopefully, it won't be so bad in terms of performance.
> 
> It's significantly slower than strncpy for typical use (smallish fixed-size
> destination buffers). So just use strncpy for that. It may be bad, but it's

Do you have any numbers?  I'm curious to see strnlen+memcpy vs stpncpy
for buffers of some typical sizes (say 80 and BUFSIZ) under amd64 and
arm64 (two typical archs).  Are we talking of 1%, 10%, or 100%?

> better than the alternatives you've mentioned. You can package strncpy
> inside a [[nodiscard]] inline wrapper if you like.
> 
> More importantly, the manual should not push strlcpy as being superior or
> being in any way a "fix" for strncpy's problems. strlcpy is worse than
> strncpy in important ways and besides - as mentioned in the glibc manual -
> neither function is a good choice for string processing.

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 18:36                               ` Alejandro Colomar
@ 2023-11-10 20:19                                 ` Alejandro Colomar
  2023-11-10 23:44                                   ` Jonny Grant
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-10 20:19 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 3565 bytes --]

Hi Paul,

On Fri, Nov 10, 2023 at 07:36:33PM +0100, Alejandro Colomar wrote:
> Hi Paul,
> 
> 
> On Fri, Nov 10, 2023 at 09:58:42AM -0800, Paul Eggert wrote:
> > On 2023-11-10 03:05, Alejandro Colomar wrote:
> > > Hopefully, it won't be so bad in terms of performance.
> > 
> > It's significantly slower than strncpy for typical use (smallish fixed-size
> > destination buffers). So just use strncpy for that. It may be bad, but it's
> > better than the alternatives you've mentioned. You can package strncpy
> > inside a [[nodiscard]] inline wrapper if you like.
> > 
> > More importantly, the manual should not push strlcpy as being superior or
> > being in any way a "fix" for strncpy's problems. strlcpy is worse than
> > strncpy in important ways and besides - as mentioned in the glibc manual -
> > neither function is a good choice for string processing.
> 
> Hmmmm, that sounds convincing.  How about this as a starting point?

Something slightly better:

diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
index 3cf4eb371..8ffedae01 100644
--- a/man3/stpncpy.3
+++ b/man3/stpncpy.3
@@ -67,6 +67,88 @@ .SH DESCRIPTION
 }
 .EE
 .in
+.\"
+.SS Producing a string in a fixed-width buffer
+Programs should normally avoid arbitrary string limitations.
+However, some programs may need to write strings into fixed-width buffers.
+.P
+Although this function wasn't designed to produce a string,
+it can be used with appropriate care for that purpose.
+There are two main cases where it can be useful:
+.IP \[bu] 3
+Copying a string into a new string in a fixed-width buffer,
+preventing buffer overflow.
+.IP \[bu]
+Copying a string into a new string in a fixed-width buffer,
+with truncation.
+.P
+Using
+.BR strncpy (3)
+in any of those cases is prone to several classes of bugs,
+so it is recommended that you write a wrapper function
+that encloses all the dangers.
+.TP
+Copying a string preventing buffer overflow
+.in +4n
+.EX
+[[nodiscard]]
+inline ssize_t
+strxcpy(char *restrict dst, const char *restrict src, char dsize)
+{
+    char  *p;
+
+    if (dsize == 0)
+        return -1;
+
+    p = stpncpy(dst, src, dsize);
+    if (dst[dsize - 1] != '\0')
+        return -1;
+
+    return p - dst;
+}
+.EE
+.in
+.P
+If it returns -1,
+the contents of
+.I dst
+are undefined,
+and the program should handle the error.
+.P
+You could implement a similar function in terms of
+.BR strlen (3)
+and
+.BR memcpy (3),
+or in terms of
+.BR strlcpy (3),
+and it would be simpler,
+but this implementation is faster.
+.\"
+.TP
+Copying a string with truncation
+Truncation is almost always a bug.
+However, in the few cases where it is not a bug,
+you can use the following function.
+.in +4n
+.EX
+inline ssize_t
+strtcpy(char *restrict dst, const char *restrict src, char dsize)
+{
+    char  *p;
+
+    if (dsize == 0)
+        return -1;
+
+    p = stpncpy(dst, src, dsize);
+    if (dst[dsize - 1] != '\0') {
+        dst[dsize - 1] = '\0';
+        p--;
+    }
+
+    return p - dst;
+}
+.EE
+.in
 .SH RETURN VALUE
 .TP
 .BR strncpy ()


However, note how many branches we need to make a function that handles
all corner cases.  Is it still faster than strnlen+memcpy?  stpncpy must
be heavily optimized for that.  Also, strnlen(3) might be optimized out
by the compiler in many cases, so maybe in real code it would be better
to use memcpy.  I'd very much like to see some numbers.

Thanks,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 19:52                               ` Alejandro Colomar
@ 2023-11-10 22:14                                 ` Paul Eggert
  2023-11-11 21:13                                   ` Alejandro Colomar
  0 siblings, 1 reply; 138+ messages in thread
From: Paul Eggert @ 2023-11-10 22:14 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 1133 bytes --]

On 2023-11-10 11:52, Alejandro Colomar wrote:

> Do you have any numbers?

It depends on size of course. With programs like 'tar' (one of the few 
programs that actually needs something like strncpy) the destination 
buffer is usually fairly small (32 bytes or less) though some of them 
are 100 bytes. I used 16 bytes in the following shell transcript:

$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy strlcpy; do 
echo; echo $i:; time ./a.out 16 100000000 abcdefghijk $i; done

strnlen+strcpy:

real	0m0.411s
user	0m0.411s
sys	0m0.000s

strnlen+memcpy:

real	0m0.392s
user	0m0.388s
sys	0m0.004s

strncpy:

real	0m0.300s
user	0m0.300s
sys	0m0.000s

stpncpy:

real	0m0.326s
user	0m0.326s
sys	0m0.000s

strlcpy:

real	0m0.623s
user	0m0.623s
sys	0m0.000s


... where a.out was generated by compiling the attached program with gcc 
-O2 on Ubuntu 23.10 64-bit on a Xeon W-1350.

I wouldn't take these numbers all that seriously, as microbenchmarks 
like these are not that informative these days. Still, for a typical 
case one should not assume strncpy must be slower merely because it has 
more work to do; quite the contrary.

[-- Attachment #2: strncpy-bench.c --]
[-- Type: text/x-csrc, Size: 1090 bytes --]

#include <stdlib.h>
#include <string.h>


int
main (int argc, char **argv)
{
  if (argc != 5)
    return 2;
  long bufsize = atol (argv[1]);
  char *buf = malloc (bufsize);
  long n = atol (argv[2]);
  char const *a = argv[3];
  if (strcmp (argv[4], "strnlen+strcpy") == 0)
    {
      for (long i = 0; i < n; i++)
	{
	  if (strnlen (a, bufsize) == bufsize)
	    return 1;
	  strcpy (buf, a);
	}
    }
  else if (strcmp (argv[4], "strnlen+memcpy") == 0)
    {
      for (long i = 0; i < n; i++)
	{
	  size_t alen = strnlen (a, bufsize);
	  if (alen == bufsize)
	    return 1;
	  memcpy (buf, a, alen + 1);
	}
    }
  else if (strcmp (argv[4], "strncpy") == 0)
    {
      for (long i = 0; i < n; i++)
	if (strncpy (buf, a, bufsize)[bufsize - 1])
	  return 1;
    }
  else if (strcmp (argv[4], "stpncpy") == 0)
    {
      for (long i = 0; i < n; i++)
	if (stpncpy (buf, a, bufsize) == buf + bufsize)
	  return 1;
    }
  else if (strcmp (argv[4], "strlcpy") == 0)
    {
      for (long i = 0; i < n; i++)
	if (strlcpy (buf, a, bufsize) == bufsize)
	  return 1;
    }
  else
    return 2;
}

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 20:19                                 ` Alejandro Colomar
@ 2023-11-10 23:44                                   ` Jonny Grant
  0 siblings, 0 replies; 138+ messages in thread
From: Jonny Grant @ 2023-11-10 23:44 UTC (permalink / raw)
  To: Alejandro Colomar, Paul Eggert; +Cc: Matthew House, linux-man, GNU C Library



On 10/11/2023 20:19, Alejandro Colomar wrote:
> Hi Paul,
> 
> On Fri, Nov 10, 2023 at 07:36:33PM +0100, Alejandro Colomar wrote:
>> Hi Paul,
>>
>>
>> On Fri, Nov 10, 2023 at 09:58:42AM -0800, Paul Eggert wrote:
>>> On 2023-11-10 03:05, Alejandro Colomar wrote:
>>>> Hopefully, it won't be so bad in terms of performance.
>>>
>>> It's significantly slower than strncpy for typical use (smallish fixed-size
>>> destination buffers). So just use strncpy for that. It may be bad, but it's
>>> better than the alternatives you've mentioned. You can package strncpy
>>> inside a [[nodiscard]] inline wrapper if you like.
>>>
>>> More importantly, the manual should not push strlcpy as being superior or
>>> being in any way a "fix" for strncpy's problems. strlcpy is worse than
>>> strncpy in important ways and besides - as mentioned in the glibc manual -
>>> neither function is a good choice for string processing.
>>
>> Hmmmm, that sounds convincing.  How about this as a starting point?
> 
> Something slightly better:
> 
> diff --git a/man3/stpncpy.3 b/man3/stpncpy.3
> index 3cf4eb371..8ffedae01 100644
> --- a/man3/stpncpy.3
> +++ b/man3/stpncpy.3
> @@ -67,6 +67,88 @@ .SH DESCRIPTION
>  }
>  .EE
>  .in
> +.\"
> +.SS Producing a string in a fixed-width buffer
> +Programs should normally avoid arbitrary string limitations.
> +However, some programs may need to write strings into fixed-width buffers.
> +.P
> +Although this function wasn't designed to produce a string,
> +it can be used with appropriate care for that purpose.
> +There are two main cases where it can be useful:
> +.IP \[bu] 3
> +Copying a string into a new string in a fixed-width buffer,
> +preventing buffer overflow.
> +.IP \[bu]
> +Copying a string into a new string in a fixed-width buffer,
> +with truncation.
> +.P
> +Using
> +.BR strncpy (3)
> +in any of those cases is prone to several classes of bugs,
> +so it is recommended that you write a wrapper function
> +that encloses all the dangers.

Some feedback about last line: "that covers all the risks" is clearer.

> +.TP
> +Copying a string preventing buffer overflow
> +.in +4n
> +.EX
> +[[nodiscard]]
> +inline ssize_t
> +strxcpy(char *restrict dst, const char *restrict src, char dsize)
> +{
> +    char  *p;
> +
> +    if (dsize == 0)
> +        return -1;
> +
> +    p = stpncpy(dst, src, dsize);
> +    if (dst[dsize - 1] != '\0')
> +        return -1;
> +
> +    return p - dst;
> +}
> +.EE
> +.in
> +.P
> +If it returns -1,
> +the contents of
> +.I dst
> +are undefined,
> +and the program should handle the error.
> +.P
> +You could implement a similar function in terms of
> +.BR strlen (3)
> +and
> +.BR memcpy (3),
> +or in terms of
> +.BR strlcpy (3),
> +and it would be simpler,
> +but this implementation is faster.

I suggest to add a little more information, could append "because it accesses less memory".

> +.\"
> +.TP
> +Copying a string with truncation
> +Truncation is almost always a bug.
> +However, in the few cases where it is not a bug,
> +you can use the following function.
> +.in +4n
> +.EX
> +inline ssize_t
> +strtcpy(char *restrict dst, const char *restrict src, char dsize)
> +{
> +    char  *p;
> +
> +    if (dsize == 0)
> +        return -1;
> +
> +    p = stpncpy(dst, src, dsize);
> +    if (dst[dsize - 1] != '\0') {
> +        dst[dsize - 1] = '\0';
> +        p--;
> +    }
> +
> +    return p - dst;
> +}
> +.EE
> +.in
>  .SH RETURN VALUE
>  .TP
>  .BR strncpy ()
> 
> 
> However, note how many branches we need to make a function that handles
> all corner cases.  Is it still faster than strnlen+memcpy?  stpncpy must
> be heavily optimized for that.  Also, strnlen(3) might be optimized out
> by the compiler in many cases, so maybe in real code it would be better
> to use memcpy.  I'd very much like to see some numbers.

A benchmark test would show performance. Can't be that many lines of code in a loop to measure this.

strnlen_s is in the C standard Annex K, but strnlen didn't make it in yet, even C23.

Kind regards
Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 11:18                     ` Alejandro Colomar
@ 2023-11-11  7:55                       ` Oskari Pirhonen
  0 siblings, 0 replies; 138+ messages in thread
From: Oskari Pirhonen @ 2023-11-11  7:55 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Matthew House, Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 3350 bytes --]

On Fri, Nov 10, 2023 at 12:18:56 +0100, Alejandro Colomar wrote:
> Hi Oskari,
> 
> On Fri, Nov 10, 2023 at 01:06:44AM -0600, Oskari Pirhonen wrote:
> > On Thu, Nov 09, 2023 at 13:23:14 +0100, Alejandro Colomar wrote:
> > > Don't worry.  strncpy(3) won't be deprecated, thanks to tar(1).  ;)
> > > 
> > 
> > Just please don't tar and feather [1] the people who use it ;)
> 
> Hmmm, it just caught me after a year fixing broken strncpy(3) calls.  I
> was a bit unfair.  I'm sorry if I wasn't so nice.  Hopefully, we've all
> learnt something about string-copying functions.  :)
> 

Indeed we have. This whole thread became much more informative than I
could've anticipated. And we also got a better wording for strncpy(3)
too :)

> > > We could maybe add a list of ways people have tried to be clever with
> > > strncpy(3) in the past and failed, and then explain why those uses are
> > > broken.  This could be in a BUGS section.
> > > 
> > 
> > This would be a very fun read.
> 
> I'll write it then!  :D
> 
> > 
> > ... snip ...
> > 
> > > > > Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no,
> > > > > it's not correct code.  It's rather dangerous code that just happens to
> > > > > not be vulnerable most of the time.
> > > > 
> > > > So will all the custom strlen(3)+memcpy(3)-based replacements suddenly be
> > > > immune to off-by-one bugs?
> > > 
> > > Slightly.  Here's the typical use of strlen(3)+strcpy(3):
> > > 
> > > if (strlen(src) >= dsize)
> > > 	goto error;
> > > strcpy(dst, src);
> > > 
> > > There's no +1 or -1 in that code, so it's hard to make an off-by-one
> > > mistake.  Okay, you may have seen that it has a '>=', which one could
> > > accidentally replace by a '>', causing an off-by-one.  I'd wrap that
> > > thing in a strxcpy() wrapper so you avoid repetition. 
> > > 
> > 
> > Might I go so far as to recommend strnlen(3) instead of strlen(3)? That
> > way, instead of blindly looking for a null terminator, you stop after a
> > predetermined max length. Especially nice for untrusted input where you
> > can't make assumptions on the "fitness for a purpose" of what's being
> > fed in.
> > 
> >     if (src == NULL || strnlen(src, dsize) == dsize)
> >         goto error;
> >     strcpy(dst, src);
> 
> A NULL check shouldn't be necessary (no other copying functions have,
> and that's not a big deal with them, although I have mixed feelings
> about things like memcpy(dst, NULL, 0)).
> 
> About strnlen(3), you're right, and Paul also pointed that out.  See the
> other mail I sent to the list with an inline implementation of strxcpy()
> using strnlen(3).
> 

Yep. I saw it just before replying to this message.

> > 
> > This, of course, assumes you have POSIX at your disposal.
> 
> I always assume this.  If not, please ask your vendor to provide a POSIX
> layer.  Or at least the parts of POSIX that can be implemented in a
> free-standing implementation.  Or stop using that vendor.
> 
> > 
> > I'm writing this before going to bed. I did briefly sanity check it with
> > a simple test prog, but it would be quite ironic if I missed something
> > wouldn't it...
> 
> Looks good at first glance.  :)
> 

Dev 1: It passes all tests.
Dev 2: Ship it.
Users: *proceed to break it anyway*

- Oskari

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 16:06                   ` Matthew House
  2023-11-10 17:48                     ` Alejandro Colomar
@ 2023-11-11 20:55                     ` Jonny Grant
  2023-11-11 21:15                       ` Jonny Grant
  1 sibling, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-11 20:55 UTC (permalink / raw)
  To: Matthew House, Alejandro Colomar; +Cc: linux-man



On 10/11/2023 16:06, Matthew House wrote:
> On Thu, Nov 9, 2023 at 7:23 AM Alejandro Colomar <alx@kernel.org> wrote:
>>> So one can interpret strncpy(3) as copying a prefix of a character sequence
>>> into a buffer (and zero-filling the remainder), in which case you're
>>> correct that truncation cannot be detected. But the function is fomally
>>> defined as copying a prefix of a string into a buffer (and zero-filling the
>>> remainder), in which case the string has been truncated if the buffer
>>> doesn't end in a null byte afterward. It's just that one may not care about
>>> the terminating null byte being truncated if the user of the result just
>>> wants the initial character sequence.
>>
>> Yes, with the ISO C definition of strncpy(3), you can detect truncation.
>> The problem is that while my definition of it is complete, the
>> definition by ISO C makes it an incomplete function (to complete its
>> functionallity in copying strings, you need to add an explicit '\0'
>> after the call).  So I prefer mine, and for self-consistency, it can't
>> report truncation.
> 
> Personally, I'm a pragmatist, and I like to see it as kind of a duality: it
> can be used as part of a routine that copies part of a string and reports
> truncation, and it can also be used as a complete routine that copies part
> of a character sequence but can't report truncation. That reflects how it's
> used in practice. And it would hardly be the first such duality in C,
> either, given things like the fundamental practice of manipulating
> arbitrary objects as if they're character arrays.
> 
> (Some of these other dualities are similarly infamous in their room for
> error, e.g., forgetting to multiply by the element size when calling
> malloc(3), which I have often been guilty of myself. And still, a worrying
> amount of code neglects to test for multiplication overflow when doing
> this, even when the length comes from an untrusted source. Yet somehow I
> haven't seen any calls for a mallocarray(3) function to replace it. Ditto
> with memset(3), which can and has caused actual hard-to-notice bugs due to
> the first few elements looking correct even if the provided length is too
> short.)
> 
> But you're entitled to your opinion on how it ought to be best represented
> in the man page, as long as the immediate shortcoming of the function w.r.t
> producing strings is made very clear, even to readers who aren't in the
> habit of contemplating formal definitions. I'm satisfied by your patch in
> that regard.
> 
>>> That's a nice library that I didn't know about! Unfortunately, I don't
>>> think it's a very viable option for the long tail of small libraries I've
>>> referred to, which generally don't have any sub-dependencies of their own,
>>> apart from those provided by the platform.
>>>
>>> Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their
>>> configure scripts from whatever build system you're using (in such a way
>>> that libbsd can locate libmd), ensuring they're safe for cross-compilation
>>> if that's a goal, ensuring you bundle them in a way that respects their
>>> license terms, and ensuring that any user of your library links to the two
>>> dependencies and doesn't duplicate them. At that point, rolling your own
>>> strlcpy(3) equivalent definitely sounds like less mental load, at least to
>>> me.
>>
>> Yes, if you had 0 deps, it might be simpler to add your implementation.
>> Although it's a tricky function to implement, so I'd be careful.  If you
>> need to roll your own, I would go for a simpler function; maybe a
>> wrapper over strlen(3)+strcpy(3).
> 
> Such a wrapper would indeed be useful for detecting truncation, but a full
> strlcpy(3) equivalent would be necessary for permitting the truncation and
> continuing, which is the behavior of the majority of existing strncpy(3)-
> based code.
> 
> I don't deny that this truncation behavior is often done dubiously and
> rarely receives enough scrutiny, but a significant chunk of the uses really
> are just building an informative string which won't cause any harm if
> truncated, and installing additional control flow to handle truncation
> errors in places where there currently isn't any can introduce its own
> bugs.

Truncation seems risky, I can't think of many nice use-cases of truncation. Say it's a file path, truncation means the file path isn't accurate any more. Maybe a song title for a music player could be ok truncated, so just display first x characters of the song title etc. Doesn't feel great though. Maybe strings beyond an expected size, as a safety check. So a song title longer than the 255 bytes that the format allows could be truncated. (probably a missing NUL in the file, or a corrupt file)

>>> I didn't see this as an issue in practice when I was reviewing all those
>>> existing usages of strncpy(3). The vast majority were used in the midst of
>>> simple string manipulation, where the destination buffer starts as
>>> uninitialized or zeroed out, and ultimately gets passed into a user
>>> expecting an ordinary null-terminated string.
>>>
>>> (One exception was a few functions that used strncpy(dst, "", len) to zero
>>
>> Holy crap!  Didn't these programmers know bzero(3) or memset(3)?  :D

Perhaps that strncpy might get optimized out, if the memory modified isn't read again after memset(). So may need explicit_memset() for this situation.

>>> out the buffer, which is thankfully pretty obvious. Another exception was
>>> the functions that actually used strncpy(3) to produce a null-padded
>>> character sequence, e.g., when writing a value into a section of a binary.
>>> But in general, I found that it's usually not difficult to tell when a
>>> usage is being clever enough that the null padding might be significant.)
>>>
>>> In fact, the greater confusion came from the surprisingly common practice
>>> of using strncpy(3) like it's memcpy(3), by giving it the known length of
>>
>> It gets better!  :D
> 
> In all these cases, I think the function naming really is having somewhat
> of a psychological effect: the authors are wrangling with strthis(3) and
> strthat(3) for dozens of lines, so they'd find it scary to start mixing it
> up with mem*(3) functions ("I'm working with C strings, not with byte
> arrays!"), or perhaps they don't even consider it. They'd rather remain
> with strncpy(3), even when it means they have to manually append it with a
> null terminator or another string. But I'm no psychoanalyst, so take that
> with a big grain of salt.
> 
> (Meanwhile, in my own code, I try to work with pointer-and-length arrays
> whenever possible instead of fooling around with null terminators and all
> their off-by-one fun, so I've become leery of using any str*(3) functions
> apart from strlen(3) and strnlen(3).)

Do you mean passing a size_t around for the length of the src string? That saves needing to read memory counting bytes, which is a performance boost on big strings. Accessing memory to read or write unnecessarily is a performance drag.

I saw you mention off by 1 errors, I recall seeing some old code bases decades ago where they used to allocate an extra 2 bytes, just to avoid crashes in their buggy code, pretty bad stuff.

Kind regards
Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 22:14                                 ` Paul Eggert
@ 2023-11-11 21:13                                   ` Alejandro Colomar
  2023-11-11 22:20                                     ` Paul Eggert
  2023-11-12  9:52                                     ` Jonny Grant
  0 siblings, 2 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-11 21:13 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 6922 bytes --]

Hi Paul,

On Fri, Nov 10, 2023 at 02:14:13PM -0800, Paul Eggert wrote:
> On 2023-11-10 11:52, Alejandro Colomar wrote:
> 
> > Do you have any numbers?
> 
> It depends on size of course. With programs like 'tar' (one of the few
> programs that actually needs something like strncpy) the destination buffer
> is usually fairly small (32 bytes or less) though some of them are 100
> bytes. I used 16 bytes in the following shell transcript:
> 
> $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy strlcpy; do echo;
> echo $i:; time ./a.out 16 100000000 abcdefghijk $i; done
> 
> strnlen+strcpy:
> 
> real	0m0.411s
> user	0m0.411s
> sys	0m0.000s
> 
> strnlen+memcpy:
> 
> real	0m0.392s
> user	0m0.388s
> sys	0m0.004s
> 
> strncpy:
> 
> real	0m0.300s
> user	0m0.300s
> sys	0m0.000s
> 
> stpncpy:
> 
> real	0m0.326s
> user	0m0.326s
> sys	0m0.000s
> 
> strlcpy:
> 
> real	0m0.623s
> user	0m0.623s
> sys	0m0.000s
> 
> 
> ... where a.out was generated by compiling the attached program with gcc -O2
> on Ubuntu 23.10 64-bit on a Xeon W-1350.
> 
> I wouldn't take these numbers all that seriously, as microbenchmarks like
> these are not that informative these days. Still, for a typical case one
> should not assume strncpy must be slower merely because it has more work to
> do; quite the contrary.

Thanks for the benchmarck!  Yeah, I won't take it as the last word, but
it shows the growth order (and its cause) of the different alternatives.

I'd like to point out some curious things about it:

-  strnlen+strcpy is slower than strnlen+memcpy.

   The compiler has all the information necessary here, so I don't see
   why it's not optimizing out the strcpy(3) into a simple memcpy(3).
   AFAICS, it's a missed optimization.  Even with -O3, it misses the
   optimization.

-  strncpy is slower than stpncpy in my computer.

   stpncpy is in fact the fastest call in my computer.

   Was strncpy(3) optimized in a recent version of glibc that you have?
   I'm using Debian Sid on an underclocked i9-13900T.  Or is it maybe
   just luck?  I'm curious.

	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
		echo; echo $i:;
		time ./a.out 16 100000000 abcdefghijk $i;
	  done;

	strnlen+strcpy:

	real	0m0.188s
	user	0m0.184s
	sys	0m0.004s

	strnlen+memcpy:

	real	0m0.148s
	user	0m0.148s
	sys	0m0.000s

	strncpy:

	real	0m0.157s
	user	0m0.157s
	sys	0m0.000s

	stpncpy:

	real	0m0.135s
	user	0m0.135s
	sys	0m0.000s

	memccpy:

	real	0m0.208s
	user	0m0.208s
	sys	0m0.000s

	strlcpy:

	real	0m0.322s
	user	0m0.322s
	sys	0m0.000s

-  strlcpy(3) is very heavy.  Much more than I expected.  See some tests
   with larger strings.  The main growth of strlcpy(3) comes from slen.

	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
		echo; echo $i:;
		time ./a.out 64 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
	  done;

	strnlen+strcpy:

	real	0m0.242s
	user	0m0.242s
	sys	0m0.000s

	strnlen+memcpy:

	real	0m0.190s
	user	0m0.186s
	sys	0m0.004s

	strncpy:

	real	0m0.174s
	user	0m0.173s
	sys	0m0.000s

	stpncpy:

	real	0m0.170s
	user	0m0.166s
	sys	0m0.004s

	memccpy:

	real	0m0.253s
	user	0m0.249s
	sys	0m0.004s

	strlcpy:

	real	0m1.385s
	user	0m1.385s
	sys	0m0.000s

-  strncpy(3) also gets heavy compared to strnlen+memcpy.
   Considering how small the difference with memcpy is for small
   strings, I wouldn't recommend it instead of memcpy, except for
   micro-optimizations.  The main growth of strncpy(3) comes from dsize.

	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
		echo; echo $i:;
		time ./a.out 256 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
	  done;

	strnlen+strcpy:

	real	0m0.234s
	user	0m0.233s
	sys	0m0.001s

	strnlen+memcpy:

	real	0m0.192s
	user	0m0.192s
	sys	0m0.000s

	strncpy:

	real	0m0.268s
	user	0m0.268s
	sys	0m0.000s

	stpncpy:

	real	0m0.267s
	user	0m0.267s
	sys	0m0.000s

	memccpy:

	real	0m0.257s
	user	0m0.256s
	sys	0m0.001s

	strlcpy:

	real	0m1.574s
	user	0m1.574s
	sys	0m0.000s

	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
		echo; echo $i:;
		time ./a.out 4096 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
	  done;

	strnlen+strcpy:

	real	0m0.227s
	user	0m0.227s
	sys	0m0.000s

	strnlen+memcpy:

	real	0m0.190s
	user	0m0.190s
	sys	0m0.000s

	strncpy:

	real	0m1.400s
	user	0m1.399s
	sys	0m0.000s

	stpncpy:

	real	0m1.398s
	user	0m1.398s
	sys	0m0.000s

	memccpy:

	real	0m0.256s
	user	0m0.256s
	sys	0m0.000s

	strlcpy:

	real	0m1.184s
	user	0m1.184s
	sys	0m0.000s


-  strnlen(3)+memcpy(3) becomes the fastest when dsize grows a bit over
   a few hundred bytes, and is only a few 10%'s slower than the fastest
   for smaller buffers.

   It is also the most semantically correct (together with
   strnlen+strcpy), avoiding unnecessary dead code (padding).  This
   should get the main backing from the manual pages.

   However, it can be useful to document typical alternatives to prevent
   mistakes from users.  Especially, since some micro-optimizations may
   favor uses of strncpy(3).

Cheers,
Alex   

> #include <stdlib.h>
> #include <string.h>
> 
> 
> int
> main (int argc, char **argv)
> {
>   if (argc != 5)
>     return 2;
>   long bufsize = atol (argv[1]);
>   char *buf = malloc (bufsize);
>   long n = atol (argv[2]);
>   char const *a = argv[3];
>   if (strcmp (argv[4], "strnlen+strcpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	{
> 	  if (strnlen (a, bufsize) == bufsize)
> 	    return 1;
> 	  strcpy (buf, a);
> 	}
>     }
>   else if (strcmp (argv[4], "strnlen+memcpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	{
> 	  size_t alen = strnlen (a, bufsize);
> 	  if (alen == bufsize)
> 	    return 1;
> 	  memcpy (buf, a, alen + 1);
> 	}
>     }
>   else if (strcmp (argv[4], "strncpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	if (strncpy (buf, a, bufsize)[bufsize - 1])
> 	  return 1;
>     }
>   else if (strcmp (argv[4], "stpncpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	if (stpncpy (buf, a, bufsize) == buf + bufsize)
> 	  return 1;
>     }

I've added the following one for completeness.  Especially now that
it'll be in C2x.

  else if (strcmp (argv[4], "memccpy") == 0)
    {
      for (long i = 0; i < n; i++)
	if (memccpy (buf, a, 0, bufsize) == NULL)
	  return 1;
    }

>   else if (strcmp (argv[4], "strlcpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	if (strlcpy (buf, a, bufsize) == bufsize)

This should have been >= bufsize, right?

> 	  return 1;
>     }
>   else
>     return 2;
> }


-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-11 20:55                     ` Jonny Grant
@ 2023-11-11 21:15                       ` Jonny Grant
  2023-11-11 22:36                         ` Alejandro Colomar
  0 siblings, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-11 21:15 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: linux-man

Alejandro

I was reading again
https://man7.org/linux/man-pages/man7/string_copying.7.html

Sharing some comments, I realise not latest man page, if you have a new one online I could read that. I was reading man-pages 6.04, perhaps some already updated.


A) Could simplify and remove the "This function" and "These functions" that start each function description.

B) "RETURN VALUE" has the text before each function, rather than after as would be the convention from "DESCRIPTION", I suggest to move the return value text after each function name.

Could make it like https://man7.org/linux/man-pages/man3/string.3.html

C) In the examples, it's good stpecpy() checks for NULL pointers, the other's don't yet though.

D) strlcpy says
"These functions force a SIGSEGV if the src pointer is not a string."
How does it determine the pointer isn't a string?

E) Are these functions mentioned like ustpcpy() standardized by POSIX? or in use in a libc?

F) 
char *stpncpy(char dst[restrict .sz], const char *restrict src,
                      size_t sz);
I know the 'restrict' keyword, but haven't seen this way it attempts to specify the size of the 'dst' array by using the parameter 'sz' is this in wide use in APIs? I remember C11 let us specify  char ptr[static 1] to say the pointer must be at least 1 element in this example

Saw a few pages started to write out functions like
size_t strnlen(const char s[.maxlen], size_t maxlen);

Is this just for documentation? usually it would be: const char s[static maxlen]

G) "Because these functions ask for the length, and a string is by
nature composed of a character sequence of the same length plus a
terminating null byte, a string is also accepted as input."

I suggest to adjust the order so it doesn't start with a fragment:

"A string is also accepted as input, because these functions ask
for the length, and a string is by nature composed of a character
sequence of the same length plus a terminating null byte."

Could simplify and remove "by nature".

Unrelated man page strncpy, noticed this.

SEE ALSO
Could this refer to strcpy(3) and string(3) at the bottom?
https://man7.org/linux/man-pages/man3/strncpy.3.html

With kind regards
Jonny




^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-11 21:13                                   ` Alejandro Colomar
@ 2023-11-11 22:20                                     ` Paul Eggert
  2023-11-12  9:52                                     ` Jonny Grant
  1 sibling, 0 replies; 138+ messages in thread
From: Paul Eggert @ 2023-11-11 22:20 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library

On 2023-11-11 13:13, Alejandro Colomar wrote:
>     Was strncpy(3) optimized in a recent version of glibc that you have?

Ubuntu 23.10 currently uses glibc 2.38-1ubuntu6. Fortification is on by 
default, so __builtin___strncpy_chk is involved.

Again, I wouldn't take these numbers too seriously. It's just a 
microbenchmark.

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-11 21:15                       ` Jonny Grant
@ 2023-11-11 22:36                         ` Alejandro Colomar
  2023-11-11 23:19                           ` Alejandro Colomar
  2023-11-17 21:46                           ` Jonny Grant
  0 siblings, 2 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-11 22:36 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man

[-- Attachment #1: Type: text/plain, Size: 8669 bytes --]

Hi Jonny,

On Sat, Nov 11, 2023 at 09:15:12PM +0000, Jonny Grant wrote:
> Alejandro
> 
> I was reading again
> https://man7.org/linux/man-pages/man7/string_copying.7.html
> 
> Sharing some comments, I realise not latest man page, if you have a new one online I could read that. I was reading man-pages 6.04, perhaps some already updated.

You can check this one:

<https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/man-pages-6.05.01.pdf#string_copying_7>
also available here:
<https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages-6.05.01.pdf#string_copying_7>

And of course, you can install them from source, or read them from the
repository itself.

> A) Could simplify and remove the "This function" and "These functions" that start each function description.

Fixed; thanks.

<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=53ea8765ed7f9733abf96e86df89619dc3d203ef>

> 
> B) "RETURN VALUE" has the text before each function, rather than after as would be the convention from "DESCRIPTION", I suggest to move the return value text after each function name.

Fixed; thanks.

<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=76316bd6f98c58d70c2330f7d2a945aac7c76dd8>

> 
> Could make it like https://man7.org/linux/man-pages/man3/string.3.html
> 
> C) In the examples, it's good stpecpy() checks for NULL pointers, the other's don't yet though.

The reason is interesting.  I also designed a similar function based on
snprintf(3), which can be chained with this one.  Since that one can
return NULL, and to reduce the number of times one needs to check for
errors, I added the NULL check.

alx@debian:~/src/shadow/shadow/master$ grepc -tfd stpeprintf .
./lib/stpeprintf.h:inline char *
stpeprintf(char *dst, char *end, const char *restrict fmt, ...)
{
	char     *p;
	va_list  ap;

	va_start(ap, fmt);
	p = vstpeprintf(dst, end, fmt, ap);
	va_end(ap);

	return p;
}
alx@debian:~/src/shadow/shadow/master$ grepc -tfd vstpeprintf .
./lib/stpeprintf.h:inline char *
vstpeprintf(char *dst, char *end, const char *restrict fmt, va_list ap)
{
	int        len;
	ptrdiff_t  size;

	if (dst == end)
		return end;
	if (dst == NULL)
		return NULL;

	size = end - dst;
	len = vsnprintf(dst, size, fmt, ap);

	if (len == -1)
		return NULL;
	if (len >= size)
		return end;

	return dst + len;
}
alx@debian:~/src/shadow/shadow/master$ grepc -tfd stpecpy .
./lib/stpecpy.h:inline char *
stpecpy(char *dst, char *end, const char *restrict src)
{
	bool    trunc;
	char    *p;
	size_t  dsize, dlen, slen;

	if (dst == end)
		return end;
	if (dst == NULL)
		return NULL;

	dsize = end - dst;
	slen = strnlen(src, dsize);
	trunc = (slen == dsize);
	dlen = slen - trunc;

	p = mempcpy(dst, src, dlen);
	*p = '\0';

	return p + trunc;
}


Then you can use them like this:


	    end = buf + sizeof(buf);
            p = buf;
            p = stpecpy(p, end, "Hello ");
            p = stpeprintf(p, end, "%d realms", 9);
            p = stpecpy(p, end, "!");
            if (p == end) {
                p--;
                goto toolong;
            }
            len = p - buf;
            puts(buf);


Regarding other string-copying functions, NULL is not inherent to them,
so I'm not sure if they should have explicit NULL checks.  Why would
these functions receive a null pointer?  The main possibility is that
the programmer forgot to check some malloc(3) call, which should receive
a different treatment from a failed copy, normally.

> D) strlcpy says
> "These functions force a SIGSEGV if the src pointer is not a string."
> How does it determine the pointer isn't a string?

By calling strlen(src).  If it isn't a string, it'll continue reading,
and likely crash due to an unbound read.  However, the SIGSEGV isn't
guaranteed, since it may find a 0 well before crashing, so I removed
that text.  It is a feature and a bug of these functions: they can find
programming errors where one passes a character sequence where a string
is expected, and crash the program to nosily report the programmer
error.  But that also makes it very slow, as Paul said.

> 
> E) Are these functions mentioned like ustpcpy() standardized by POSIX? or in use in a libc?

No.  They are my inventions, like stpecpy().  It seems I forgot to add a
"This function is not provided by any library" in some of them.

Fixed; thanks.
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=9848ac50ceb6cc4d786b3899ee4626959e5f1d81>

> 
> F) 
> char *stpncpy(char dst[restrict .sz], const char *restrict src,
>                       size_t sz);
> I know the 'restrict' keyword, but haven't seen this way it attempts to specify the size of the 'dst' array by using the parameter 'sz' is this in wide use in APIs? I remember C11 let us specify  char ptr[static 1] to say the pointer must be at least 1 element in this example

It continues meaning the same thing.  If you use array notation, the
restrict must be placed inside the brackets.  The following two snippets
are equivalent C code:

	void foo(int *p, int *restrict x);
	void foo(int *p, int x[restrict 7]);

Since I didn't use 'static', to ISO C the array notation is ignored.
GCC, however, will be reasonable and understand it.  To GCC, there's not
much difference between the following:

	[[gnu::nonnull]]
	void bar(int x[7]);
	void bar(int x[static 7]);

And of course, you can combine static and restrict:

	void baz(int *p, int x[static restrict 7]);

> 
> Saw a few pages started to write out functions like
> size_t strnlen(const char s[.maxlen], size_t maxlen);
> 
> Is this just for documentation? usually it would be: const char s[static maxlen]

I don't like static for array parameters.  Specifying a size for a
parameter should similarly signify to the compiler that it should expect
no less than N elements.  This is how GCC behaves.

And static has another implication: nonnull.  IMO, nonnull is tangential
to array size, and should be specified separately with its own attribute
or qualifier.  I'd like to be able to specify the following different
cases:

	void f1(int [10]);  //  NULL, or array of size >= 10
	void f2(int [_Nonnull 10]);  // Array of size >=10

With static, I can only do the second.  Quite unreasonable.


Regarding the '.', consider the following two snippets:

	int size;  // This is the size of s[size].
	void g1(char s[size], size_t size);

You could be tricked to think that the size of s[] is the second
parameter to the function, but it's the global variable size.

	void g2(char s[size], size_t size);

Here's, since there's no global size, the code won't even compile.
There's no way to use a parameter that comes later as a size, conforming
to ISO C.  We were discussing this [.identifier] syntax in linux-man@
and gcc@, as a possible extension.  We haven't yet decided on it, but
I'm previewing it as a documentation extension for now.  The rationale
for the syntax comes from similarity with designated initializers for
structures.

> 
> G) "Because these functions ask for the length, and a string is by
> nature composed of a character sequence of the same length plus a
> terminating null byte, a string is also accepted as input."
> 
> I suggest to adjust the order so it doesn't start with a fragment:
> 
> "A string is also accepted as input, because these functions ask
> for the length, and a string is by nature composed of a character
> sequence of the same length plus a terminating null byte."
> 
> Could simplify and remove "by nature".

Yep; thanks.
<https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=78b2ff8c6f25654648f0fa06c310b87a7e49128e>

> 
> Unrelated man page strncpy, noticed this.
> 
> SEE ALSO
> Could this refer to strcpy(3) and string(3) at the bottom?
> https://man7.org/linux/man-pages/man3/strncpy.3.html

I removed it on purpose, because I intended to put some distance between
strncpy(3), and strings and string-copying functions like strcpy(3).

That's why I point to string_copying(7), where readers should be
educated of all of the differences.  Then, string_copying(7) has a more
complete SEE ALSO, because it has already detailed all the different
functions, and the reader is ready to read the individual pages.

Kind regards,
Alex

> 
> With kind regards
> Jonny
> 
> 
> 

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-11 22:36                         ` Alejandro Colomar
@ 2023-11-11 23:19                           ` Alejandro Colomar
  2023-11-17 21:46                           ` Jonny Grant
  1 sibling, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-11 23:19 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man

[-- Attachment #1: Type: text/plain, Size: 9355 bytes --]

On Sat, Nov 11, 2023 at 11:36:09PM +0100, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Sat, Nov 11, 2023 at 09:15:12PM +0000, Jonny Grant wrote:
> > Alejandro
> > 
> > I was reading again
> > https://man7.org/linux/man-pages/man7/string_copying.7.html
> > 
> > Sharing some comments, I realise not latest man page, if you have a new one online I could read that. I was reading man-pages 6.04, perhaps some already updated.
> 
> You can check this one:
> 
> <https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/man-pages-6.05.01.pdf#string_copying_7>
> also available here:
> <https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages-6.05.01.pdf#string_copying_7>
> 
> And of course, you can install them from source, or read them from the
> repository itself.
> 
> > A) Could simplify and remove the "This function" and "These functions" that start each function description.
> 
> Fixed; thanks.
> 
> <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=53ea8765ed7f9733abf96e86df89619dc3d203ef>
> 
> > 
> > B) "RETURN VALUE" has the text before each function, rather than after as would be the convention from "DESCRIPTION", I suggest to move the return value text after each function name.
> 
> Fixed; thanks.
> 
> <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=76316bd6f98c58d70c2330f7d2a945aac7c76dd8>
> 
> > 
> > Could make it like https://man7.org/linux/man-pages/man3/string.3.html
> > 
> > C) In the examples, it's good stpecpy() checks for NULL pointers, the other's don't yet though.
> 
> The reason is interesting.  I also designed a similar function based on
> snprintf(3), which can be chained with this one.  Since that one can
> return NULL, and to reduce the number of times one needs to check for
> errors, I added the NULL check.
> 
> alx@debian:~/src/shadow/shadow/master$ grepc -tfd stpeprintf .
> ./lib/stpeprintf.h:inline char *
> stpeprintf(char *dst, char *end, const char *restrict fmt, ...)
> {
> 	char     *p;
> 	va_list  ap;
> 
> 	va_start(ap, fmt);
> 	p = vstpeprintf(dst, end, fmt, ap);
> 	va_end(ap);
> 
> 	return p;
> }
> alx@debian:~/src/shadow/shadow/master$ grepc -tfd vstpeprintf .
> ./lib/stpeprintf.h:inline char *
> vstpeprintf(char *dst, char *end, const char *restrict fmt, va_list ap)
> {
> 	int        len;
> 	ptrdiff_t  size;
> 
> 	if (dst == end)
> 		return end;
> 	if (dst == NULL)
> 		return NULL;
> 
> 	size = end - dst;
> 	len = vsnprintf(dst, size, fmt, ap);
> 
> 	if (len == -1)
> 		return NULL;
> 	if (len >= size)
> 		return end;
> 
> 	return dst + len;
> }
> alx@debian:~/src/shadow/shadow/master$ grepc -tfd stpecpy .
> ./lib/stpecpy.h:inline char *
> stpecpy(char *dst, char *end, const char *restrict src)
> {
> 	bool    trunc;
> 	char    *p;
> 	size_t  dsize, dlen, slen;
> 
> 	if (dst == end)
> 		return end;
> 	if (dst == NULL)
> 		return NULL;
> 
> 	dsize = end - dst;
> 	slen = strnlen(src, dsize);
> 	trunc = (slen == dsize);
> 	dlen = slen - trunc;
> 
> 	p = mempcpy(dst, src, dlen);
> 	*p = '\0';
> 
> 	return p + trunc;
> }
> 
> 
> Then you can use them like this:
> 
> 
> 	    end = buf + sizeof(buf);
>             p = buf;
>             p = stpecpy(p, end, "Hello ");
>             p = stpeprintf(p, end, "%d realms", 9);
>             p = stpecpy(p, end, "!");

Oops, missing NULL check:

		if (p == NULL)
			goto fail;

>             if (p == end) {
>                 p--;
>                 goto toolong;
>             }
>             len = p - buf;
>             puts(buf);
> 
> 
> Regarding other string-copying functions, NULL is not inherent to them,
> so I'm not sure if they should have explicit NULL checks.  Why would
> these functions receive a null pointer?  The main possibility is that
> the programmer forgot to check some malloc(3) call, which should receive
> a different treatment from a failed copy, normally.
> 
> > D) strlcpy says
> > "These functions force a SIGSEGV if the src pointer is not a string."
> > How does it determine the pointer isn't a string?
> 
> By calling strlen(src).  If it isn't a string, it'll continue reading,
> and likely crash due to an unbound read.  However, the SIGSEGV isn't
> guaranteed, since it may find a 0 well before crashing, so I removed
> that text.  It is a feature and a bug of these functions: they can find
> programming errors where one passes a character sequence where a string
> is expected, and crash the program to nosily report the programmer
> error.  But that also makes it very slow, as Paul said.
> 
> > 
> > E) Are these functions mentioned like ustpcpy() standardized by POSIX? or in use in a libc?
> 
> No.  They are my inventions, like stpecpy().  It seems I forgot to add a
> "This function is not provided by any library" in some of them.
> 
> Fixed; thanks.
> <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=9848ac50ceb6cc4d786b3899ee4626959e5f1d81>
> 
> > 
> > F) 
> > char *stpncpy(char dst[restrict .sz], const char *restrict src,
> >                       size_t sz);
> > I know the 'restrict' keyword, but haven't seen this way it attempts to specify the size of the 'dst' array by using the parameter 'sz' is this in wide use in APIs? I remember C11 let us specify  char ptr[static 1] to say the pointer must be at least 1 element in this example
> 
> It continues meaning the same thing.  If you use array notation, the
> restrict must be placed inside the brackets.  The following two snippets
> are equivalent C code:
> 
> 	void foo(int *p, int *restrict x);
> 	void foo(int *p, int x[restrict 7]);
> 
> Since I didn't use 'static', to ISO C the array notation is ignored.
> GCC, however, will be reasonable and understand it.  To GCC, there's not
> much difference between the following:
> 
> 	[[gnu::nonnull]]
> 	void bar(int x[7]);
> 	void bar(int x[static 7]);
> 
> And of course, you can combine static and restrict:
> 
> 	void baz(int *p, int x[static restrict 7]);
> 
> > 
> > Saw a few pages started to write out functions like
> > size_t strnlen(const char s[.maxlen], size_t maxlen);
> > 
> > Is this just for documentation? usually it would be: const char s[static maxlen]
> 
> I don't like static for array parameters.  Specifying a size for a
> parameter should similarly signify to the compiler that it should expect
> no less than N elements.  This is how GCC behaves.
> 
> And static has another implication: nonnull.  IMO, nonnull is tangential
> to array size, and should be specified separately with its own attribute
> or qualifier.  I'd like to be able to specify the following different
> cases:
> 
> 	void f1(int [10]);  //  NULL, or array of size >= 10
> 	void f2(int [_Nonnull 10]);  // Array of size >=10
> 
> With static, I can only do the second.  Quite unreasonable.
> 
> 
> Regarding the '.', consider the following two snippets:
> 
> 	int size;  // This is the size of s[size].
> 	void g1(char s[size], size_t size);
> 
> You could be tricked to think that the size of s[] is the second
> parameter to the function, but it's the global variable size.
> 
> 	void g2(char s[size], size_t size);
> 
> Here's, since there's no global size, the code won't even compile.
> There's no way to use a parameter that comes later as a size, conforming
> to ISO C.  We were discussing this [.identifier] syntax in linux-man@
> and gcc@, as a possible extension.  We haven't yet decided on it, but
> I'm previewing it as a documentation extension for now.  The rationale
> for the syntax comes from similarity with designated initializers for
> structures.
> 
> > 
> > G) "Because these functions ask for the length, and a string is by
> > nature composed of a character sequence of the same length plus a
> > terminating null byte, a string is also accepted as input."
> > 
> > I suggest to adjust the order so it doesn't start with a fragment:
> > 
> > "A string is also accepted as input, because these functions ask
> > for the length, and a string is by nature composed of a character
> > sequence of the same length plus a terminating null byte."
> > 
> > Could simplify and remove "by nature".
> 
> Yep; thanks.
> <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=78b2ff8c6f25654648f0fa06c310b87a7e49128e>
> 
> > 
> > Unrelated man page strncpy, noticed this.
> > 
> > SEE ALSO
> > Could this refer to strcpy(3) and string(3) at the bottom?
> > https://man7.org/linux/man-pages/man3/strncpy.3.html
> 
> I removed it on purpose, because I intended to put some distance between
> strncpy(3), and strings and string-copying functions like strcpy(3).
> 
> That's why I point to string_copying(7), where readers should be
> educated of all of the differences.  Then, string_copying(7) has a more
> complete SEE ALSO, because it has already detailed all the different
> functions, and the reader is ready to read the individual pages.
> 
> Kind regards,
> Alex
> 
> > 
> > With kind regards
> > Jonny
> > 
> > 
> > 
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 0/2] Expand BUGS section of string_copying(7).
  2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant
  2023-11-04 19:33 ` Alejandro Colomar
@ 2023-11-12  9:17 ` Alejandro Colomar
  2023-11-12  9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-12  9:17 UTC (permalink / raw)
  To: linux-man; +Cc: Alejandro Colomar, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 462 bytes --]

Hi,

After Paul showing important problems of strlcpy(3) (and strlcat(3)),
I've written something in string_copying(7)'s BUGS to warn against them.

Cheers,
Alex

Alejandro Colomar (2):
  string_copying.7: BUGS: *cat(3) functions aren't always bad
  string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance
    problems

 man7/string_copying.7 | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad
  2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant
  2023-11-04 19:33 ` Alejandro Colomar
  2023-11-12  9:17 ` [PATCH 0/2] Expand BUGS section of string_copying(7) Alejandro Colomar
@ 2023-11-12  9:18 ` Alejandro Colomar
  2023-11-12  9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-12  9:18 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, Paul Eggert, Jonny Grant,
	DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk,
	Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
	Carlos O'Donell, Xi Ruoyao, Stefan Puiu, Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]

The compiler will sometimes optimize them to normal *cpy(3) functions,
since the length of dst is usually known, if the previous *cpy(3) is
visible to the compiler.  And they provide for cleaner code.  If you
know that they'll get optimized, you could use them.

Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man7/string_copying.7 | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/man7/string_copying.7 b/man7/string_copying.7
index 1637ebc91..0254fbba6 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -592,8 +592,14 @@ .SH BUGS
 All catenation functions share the same performance problem:
 .UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
 Shlemiel the painter
 .UE .
+As a mitigation,
+compilers are able to transform some calls to catenation functions
+into normal copy functions,
+since
+.I strlen(dst)
+is usually a byproduct of the previous copy.
 .\" ----- EXAMPLES :: -------------------------------------------------/
 .SH EXAMPLES
 The following are examples of correct use of each of these functions.
 .\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems
  2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant
                   ` (2 preceding siblings ...)
  2023-11-12  9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
@ 2023-11-12  9:18 ` Alejandro Colomar
  2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-12  9:18 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, Paul Eggert, Jonny Grant,
	DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk,
	Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
	Carlos O'Donell, Xi Ruoyao, Stefan Puiu, Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 2593 bytes --]

Also point to BUGS from other sections that talk about these functions.

These functions are doomed due to the design decision of mirroring
snprintf(3)'s return value.  They must return strlen(src), which makes
them terribly slow, and vulnerable to DoS if an attacker can control
strlen(src).

A better design would have been to return -1 when truncating.

Reported-by: Paul Eggert <eggert@cs.ucla.edu>
Cc: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man7/string_copying.7 | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/man7/string_copying.7 b/man7/string_copying.7
index 0254fbba6..cb3910db0 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -226,9 +226,9 @@ .SS Truncate or not?
 .IP \[bu]
 .BR strlcpy (3bsd)
 and
 .BR strlcat (3bsd)
-are similar, but less efficient when chained.
+are similar, but have important performance problems; see BUGS.
 .IP \[bu]
 .BR stpncpy (3)
 and
 .BR strncpy (3)
@@ -417,8 +417,10 @@ .SS Functions
 the resulting string is truncated
 (but it is guaranteed to be null-terminated).
 They return the length of the total string they tried to create.
 .IP
+Check BUGS before using these functions.
+.IP
 .BR stpecpy (3)
 is a simpler alternative to these functions.
 .\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
 .TP
@@ -598,8 +600,22 @@ .SH BUGS
 into normal copy functions,
 since
 .I strlen(dst)
 is usually a byproduct of the previous copy.
+.P
+.BR strlcpy (3)
+and
+.BR strlcat (3)
+need to read the entire
+.I src
+string,
+even if the destination buffer is small.
+This makes them vulnerable to Denial of Service (DoS) attacks
+if an attacker can control the length of the
+.I src
+string.
+And if not,
+they're still unnecessarily slow.
 .\" ----- EXAMPLES :: -------------------------------------------------/
 .SH EXAMPLES
 The following are examples of correct use of each of these functions.
 .\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-11 21:13                                   ` Alejandro Colomar
  2023-11-11 22:20                                     ` Paul Eggert
@ 2023-11-12  9:52                                     ` Jonny Grant
  2023-11-12 10:59                                       ` Alejandro Colomar
  1 sibling, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-12  9:52 UTC (permalink / raw)
  To: Alejandro Colomar, Paul Eggert; +Cc: Matthew House, linux-man, GNU C Library



On 11/11/2023 21:13, Alejandro Colomar wrote:
> Hi Paul,
> 
> On Fri, Nov 10, 2023 at 02:14:13PM -0800, Paul Eggert wrote:
>> On 2023-11-10 11:52, Alejandro Colomar wrote:
>>
>>> Do you have any numbers?
>>
>> It depends on size of course. With programs like 'tar' (one of the few
>> programs that actually needs something like strncpy) the destination buffer
>> is usually fairly small (32 bytes or less) though some of them are 100
>> bytes. I used 16 bytes in the following shell transcript:
>>
>> $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy strlcpy; do echo;
>> echo $i:; time ./a.out 16 100000000 abcdefghijk $i; done
>>
>> strnlen+strcpy:
>>
>> real	0m0.411s
>> user	0m0.411s
>> sys	0m0.000s
>>
>> strnlen+memcpy:
>>
>> real	0m0.392s
>> user	0m0.388s
>> sys	0m0.004s
>>
>> strncpy:
>>
>> real	0m0.300s
>> user	0m0.300s
>> sys	0m0.000s
>>
>> stpncpy:
>>
>> real	0m0.326s
>> user	0m0.326s
>> sys	0m0.000s
>>
>> strlcpy:
>>
>> real	0m0.623s
>> user	0m0.623s
>> sys	0m0.000s
>>
>>
>> ... where a.out was generated by compiling the attached program with gcc -O2
>> on Ubuntu 23.10 64-bit on a Xeon W-1350.
>>
>> I wouldn't take these numbers all that seriously, as microbenchmarks like
>> these are not that informative these days. Still, for a typical case one
>> should not assume strncpy must be slower merely because it has more work to
>> do; quite the contrary.
> 
> Thanks for the benchmarck!  Yeah, I won't take it as the last word, but
> it shows the growth order (and its cause) of the different alternatives.
> 
> I'd like to point out some curious things about it:
> 
> -  strnlen+strcpy is slower than strnlen+memcpy.
> 
>    The compiler has all the information necessary here, so I don't see
>    why it's not optimizing out the strcpy(3) into a simple memcpy(3).
>    AFAICS, it's a missed optimization.  Even with -O3, it misses the
>    optimization.
> 
> -  strncpy is slower than stpncpy in my computer.
> 
>    stpncpy is in fact the fastest call in my computer.
> 
>    Was strncpy(3) optimized in a recent version of glibc that you have?
>    I'm using Debian Sid on an underclocked i9-13900T.  Or is it maybe
>    just luck?  I'm curious.
> 
> 	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
> 		echo; echo $i:;
> 		time ./a.out 16 100000000 abcdefghijk $i;
> 	  done;
> 
> 	strnlen+strcpy:
> 
> 	real	0m0.188s
> 	user	0m0.184s
> 	sys	0m0.004s
> 
> 	strnlen+memcpy:
> 
> 	real	0m0.148s
> 	user	0m0.148s
> 	sys	0m0.000s
> 
> 	strncpy:
> 
> 	real	0m0.157s
> 	user	0m0.157s
> 	sys	0m0.000s
> 
> 	stpncpy:
> 
> 	real	0m0.135s
> 	user	0m0.135s
> 	sys	0m0.000s
> 
> 	memccpy:
> 
> 	real	0m0.208s
> 	user	0m0.208s
> 	sys	0m0.000s
> 
> 	strlcpy:
> 
> 	real	0m0.322s
> 	user	0m0.322s
> 	sys	0m0.000s
> 
> -  strlcpy(3) is very heavy.  Much more than I expected.  See some tests
>    with larger strings.  The main growth of strlcpy(3) comes from slen.
> 
> 	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
> 		echo; echo $i:;
> 		time ./a.out 64 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
> 	  done;
> 
> 	strnlen+strcpy:
> 
> 	real	0m0.242s
> 	user	0m0.242s
> 	sys	0m0.000s
> 
> 	strnlen+memcpy:
> 
> 	real	0m0.190s
> 	user	0m0.186s
> 	sys	0m0.004s
> 
> 	strncpy:
> 
> 	real	0m0.174s
> 	user	0m0.173s
> 	sys	0m0.000s
> 
> 	stpncpy:
> 
> 	real	0m0.170s
> 	user	0m0.166s
> 	sys	0m0.004s
> 
> 	memccpy:
> 
> 	real	0m0.253s
> 	user	0m0.249s
> 	sys	0m0.004s
> 
> 	strlcpy:
> 
> 	real	0m1.385s
> 	user	0m1.385s
> 	sys	0m0.000s
> 
> -  strncpy(3) also gets heavy compared to strnlen+memcpy.
>    Considering how small the difference with memcpy is for small
>    strings, I wouldn't recommend it instead of memcpy, except for
>    micro-optimizations.  The main growth of strncpy(3) comes from dsize.
> 
> 	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
> 		echo; echo $i:;
> 		time ./a.out 256 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
> 	  done;
> 
> 	strnlen+strcpy:
> 
> 	real	0m0.234s
> 	user	0m0.233s
> 	sys	0m0.001s
> 
> 	strnlen+memcpy:
> 
> 	real	0m0.192s
> 	user	0m0.192s
> 	sys	0m0.000s
> 
> 	strncpy:
> 
> 	real	0m0.268s
> 	user	0m0.268s
> 	sys	0m0.000s
> 
> 	stpncpy:
> 
> 	real	0m0.267s
> 	user	0m0.267s
> 	sys	0m0.000s
> 
> 	memccpy:
> 
> 	real	0m0.257s
> 	user	0m0.256s
> 	sys	0m0.001s
> 
> 	strlcpy:
> 
> 	real	0m1.574s
> 	user	0m1.574s
> 	sys	0m0.000s
> 
> 	$ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do
> 		echo; echo $i:;
> 		time ./a.out 4096 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i;
> 	  done;
> 
> 	strnlen+strcpy:
> 
> 	real	0m0.227s
> 	user	0m0.227s
> 	sys	0m0.000s
> 
> 	strnlen+memcpy:
> 
> 	real	0m0.190s
> 	user	0m0.190s
> 	sys	0m0.000s
> 
> 	strncpy:
> 
> 	real	0m1.400s
> 	user	0m1.399s
> 	sys	0m0.000s
> 
> 	stpncpy:
> 
> 	real	0m1.398s
> 	user	0m1.398s
> 	sys	0m0.000s
> 
> 	memccpy:
> 
> 	real	0m0.256s
> 	user	0m0.256s
> 	sys	0m0.000s
> 
> 	strlcpy:
> 
> 	real	0m1.184s
> 	user	0m1.184s
> 	sys	0m0.000s
> 
> 
> -  strnlen(3)+memcpy(3) becomes the fastest when dsize grows a bit over
>    a few hundred bytes, and is only a few 10%'s slower than the fastest
>    for smaller buffers.
> 
>    It is also the most semantically correct (together with
>    strnlen+strcpy), avoiding unnecessary dead code (padding).  This
>    should get the main backing from the manual pages.
> 
>    However, it can be useful to document typical alternatives to prevent
>    mistakes from users.  Especially, since some micro-optimizations may
>    favor uses of strncpy(3).
> 
> Cheers,
> Alex   
> 
>> #include <stdlib.h>
>> #include <string.h>
>>
>>
>> int
>> main (int argc, char **argv)
>> {
>>   if (argc != 5)
>>     return 2;
>>   long bufsize = atol (argv[1]);
>>   char *buf = malloc (bufsize);
>>   long n = atol (argv[2]);
>>   char const *a = argv[3];
>>   if (strcmp (argv[4], "strnlen+strcpy") == 0)
>>     {
>>       for (long i = 0; i < n; i++)
>> 	{
>> 	  if (strnlen (a, bufsize) == bufsize)
>> 	    return 1;
>> 	  strcpy (buf, a);
>> 	}
>>     }
>>   else if (strcmp (argv[4], "strnlen+memcpy") == 0)
>>     {
>>       for (long i = 0; i < n; i++)
>> 	{
>> 	  size_t alen = strnlen (a, bufsize);
>> 	  if (alen == bufsize)
>> 	    return 1;
>> 	  memcpy (buf, a, alen + 1);
>> 	}
>>     }
>>   else if (strcmp (argv[4], "strncpy") == 0)
>>     {
>>       for (long i = 0; i < n; i++)
>> 	if (strncpy (buf, a, bufsize)[bufsize - 1])
>> 	  return 1;
>>     }
>>   else if (strcmp (argv[4], "stpncpy") == 0)
>>     {
>>       for (long i = 0; i < n; i++)
>> 	if (stpncpy (buf, a, bufsize) == buf + bufsize)
>> 	  return 1;
>>     }
> 
> I've added the following one for completeness.  Especially now that
> it'll be in C2x.
> 
>   else if (strcmp (argv[4], "memccpy") == 0)
>     {
>       for (long i = 0; i < n; i++)
> 	if (memccpy (buf, a, 0, bufsize) == NULL)
> 	  return 1;
>     }
> 
>>   else if (strcmp (argv[4], "strlcpy") == 0)
>>     {
>>       for (long i = 0; i < n; i++)
>> 	if (strlcpy (buf, a, bufsize) == bufsize)
> 
> This should have been >= bufsize, right?
> 
>> 	  return 1;
>>     }
>>   else
>>     return 2;
>> }
> 
> 

Maybe we're gonna need a bigger benchmark.

Probably there existing studies. Or could patch something like SQLite Benchmark to utilise each string function just for measurements. Hopefully it moves around at least 2GB of strings to give some meaningful comparison timings.

As Paul mentioned, strlcpy is a poor choice for processing strings. Could rely on their guidance as they already measured.
https://www.gnu.org/software/libc/manual/html_node/Truncating-Strings.html

Maybe the strlcpy API is easier, safer for programmers; but the compiler can't figure out that the programmer already knew src string length. So the strlcpy does a strlen() and wastes time reading over memory. If the src length is known, can just memcpy.


When I've benchmarked things, reducing the memory accesses for read, write boosted performance, also looked at the cycles taken, of course cache and alignment all play a part too.

Maybe could suggest in your man page programmers should keep track of the src size ? - to save the cost of the strlen().

At least the strlen functions are optimized:
glibc/strnlen.c calls memchr() searching for '\0' memchr searches 4 bytes at a time.
glibc/strlen.c searches 4 bytes at a time.

glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice?

memcpy (dest, src, size);
dest[size - 1] = '\0';

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-12  9:52                                     ` Jonny Grant
@ 2023-11-12 10:59                                       ` Alejandro Colomar
  2023-11-12 20:49                                         ` Paul Eggert
  2023-11-17 21:57                                         ` Jonny Grant
  0 siblings, 2 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-12 10:59 UTC (permalink / raw)
  To: Jonny Grant; +Cc: Paul Eggert, Matthew House, linux-man, GNU C Library

[-- Attachment #1: Type: text/plain, Size: 4572 bytes --]

Hi Jonny,

On Sun, Nov 12, 2023 at 09:52:20AM +0000, Jonny Grant wrote:
[... some micro-benchmarks...]

> 
> Maybe we're gonna need a bigger benchmark.

Not really.

> 
> Probably there existing studies. Or could patch something like SQLite
> Benchmark to utilise each string function just for measurements.
> Hopefully it moves around at least 2GB of strings to give some
> meaningful comparison timings.

I wasn't so interested in the small differences between functions.
What this micro-benchmark showed clearly, without needing much more info
to be conclusive, is the first order of growth of each of the functions:

-  strlcpy(3)'s first order growth corresponds to strlen(src).  That's
   due to returning strlen(src), which proves to be a poor API.

-  strncpy(3)'s first order growth corresponds to sizeof(dst).  That's
   of course due to the zeroing.  If sizeof(dst) is kept very small, you
   could live with it.  When the size grows to more or less 4 KiB, this
   drag becomes meaningful.

-  strnlen(3)+*cpy() first order growth corresponds to
   strnlen(src, sizeof(dst)), which is the fastest order of growth
   you can get from a truncating string-copying function (except if you
   keep track of your slen manually and call directly memcpy(3)).

Of course, first order of growth ignores second order of growth and so
on, which for small inputs can be important.  That is, O(x^3) is bigger
than O(x^2), but x3 + x2 can be smaller than 5*x2 for small x.

> 
> As Paul mentioned, strlcpy is a poor choice for processing strings.\
> Could rely on their guidance as they already measured.
> https://www.gnu.org/software/libc/manual/html_node/Truncating-Strings.html

Indeed.  I've added important notices in BUGS about it, and recommended
against.

> 
> Maybe the strlcpy API is easier, safer for programmers; but the
> compiler can't figure out that the programmer already knew src string
> length.  So the strlcpy does a strlen() and wastes time reading over
> memory.  If the src length is known, can just memcpy.

I've written strtcpy(3) as an alternative to strlcpy(3) that doesn't
suffer its problems.  It should be even safer and easier to use, and its
first order of growth is better.  I'll send a patch for review in a
moment.

> When I've benchmarked things, reducing the memory accesses for read,
> write boosted performance, also looked at the cycles taken, of course
> cache and alignment all play a part too.

If one wants to micro-optimize for their use case, its none of my
business.  I provide a function that should be safe and relatively fast
for all use cases, which libc doesn't.

> Maybe could suggest in your man page programmers should keep track of
> the src size ? - to save the cost of the strlen().

No.  Optimizations are not my business.  Writing good APIs should make
these optimizations low value so that they aren't done, except for the
most performance-critical programs.

The problem comes when libc doesn't provide anything usable, and the
user has no guidance on where to start.  Then, programmers start being
clever, usually too clever.  That's why I think the man-pages should go
ahead and write wrapper functions such as strtcpy() and stpecpy()
aound libc functions; these wrappers should provide a fast and safe
starting point for most programs.

It's true that memcpy(3) is the fastest function one can use, but it
requires the programmer to be rather careful with the lengths of the
strings.  I don't think keeping track of all those little details is
what the common programmer should do.

> 
> At least the strlen functions are optimized:
> glibc/strnlen.c calls memchr() searching for '\0' memchr searches 4 bytes at a time.
> glibc/strlen.c searches 4 bytes at a time.
> 
> glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice?
> 
> memcpy (dest, src, size);
> dest[size - 1] = '\0';

-1's in the source code make up for off-by-one bugs.  APIs should be
written so that common use doesn't involve manually writing -1 if
possible.

I acknowledge the performance benefits of this construction, and have
used it myself in NGINX code, but I also find it very dangerous, which
is why I recommend using a wrapper over it:

	char *
	ustr2stp(char *restrict dst, const char *restrict src, size_t len)
	{
		char  *p;

		p = mempcpy(dst, src, len);
		*p = '\0';

		return p;
	}

Cheers,
Alex

> 
> Kind regards, Jonny

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v2 0/3] Improve string_copying(7)
  2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant
                   ` (3 preceding siblings ...)
  2023-11-12  9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
@ 2023-11-12 11:26 ` Alejandro Colomar
  2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-12 11:26 UTC (permalink / raw)
  To: linux-man, Guillem Jover
  Cc: Alejandro Colomar, libc-alpha, Paul Eggert, Jonny Grant,
	DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk,
	Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson,
	Carlos O'Donell, Xi Ruoyao, Stefan Puiu, Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 907 bytes --]


Hi,

v3:

-  Patches 1/3 and 2/3 are identical to v2, except that I CCd libbsd's
   maintainer (Guillem) in 2/3 so he's aware that we're documenting BUGS
   for strlcpy(3).  Since the strlcpy(3bsd) manual page is part of
   libbsd, it may be interesting to also add a BUGS section in that
   page.

-  Add 3/3, which adds strtcpy(3), a function almost identical to
   strscpy(9), and very similar to strlcpy(3), which doesn't share its
   bugs.

Cheers,
Alex

Alejandro Colomar (3):
  string_copying.7: BUGS: *cat(3) functions aren't always bad
  string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance
    problems
  strtcpy.3, string_copying.7: Add strtcpy(3)

 man3/strtcpy.3        |   1 +
 man7/string_copying.7 | 121 +++++++++++++++++++++++++++++++-----------
 2 files changed, 92 insertions(+), 30 deletions(-)
 create mode 100644 man3/strtcpy.3

-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad
  2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant
                   ` (4 preceding siblings ...)
  2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar
@ 2023-11-12 11:26 ` Alejandro Colomar
  2023-11-17 21:43   ` Jonny Grant
  2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
  2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar
  7 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-12 11:26 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, Guillem Jover, Paul Eggert,
	Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen,
	Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
	G. Branden Robinson, Carlos O'Donell, Xi Ruoyao, Stefan Puiu,
	Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]

The compiler will sometimes optimize them to normal *cpy(3) functions,
since the length of dst is usually known, if the previous *cpy(3) is
visible to the compiler.  And they provide for cleaner code.  If you
know that they'll get optimized, you could use them.

Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man7/string_copying.7 | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/man7/string_copying.7 b/man7/string_copying.7
index 1637ebc91..0254fbba6 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -592,8 +592,14 @@ .SH BUGS
 All catenation functions share the same performance problem:
 .UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/
 Shlemiel the painter
 .UE .
+As a mitigation,
+compilers are able to transform some calls to catenation functions
+into normal copy functions,
+since
+.I strlen(dst)
+is usually a byproduct of the previous copy.
 .\" ----- EXAMPLES :: -------------------------------------------------/
 .SH EXAMPLES
 The following are examples of correct use of each of these functions.
 .\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems
  2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant
                   ` (5 preceding siblings ...)
  2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
@ 2023-11-12 11:26 ` Alejandro Colomar
  2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar
  7 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-12 11:26 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, Guillem Jover, Paul Eggert,
	Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen,
	Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
	G. Branden Robinson, Carlos O'Donell, Xi Ruoyao, Stefan Puiu,
	Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 2634 bytes --]

Also point to BUGS from other sections that talk about these functions.

These functions are doomed due to the design decision of mirroring
snprintf(3)'s return value.  They must return strlen(src), which makes
them terribly slow, and vulnerable to DoS if an attacker can control
strlen(src).

A better design would have been to return -1 when truncating.

Reported-by: Paul Eggert <eggert@cs.ucla.edu>
Cc: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Guillem Jover <guillem@hadrons.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man7/string_copying.7 | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/man7/string_copying.7 b/man7/string_copying.7
index 0254fbba6..cb3910db0 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -226,9 +226,9 @@ .SS Truncate or not?
 .IP \[bu]
 .BR strlcpy (3bsd)
 and
 .BR strlcat (3bsd)
-are similar, but less efficient when chained.
+are similar, but have important performance problems; see BUGS.
 .IP \[bu]
 .BR stpncpy (3)
 and
 .BR strncpy (3)
@@ -417,8 +417,10 @@ .SS Functions
 the resulting string is truncated
 (but it is guaranteed to be null-terminated).
 They return the length of the total string they tried to create.
 .IP
+Check BUGS before using these functions.
+.IP
 .BR stpecpy (3)
 is a simpler alternative to these functions.
 .\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
 .TP
@@ -598,8 +600,22 @@ .SH BUGS
 into normal copy functions,
 since
 .I strlen(dst)
 is usually a byproduct of the previous copy.
+.P
+.BR strlcpy (3)
+and
+.BR strlcat (3)
+need to read the entire
+.I src
+string,
+even if the destination buffer is small.
+This makes them vulnerable to Denial of Service (DoS) attacks
+if an attacker can control the length of the
+.I src
+string.
+And if not,
+they're still unnecessarily slow.
 .\" ----- EXAMPLES :: -------------------------------------------------/
 .SH EXAMPLES
 The following are examples of correct use of each of these functions.
 .\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3)
  2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant
                   ` (6 preceding siblings ...)
  2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
@ 2023-11-12 11:27 ` Alejandro Colomar
  7 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-12 11:27 UTC (permalink / raw)
  To: linux-man
  Cc: Alejandro Colomar, libc-alpha, Guillem Jover, Paul Eggert,
	Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen,
	Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg,
	G. Branden Robinson, Carlos O'Donell, Xi Ruoyao, Stefan Puiu,
	Andreas Schwab

[-- Attachment #1: Type: text/plain, Size: 7496 bytes --]

Add this new truncating string-copying function.  It intends to fully
replace strlcpy(3), which has important bugs (documented in the
preceeding commit).

It is almost identical to Linux kernel's strscpy(9), so reduce the
documentation of strscpy(9) in this page to the minimum, giving
preference to strtcpy(3).  Provide a reference implementation, since no
libc provides it.

Providing an easy, safe, and relatively fast truncating string-copying
function should prevent users from rolling their own, in which they
might introduce bugs accidentally.  We already made enough mistakes
while discussing these functions, so it's certainly not something that
should be written often.

Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Jonny Grant <jg@jguk.org>
Cc: DJ Delorie <dj@redhat.com>
Cc: Matthew House <mattlloydhouse@gmail.com>
Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com>
Cc: Thorsten Kukuk <kukuk@suse.com>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Zack Weinberg <zack@owlfolio.org>
Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com>
Cc: Carlos O'Donell <carlos@redhat.com>
Cc: Xi Ruoyao <xry111@xry111.site>
Cc: Stefan Puiu <stefan.puiu@gmail.com>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Guillem Jover <guillem@hadrons.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>
---
 man3/strtcpy.3        |  1 +
 man7/string_copying.7 | 97 ++++++++++++++++++++++++++++++-------------
 2 files changed, 69 insertions(+), 29 deletions(-)
 create mode 100644 man3/strtcpy.3

diff --git a/man3/strtcpy.3 b/man3/strtcpy.3
new file mode 100644
index 000000000..beb850746
--- /dev/null
+++ b/man3/strtcpy.3
@@ -0,0 +1 @@
+.so man7/string_copying.7
diff --git a/man7/string_copying.7 b/man7/string_copying.7
index cb3910db0..4f609e480 100644
--- a/man7/string_copying.7
+++ b/man7/string_copying.7
@@ -6,8 +6,9 @@
 .\" ----- NAME :: -----------------------------------------------------/
 .SH NAME
 stpcpy,
 strcpy, strcat,
+strtcpy,
 stpecpy,
 strlcpy, strlcat,
 stpncpy,
 strncpy,
@@ -30,8 +31,11 @@ .SS Strings
 // Chain-copy a string with truncation.
 .BI "char *stpecpy(char *" dst ", char " end "[0], const char *restrict " src );
 .P
 // Copy/catenate a string with truncation.
+.BI "size_t strtcpy(char " dst "[restrict ." sz "], \
+const char *restrict " src ,
+.BI "               size_t " sz );
 .BI "size_t strlcpy(char " dst "[restrict ." sz "], \
 const char *restrict " src ,
 .BI "               size_t " sz );
 .BI "size_t strlcat(char " dst "[restrict ." sz "], \
@@ -220,10 +224,10 @@ .SS Truncate or not?
 .P
 Functions that truncate:
 .IP \[bu] 3
 .BR stpecpy (3)
-is the most efficient string copy function that performs truncation.
-It only requires to check for truncation once after all chained calls.
+.IP \[bu]
+.BR strtcpy (3)
 .IP \[bu]
 .BR strlcpy (3bsd)
 and
 .BR strlcat (3bsd)
@@ -326,8 +330,10 @@ .SS String vs character sequence
 .IP \[bu]
 .BR strcpy (3),
 .BR strcat (3)
 .IP \[bu]
+.BR strtcpy (3)
+.IP \[bu]
 .BR stpecpy (3)
 .IP \[bu]
 .BR strlcpy (3bsd),
 .BR strlcat (3bsd)
@@ -390,12 +396,24 @@ .SS Functions
 The return value is useless.
 .IP
 .BR stpcpy (3)
 is a faster alternative to these functions.
+.\" ----- DESCRIPTION :: Functions :: strtcpy(3) ----------------------/
+.TP
+.BR strtcpy (3)
+Copy the input string into a destination string.
+If the destination buffer isn't large enough to hold the copy,
+the resulting string is truncated
+(but it is guaranteed to be null-terminated).
+It returns the length of the string,
+or \-1 if it truncated.
+.IP
+This function is not provided by any library;
+see EXAMPLES for a reference implementation.
 .\" ----- DESCRIPTION :: Functions :: stpecpy(3) ----------------------/
 .TP
 .BR stpecpy (3)
-Copy the input string into a destination string.
+Chain-copy the input string into a destination string.
 If the destination buffer,
 limited by a pointer to its end,
 isn't large enough to hold the copy,
 the resulting string is truncated
@@ -419,10 +437,12 @@ .SS Functions
 They return the length of the total string they tried to create.
 .IP
 Check BUGS before using these functions.
 .IP
+.BR strtcpy (3)
+and
 .BR stpecpy (3)
-is a simpler alternative to these functions.
+are better alternatives to these functions.
 .\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/
 .TP
 .BR stpncpy (3)
 Copy the input string into
@@ -542,8 +562,17 @@ .SH RETURN VALUE
 .BR ustpcpy (3)
 A pointer to one after the last character
 in the destination character sequence.
 .TP
+.BR strtcpy (3)
+The length of the string.
+When truncation occurs, it returns \-1.
+When
+.I dsize
+is
+.BR 0 ,
+it also returns \-1.
+.TP
 .BR strlcpy (3bsd)
 .TQ
 .BR strlcat (3bsd)
 The length of the total string that they tried to create
@@ -562,25 +591,14 @@ .SH RETURN VALUE
 which is useless.
 .\" ----- NOTES :: strscpy(9) -----------------------------------------/
 .SH NOTES
 The Linux kernel has an internal function for copying strings,
-which is similar to
-.BR stpecpy (3),
-except that it can't be chained:
-.TP
-.BR strscpy (9)
-Copy the input string into a destination string.
-If the destination buffer,
-limited by its size,
-isn't large enough to hold the copy,
-the resulting string is truncated
-(but it is guaranteed to be null-terminated).
-It returns the length of the destination string, or
+.BR strscpy (9),
+which is identical to
+.BR strtcpy (3),
+except that it returns
 .B \-E2BIG
-on truncation.
-.IP
-.BR stpecpy (3)
-is a simpler and faster alternative to this function.
+instead of \-1.
 .\" ----- CAVEATS :: --------------------------------------------------/
 .SH CAVEATS
 Don't mix chain calls to truncating and non-truncating functions.
 It is conceptually wrong
@@ -640,8 +658,17 @@ .SH EXAMPLES
 strcat(buf, "!");
 len = strlen(buf);
 puts(buf);
 .EE
+.\" ----- EXAMPLES :: strtcpy(3) --------------------------------------/
+.TP
+.BR strtcpy (3)
+.EX
+len = strtcpy(buf, "Hello world!", sizeof(buf));
+if (len == \-1)
+    goto toolong;
+puts(buf);
+.EE
 .\" ----- EXAMPLES :: stpecpy(3) --------------------------------------/
 .TP
 .BR stpecpy (3)
 .EX
@@ -671,17 +698,8 @@ .SH EXAMPLES
 if (len >= sizeof(buf))
     goto toolong;
 puts(buf);
 .EE
-.\" ----- EXAMPLES :: strscpy(9) --------------------------------------/
-.TP
-.BR strscpy (9)
-.EX
-len = strscpy(buf, "Hello world!", sizeof(buf));
-if (len == \-E2BIG)
-    goto toolong;
-puts(buf);
-.EE
 .\" ----- EXAMPLES :: stpncpy(3) --------------------------------------/
 .TP
 .BR stpncpy (3)
 .EX
@@ -765,8 +783,29 @@ .SS Implementations
 .in +4n
 .EX
 /* This code is in the public domain. */
 \&
+.\" ----- EXAMPLES :: Implementations :: strtcpy(3) -------------------/
+ssize_t
+.IR strtcpy "(char *restrict dst, const char *restrict src, size_t sz)"
+{
+    bool    trunc;
+    char    *p;
+    size_t  dlen, slen;
+\&
+    if (dsize == 0)
+        return \-1;
+\&
+    slen = strnlen(src, dsize);
+    trunc = (slen == dsize);
+    dlen = slen \- trunc;
+\&
+    p = mempcpy(dst, src, dlen);
+    *p = \[aq]\e0\[aq];
+
+    return trunc ? \-1 : slen;
+}
+\&
 .\" ----- EXAMPLES :: Implementations :: stpecpy(3) -------------------/
 char *
 .IR stpecpy "(char *dst, char end[0], const char *restrict src)"
 {
-- 
2.42.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-12 10:59                                       ` Alejandro Colomar
@ 2023-11-12 20:49                                         ` Paul Eggert
  2023-11-12 21:00                                           ` Alejandro Colomar
  2023-11-13 23:46                                           ` Jonny Grant
  2023-11-17 21:57                                         ` Jonny Grant
  1 sibling, 2 replies; 138+ messages in thread
From: Paul Eggert @ 2023-11-12 20:49 UTC (permalink / raw)
  To: Alejandro Colomar, Jonny Grant; +Cc: Matthew House, linux-man

[dropping libc-alpha since this is only about the man pages]

On 2023-11-12 02:59, Alejandro Colomar wrote:

> I think the man-pages should go
> ahead and write wrapper functions such as strtcpy() and stpecpy()
> aound libc functions; these wrappers should provide a fast and safe
> starting point for most programs.

It's OK for man pages to give these in EXAMPLES sections. However, the 
man pages currently go too far in this direction. Currently, if I type 
"man stpecpy", I get a man page with a synopsis and it looks to me like 
glibc supports stpecpy(3) just like it supports stpcpy(3). But glibc 
doesn't do that, as stpecpy is merely a man-pages invention: although 
the source code for stpecpy is in the EXAMPLES section of 
string_copying(7), you can't use stpecpy in an app without 
copy-and-pasting the man page's source into your code.

It's not just stepecpy. For example, there is no ustr2stp function in 
glibc, but "man ustr2stp" acts as if there is one.

The man pages should describe the library that exists, not the library 
that some of us would rather have.


> It's true that memcpy(3) is the fastest function one can use, but it
> requires the programmer to be rather careful with the lengths of the
> strings.  I don't think keeping track of all those little details is
> what the common programmer should do.

Unfortunately, C is not designed for string use that's that convenient. 
If you want safe and efficient use of possibly-long C strings, keeping 
track of lengths is generally the best way to do it.


>> glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice?
>>
>> memcpy (dest, src, size);
>> dest[size - 1] = '\0';
> 
> -1's in the source code make up for off-by-one bugs.

The "dest[size - 1] = '\0';" is there because strlcpy(dst, src, sz) is 
defined to null-terminate the result if sz!=0, so that particular "-1" 
isn't a bug. (Perhaps you meant that the strlcpy spec itself is buggy? 
It wasn't clear to me.)

That "last byte, twice" question is: why is the last argument to memcpy 
"size" and not "size - 1" which would be equally correct? The answer is 
performance: memcpy often works faster when copying a number of bytes 
that is a multiple of a smallish power of two, and "size" is more likely 
than "size - 1" to be such a multiple.


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-12 20:49                                         ` Paul Eggert
@ 2023-11-12 21:00                                           ` Alejandro Colomar
  2023-11-12 21:45                                             ` Alejandro Colomar
  2023-11-13 23:46                                           ` Jonny Grant
  1 sibling, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-12 21:00 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man

[-- Attachment #1: Type: text/plain, Size: 3346 bytes --]

On Sun, Nov 12, 2023 at 12:49:44PM -0800, Paul Eggert wrote:
> [dropping libc-alpha since this is only about the man pages]
> 
> On 2023-11-12 02:59, Alejandro Colomar wrote:
> 
> > I think the man-pages should go
> > ahead and write wrapper functions such as strtcpy() and stpecpy()
> > aound libc functions; these wrappers should provide a fast and safe
> > starting point for most programs.
> 
> It's OK for man pages to give these in EXAMPLES sections. However, the man
> pages currently go too far in this direction. Currently, if I type "man
> stpecpy", I get a man page with a synopsis and it looks to me like glibc
> supports stpecpy(3) just like it supports stpcpy(3). But glibc doesn't do
> that, as stpecpy is merely a man-pages invention: although the source code
> for stpecpy is in the EXAMPLES section of string_copying(7), you can't use
> stpecpy in an app without copy-and-pasting the man page's source into your
> code.
> 
> It's not just stepecpy. For example, there is no ustr2stp function in glibc,
> but "man ustr2stp" acts as if there is one.

Yeah, I've thought of removing those links.  Will do it.

> 
> The man pages should describe the library that exists, not the library that
> some of us would rather have.
> 
> 
> > It's true that memcpy(3) is the fastest function one can use, but it
> > requires the programmer to be rather careful with the lengths of the
> > strings.  I don't think keeping track of all those little details is
> > what the common programmer should do.
> 
> Unfortunately, C is not designed for string use that's that convenient. If
> you want safe and efficient use of possibly-long C strings, keeping track of
> lengths is generally the best way to do it.
> 
> 
> > > glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice?
> > > 
> > > memcpy (dest, src, size);
> > > dest[size - 1] = '\0';
> > 
> > -1's in the source code make up for off-by-one bugs.
> 
> The "dest[size - 1] = '\0';" is there because strlcpy(dst, src, sz) is
> defined to null-terminate the result if sz!=0, so that particular "-1" isn't
> a bug. (Perhaps you meant that the strlcpy spec itself is buggy? It wasn't
> clear to me.)

I didn't mean this code has a bug.  I meant that writing this code all
the time is prone to bugs, because one may forget the -1 in some of the
cases.

And yes, the strlcpy(3) spec is buggy in that it forces a pattern that
is prone to off-by-one bugs: to check for truncation, one must use '>=',
which one may mistype as '>' (or even '==').  It would have been much
better to return -1 on truncation, to have a simple == -1 check as most
libc functions.

Any function that requires writing hundreds of 'size - 1', or hundreds
of '>=' should at least be wrapped.  If that use is the only intended
use of the function (as is of snprintf(3) and strlcpy(3)), it's a bad
API.

Cheers,
Alex

> 
> That "last byte, twice" question is: why is the last argument to memcpy
> "size" and not "size - 1" which would be equally correct? The answer is
> performance: memcpy often works faster when copying a number of bytes that
> is a multiple of a smallish power of two, and "size" is more likely than
> "size - 1" to be such a multiple.
> 

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-12 21:00                                           ` Alejandro Colomar
@ 2023-11-12 21:45                                             ` Alejandro Colomar
  0 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-12 21:45 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man

[-- Attachment #1: Type: text/plain, Size: 4011 bytes --]

On Sun, Nov 12, 2023 at 10:00:06PM +0100, Alejandro Colomar wrote:
> On Sun, Nov 12, 2023 at 12:49:44PM -0800, Paul Eggert wrote:
> > [dropping libc-alpha since this is only about the man pages]
> > 
> > On 2023-11-12 02:59, Alejandro Colomar wrote:
> > 
> > > I think the man-pages should go
> > > ahead and write wrapper functions such as strtcpy() and stpecpy()
> > > aound libc functions; these wrappers should provide a fast and safe
> > > starting point for most programs.
> > 
> > It's OK for man pages to give these in EXAMPLES sections. However, the man
> > pages currently go too far in this direction. Currently, if I type "man
> > stpecpy", I get a man page with a synopsis and it looks to me like glibc
> > supports stpecpy(3) just like it supports stpcpy(3). But glibc doesn't do
> > that, as stpecpy is merely a man-pages invention: although the source code
> > for stpecpy is in the EXAMPLES section of string_copying(7), you can't use
> > stpecpy in an app without copy-and-pasting the man page's source into your
> > code.
> > 
> > It's not just stepecpy. For example, there is no ustr2stp function in glibc,
> > but "man ustr2stp" acts as if there is one.
> 
> Yeah, I've thought of removing those links.  Will do it.
> 
> > 
> > The man pages should describe the library that exists, not the library that
> > some of us would rather have.
> > 
> > 
> > > It's true that memcpy(3) is the fastest function one can use, but it
> > > requires the programmer to be rather careful with the lengths of the
> > > strings.  I don't think keeping track of all those little details is
> > > what the common programmer should do.
> > 
> > Unfortunately, C is not designed for string use that's that convenient. If
> > you want safe and efficient use of possibly-long C strings, keeping track of
> > lengths is generally the best way to do it.
> > 
> > 
> > > > glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice?
> > > > 
> > > > memcpy (dest, src, size);
> > > > dest[size - 1] = '\0';
> > > 
> > > -1's in the source code make up for off-by-one bugs.
> > 
> > The "dest[size - 1] = '\0';" is there because strlcpy(dst, src, sz) is
> > defined to null-terminate the result if sz!=0, so that particular "-1" isn't
> > a bug. (Perhaps you meant that the strlcpy spec itself is buggy? It wasn't
> > clear to me.)
> 
> I didn't mean this code has a bug.  I meant that writing this code all
> the time is prone to bugs, because one may forget the -1 in some of the
> cases.


Ahh, I hadn't noticed that was part of the implementation of strlcpy(3).
I though it was some pattern showing how to use memcpy(3) to copy
strings.  I was saying that such a pattern would be a bad thing to write
all the time.

But yeah, inside strlcpy(3) it's fine, and I don't think strlcpy(3) is
bad in that regard.  The only problem I see in strlcpy(3) is the return
value.

> 
> And yes, the strlcpy(3) spec is buggy in that it forces a pattern that
> is prone to off-by-one bugs: to check for truncation, one must use '>=',
> which one may mistype as '>' (or even '==').  It would have been much
> better to return -1 on truncation, to have a simple == -1 check as most
> libc functions.
> 
> Any function that requires writing hundreds of 'size - 1', or hundreds
> of '>=' should at least be wrapped.  If that use is the only intended
> use of the function (as is of snprintf(3) and strlcpy(3)), it's a bad
> API.
> 
> Cheers,
> Alex
> 
> > 
> > That "last byte, twice" question is: why is the last argument to memcpy
> > "size" and not "size - 1" which would be equally correct? The answer is
> > performance: memcpy often works faster when copying a number of bytes that
> > is a multiple of a smallish power of two, and "size" is more likely than
> > "size - 1" to be such a multiple.
> > 
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 17:48                     ` Alejandro Colomar
@ 2023-11-13 15:01                       ` Matthew House
  0 siblings, 0 replies; 138+ messages in thread
From: Matthew House @ 2023-11-13 15:01 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jonny Grant, linux-man

On Fri, Nov 10, 2023 at 12:48 PM Alejandro Colomar <alx@kernel.org> wrote:
> On Fri, Nov 10, 2023 at 11:06:00AM -0500, Matthew House wrote:
> > As I learned, the typical use of strcpy(3) (at least 80% of uses in my
> > estimation) is actually copying a string into a new buffer, not an existing
> > buffer. And that does need a +1 to calculate a size to pass to the
> > allocation function, and usually a lot more +s if it's going to be
>
> If you strcpy(3) to a new buffer, you'd usually strdup(3), no?  Unless
> it's part of a larger object.

Indeed, it's part of a larger object much more often than not. Empirically,
the idiomatic pattern in the strcpy(3)-using codebases I checked is stuff
like:

    char *key = ..., *value = ...;
    size_t dsize = strlen(key) + 1;
    if (value) dsize += 2 + strlen(value);
    char *dst = malloc(dsize);
    if (!dst) return NULL;
    strcpy(dst, key);
    if (value != NULL) {
        strcat(dst, ", ");
        strcat(dst, value);
    }
    return dst;

(Or similar with sprintf(3), when the sequence is fixed at compile time.
Combinations of strcat(3) and sprintf(3) are also common.)

And even in the case of only copying a single string, strdup(3) is not an
option for any codebase using functions other than malloc(3)/free(3) for
allocation; they either have to write a custom wrapper (very rare in
practice), or just allocate strlen(src) + 1 bytes inline and strcpy(3) it,
as the limiting case of the general strcpy(3)/strcat(3) pattern.

Also, strdup(3) isn't in current ISO C (yes, I know it's in C23, but I'm
still a pessimist), so it isn't directly portable to Windows' CRT without
conditionally #defining it as an alias of _strdup(), which probably scares
off a few potential users.

> > Relatedly, as I also learned from all the manual strdup(3)-like snippets
> > that use a custom allocator, the typical library author is deathly allergic
> > to writing a custom wrapper over anything that isn't an allocation
> > function; they'll repeat the entirety of the logic inline as many times as
> > it takes. So I don't buy that most people would be replacing numerous calls
> > to strncpy(3) with calls to a unified wrapper function that can be
> > inspected and fixed all in one place, as you seem to suggest in your later
> > email.
>
> I try to avoid cowboy programmers, but we know it's impossible.  I just
> do what I can.  But cowboy programmers will nevertheless continue to
> exist and negate reality.
>
> <https://github.com/nginx/unit/issues/795>
> <https://github.com/nginx/unit/issues/804>
> <https://github.com/nginx/unit/issues/923>
>
> The responses from a programmer from nginx are gems, doubting that UB is
> a problem, or even suggesting implementing a cosmetic patch instead of
> fixing an API.  You can read those links if you want some fun.

I don't deny that 'cowboy programmers' who disregard the formal rules in
favor of their own mental models, then blame the compiler devs, standards
authors, et al. if they ever get bitten, can be a real problem in the C
community, and targeting their specific preferences isn't always practical.
(Some of them still do have valid points, though; it's not an axiom that
all instances of UB or unspecified behavior currently in the standards are
necessarily a net good, as a few of the cowboys' opponents seem to
overzealously imply.)

But I also don't think that the very common preference for repeatedly
inlining code over writing a custom wrapper can simply be brushed off as
solely being held by such careless programmers. I can think of at least a
couple scenarios where it can make some sense even for careful programmers.

First, many teams writing libraries don't have much in the way of coherent
top-down control over the general layout of the codebase: every programmer
works primarily on their own functionality, while trying not to trample
over the work of their peers. So it can be especially difficult to set up a
central file of utility functions and keep them fully stable and available.
Instead, if a programmer just sticks purely to the platform-provided
functions, they have the assurance of fully-consistent behavior, at the
cost of the mental overhead of correctly writing patterns on top of them.

Second, some code is optimized for being very literally reused, by directly
transplanting functions from one project to another. For instance, CPython
has a few files transplanted from other FOSS libraries in this way, used as
fallbacks for mostly-but-not-entirely-portable APIs. But if such code
referred to project-specific wrappers, then all the wrappers would have to
be copied as well to get everything to work; thus, it's again valuable to
stick to common platform APIs.

More generally, if strncpy(3), short of being deprecated, became (e.g.)
strongly discouraged and heavily linted against in clang-tidy and the big
IDEs, to the point that library authors are pushed to git rid of it one way
or another, then I'd expect to see many more inlined memcpy(3)-based
replacements than foolproof wrappers. And even if some of them can be
blamed on 'cowboy programmers', inlined patterns represent enough of the
general codebase that we'd all have to read it anyway, which is not
something I'd prefer over working through strncpy(3)'s faults.

> > Also, the typical use of strncpy(3) by far is to allow a truncated string
> > rather than raising an error on truncation, and in that use case, it makes
> > no difference whether or not the size inside the strncpy(3) call has a -1.
>
> True; that's a benign off-by-one cancer.  But still a cancer.

I don't see it that way. Both versions make some amount of logical sense.
With a -1 inside the strncpy(3) call, you're taking a raw prefix of a
string, then appending a null terminator to the prefix in case it doesn't
have one. Without a -1 inside a strncpy(3) call, you're again taking a raw
prefix, then truncating again by one more byte to ensure that a null
terminator is present.

The only real question is the size that it really ought to be truncated to
(assuming that truncation makes sense in the first place), but usually
that's just "whatever size fills as much of the buffer as possible".

> > Certainly, it can be quite a task to figure out whether the fields are
> > actually read, if the API is poorly specified; without going through its
> > entire implementation, any of those "unused" fields could be copied around
> > or compared before being discarded, making it dangerous to leave them
> > uninitialized. But need we add a comment to every one of those memset(3)
> > calls, "I'm unsure whether this zeroing is significant at all"? Perhaps
> > such a comment might be helpful, if there really is reason to suspect that
> > the API is nefarious, but I've hardly ever seen stuff like that in
> > practice.
>
> Maybe it's because in the code I've worked with, there were actual calls
> to strncpy(3) where the zeroing matters, and they're disguised between
> other strncpy(3) calls, which make it all a funny amusement park.
>
> If you _only_ use strings, and wrap strncpy(3) in a wrapper that
> protects against off-by-ones, it would be acceptable, I must say.  It's
> just that I don't find that code when I see strncpy(3) calls.  Maybe I
> don't look at the right code bases.

My condolences! But yeah, basically all codebases I've ever looked at,
including the ones I reviewed for typical strncpy(3) usage, really do tend
to use plain, ordinary C strings all the way; some even have comments
reminding not to depend on strncpy(3)'s zero-padding, on account of a few
misbehaving implementations without it. I recall seeing one library a while
back that zero-padded all strings up to a multiple of 8 bytes for SIMD
purposes, but IIRC, that one used entirely custom functions for string
manipulation, and limited use of standard functions to reading the strings.

> > If forces you to do extra work, the same way strcpy(3) forces you to do
> > extra work.
>
> strncpy(3) still requires you to know your buffer sizes.  So any dangers
> of strcpy(3) in that regard should be shared by strncpy(3).  No?

What I was trying to say with my whole anti-strcpy(3) diatribe is, it's a
very good thing that strncpy(3) requires you to know your buffer size!
strcpy(3), strcat(3), and sprintf(3) share the danger that you can use them
*even without knowing your buffer size* and putting in the extra work.
Thus, library authors can and have frequently written clever things like

    void write_string_to_buffer(char *buf, const char *key, int value) {
        sprintf(buf, "%s, %d\n", key, value);
    }

where the required buffer size is known neither to the caller nor the
callee; callers just all coincidentally happen to use a large-enough
buffer, even though the requirement isn't documented anywhere. And with
enough callers, it becomes very likely to mess this up somewhere and
actually expose a buffer overwrite, as I mentioned a few times in my list.

Meanwhile, with strncpy(3), which requires the destination size to be set
in stone, the only dangers are memcpy(3)-like uses where it turns out the
source string isn't always long enough; truncating uses where truncation
is logically inappropriate, or where the string is truncated too far;
truncation-detecting uses where some source strings are needlessly
rejected; cleverness in deciding when to append the null terminator; normal
off-by-one errors; and, of course, your fear of secret reliance on the zero
padding.

Most of these are strictly local dangers, that can be diagnosed mainly by
looking at the call to strncpy(3) and the immediate use of the destination
buffer. The only exceptions are certain memcpy(3)-like uses, which can rely
on the code that's creating the source string to make it long enough, and
secret reliance on zero padding, which appears rare to me in practice.

But strcpy(3)'s biggest and most frequent danger is a global danger that
necessarily forces you to scour the codebase to track down all the callers
and make sure that the source ultimately fits in the destination. And many
codebases even consider this perfectly legitimate, e.g., by having some
common INTERNAL_BUFFER_SIZE that they implicitly expect the source string
to adhere to! That's why I say that strcpy(3)'s dangers are not really
shared by strncpy(3).

Thank you,
Matthew House

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-12 20:49                                         ` Paul Eggert
  2023-11-12 21:00                                           ` Alejandro Colomar
@ 2023-11-13 23:46                                           ` Jonny Grant
  1 sibling, 0 replies; 138+ messages in thread
From: Jonny Grant @ 2023-11-13 23:46 UTC (permalink / raw)
  To: Paul Eggert, Alejandro Colomar; +Cc: Matthew House, linux-man



On 12/11/2023 20:49, Paul Eggert wrote:
> [dropping libc-alpha since this is only about the man pages]
> 
> On 2023-11-12 02:59, Alejandro Colomar wrote:
> 
>> I think the man-pages should go
>> ahead and write wrapper functions such as strtcpy() and stpecpy()
>> aound libc functions; these wrappers should provide a fast and safe
>> starting point for most programs.
> 
> It's OK for man pages to give these in EXAMPLES sections. However, the man pages currently go too far in this direction. Currently, if I type "man stpecpy", I get a man page with a synopsis and it looks to me like glibc supports stpecpy(3) just like it supports stpcpy(3). But glibc doesn't do that, as stpecpy is merely a man-pages invention: although the source code for stpecpy is in the EXAMPLES section of string_copying(7), you can't use stpecpy in an app without copy-and-pasting the man page's source into your code.
> 
> It's not just stepecpy. For example, there is no ustr2stp function in glibc, but "man ustr2stp" acts as if there is one.
> 
> The man pages should describe the library that exists, not the library that some of us would rather have.
> 
> 
>> It's true that memcpy(3) is the fastest function one can use, but it
>> requires the programmer to be rather careful with the lengths of the
>> strings.  I don't think keeping track of all those little details is
>> what the common programmer should do.
> 
> Unfortunately, C is not designed for string use that's that convenient. If you want safe and efficient use of possibly-long C strings, keeping track of lengths is generally the best way to do it.
> 
> 
>>> glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice?
>>>
>>> memcpy (dest, src, size);
>>> dest[size - 1] = '\0';
>>
>> -1's in the source code make up for off-by-one bugs.
> 
> The "dest[size - 1] = '\0';" is there because strlcpy(dst, src, sz) is defined to null-terminate the result if sz!=0, so that particular "-1" isn't a bug. (Perhaps you meant that the strlcpy spec itself is buggy? It wasn't clear to me.)
> 
> That "last byte, twice" question is: why is the last argument to memcpy "size" and not "size - 1" which would be equally correct? The answer is performance: memcpy often works faster when copying a number of bytes that is a multiple of a smallish power of two, and "size" is more likely than "size - 1" to be such a multiple.
> 

Thank you for your reply. I see what you mean, many programmers consider sizes and would make their dest buffer say 32 bytes, so when this truncation occurs it makes sense to make the most of that to copy quickly, even if that means writing the null terminator on top of the last written byte. Probably someone measured strlcpy with these truncation calls and saw a lot of convenient power of 2 sizes coming through, when truncating strings in this way.

Personally, I'm not sure if it is much use when strings are truncated, as strlcpy detects, an API like this could just return an error and not partially copy. Then the programmer would have a chance to realloc() and copy the full string. 

The strlcpy API returns src_length, even when it's truncated and didn't write src_length+1 bytes to dest, how misleading. Shame strlcpy can't be [[deprecated]].

I'm sure everyone may have read these posts before about strlcpy, just sharing while I remember:

Ulrich Drepper frowned upon strlcpy:
https://sourceware.org/legacy-ml/libc-alpha/2000-08/msg00053.html

"This is horribly inefficient BSD crap.  Using these function only
leads to other errors.  Correct string handling means that you always
know how long your strings are and therefore you can you memcpy
(instead of strcpy).

Beside, those who are using strcat or variants deserved to be punished."

The rest of the thread is also interesting.

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad
  2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
@ 2023-11-17 21:43   ` Jonny Grant
  2023-11-18  0:25     ` Signing all patches and email to this list Matthew House
  0 siblings, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-17 21:43 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Paul Eggert, linux-man


On 12/11/2023 11:26, Alejandro Colomar wrote:
> The compiler will sometimes optimize them to normal *cpy(3) functions,
> since the length of dst is usually known, if the previous *cpy(3) is
> visible to the compiler.  And they provide for cleaner code.  If you
> know that they'll get optimized, you could use them.

May I ask, is there an example or document that shows this optimization by the compiler? Perhaps a godbolt link?

So it's a strcat() optimized to a strcpy()?

I know gcc might unroll and just include the values of the string bytes.

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-11 22:36                         ` Alejandro Colomar
  2023-11-11 23:19                           ` Alejandro Colomar
@ 2023-11-17 21:46                           ` Jonny Grant
  2023-11-18  9:37                             ` PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) Alejandro Colomar
  2023-11-18  9:44                             ` NULL safety " Alejandro Colomar
  1 sibling, 2 replies; 138+ messages in thread
From: Jonny Grant @ 2023-11-17 21:46 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: linux-man

Thank you for your swift replies Alejandro and incorporating changes.

On 11/11/2023 22:36, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Sat, Nov 11, 2023 at 09:15:12PM +0000, Jonny Grant wrote:
>> Alejandro
>>
>> I was reading again
>> https://man7.org/linux/man-pages/man7/string_copying.7.html
>>
>> Sharing some comments, I realise not latest man page, if you have a new one online I could read that. I was reading man-pages 6.04, perhaps some already updated.
> 
> You can check this one:
> 
> <https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/man-pages-6.05.01.pdf#string_copying_7>
> also available here:
> <https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages-6.05.01.pdf#string_copying_7>
> 
> And of course, you can install them from source, or read them from the
> repository itself.

That's good if you have your online PDF version of unreleased versions I could read through.
 
>> A) Could simplify and remove the "This function" and "These functions" that start each function description.
> 
> Fixed; thanks.
> 
> <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=53ea8765ed7f9733abf96e86df89619dc3d203ef>
> 
>>
>> B) "RETURN VALUE" has the text before each function, rather than after as would be the convention from "DESCRIPTION", I suggest to move the return value text after each function name.
> 
> Fixed; thanks.
> 
> <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=76316bd6f98c58d70c2330f7d2a945aac7c76dd8>
> 
>>
>> Could make it like https://man7.org/linux/man-pages/man3/string.3.html
>>
>> C) In the examples, it's good stpecpy() checks for NULL pointers, the other's don't yet though.
> 
> The reason is interesting.  I also designed a similar function based on
> snprintf(3), which can be chained with this one.  Since that one can
> return NULL, and to reduce the number of times one needs to check for
> errors, I added the NULL check.

That's good, any API that allocates memory could in theory return NULL, like strdup() too.

> alx@debian:~/src/shadow/shadow/master$ grepc -tfd stpeprintf .
> ./lib/stpeprintf.h:inline char *
> stpeprintf(char *dst, char *end, const char *restrict fmt, ...)
> {
> 	char     *p;
> 	va_list  ap;
> 
> 	va_start(ap, fmt);
> 	p = vstpeprintf(dst, end, fmt, ap);
> 	va_end(ap);
> 
> 	return p;
> }
> alx@debian:~/src/shadow/shadow/master$ grepc -tfd vstpeprintf .
> ./lib/stpeprintf.h:inline char *
> vstpeprintf(char *dst, char *end, const char *restrict fmt, va_list ap)
> {
> 	int        len;
> 	ptrdiff_t  size;
> 
> 	if (dst == end)
> 		return end;
> 	if (dst == NULL)
> 		return NULL;
> 
> 	size = end - dst;
> 	len = vsnprintf(dst, size, fmt, ap);
> 
> 	if (len == -1)
> 		return NULL;
> 	if (len >= size)
> 		return end;
> 
> 	return dst + len;
> }
> alx@debian:~/src/shadow/shadow/master$ grepc -tfd stpecpy .
> ./lib/stpecpy.h:inline char *
> stpecpy(char *dst, char *end, const char *restrict src)
> {
> 	bool    trunc;
> 	char    *p;
> 	size_t  dsize, dlen, slen;
> 
> 	if (dst == end)
> 		return end;
> 	if (dst == NULL)
> 		return NULL;
> 
> 	dsize = end - dst;
> 	slen = strnlen(src, dsize);
> 	trunc = (slen == dsize);
> 	dlen = slen - trunc;
> 
> 	p = mempcpy(dst, src, dlen);
> 	*p = '\0';
> 
> 	return p + trunc;
> }
> 
> 
> Then you can use them like this:
> 
> 
> 	    end = buf + sizeof(buf);
>             p = buf;
>             p = stpecpy(p, end, "Hello ");
>             p = stpeprintf(p, end, "%d realms", 9);
>             p = stpecpy(p, end, "!");
>             if (p == end) {
>                 p--;
>                 goto toolong;
>             }
>             len = p - buf;
>             puts(buf);
> 
> 
> Regarding other string-copying functions, NULL is not inherent to them,
> so I'm not sure if they should have explicit NULL checks.  Why would
> these functions receive a null pointer?  The main possibility is that
> the programmer forgot to check some malloc(3) call, which should receive
> a different treatment from a failed copy, normally.

Perhaps it's just my point of view. In safety critical software I always do my best to ensure no code calls an API with the null pointer constant - when it's expecting a valid pointer. Given that the null pointer constant is defined in the C standard, even if APIs have undefined behaviour if they require a pointer but are passed a NULL. So the converse is I make APIs check for NULL (if they require a valid pointer) and reject with an error. Covers all bases (there can be corrupt data files occurring that we can't anticipate), so issues can be logged, and no core dump. I'd rather display a "USB device error 51" message on a UI than suffer a core dump which turns off a piece of safety critical equipment or sends it into a restart death loop.

I recall you mentioned [[gnu::nonnull]] aka __attribute__((nonnull)) which is an optimizer hint the API will always be called with a valid pointer. There is also returns_nonnull.

The difficulty is the optimizer will remove any NULL pointer constant checks within those APIs (if there were any). The side effect is a useful compiler warning, if the compiler figures out someone is passing NULL.

So in a safety critical system we must wrap all such APIs, to put back in the null pointer constant checks.

> 
>> D) strlcpy says
>> "These functions force a SIGSEGV if the src pointer is not a string."
>> How does it determine the pointer isn't a string?
> 
> By calling strlen(src).  If it isn't a string, it'll continue reading,
> and likely crash due to an unbound read.  However, the SIGSEGV isn't
> guaranteed, since it may find a 0 well before crashing, so I removed
> that text.  It is a feature and a bug of these functions: they can find
> programming errors where one passes a character sequence where a string
> is expected, and crash the program to nosily report the programmer
> error.  But that also makes it very slow, as Paul said.

Ok I see what you mean. It's good you took out that line, I recall there was even a raise(SIGSEGV) in the implementation in a previous version of the man page.

I wish programmers would keep track of the length of their strings if they need performance, with the pointer to avoid all these strlen(). So then we'd only need to use strnlen() to sanity check buffers given by external libraries.

There are so may variations on this idea to avoid C-string with NUL terminator.

Using a 'struct sbuf' to contain the string buffer
https://man.freebsd.org/cgi/man.cgi?query=sbuf&apropos=0&sektion=0&manpath=FreeBSD+8.2-RELEASE&format=html

C++ has all it's STL containers like std::string.

Other APIs prefer start_ptr, end_ptr (the one after the last character), probably they should also keep the current allocated buffer size, or always do a realloc() when appending.

Others may think differently, that's fine, not all uses of C are the same target.

>>
>> E) Are these functions mentioned like ustpcpy() standardized by POSIX? or in use in a libc?
> 
> No.  They are my inventions, like stpecpy().  It seems I forgot to add a
> "This function is not provided by any library" in some of them.
> 
> Fixed; thanks.
> <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=9848ac50ceb6cc4d786b3899ee4626959e5f1d81>
> 
>>
>> F) 
>> char *stpncpy(char dst[restrict .sz], const char *restrict src,
>>                       size_t sz);
>> I know the 'restrict' keyword, but haven't seen this way it attempts to specify the size of the 'dst' array by using the parameter 'sz' is this in wide use in APIs? I remember C11 let us specify  char ptr[static 1] to say the pointer must be at least 1 element in this example
> 
> It continues meaning the same thing.  If you use array notation, the
> restrict must be placed inside the brackets.  The following two snippets
> are equivalent C code:
> 
> 	void foo(int *p, int *restrict x);
> 	void foo(int *p, int x[restrict 7]);
> 
> Since I didn't use 'static', to ISO C the array notation is ignored.
> GCC, however, will be reasonable and understand it.  To GCC, there's not
> much difference between the following:
> 
> 	[[gnu::nonnull]]
> 	void bar(int x[7]);
> 	void bar(int x[static 7]);
> 
> And of course, you can combine static and restrict:
> 
> 	void baz(int *p, int x[static restrict 7]);
> 
>>
>> Saw a few pages started to write out functions like
>> size_t strnlen(const char s[.maxlen], size_t maxlen);
>>
>> Is this just for documentation? usually it would be: const char s[static maxlen]
> 
> I don't like static for array parameters.  Specifying a size for a
> parameter should similarly signify to the compiler that it should expect
> no less than N elements.  This is how GCC behaves.
> 
> And static has another implication: nonnull.  IMO, nonnull is tangential
> to array size, and should be specified separately with its own attribute
> or qualifier.  I'd like to be able to specify the following different
> cases:
> 
> 	void f1(int [10]);  //  NULL, or array of size >= 10
> 	void f2(int [_Nonnull 10]);  // Array of size >=10
> 
> With static, I can only do the second.  Quite unreasonable.
> 
> 
> Regarding the '.', consider the following two snippets:
> 
> 	int size;  // This is the size of s[size].
> 	void g1(char s[size], size_t size);
> 
> You could be tricked to think that the size of s[] is the second
> parameter to the function, but it's the global variable size.
> 
> 	void g2(char s[size], size_t size);
> 
> Here's, since there's no global size, the code won't even compile.
> There's no way to use a parameter that comes later as a size, conforming
> to ISO C.  We were discussing this [.identifier] syntax in linux-man@
> and gcc@, as a possible extension.  We haven't yet decided on it, but
> I'm previewing it as a documentation extension for now.  The rationale
> for the syntax comes from similarity with designated initializers for
> structures.

That would be good if it got in ISO C.

>> G) "Because these functions ask for the length, and a string is by
>> nature composed of a character sequence of the same length plus a
>> terminating null byte, a string is also accepted as input."
>>
>> I suggest to adjust the order so it doesn't start with a fragment:
>>
>> "A string is also accepted as input, because these functions ask
>> for the length, and a string is by nature composed of a character
>> sequence of the same length plus a terminating null byte."
>>
>> Could simplify and remove "by nature".
> 
> Yep; thanks.
> <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=78b2ff8c6f25654648f0fa06c310b87a7e49128e>
> 
>>
>> Unrelated man page strncpy, noticed this.
>>
>> SEE ALSO
>> Could this refer to strcpy(3) and string(3) at the bottom?
>> https://man7.org/linux/man-pages/man3/strncpy.3.html
> 
> I removed it on purpose, because I intended to put some distance between
> strncpy(3), and strings and string-copying functions like strcpy(3).
> 
> That's why I point to string_copying(7), where readers should be
> educated of all of the differences.  Then, string_copying(7) has a more
> complete SEE ALSO, because it has already detailed all the different
> functions, and the reader is ready to read the individual pages.
> 
> Kind regards,
> Alex

Fair enough. We've all shared a lot going over strnlen and other points! Man pages are all better as a result of all your efforts.

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-12 10:59                                       ` Alejandro Colomar
  2023-11-12 20:49                                         ` Paul Eggert
@ 2023-11-17 21:57                                         ` Jonny Grant
  2023-11-18 10:12                                           ` Alejandro Colomar
  1 sibling, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-17 21:57 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Paul Eggert, Matthew House, linux-man



On 12/11/2023 10:59, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Sun, Nov 12, 2023 at 09:52:20AM +0000, Jonny Grant wrote:
> [... some micro-benchmarks...]
> 
>>
>> Maybe we're gonna need a bigger benchmark.
> 
> Not really.
> 
>>
>> Probably there existing studies. Or could patch something like SQLite
>> Benchmark to utilise each string function just for measurements.
>> Hopefully it moves around at least 2GB of strings to give some
>> meaningful comparison timings.
> 
> I wasn't so interested in the small differences between functions.
> What this micro-benchmark showed clearly, without needing much more info
> to be conclusive, is the first order of growth of each of the functions:
> 
> -  strlcpy(3)'s first order growth corresponds to strlen(src).  That's
>    due to returning strlen(src), which proves to be a poor API.
> 
> -  strncpy(3)'s first order growth corresponds to sizeof(dst).  That's
>    of course due to the zeroing.  If sizeof(dst) is kept very small, you
>    could live with it.  When the size grows to more or less 4 KiB, this
>    drag becomes meaningful.
> 
> -  strnlen(3)+*cpy() first order growth corresponds to
>    strnlen(src, sizeof(dst)), which is the fastest order of growth
>    you can get from a truncating string-copying function (except if you
>    keep track of your slen manually and call directly memcpy(3)).

That's a really good point, keeping track of the length (and buffer size) and then just using memcpy.
The copy time should be closer to the number of bytes read and written.

> 
> Of course, first order of growth ignores second order of growth and so
> on, which for small inputs can be important.  That is, O(x^3) is bigger
> than O(x^2), but x3 + x2 can be smaller than 5*x2 for small x.
> 
>>
>> As Paul mentioned, strlcpy is a poor choice for processing strings.\
>> Could rely on their guidance as they already measured.
>> https://www.gnu.org/software/libc/manual/html_node/Truncating-Strings.html
> 
> Indeed.  I've added important notices in BUGS about it, and recommended
> against

Saw glibc have (11) functions listed as a poor choice for string processing

> 
>>
>> Maybe the strlcpy API is easier, safer for programmers; but the
>> compiler can't figure out that the programmer already knew src string
>> length.  So the strlcpy does a strlen() and wastes time reading over
>> memory.  If the src length is known, can just memcpy.
> 
> I've written strtcpy(3) as an alternative to strlcpy(3) that doesn't
> suffer its problems.  It should be even safer and easier to use, and its
> first order of growth is better.  I'll send a patch for review in a
> moment.

I did take a look at strtcpy but it calls strnlen(), reading over memory.

> 
>> When I've benchmarked things, reducing the memory accesses for read,
>> write boosted performance, also looked at the cycles taken, of course
>> cache and alignment all play a part too.
> 
> If one wants to micro-optimize for their use case, its none of my
> business.  I provide a function that should be safe and relatively fast
> for all use cases, which libc doesn't.
> 
>> Maybe could suggest in your man page programmers should keep track of
>> the src size ? - to save the cost of the strlen().
> 
> No.  Optimizations are not my business.  Writing good APIs should make
> these optimizations low value so that they aren't done, except for the
> most performance-critical programs.
> 
> The problem comes when libc doesn't provide anything usable, and the
> user has no guidance on where to start.  Then, programmers start being
> clever, usually too clever.  That's why I think the man-pages should go
> ahead and write wrapper functions such as strtcpy() and stpecpy()
> aound libc functions; these wrappers should provide a fast and safe
> starting point for most programs.
> 
> It's true that memcpy(3) is the fastest function one can use, but it
> requires the programmer to be rather careful with the lengths of the
> strings.  I don't think keeping track of all those little details is
> what the common programmer should do.

That's true, high-performance users probably create their own bespoke solutions.
strtcpy probably takes the src size?

> 
>>
>> At least the strlen functions are optimized:
>> glibc/strnlen.c calls memchr() searching for '\0' memchr searches 4 bytes at a time.
>> glibc/strlen.c searches 4 bytes at a time.
>>
>> glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice?
>>
>> memcpy (dest, src, size);
>> dest[size - 1] = '\0';
> 
> -1's in the source code make up for off-by-one bugs.  APIs should be
> written so that common use doesn't involve manually writing -1 if
> possible.

What way do you feel they should be doing it?

> 
> I acknowledge the performance benefits of this construction, and have
> used it myself in NGINX code, but I also find it very dangerous, which
> is why I recommend using a wrapper over it:
> 
> 	char *
> 	ustr2stp(char *restrict dst, const char *restrict src, size_t len)
> 	{
> 		char  *p;
> 
> 		p = mempcpy(dst, src, len);
> 		*p = '\0';
> 
> 		return p;
> 	}
> 
> Cheers,
> Alex
> 
>>
>> Kind regards, Jonny
> 

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: Signing all patches and email to this list
  2023-11-17 21:43   ` Jonny Grant
@ 2023-11-18  0:25     ` Matthew House
  2023-11-18 23:24       ` Jonny Grant
  0 siblings, 1 reply; 138+ messages in thread
From: Matthew House @ 2023-11-18  0:25 UTC (permalink / raw)
  To: Jonny Grant; +Cc: Alejandro Colomar, Paul Eggert, linux-man

On Fri, Nov 17, 2023 at 4:43 PM Jonny Grant <jg@jguk.org> wrote:
> On 12/11/2023 11:26, Alejandro Colomar wrote:
> > The compiler will sometimes optimize them to normal *cpy(3) functions,
> > since the length of dst is usually known, if the previous *cpy(3) is
> > visible to the compiler.  And they provide for cleaner code.  If you
> > know that they'll get optimized, you could use them.
>
> May I ask, is there an example or document that shows this optimization by the compiler? Perhaps a godbolt link?
>
> So it's a strcat() optimized to a strcpy()?
>
> I know gcc might unroll and just include the values of the string bytes.
>
> Kind regards, Jonny

See <https://godbolt.org/z/e34fWrTGf>. If a function computes the strlen()
of the destination before calling strcat(), without modifying its value
between the two calls, GCC will replace the strcat() with a strcpy(). If a
function computes the strlen() of both the source and the destination, GCC
will further replace the strcat() with a memcpy(), and possibly inline the
memcpy() if the size is short enough. It will also remember the increased
length of the destination for any future strcat() calls, to accomodate for
strcpy(), strcat(), strcat(), ... chains. This is implemented in the
strlen_pass::handle_builtin_strcat() function in gcc/tree-ssa-strlen.cc.
Neither Clang nor MSVC appears to implement any similar optimization.

Nevertheless, I would be extremely wary of recommending the bare strcpy(3),
strcat(3), and sprintf(3) functions on the basis of "providing for cleaner
code". By permitting the programmer to perform the copy with no immediate
knowledge of the source and destination sizes, the functions open up a
unique opportunity for squirreling away the guaranteed sizes in distant and
opaque parts of the codebase. And this antipattern isn't a rare exception,
but shows up in nearly every library that makes extensive use of the
functions.

Thank you,
Matthew House

^ permalink raw reply	[flat|nested] 138+ messages in thread

* PDF book of unreleased pages (was: strncpy clarify result may not be null terminated)
  2023-11-17 21:46                           ` Jonny Grant
@ 2023-11-18  9:37                             ` Alejandro Colomar
  2023-11-19  0:22                               ` Deri
  2023-11-18  9:44                             ` NULL safety " Alejandro Colomar
  1 sibling, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-18  9:37 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man, Deri James

[-- Attachment #1: Type: text/plain, Size: 1976 bytes --]

On Fri, Nov 17, 2023 at 09:46:47PM +0000, Jonny Grant wrote:
> Thank you for your swift replies Alejandro and incorporating changes.

:-)

> >> I was reading again
> >> https://man7.org/linux/man-pages/man7/string_copying.7.html
> >>
> >> Sharing some comments, I realise not latest man page, if you have a new one online I could read that. I was reading man-pages 6.04, perhaps some already updated.
> > 
> > You can check this one:
> > 
> > <https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/man-pages-6.05.01.pdf#string_copying_7>
> > also available here:
> > <https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages-6.05.01.pdf#string_copying_7>
> > 
> > And of course, you can install them from source, or read them from the
> > repository itself.
> 
> That's good if you have your online PDF version of unreleased versions I could read through.

I have that as a goal, but need some help.  The thing is: we have
<./scripts/LinuxManBook/>, which contains a Perl script and some helper
stuff for it.  It was contributed by gropdf(1)'s maintainer Deri James.

Currently, that script does a lot of magic which produces the book from
all of the pages.

I'd like to be able to split the script into several smaller scripts
that can be run on each page, and then another script that merges all of
them into the single PDF file.  That would be something I can merge into
the Makefiles so that we can run a `make build-pdf` and if I touch a
single page, it would only update the relevant part, reusing as much as
possible from previous runs.

Since I don't understand Perl, and don't know much of gropdf(1) either,
I need help.

Maybe Deri or Branden can help with that.  If anyone else understands it
and can also help, that's very welcome too!

Then I could install a hook in my server that runs

	$ make build-pdf docdir=/srv/www/...

Cheers,
Alex


-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* NULL safety (was: strncpy clarify result may not be null terminated)
  2023-11-17 21:46                           ` Jonny Grant
  2023-11-18  9:37                             ` PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) Alejandro Colomar
@ 2023-11-18  9:44                             ` Alejandro Colomar
  2023-11-18 23:21                               ` NULL safety Jonny Grant
  1 sibling, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-18  9:44 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man

[-- Attachment #1: Type: text/plain, Size: 2306 bytes --]

Hi Jonny,

On Fri, Nov 17, 2023 at 09:46:47PM +0000, Jonny Grant wrote:
> > Regarding other string-copying functions, NULL is not inherent to them,
> > so I'm not sure if they should have explicit NULL checks.  Why would
> > these functions receive a null pointer?  The main possibility is that
> > the programmer forgot to check some malloc(3) call, which should receive
> > a different treatment from a failed copy, normally.
> 
> Perhaps it's just my point of view. In safety critical software I always do my best to ensure no code calls an API with the null pointer constant - when it's expecting a valid pointer. Given that the null pointer constant is defined in the C standard, even if APIs have undefined behaviour if they require a pointer but are passed a NULL. So the converse is I make APIs check for NULL (if they require a valid pointer) and reject with an error. Covers all bases (there can be corrupt data files occurring that we can't anticipate), so issues can be logged, and no core dump. I'd rather display a "USB device error 51" message on a UI than suffer a core dump which turns off a piece of safety critical equipment or sends it into a restart death loop.
> 
> I recall you mentioned [[gnu::nonnull]] aka __attribute__((nonnull)) which is an optimizer hint the API will always be called with a valid pointer. There is also returns_nonnull.
> 
> The difficulty is the optimizer will remove any NULL pointer constant checks within those APIs (if there were any). The side effect is a useful compiler warning, if the compiler figures out someone is passing NULL.
> 
> So in a safety critical system we must wrap all such APIs, to put back in the null pointer constant checks.

There's Clang's qualifier _Nonnull, which is not a hint to the
optimizer.  It is an attempt to have null correctness similar to how we
have const correctness.  It still has little support, even from Clang
itself.  It has some important problem: it applies to the pointer, not
to the pointee, but pointer qualifiers are discarded easily.  A better
design would make it a pointee qualifier.  Hopefully, this will some day
be there to end all NULL discussions.  Until then, yeah, NULL is a
dangerous part of the language.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-17 21:57                                         ` Jonny Grant
@ 2023-11-18 10:12                                           ` Alejandro Colomar
  2023-11-18 23:03                                             ` Jonny Grant
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-18 10:12 UTC (permalink / raw)
  To: Jonny Grant; +Cc: Paul Eggert, Matthew House, linux-man

[-- Attachment #1: Type: text/plain, Size: 5478 bytes --]

Hi Jonny,

On Fri, Nov 17, 2023 at 09:57:39PM +0000, Jonny Grant wrote:
> > -  strlcpy(3)'s first order growth corresponds to strlen(src).  That's
> >    due to returning strlen(src), which proves to be a poor API.
> > 
> > -  strncpy(3)'s first order growth corresponds to sizeof(dst).  That's
> >    of course due to the zeroing.  If sizeof(dst) is kept very small, you
> >    could live with it.  When the size grows to more or less 4 KiB, this
> >    drag becomes meaningful.
> > 
> > -  strnlen(3)+*cpy() first order growth corresponds to
> >    strnlen(src, sizeof(dst)), which is the fastest order of growth
> >    you can get from a truncating string-copying function (except if you
> >    keep track of your slen manually and call directly memcpy(3)).
> 
> That's a really good point, keeping track of the length (and buffer size) and then just using memcpy.
> The copy time should be closer to the number of bytes read and written.

Actually, the performance of memcpy(3) should also be on the order of
strnlen(src, sizeof(dst)), so it should always take similar times
compared to strnlen(3)+*cpy().  It is only that it will always be
slightly faster due to avoiding a second read, but it will only be a %.
Nothing like 10x, which can easily happen with strlcpy(3) or strncpy(3).

> > Of course, first order of growth ignores second order of growth and so
> > on, which for small inputs can be important.  That is, O(x^3) is bigger
> > than O(x^2), but x3 + x2 can be smaller than 5*x2 for small x.
> > 
> >>
> >> As Paul mentioned, strlcpy is a poor choice for processing strings.\
> >> Could rely on their guidance as they already measured.
> >> https://www.gnu.org/software/libc/manual/html_node/Truncating-Strings.html
> > 
> > Indeed.  I've added important notices in BUGS about it, and recommended
> > against
> 
> Saw glibc have (11) functions listed as a poor choice for string processing

They list many functions as poor choices for string processing.  The
problem is that they list those functions for string processing.  I went
a bit further and de-listed some: We don't list strncpy(3) or strncat(3)
as functions that process strings, but rather as something else.  And
they are actually good functions for processing that something else.

The problem with strlcpy(3) is that it's a function that is designed to
process strings, and being bad at processing strings makes it a bad
function period.

> >> Maybe the strlcpy API is easier, safer for programmers; but the
> >> compiler can't figure out that the programmer already knew src string
> >> length.  So the strlcpy does a strlen() and wastes time reading over
> >> memory.  If the src length is known, can just memcpy.
> > 
> > I've written strtcpy(3) as an alternative to strlcpy(3) that doesn't
> > suffer its problems.  It should be even safer and easier to use, and its
> > first order of growth is better.  I'll send a patch for review in a
> > moment.
> 
> I did take a look at strtcpy but it calls strnlen(), reading over memory.

That's just a few % slower than memcpy(3).  Don't expect memcpy(3) to be
much faster than this.  strtcpy() reads twice writes once; memcpy(3)
reads once writes once.  So you can expect memcpy(3) to be constantly
33% faster (very roughly).

If you implement you own strtcpy() in assembly, maybe you can get
something that's in the single-digit % slower than memcpy(3), similar to
strcpy(3).

> >> When I've benchmarked things, reducing the memory accesses for read,
> >> write boosted performance, also looked at the cycles taken, of course
> >> cache and alignment all play a part too.
> > 
> > If one wants to micro-optimize for their use case, its none of my
> > business.  I provide a function that should be safe and relatively fast
> > for all use cases, which libc doesn't.
> > 
> >> Maybe could suggest in your man page programmers should keep track of
> >> the src size ? - to save the cost of the strlen().
> > 
> > No.  Optimizations are not my business.  Writing good APIs should make
> > these optimizations low value so that they aren't done, except for the
> > most performance-critical programs.
> > 
> > The problem comes when libc doesn't provide anything usable, and the
> > user has no guidance on where to start.  Then, programmers start being
> > clever, usually too clever.  That's why I think the man-pages should go
> > ahead and write wrapper functions such as strtcpy() and stpecpy()
> > aound libc functions; these wrappers should provide a fast and safe
> > starting point for most programs.
> > 
> > It's true that memcpy(3) is the fastest function one can use, but it
> > requires the programmer to be rather careful with the lengths of the
> > strings.  I don't think keeping track of all those little details is
> > what the common programmer should do.
> 
> That's true, high-performance users probably create their own bespoke solutions.
> strtcpy probably takes the src size?

No.  strtcpy() takes the dst size.

ssize_t
strtcpy(char dst[restrict dsize], const char *restrict src, size_t dsize);

This function doesn't care about the src size.  It requires that it's
either a string, or a character array larger than dst.  In both cases,
it means that the internal calculation of slen = strnlen(src, dsize)
will never overrun the buffer, while costing only a small time.


-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-18 10:12                                           ` Alejandro Colomar
@ 2023-11-18 23:03                                             ` Jonny Grant
  0 siblings, 0 replies; 138+ messages in thread
From: Jonny Grant @ 2023-11-18 23:03 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Paul Eggert, Matthew House, linux-man



On 18/11/2023 10:12, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Fri, Nov 17, 2023 at 09:57:39PM +0000, Jonny Grant wrote:
>>> -  strlcpy(3)'s first order growth corresponds to strlen(src).  That's
>>>    due to returning strlen(src), which proves to be a poor API.
>>>
>>> -  strncpy(3)'s first order growth corresponds to sizeof(dst).  That's
>>>    of course due to the zeroing.  If sizeof(dst) is kept very small, you
>>>    could live with it.  When the size grows to more or less 4 KiB, this
>>>    drag becomes meaningful.
>>>
>>> -  strnlen(3)+*cpy() first order growth corresponds to
>>>    strnlen(src, sizeof(dst)), which is the fastest order of growth
>>>    you can get from a truncating string-copying function (except if you
>>>    keep track of your slen manually and call directly memcpy(3)).
>>
>> That's a really good point, keeping track of the length (and buffer size) and then just using memcpy.
>> The copy time should be closer to the number of bytes read and written.
> 
> Actually, the performance of memcpy(3) should also be on the order of
> strnlen(src, sizeof(dst)), so it should always take similar times
> compared to strnlen(3)+*cpy().  It is only that it will always be
> slightly faster due to avoiding a second read, but it will only be a %.
> Nothing like 10x, which can easily happen with strlcpy(3) or strncpy(3).
> 
>>> Of course, first order of growth ignores second order of growth and so
>>> on, which for small inputs can be important.  That is, O(x^3) is bigger
>>> than O(x^2), but x3 + x2 can be smaller than 5*x2 for small x.
>>>
>>>>
>>>> As Paul mentioned, strlcpy is a poor choice for processing strings.\
>>>> Could rely on their guidance as they already measured.
>>>> https://www.gnu.org/software/libc/manual/html_node/Truncating-Strings.html
>>>
>>> Indeed.  I've added important notices in BUGS about it, and recommended
>>> against
>>
>> Saw glibc have (11) functions listed as a poor choice for string processing
> 
> They list many functions as poor choices for string processing.  The
> problem is that they list those functions for string processing.  I went
> a bit further and de-listed some: We don't list strncpy(3) or strncat(3)
> as functions that process strings, but rather as something else.  And
> they are actually good functions for processing that something else.
> 
> The problem with strlcpy(3) is that it's a function that is designed to
> process strings, and being bad at processing strings makes it a bad
> function period.
> 
>>>> Maybe the strlcpy API is easier, safer for programmers; but the
>>>> compiler can't figure out that the programmer already knew src string
>>>> length.  So the strlcpy does a strlen() and wastes time reading over
>>>> memory.  If the src length is known, can just memcpy.
>>>
>>> I've written strtcpy(3) as an alternative to strlcpy(3) that doesn't
>>> suffer its problems.  It should be even safer and easier to use, and its
>>> first order of growth is better.  I'll send a patch for review in a
>>> moment.
>>
>> I did take a look at strtcpy but it calls strnlen(), reading over memory.
> 
> That's just a few % slower than memcpy(3).  Don't expect memcpy(3) to be
> much faster than this.  strtcpy() reads twice writes once; memcpy(3)
> reads once writes once.  So you can expect memcpy(3) to be constantly
> 33% faster (very roughly).

Probably there are benchmarks, measurements comparing those functions which use strnlen() to those that just do memcpy()? Would be interesting to hear what the time is to do those reads & writes, or just do writes.

> If you implement you own strtcpy() in assembly, maybe you can get
> something that's in the single-digit % slower than memcpy(3), similar to
> strcpy(3).
> 
>>>> When I've benchmarked things, reducing the memory accesses for read,
>>>> write boosted performance, also looked at the cycles taken, of course
>>>> cache and alignment all play a part too.
>>>
>>> If one wants to micro-optimize for their use case, its none of my
>>> business.  I provide a function that should be safe and relatively fast
>>> for all use cases, which libc doesn't.
>>>
>>>> Maybe could suggest in your man page programmers should keep track of
>>>> the src size ? - to save the cost of the strlen().
>>>
>>> No.  Optimizations are not my business.  Writing good APIs should make
>>> these optimizations low value so that they aren't done, except for the
>>> most performance-critical programs.
>>>
>>> The problem comes when libc doesn't provide anything usable, and the
>>> user has no guidance on where to start.  Then, programmers start being
>>> clever, usually too clever.  That's why I think the man-pages should go
>>> ahead and write wrapper functions such as strtcpy() and stpecpy()
>>> aound libc functions; these wrappers should provide a fast and safe
>>> starting point for most programs.
>>>
>>> It's true that memcpy(3) is the fastest function one can use, but it
>>> requires the programmer to be rather careful with the lengths of the
>>> strings.  I don't think keeping track of all those little details is
>>> what the common programmer should do.
>>
>> That's true, high-performance users probably create their own bespoke solutions.
>> strtcpy probably takes the src size?
> 
> No.  strtcpy() takes the dst size.
> 
> ssize_t
> strtcpy(char dst[restrict dsize], const char *restrict src, size_t dsize);
> 
> This function doesn't care about the src size.  It requires that it's
> either a string, or a character array larger than dst.  In both cases,
> it means that the internal calculation of slen = strnlen(src, dsize)
> will never overrun the buffer, while costing only a small time.

Ok I see, I would rather use something that allowed the src_len to be specified, to save that strnlen() cost.

Kind regards
Jonny


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: NULL safety
  2023-11-18  9:44                             ` NULL safety " Alejandro Colomar
@ 2023-11-18 23:21                               ` Jonny Grant
  2023-11-24 22:25                                 ` Alejandro Colomar
  0 siblings, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-18 23:21 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: linux-man



On 18/11/2023 09:44, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Fri, Nov 17, 2023 at 09:46:47PM +0000, Jonny Grant wrote:
>>> Regarding other string-copying functions, NULL is not inherent to them,
>>> so I'm not sure if they should have explicit NULL checks.  Why would
>>> these functions receive a null pointer?  The main possibility is that
>>> the programmer forgot to check some malloc(3) call, which should receive
>>> a different treatment from a failed copy, normally.
>>
>> Perhaps it's just my point of view. In safety critical software I always do my best to ensure no code calls an API with the null pointer constant - when it's expecting a valid pointer. Given that the null pointer constant is defined in the C standard, even if APIs have undefined behaviour if they require a pointer but are passed a NULL. So the converse is I make APIs check for NULL (if they require a valid pointer) and reject with an error. Covers all bases (there can be corrupt data files occurring that we can't anticipate), so issues can be logged, and no core dump. I'd rather display a "USB device error 51" message on a UI than suffer a core dump which turns off a piece of safety critical equipment or sends it into a restart death loop.
>>
>> I recall you mentioned [[gnu::nonnull]] aka __attribute__((nonnull)) which is an optimizer hint the API will always be called with a valid pointer. There is also returns_nonnull.
>>
>> The difficulty is the optimizer will remove any NULL pointer constant checks within those APIs (if there were any). The side effect is a useful compiler warning, if the compiler figures out someone is passing NULL.
>>
>> So in a safety critical system we must wrap all such APIs, to put back in the null pointer constant checks.
> 
> There's Clang's qualifier _Nonnull, which is not a hint to the
> optimizer.  It is an attempt to have null correctness similar to how we
> have const correctness.  It still has little support, even from Clang
> itself.  It has some important problem: it applies to the pointer, not
> to the pointee, but pointer qualifiers are discarded easily.  A better
> design would make it a pointee qualifier.  Hopefully, this will some day
> be there to end all NULL discussions.  Until then, yeah, NULL is a
> dangerous part of the language.
> 
> Cheers,
> Alex
> 

I saw Christopher Bazley was talking about this. As I understand it, _Nonnull is milder than attribute nonnull. _Nonnull probably helps with static analysis, but doesn't optimize out any code checking if(ptr == NULL) return -1;

Saw this, did you get traction with your proposal?

https://discourse.llvm.org/t/iso-c3x-proposal-nonnull-qualifier/59269?page=2


You're right NULL is a dangerous part of the language, there is part of the C spec that does state functions which don't document supporting arguments that are NULL, are undefined behaviour. It's implementation defined, and most don't check for it, which is fine, it's their choice. NULL is pretty easy to check for in a wrapper, simpler than catching use-after-free pointers at runtime, like valgrind or address sanitizer does.

Paul Eggert drew my attention to this in C23:

7.1.4  Use of library functions


"If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after default argument promotion) not expected by a function with a variable number of arguments, the behavior is undefined."
 
Kind regards
Jonny




^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: Signing all patches and email to this list
  2023-11-18  0:25     ` Signing all patches and email to this list Matthew House
@ 2023-11-18 23:24       ` Jonny Grant
  0 siblings, 0 replies; 138+ messages in thread
From: Jonny Grant @ 2023-11-18 23:24 UTC (permalink / raw)
  To: Matthew House; +Cc: Alejandro Colomar, Paul Eggert, linux-man



On 18/11/2023 00:25, Matthew House wrote:
> On Fri, Nov 17, 2023 at 4:43 PM Jonny Grant <jg@jguk.org> wrote:
>> On 12/11/2023 11:26, Alejandro Colomar wrote:
>>> The compiler will sometimes optimize them to normal *cpy(3) functions,
>>> since the length of dst is usually known, if the previous *cpy(3) is
>>> visible to the compiler.  And they provide for cleaner code.  If you
>>> know that they'll get optimized, you could use them.
>>
>> May I ask, is there an example or document that shows this optimization by the compiler? Perhaps a godbolt link?
>>
>> So it's a strcat() optimized to a strcpy()?
>>
>> I know gcc might unroll and just include the values of the string bytes.
>>
>> Kind regards, Jonny
> 
> See <https://godbolt.org/z/e34fWrTGf>. If a function computes the strlen()
> of the destination before calling strcat(), without modifying its value
> between the two calls, GCC will replace the strcat() with a strcpy(). If a
> function computes the strlen() of both the source and the destination, GCC
> will further replace the strcat() with a memcpy(), and possibly inline the
> memcpy() if the size is short enough. It will also remember the increased
> length of the destination for any future strcat() calls, to accomodate for
> strcpy(), strcat(), strcat(), ... chains. This is implemented in the
> strlen_pass::handle_builtin_strcat() function in gcc/tree-ssa-strlen.cc.
> Neither Clang nor MSVC appears to implement any similar optimization.

That's great it optimizes, thank you for sharing the information.

> Nevertheless, I would be extremely wary of recommending the bare strcpy(3),
> strcat(3), and sprintf(3) functions on the basis of "providing for cleaner
> code". By permitting the programmer to perform the copy with no immediate
> knowledge of the source and destination sizes, the functions open up a
> unique opportunity for squirreling away the guaranteed sizes in distant and
> opaque parts of the codebase. And this antipattern isn't a rare exception,
> but shows up in nearly every library that makes extensive use of the
> functions.
> 
> Thank you,
> Matthew House

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-10 13:15                             ` Alejandro Colomar
@ 2023-11-18 23:40                               ` Jonny Grant
  2023-11-20 11:56                                 ` Jonny Grant
  0 siblings, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-18 23:40 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Paul Eggert, Matthew House, linux-man



On 10/11/2023 13:15, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Fri, Nov 10, 2023 at 11:36:20AM +0000, Jonny Grant wrote:
>>
>>
>> On 10/11/2023 05:36, Paul Eggert wrote:
>>> On 2023-11-09 15:48, Alejandro Colomar wrote:
>>>> I'd then just use strlen(3)+strcpy(3), avoiding
>>>> strncpy(3).
>>>
>>> But that is vulnerable to the same denial-of-service attack that strlcpy is vulnerable to. You'd need strnlen+strcpy instead.
>>>
>>> The strncpy approach I suggested is simpler, and (though this doesn't matter much in practice) is typically significantly faster than strnlen+strcpy in the typical case where the destination is a small fixed-size buffer.
>>>
>>> Although strncpy is not a good design, it's often simpler or faster or safer than later "improvements".
>>
>> As you say, it is a known API. I recall looking for a standardized bounded string copy a few years ago that avoids pitfalls:
>>
>> 1) cost of any initial strnlen() reading memory to determine input src size
>> 2) accepts a src_max_size to actually try to copy from src
>> 3) does not truncate by writing anything to the buffer if there isn't enough space in the dest_max_size to fit src_max_size
>> 4) check for NULL pointers
>> 5) probably other thing I've overlooked
>>
>> Something like this API:
>> int my_str_copy(char *dest, const char *src, size_t dest_max_size, size_t src_max_size, size_t * dest_written);
>> These sizes are including any NUL terminating byte.
>>
>> 0 on success, or an an error code like EINVAL, or ERANGE if would truncate
> 
> -  Linux kernel's strscpy() returns -E2BIG if it would truncate.  You
>    may want to follow suit if you want such an errno(3) code.
That is good, E2BIG if the dest_max_size can't accommodate src_max_size

> 
>    However, I think it's simpler to return the "standard" user-space
>    error return value: -1> 
>    If you'd need to distinguish error reasons, you could distinguish
>    error codes, but for a string-copying function I think it's not so
>    useful.

In the past I've used different values, eg -1 .. -5 as there are 5 different errors detected by this function I made a test version of, so application just needs to check for 0 for success. (The different error returns are useful when the issue is logged, to see where the error was detected in the function.)
 
> -  Why specify the src buffer size?  If you're copying strings, then you
>    know it'll be null-terminated, so strnlen(3) will not overrun.

The application should know the src buffer size, given that it allocated the buffer. That saves the performance cost of strnlen().

> If
>    you're not copying strings, then you'll need a different function
>    that reads from a non-string.  The only standard such function is
>    strncat(3), which reads from a fixed-width null-padded buffer, and
>    writes to a string.  You may want to write a function similar to
>    strncat(3) that doesn't catenate, if you want to just copy; I call
>    that function zustr2stp(), and you can find an implementation in
>    string_copying(7).
> 
> -  You can reuse the return value for the dest_written value with
>    ssize_t.  Just return -1 on error and the string length on success.
>    That's how most libc functions behave.

Sounds good.

> 
> -  Regarding NULL checks, it depends on how you program.  I wouldn't add
>    them, but if you want to avoid crashes at all costs, it may be
>    necessary for you.  You could do a wrapper over strxcpy():
> 
> 
> 	inline ssize_t
> 	strxcpy0(char *restrict dst, const char *restrict src, size_t dsize)
> 	{
> 		if (dst == NULL || src == NULL)
> 			return -1;
> 
> 		return strxcpy(dst, src, dsize);
> 	}
> 
>    I used 0 in the name to mark that this function checks for null
>    pointers.
> 
> Cheers,
> Alex
> 
>>
>> All comments welcome.
>>
>> Kind regards, Jonny
> 

Kind regards, Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated)
  2023-11-18  9:37                             ` PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) Alejandro Colomar
@ 2023-11-19  0:22                               ` Deri
  2023-11-19  1:19                                 ` Alejandro Colomar
  0 siblings, 1 reply; 138+ messages in thread
From: Deri @ 2023-11-19  0:22 UTC (permalink / raw)
  To: Jonny Grant, Alejandro Colomar; +Cc: linux-man

On Saturday, 18 November 2023 09:37:17 GMT Alejandro Colomar wrote:
> On Fri, Nov 17, 2023 at 09:46:47PM +0000, Jonny Grant wrote:
> > Thank you for your swift replies Alejandro and incorporating changes.
> :
> :-)
> :
> > >> I was reading again
> > >> https://man7.org/linux/man-pages/man7/string_copying.7.html
> > >> 
> > >> Sharing some comments, I realise not latest man page, if you have a new
> > >> one online I could read that. I was reading man-pages 6.04, perhaps
> > >> some already updated.> > 
> > > You can check this one:
> > > 
> > > <https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/ma
> > > n-pages-6.05.01.pdf#string_copying_7> also available here:
> > > <https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages
> > > -6.05.01.pdf#string_copying_7>
> > > 
> > > And of course, you can install them from source, or read them from the
> > > repository itself.
> > 
> > That's good if you have your online PDF version of unreleased versions I
> > could read through.
> I have that as a goal, but need some help.  The thing is: we have
> <./scripts/LinuxManBook/>, which contains a Perl script and some helper
> stuff for it.  It was contributed by gropdf(1)'s maintainer Deri James.
> 
> Currently, that script does a lot of magic which produces the book from
> all of the pages.
> 
> I'd like to be able to split the script into several smaller scripts
> that can be run on each page, and then another script that merges all of
> them into the single PDF file.  That would be something I can merge into
> the Makefiles so that we can run a `make build-pdf` and if I touch a
> single page, it would only update the relevant part, reusing as much as
> possible from previous runs.

Hi Alex,

I assume you are thinking this will make production more efficient (quicker). 
The time saved would be absolutely minimal. It is obvious that to produce a 
pdf containing all the man pages then all the man pages have to be consumed by 
groff, not just the page which has changed. On my system this takes about 18 
seconds to produce the 2800+ pages of the book. Of this, a quarter of a second 
is consumed by the "magic" part of the script, the rest of the 18 seconds is 
consumed by calls to groff and gropdf. So any splitting of the perl script is 
only going to have an effect on the quarter of a second!

I don't understand why the perl script can't be included in your make file as 
part of build-pdf target. Presumably it would be dependent on running after 
the scripts which add the revision label and date to each man page.

> Since I don't understand Perl, and don't know much of gropdf(1) either,
> I need help.
> 
> Maybe Deri or Branden can help with that.  If anyone else understands it
> and can also help, that's very welcome too!

You are probably better placed to add the necessaries to your makefile. You 
would then just need to remember to make build-pdf any time you alter one of 
the source man pages. Since you are manually running my script to produce the 
pdf, it should not be difficult to automate it in a makefile.

> Then I could install a hook in my server that runs
> 
> 	$ make build-pdf docdir=/srv/www/...

And wait 18s each time the hook is actioned!! Or, set the build to place the 
generated pdf somewhere in /srv/www/... and include the build in your normal 
workflow when a man page is changed.

Cheers

Deri

> Cheers,
> Alex





^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated)
  2023-11-19  0:22                               ` Deri
@ 2023-11-19  1:19                                 ` Alejandro Colomar
  2023-11-19  9:29                                   ` Alejandro Colomar
  2023-11-19 16:21                                   ` Deri
  0 siblings, 2 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-19  1:19 UTC (permalink / raw)
  To: Deri; +Cc: Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 3480 bytes --]

Hi Deri!

On Sun, Nov 19, 2023 at 12:22:56AM +0000, Deri wrote:
> Hi Alex,
> 
> I assume you are thinking this will make production more efficient (quicker). 

Not necessarily.  The main reason is that I want to be able to inspect
and understand every little step of the groff pipeline.  See for example
how I build a pdf from a single page:

	$ touch man2/membarrier.2
	$ make build-pdf   
	PRECONV	.tmp/man/man2/membarrier.2.tbl
	TBL	.tmp/man/man2/membarrier.2.eqn
	EQN	.tmp/man/man2/membarrier.2.pdf.troff
	TROFF	.tmp/man/man2/membarrier.2.pdf.set
	GROPDF	.tmp/man/man2/membarrier.2.pdf

That helps debug the pipeline, and also learn about it.

If that helps parallelize some tasks, then that'll be welcome.

> The time saved would be absolutely minimal. It is obvious that to produce a 
> pdf containing all the man pages then all the man pages have to be consumed by 
> groff, not just the page which has changed.

But do you need to run the entire pipeline, or can you reuse most of it?
I can process in parallel much faster, with `make -jN ...`.  I guess
the .pdf.troff files can be reused; maybe even the .pdf.set ones?

Could you change the script at least to produce intermediary files as in
the pipeline shown above?  As many as possible would be excellent.

> On my system this takes about 18 
> seconds to produce the 2800+ pages of the book. Of this, a quarter of a second 
> is consumed by the "magic" part of the script, the rest of the 18 seconds is 
> consumed by calls to groff and gropdf.

But how much of that work needs to be on a single process?  I bought a
new CPU with 24 cores.  Gotta use them all  :D

> So any splitting of the perl script is 
> only going to have an effect on the quarter of a second!
> 
> I don't understand why the perl script can't be included in your make file as 
> part of build-pdf target.

It can.  I just prefer to be strict about the Makefile having "one rule
per each file", while currently the script generates 4 files (T, two
.Z's, and the .pdf).

> Presumably it would be dependent on running after 
> the scripts which add the revision label and date to each man page.

I only set the revision and date on dist tarballs.  For the git HEAD
book, I'd keep the (unreleased) version and (date).  So, no worries
there.

> 
> > Since I don't understand Perl, and don't know much of gropdf(1) either,
> > I need help.
> > 
> > Maybe Deri or Branden can help with that.  If anyone else understands it
> > and can also help, that's very welcome too!
> 
> You are probably better placed to add the necessaries to your makefile. You 
> would then just need to remember to make build-pdf any time you alter one of 
> the source man pages. Since you are manually running my script to produce the 
> pdf, it should not be difficult to automate it in a makefile.
> 
> > Then I could install a hook in my server that runs
> > 
> > 	$ make build-pdf docdir=/srv/www/...
> 
> And wait 18s each time the hook is actioned!! Or, set the build to place the 
> generated pdf somewhere in /srv/www/... and include the build in your normal 
> workflow when a man page is changed.

Hmm.  I still hope some of it can be parallelized, but 18s could be
reasonable, if the server does that in the background after pushing.
My old raspberry pi would burn, but the new computer should handle that
just fine.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated)
  2023-11-19  1:19                                 ` Alejandro Colomar
@ 2023-11-19  9:29                                   ` Alejandro Colomar
  2023-11-19 16:21                                   ` Deri
  1 sibling, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-19  9:29 UTC (permalink / raw)
  To: Deri; +Cc: Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 4024 bytes --]

On Sun, Nov 19, 2023 at 02:19:43AM +0100, Alejandro Colomar wrote:
> Hi Deri!
> 
> On Sun, Nov 19, 2023 at 12:22:56AM +0000, Deri wrote:
> > Hi Alex,
> > 
> > I assume you are thinking this will make production more efficient (quicker). 
> 
> Not necessarily.  The main reason is that I want to be able to inspect
> and understand every little step of the groff pipeline.  See for example
> how I build a pdf from a single page:
> 
> 	$ touch man2/membarrier.2
> 	$ make build-pdf   
> 	PRECONV	.tmp/man/man2/membarrier.2.tbl
> 	TBL	.tmp/man/man2/membarrier.2.eqn
> 	EQN	.tmp/man/man2/membarrier.2.pdf.troff
> 	TROFF	.tmp/man/man2/membarrier.2.pdf.set
> 	GROPDF	.tmp/man/man2/membarrier.2.pdf
> 
> That helps debug the pipeline, and also learn about it.
> 
> If that helps parallelize some tasks, then that'll be welcome.
> 
> > The time saved would be absolutely minimal. It is obvious that to produce a 
> > pdf containing all the man pages then all the man pages have to be consumed by 
> > groff, not just the page which has changed.
> 
> But do you need to run the entire pipeline, or can you reuse most of it?
> I can process in parallel much faster, with `make -jN ...`.  I guess
> the .pdf.troff files can be reused; maybe even the .pdf.set ones?
> 
> Could you change the script at least to produce intermediary files as in
> the pipeline shown above?  As many as possible would be excellent.

And if then you could split the Perl script so that it is composed of
several subcripts called by the main script, and each subscript produces
exactly one file, that's be great.  I could call each of those smaller
scripts in a Makefile rule.

> 
> > On my system this takes about 18 
> > seconds to produce the 2800+ pages of the book. Of this, a quarter of a second 
> > is consumed by the "magic" part of the script, the rest of the 18 seconds is 
> > consumed by calls to groff and gropdf.
> 
> But how much of that work needs to be on a single process?  I bought a
> new CPU with 24 cores.  Gotta use them all  :D
> 
> > So any splitting of the perl script is 
> > only going to have an effect on the quarter of a second!
> > 
> > I don't understand why the perl script can't be included in your make file as 
> > part of build-pdf target.
> 
> It can.  I just prefer to be strict about the Makefile having "one rule
> per each file", while currently the script generates 4 files (T, two
> .Z's, and the .pdf).
> 
> > Presumably it would be dependent on running after 
> > the scripts which add the revision label and date to each man page.
> 
> I only set the revision and date on dist tarballs.  For the git HEAD
> book, I'd keep the (unreleased) version and (date).  So, no worries
> there.
> 
> > 
> > > Since I don't understand Perl, and don't know much of gropdf(1) either,
> > > I need help.
> > > 
> > > Maybe Deri or Branden can help with that.  If anyone else understands it
> > > and can also help, that's very welcome too!
> > 
> > You are probably better placed to add the necessaries to your makefile. You 
> > would then just need to remember to make build-pdf any time you alter one of 
> > the source man pages. Since you are manually running my script to produce the 
> > pdf, it should not be difficult to automate it in a makefile.
> > 
> > > Then I could install a hook in my server that runs
> > > 
> > > 	$ make build-pdf docdir=/srv/www/...
> > 
> > And wait 18s each time the hook is actioned!! Or, set the build to place the 
> > generated pdf somewhere in /srv/www/... and include the build in your normal 
> > workflow when a man page is changed.
> 
> Hmm.  I still hope some of it can be parallelized, but 18s could be
> reasonable, if the server does that in the background after pushing.
> My old raspberry pi would burn, but the new computer should handle that
> just fine.
> 
> Cheers,
> Alex
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated)
  2023-11-19  1:19                                 ` Alejandro Colomar
  2023-11-19  9:29                                   ` Alejandro Colomar
@ 2023-11-19 16:21                                   ` Deri
  2023-11-19 20:58                                     ` Alejandro Colomar
  1 sibling, 1 reply; 138+ messages in thread
From: Deri @ 2023-11-19 16:21 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Jonny Grant, linux-man

On Sunday, 19 November 2023 01:19:43 GMT Alejandro Colomar wrote:
> Hi Deri!
> 
> On Sun, Nov 19, 2023 at 12:22:56AM +0000, Deri wrote:
> > Hi Alex,
> > 
> > I assume you are thinking this will make production more efficient
> > (quicker).
> Not necessarily.  The main reason is that I want to be able to inspect
> and understand every little step of the groff pipeline.  See for example
> how I build a pdf from a single page:
> 
> 	$ touch man2/membarrier.2
> 	$ make build-pdf
> 	PRECONV	.tmp/man/man2/membarrier.2.tbl
> 	TBL	.tmp/man/man2/membarrier.2.eqn
> 	EQN	.tmp/man/man2/membarrier.2.pdf.troff
> 	TROFF	.tmp/man/man2/membarrier.2.pdf.set
> 	GROPDF	.tmp/man/man2/membarrier.2.pdf
> 
> That helps debug the pipeline, and also learn about it.
> 
> If that helps parallelize some tasks, then that'll be welcome.

Hi Alex,

Doing it that way actually stops the jobs being run in parallel! Each step 
completes before the next step starts, whereas if you let groff build the 
pipeline all the processes are run in parallel. Using separate steps may be 
desirable for "understanding every little step of the groff pipeline", (and 
may aid debugging an issue), but once such knowledge is obtained it is 
probably better to leave the pipelining to groff, in a production environment.

> > The time saved would be absolutely minimal. It is obvious that to produce
> > a
> > pdf containing all the man pages then all the man pages have to be
> > consumed by groff, not just the page which has changed.
> 
> But do you need to run the entire pipeline, or can you reuse most of it?
> I can process in parallel much faster, with `make -jN ...`.  I guess
> the .pdf.troff files can be reused; maybe even the .pdf.set ones?
> 
> Could you change the script at least to produce intermediary files as in
> the pipeline shown above?  As many as possible would be excellent.

Perhaps it would help if I explain the stages of my script. First a look at 
what the script needs to do to produce a pdf of all man pages. There are too 
many files to produce a single command line with all the filenames of each 
man, groff has no mechanism for passing a list of filenames, so first job is 
to concatenate all the separate files into one input file for groff. And while 
we are doing that, add the "magic sauce" which makes all the pdf links in the 
book and sorts out the aliases which point to another man page.

After this is done there is a single troff file, called LMB.man, which is the 
file groff is going to process. In the script you should see something like 
this:-

my $temp='LMB.man';

[...]

my $format='pdf';
my $paper=$fpaper ||';
my $cmdstring="-T$format -k -pet -M. -F. -mandoc -manmark -dpaper=$paper -P-
p$paper -rC1 -rCHECKSTYLE=3";
my $front='LMBfront.t';
my $frontdit='LMBfront.set';
my $mandit='LinuxManBook.set';
my $book="LinuxManBook.$format";

system("groff -T$format -dpaper=$paper -P-p$paper -ms $front -Z > $frontdit");
system("groff -z -dPDF.EXPORT=1 -dLABEL.REFS=1 $temp $cmdstring 2>&1 | 
LC_ALL=C grep '^\\. *ds' | groff -T$format $cmdstring - $temp -Z > $mandit");
system("./gro$format -F.:/usr/share/groff/current/font $frontdit $mandit -
p$paper  > $book");

(This includes changes by Brian Inglish ts). If you remove the lines which 
call system you will end up with just the single file LMB.man (in about a 
quarter of a second). You can treat this file just the same as your single 
page example if you want to.

The first system call creates the title page from the troff source file 
LMBfront.t and produces LMBfront.set, this can be added to your makefile as an 
entirely separate rule depending on whether the .set file needs to be built.

The second and third system calls are the calls to groff which could be put 
into your makefile or split into separate stages to avoid parallelism.

The second system call produces LinuxManBook.set and the third system combines 
this with LMBfront.set to produce the pdf.

The "./" in the third system call is because I gave you a pre-release gropdf, 
you may be using the released 1.23.0 gropdf now.

> > On my system this takes about 18
> > seconds to produce the 2800+ pages of the book. Of this, a quarter of a
> > second is consumed by the "magic" part of the script, the rest of the 18
> > seconds is consumed by calls to groff and gropdf.
> 
> But how much of that work needs to be on a single process?  I bought a
> new CPU with 24 cores.  Gotta use them all  :D

I realise you are having difficulty in letting go of your idea of re-using 
previous work, rather than starting afresh each time. Imagine a single word 
change in one man page causes it to grow from 2 pages to 3, so all links to 
pages after this changed entry would be one page adrift. This is why very 
little previous work is useful, and why the whole book has to be dealt with as 
a single process. If each entry was processed separately, as you would like to 
use all your shiny new cores, how would the process dealing with accept(2) 
know which page socket(2) would be on when it adds it as a link in the text. I 
hope you can see that at some point it has to be treated as a homogenous whole 
in order calculate correct links between entries.

> > So any splitting of the perl script is
> > only going to have an effect on the quarter of a second!
> > 
> > I don't understand why the perl script can't be included in your make file
> > as part of build-pdf target.
> 
> It can.  I just prefer to be strict about the Makefile having "one rule
> per each file", while currently the script generates 4 files (T, two
> .Z's, and the .pdf).

Explained how to separate above so that the script only generates LMB.man and 
the system calls moved to the makefile.

> > Presumably it would be dependent on running after
> > the scripts which add the revision label and date to each man page.
> 
> I only set the revision and date on dist tarballs.  For the git HEAD
> book, I'd keep the (unreleased) version and (date).  So, no worries
> there.

Given that you seem to intend to offer these interim books as a download, it 
would make sense if they included either a date or git commit ID to 
differenciate them, if someone queries something it would be useful to know 
exactly what they were looking at.

Cheers 

Deri

> > > Since I don't understand Perl, and don't know much of gropdf(1) either,
> > > I need help.
> > > 
> > > Maybe Deri or Branden can help with that.  If anyone else understands it
> > > and can also help, that's very welcome too!
> > 
> > You are probably better placed to add the necessaries to your makefile.
> > You
> > would then just need to remember to make build-pdf any time you alter one
> > of the source man pages. Since you are manually running my script to
> > produce the pdf, it should not be difficult to automate it in a makefile.
> > 
> > > Then I could install a hook in my server that runs
> > > 
> > > 	$ make build-pdf docdir=/srv/www/...
> > 
> > And wait 18s each time the hook is actioned!! Or, set the build to place
> > the generated pdf somewhere in /srv/www/... and include the build in your
> > normal workflow when a man page is changed.
> 
> Hmm.  I still hope some of it can be parallelized, but 18s could be
> reasonable, if the server does that in the background after pushing.
> My old raspberry pi would burn, but the new computer should handle that
> just fine.

I'm confused. The 18s is how long it takes to generate the book, so if the 
book is built in response to an access to a particular url the http server 
can't start "pushing" for the 18s, then addon the transfer time for the pdf 
and I suspect you will have a lot of aborted transfers. Additionally, the 
script, and any makefile equivalent you write, is not designed for concurrent 
invocation, so if two people visit the same url within the 18 second window 
neither user will receive a valid pdf.

I advise the build becomes part of your workflow after making changes, and 
then place the pdf in a location where it can be served by the http server.

Your model of slicing and dicing man pages to be processed individually is 
doable using a website to serve the individual pages, see:-

http://chuzzlewit.co.uk/WebManPDF.pl/man:/2/accept

This is running on a 1" cube no more powerful than a raspberry pi 3. The 
difference is that the "magic sauce" added to each man page sets the links to 
external http calls back to itself to produce another man page, rather than 
internal links to another part of the pdf. You can get an index of all the man 
pages, on the (very old) system, here.

http://chuzzlewit.co.uk/

Cheers 

Deri

> Cheers,
> Alex





^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated)
  2023-11-19 16:21                                   ` Deri
@ 2023-11-19 20:58                                     ` Alejandro Colomar
  2023-11-20  0:46                                       ` G. Branden Robinson
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-19 20:58 UTC (permalink / raw)
  To: Deri; +Cc: Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 11139 bytes --]

On Sun, Nov 19, 2023 at 04:21:45PM +0000, Deri wrote:
> > 	$ touch man2/membarrier.2
> > 	$ make build-pdf
> > 	PRECONV	.tmp/man/man2/membarrier.2.tbl
> > 	TBL	.tmp/man/man2/membarrier.2.eqn
> > 	EQN	.tmp/man/man2/membarrier.2.pdf.troff
> > 	TROFF	.tmp/man/man2/membarrier.2.pdf.set
> > 	GROPDF	.tmp/man/man2/membarrier.2.pdf
> > 
> > That helps debug the pipeline, and also learn about it.
> > 
> > If that helps parallelize some tasks, then that'll be welcome.
> 
> Hi Alex,

Hi Deri,

> Doing it that way actually stops the jobs being run in parallel! Each step 

Hmm, kind of makes sense.

> completes before the next step starts, whereas if you let groff build the 
> pipeline all the processes are run in parallel. Using separate steps may be 
> desirable for "understanding every little step of the groff pipeline", (and 

Still a useful thing for our build system.

> may aid debugging an issue), but once such knowledge is obtained it is 
> probably better to leave the pipelining to groff, in a production environment.

Unless performance is really a problem, I prefer the understanding and
debugging aid.  It'll help not only me, but others who see the project
and would like to learn how all this magic works.

> > > The time saved would be absolutely minimal. It is obvious that to produce
> > > a
> > > pdf containing all the man pages then all the man pages have to be
> > > consumed by groff, not just the page which has changed.
> > 
> > But do you need to run the entire pipeline, or can you reuse most of it?
> > I can process in parallel much faster, with `make -jN ...`.  I guess
> > the .pdf.troff files can be reused; maybe even the .pdf.set ones?
> > 
> > Could you change the script at least to produce intermediary files as in
> > the pipeline shown above?  As many as possible would be excellent.
> 
> Perhaps it would help if I explain the stages of my script. First a look at 
> what the script needs to do to produce a pdf of all man pages. There are too 
> many files to produce a single command line with all the filenames of each 
> man, groff has no mechanism for passing a list of filenames, so first job is 

You can always `find ... | xargs cat | troff /dev/stdin`

> to concatenate all the separate files into one input file for groff. And while 
> we are doing that, add the "magic sauce" which makes all the pdf links in the 
> book and sorts out the aliases which point to another man page.

Yep, I think I partially understood that part of the script today.  It's
what this `... | LC_ALL=C grep '^\\. *ds' |` pipeline produces and
passes to groff, right?

> After this is done there is a single troff file, called LMB.man, which is the 

That's what's currently called LinuxManBook.Z, right?

> file groff is going to process. In the script you should see something like 
> this:-
> 
> my $temp='LMB.man';

I don't.  Maybe you have a slightly different version of it?

> [...]
> 
> my $format='pdf';
> my $paper=$fpaper ||';
> my $cmdstring="-T$format -k -pet -M. -F. -mandoc -manmark -dpaper=$paper -P-
> p$paper -rC1 -rCHECKSTYLE=3";
> my $front='LMBfront.t';
> my $frontdit='LMBfront.set';
> my $mandit='LinuxManBook.set';
> my $book="LinuxManBook.$format";
> 
> system("groff -T$format -dpaper=$paper -P-p$paper -ms $front -Z > $frontdit");

This creates the front page .set file

> system("groff -z -dPDF.EXPORT=1 -dLABEL.REFS=1 $temp $cmdstring 2>&1 | 
> LC_ALL=C grep '^\\. *ds' |

This creates the bookmarks, right?

> groff -T$format $cmdstring - $temp -Z > $mandit");

And this is the main .set file.

> system("./gro$format -F.:/usr/share/groff/current/font $frontdit $mandit -
> p$paper  > $book");

And finally we have the book.

> 
> (This includes changes by Brian Inglish ts). If you remove the lines which 
> call system you will end up with just the single file LMB.man (in about a 
> quarter of a second). You can treat this file just the same as your single 
> page example if you want to.
> 
> The first system call creates the title page from the troff source file 
> LMBfront.t and produces LMBfront.set, this can be added to your makefile as an 
> entirely separate rule depending on whether the .set file needs to be built.
> 
> The second and third system calls are the calls to groff which could be put 
> into your makefile or split into separate stages to avoid parallelism.
> 
> The second system call produces LinuxManBook.set and the third system combines 
> this with LMBfront.set to produce the pdf.
> 
> The "./" in the third system call is because I gave you a pre-release gropdf, 
> you may be using the released 1.23.0 gropdf now.
> 
> > > On my system this takes about 18
> > > seconds to produce the 2800+ pages of the book. Of this, a quarter of a
> > > second is consumed by the "magic" part of the script, the rest of the 18
> > > seconds is consumed by calls to groff and gropdf.
> > 
> > But how much of that work needs to be on a single process?  I bought a
> > new CPU with 24 cores.  Gotta use them all  :D
> 
> I realise you are having difficulty in letting go of your idea of re-using 
> previous work, rather than starting afresh each time. Imagine a single word 
> change in one man page causes it to grow from 2 pages to 3, so all links to 
> pages after this changed entry would be one page adrift. This is why very 
> little previous work is useful, and why the whole book has to be dealt with as 
> a single process.

Does such a change need re-running troff(1)?  Or is gropdf(1) enough?  If
troff(1)

My problem is probably that I don't know what's done by `gropdf`, and
what's done by `troff -Tpdf`.  I was hoping that `troff -Tpdf` still
didn't need to know about the entire book, and that only gropdf(1) would
need that.

> If each entry was processed separately, as you would like to 
> use all your shiny new cores, how would the process dealing with accept(2) 
> know which page socket(2) would be on when it adds it as a link in the text. I 
> hope you can see that at some point it has to be treated as a homogenous whole 
> in order calculate correct links between entries.
> 
> > > So any splitting of the perl script is
> > > only going to have an effect on the quarter of a second!
> > > 
> > > I don't understand why the perl script can't be included in your make file
> > > as part of build-pdf target.
> > 
> > It can.  I just prefer to be strict about the Makefile having "one rule
> > per each file", while currently the script generates 4 files (T, two
> > .Z's, and the .pdf).
> 
> Explained how to separate above so that the script only generates LMB.man and 
> the system calls moved to the makefile.

Thanks!

> > > Presumably it would be dependent on running after
> > > the scripts which add the revision label and date to each man page.
> > 
> > I only set the revision and date on dist tarballs.  For the git HEAD
> > book, I'd keep the (unreleased) version and (date).  So, no worries
> > there.
> 
> Given that you seem to intend to offer these interim books as a download, it 
> would make sense if they included either a date or git commit ID to 
> differenciate them, if someone queries something it would be useful to know 
> exactly what they were looking at.

The books for releases are available at

<https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/man-pages-6.05.01.pdf>

(replace the version numbers for other versions, or navigate the dirs)
I need to document that in the README of the project.

For git HEAD, I plan to have something like

<https://www.alejandro-colomar.es/share/dist/man-pages/git/man-pages-HEAD.pdf>

It's mainly intended for easily checking what git HEAD looks like, and
discard that later.  If the audience asks for version numbers, though,
I could create provide `git --describe` versions and dates in the pages.

> Cheers 
> 
> Deri
> 
> > > > Since I don't understand Perl, and don't know much of gropdf(1) either,
> > > > I need help.
> > > > 
> > > > Maybe Deri or Branden can help with that.  If anyone else understands it
> > > > and can also help, that's very welcome too!
> > > 
> > > You are probably better placed to add the necessaries to your makefile.
> > > You
> > > would then just need to remember to make build-pdf any time you alter one
> > > of the source man pages. Since you are manually running my script to
> > > produce the pdf, it should not be difficult to automate it in a makefile.
> > > 
> > > > Then I could install a hook in my server that runs
> > > > 
> > > > 	$ make build-pdf docdir=/srv/www/...
> > > 
> > > And wait 18s each time the hook is actioned!! Or, set the build to place
> > > the generated pdf somewhere in /srv/www/... and include the build in your
> > > normal workflow when a man page is changed.
> > 
> > Hmm.  I still hope some of it can be parallelized, but 18s could be
> > reasonable, if the server does that in the background after pushing.
> > My old raspberry pi would burn, but the new computer should handle that
> > just fine.
> 
> I'm confused. The 18s is how long it takes to generate the book, so if the 
> book is built in response to an access to a particular url the http server 
> can't start "pushing" for the 18s, then addon the transfer time for the pdf 
> and I suspect you will have a lot of aborted transfers. Additionally, the 
> script, and any makefile equivalent you write, is not designed for concurrent 
> invocation, so if two people visit the same url within the 18 second window 
> neither user will receive a valid pdf.

No, my intention is that whenever I `git push` via SSH, the receiving
server runs `make build-book-pdf` after receiving the changes.  That is
run after the git SSH connection has closed, so I wouldn't notice.

HTTP connections wouldn't trigger anything in my server, except Nginx
serving the file, of course.

> I advise the build becomes part of your workflow after making changes, and 
> then place the pdf in a location where it can be served by the http server.
> 
> Your model of slicing and dicing man pages to be processed individually is 
> doable using a website to serve the individual pages, see:-
> 
> http://chuzzlewit.co.uk/WebManPDF.pl/man:/2/accept
> 
> This is running on a 1" cube no more powerful than a raspberry pi 3. The 
> difference is that the "magic sauce" added to each man page sets the links to 
> external http calls back to itself to produce another man page, rather than 
> internal links to another part of the pdf. You can get an index of all the man 
> pages, on the (very old) system, here.
> 
> http://chuzzlewit.co.uk/

Yep, I've seen that server :)
Long term I also intend to provide one-page PDFs and HTML files of the
pages.  Although I prefer pre-generating them, instead of on-demand.
Maybe a git hook, or maybe a cron job that re-generates them once a day
or so.

Cheers,
Alex

> 
> Cheers 
> 
> Deri

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated)
  2023-11-19 20:58                                     ` Alejandro Colomar
@ 2023-11-20  0:46                                       ` G. Branden Robinson
  2023-11-20  9:43                                         ` Alejandro Colomar
  0 siblings, 1 reply; 138+ messages in thread
From: G. Branden Robinson @ 2023-11-20  0:46 UTC (permalink / raw)
  To: Alejandro Colomar, Deri; +Cc: Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 3642 bytes --]

Hi Alex and Deri,

I'm going to address just a few small parts of this message...

At 2023-11-19T21:58:03+0100, Alejandro Colomar wrote:
> You can always `find ... | xargs cat | troff /dev/stdin`

...not if you need to preprocess any of the input.  With tbl(1), for
instance.

> My problem is probably that I don't know what's done by `gropdf`, and
> what's done by `troff -Tpdf`.  I was hoping that `troff -Tpdf` still
> didn't need to know about the entire book, and that only gropdf(1)
> would need that.

This stuff is documented in groff's Texinfo manual, and in the groff(1)
and roff(7) man pages.

Here's an excerpt of the last.

Using roff
       When you read a man page, often a roff is the program rendering
       it.  Some roff implementations provide wrapper programs that make
       it easy to use the roff system from the shell’s command line.
       These can be specific to a macro package, like mmroff(1), or more
       general.  groff(1) provides command‐line options sparing the user
       from constructing the long, order‐dependent pipelines familiar to
       AT&T troff users.  Further, a heuristic program, grog(1), is
       available to infer from a document’s contents which groff
       arguments should be used to process it.

   The roff pipeline
       A typical roff document is prepared by running one or more
       processors in series, followed by a a formatter program and then
       an output driver (or “device postprocessor”).  Commonly, these
       programs are structured into a pipeline; that is, each is run in
       sequence such that the output of one is taken as the input to the
       next, without passing through secondary storage.  (On non‐Unix
       systems, pipelines may have to be simulated with temporary
       files.)

        $ preproc1 < input‐file | preproc2 | ... | troff [option] ... \
            | output‐driver

       Once all preprocessors have run, they deliver pure roff language
       input to the formatter, which in turn generates a document in a
       page description language that is then interpreted by a
       postprocessor for viewing, printing, or further processing.

gropdf(1) is the output driver for the PDF "device".  So "groff -T pdf
input.tr" and "troff -T pdf input.tr | gropdf" are equivalent.

(Yes, you still need the `-T pdf` arguments, even to troff proper.

roff(7) again:

Concepts
[...]
       When a device‐independent roff formatter starts up, it obtains
       information about the device for which it is preparing output
       from the latter’s description file (see groff_font(5)).  An
       essential property is the length of the output line, such as “6.5
       inches”.
)

> > Your model of slicing and dicing man pages to be processed
> > individually is doable using a website to serve the individual
> > pages, see:-
> > 
> > http://chuzzlewit.co.uk/WebManPDF.pl/man:/2/accept
> > 
> > This is running on a 1" cube no more powerful than a raspberry pi 3.
> > The difference is that the "magic sauce" added to each man page sets
> > the links to external http calls back to itself to produce another
> > man page, rather than internal links to another part of the pdf. You
> > can get an index of all the man pages, on the (very old) system,
> > here.
> > 
> > http://chuzzlewit.co.uk/
> 
> Yep, I've seen that server :)

Is it just me, or are the fonts not getting embedded in the PDFs
generated by chuzzlewit?  They look fine on my desktop machine but
pretty bad on my Android tablet.

Regards,
Branden

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated)
  2023-11-20  0:46                                       ` G. Branden Robinson
@ 2023-11-20  9:43                                         ` Alejandro Colomar
  0 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-20  9:43 UTC (permalink / raw)
  To: G. Branden Robinson; +Cc: Deri, Jonny Grant, linux-man

[-- Attachment #1: Type: text/plain, Size: 4450 bytes --]

Hi Branden,

On Sun, Nov 19, 2023 at 06:46:29PM -0600, G. Branden Robinson wrote:
> Hi Alex and Deri,
> 
> I'm going to address just a few small parts of this message...
> 
> At 2023-11-19T21:58:03+0100, Alejandro Colomar wrote:
> > You can always `find ... | xargs cat | troff /dev/stdin`
> 
> ...not if you need to preprocess any of the input.  With tbl(1), for
> instance.

What I mean is that I can preprocess individually:

find ... | while read f; do eqn $f > $f.troff; done

And only process together in a single invocation what _needs_ to be done
in a single invocation:

find ... | xargs cat | gropdf /dev/stdin

I guess that preprocessors can be run per-file.
I know that gropdf(1) must be run with the entire book as input.
But I don't know if `troff -Tpdf` needs to see the entire book at once,
or if it can process each file separately.

In my laptop, the pipeline for building the Linux Man Book takes 23.3 s.
I've split the processing of the book so that I produce every
intermediary file in the pipeline (except pic(1), which I think we don't
need).  From that, I've seen the times it takes for each program to do
its job (and importantly, the overall time wasn't slower; it took again
23.3 s): preconv(1) takes 0.04 s; tbl(1) takes 0.06 s; eqn(1) takes
0.05 s; troff(1) takes 2.8 s; and gropdf(1) takes 17.6 s.

The time taken by gropdf(1) is mandatory, since it can't process the
individual files separately.  But if we can reduce the time taken by all
other programs close to 0, it would be good.  It depends on which
programs need to see the entire book, and which can process each file
separately.

Nevertheless, I think it's interesting to process the book per-file, as
much as possible, even if the overall time won't change significantly.
It is a good documentation of what needs to be processed together and
what not, when building a PDF document with groff.

> > My problem is probably that I don't know what's done by `gropdf`, and
> > what's done by `troff -Tpdf`.  I was hoping that `troff -Tpdf` still
> > didn't need to know about the entire book, and that only gropdf(1)
> > would need that.
> 
> This stuff is documented in groff's Texinfo manual, and in the groff(1)
> and roff(7) man pages.
> 
> Here's an excerpt of the last.
> 
> Using roff
>        When you read a man page, often a roff is the program rendering
>        it.  Some roff implementations provide wrapper programs that make
>        it easy to use the roff system from the shell’s command line.
>        These can be specific to a macro package, like mmroff(1), or more
>        general.  groff(1) provides command‐line options sparing the user
>        from constructing the long, order‐dependent pipelines familiar to
>        AT&T troff users.  Further, a heuristic program, grog(1), is
>        available to infer from a document’s contents which groff
>        arguments should be used to process it.
> 
>    The roff pipeline
>        A typical roff document is prepared by running one or more
>        processors in series, followed by a a formatter program and then
>        an output driver (or “device postprocessor”).  Commonly, these
>        programs are structured into a pipeline; that is, each is run in
>        sequence such that the output of one is taken as the input to the
>        next, without passing through secondary storage.  (On non‐Unix
>        systems, pipelines may have to be simulated with temporary
>        files.)
> 
>         $ preproc1 < input‐file | preproc2 | ... | troff [option] ... \
>             | output‐driver
> 
>        Once all preprocessors have run, they deliver pure roff language
>        input to the formatter, which in turn generates a document in a
>        page description language that is then interpreted by a
>        postprocessor for viewing, printing, or further processing.
> 
> gropdf(1) is the output driver for the PDF "device".  So "groff -T pdf
> input.tr" and "troff -T pdf input.tr | gropdf" are equivalent.
> 
> (Yes, you still need the `-T pdf` arguments, even to troff proper.

This doesn't answer my doubt.  For generating a book, does troff(1) need
to see the entire book, or it enough if gropdf(1) does?  My guess is
that troff(1) also needs to see the entire book, but I don't know for
sure.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-18 23:40                               ` Jonny Grant
@ 2023-11-20 11:56                                 ` Jonny Grant
  2023-11-20 15:12                                   ` Alejandro Colomar
  0 siblings, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-20 11:56 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Paul Eggert, Matthew House, linux-man

BTW, GCC has a useful warning for truncation that may help code bases that use strncpy, you've probably seen this and the article, just sharing for completeness.

warning: ‘__builtin_strncpy’ output truncated before terminating nul copying XYZ bytes from a string of the same length [-Wstringop-truncation]


Martin's article from 2019
https://developers.redhat.com/blog/2018/05/24/detecting-string-truncation-with-gcc-8#forming_truncated_strings_with_snprintf

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-20 11:56                                 ` Jonny Grant
@ 2023-11-20 15:12                                   ` Alejandro Colomar
  2023-11-20 23:08                                     ` Jonny Grant
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-20 15:12 UTC (permalink / raw)
  To: Jonny Grant; +Cc: Paul Eggert, Matthew House, linux-man

[-- Attachment #1: Type: text/plain, Size: 1133 bytes --]

Hi Jonny,

On Mon, Nov 20, 2023 at 11:56:40AM +0000, Jonny Grant wrote:
> BTW, GCC has a useful warning for truncation that may help code bases that use strncpy, you've probably seen this and the article, just sharing for completeness.

It's actually the opposite.  GCC's warnings about strncpy(3) are
nefarious, as it warns in valid uses of strncpy(3) for writing a
null-padded character sequence (the use for which strncpy(3) was
designed), recommending the bogus use as a function for copying
truncated strings.

> 
> warning: ‘__builtin_strncpy’ output truncated before terminating nul copying XYZ bytes from a string of the same length [-Wstringop-truncation]
> 
> 
> Martin's article from 2019
> https://developers.redhat.com/blog/2018/05/24/detecting-string-truncation-with-gcc-8#forming_truncated_strings_with_snprintf

I discussed with Martin about this, IIRC, and he told me they had to
decide which use of strncpy(3) to support, with the side effect that
other uses would be warned about, and they chose the one that I think is
bogus.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-20 15:12                                   ` Alejandro Colomar
@ 2023-11-20 23:08                                     ` Jonny Grant
  2023-11-20 23:42                                       ` Alejandro Colomar
  0 siblings, 1 reply; 138+ messages in thread
From: Jonny Grant @ 2023-11-20 23:08 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: Paul Eggert, Matthew House, linux-man



On 20/11/2023 15:12, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Mon, Nov 20, 2023 at 11:56:40AM +0000, Jonny Grant wrote:
>> BTW, GCC has a useful warning for truncation that may help code bases that use strncpy, you've probably seen this and the article, just sharing for completeness.
> 
> It's actually the opposite.  GCC's warnings about strncpy(3) are
> nefarious, as it warns in valid uses of strncpy(3) for writing a
> null-padded character sequence (the use for which strncpy(3) was
> designed), recommending the bogus use as a function for copying
> truncated strings.

You're right, I can see this warning is issued for valid uses of strncpy(3) to copy a sequence of characters, (without even a single NUL pad). It does not warn when the byte sequence count includes a NUL byte.

>>
>> warning: ‘__builtin_strncpy’ output truncated before terminating nul copying XYZ bytes from a string of the same length [-Wstringop-truncation]
>>
>>
>> Martin's article from 2019
>> https://developers.redhat.com/blog/2018/05/24/detecting-string-truncation-with-gcc-8#forming_truncated_strings_with_snprintf
> 
> I discussed with Martin about this, IIRC, and he told me they had to
> decide which use of strncpy(3) to support, with the side effect that
> other uses would be warned about, and they chose the one that I think is
> bogus.

Fair enough.


While I remember, the strlcpy discussion has been going on for over 20 years.

https://sourceware.org/legacy-ml/libc-alpha/2000-08/msg00053.html

https://news.ycombinator.com/item?id=6940601


Kind regards, Jonny

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: strncpy clarify result may not be null terminated
  2023-11-20 23:08                                     ` Jonny Grant
@ 2023-11-20 23:42                                       ` Alejandro Colomar
  0 siblings, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-20 23:42 UTC (permalink / raw)
  To: Jonny Grant; +Cc: Paul Eggert, Matthew House, linux-man

[-- Attachment #1: Type: text/plain, Size: 1240 bytes --]

Hi Jonny,

On Mon, Nov 20, 2023 at 11:08:58PM +0000, Jonny Grant wrote:
> > I discussed with Martin about this, IIRC, and he told me they had to
> > decide which use of strncpy(3) to support, with the side effect that
> > other uses would be warned about, and they chose the one that I think is
> > bogus.
> 
> Fair enough.

To be fair with Martin and GCC, the uses of strncpy(3) that I consider
correct are so trivial that those warnings are unnecessary, since one
should always use sizeof(dst) in the call, which can be done by a
wrapper macro

	#define STRNCPY(dst, src)  strncpy(dst, src, nitems(dst))

which is precisely what I did in shadow-utils.  With this, the chances
of getting the size wrong are 0, so I'd just turn off those warnings.

Since strncpy(3) should always be used for writing to a fixed-size
array, it's likely to be an actual array, of which you can take the
size with nitems().  At least in shadow-utils, all calls have been
replaced by that macro.  I'm curious if all uses are similarly trivial
in tar(1).

So if this warning helps those who misuse strncpy(3) to at least misuse
it safely, then it's a partially-good thing.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: NULL safety
  2023-11-18 23:21                               ` NULL safety Jonny Grant
@ 2023-11-24 22:25                                 ` Alejandro Colomar
  2023-11-25  0:57                                   ` Jonny Grant
  0 siblings, 1 reply; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-24 22:25 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man

[-- Attachment #1: Type: text/plain, Size: 957 bytes --]

Hi Jonny,

On Sat, Nov 18, 2023 at 11:21:00PM +0000, Jonny Grant wrote:
> I saw Christopher Bazley was talking about this. As I understand it, _Nonnull is milder than attribute nonnull. _Nonnull probably helps with static analysis, but doesn't optimize out any code checking if(ptr == NULL) return -1;
> 
> Saw this, did you get traction with your proposal?
> 
> https://discourse.llvm.org/t/iso-c3x-proposal-nonnull-qualifier/59269?page=2

I didn't follow up with that.  I'd first like to be able to try Clang's
static analyzer with _Nullable, to be able to play with it.  An
_Optional qualifier would only be usable by something like -fanalyzer,
or Clang's analyzer, since it needs to avoid false positives that are
quite complex.  It's not a warning that you'd want in -Wall.

And since Clang's analyzer isn't easy to use, I'm not working on that
until they make it easier.

Cheers,
Alex


-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: NULL safety
  2023-11-24 22:25                                 ` Alejandro Colomar
@ 2023-11-25  0:57                                   ` Jonny Grant
  0 siblings, 0 replies; 138+ messages in thread
From: Jonny Grant @ 2023-11-25  0:57 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: linux-man



On 24/11/2023 22:25, Alejandro Colomar wrote:
> Hi Jonny,
> 
> On Sat, Nov 18, 2023 at 11:21:00PM +0000, Jonny Grant wrote:
>> I saw Christopher Bazley was talking about this. As I understand it, _Nonnull is milder than attribute nonnull. _Nonnull probably helps with static analysis, but doesn't optimize out any code checking if(ptr == NULL) return -1;
>>
>> Saw this, did you get traction with your proposal?
>>
>> https://discourse.llvm.org/t/iso-c3x-proposal-nonnull-qualifier/59269?page=2
> 
> I didn't follow up with that.  I'd first like to be able to try Clang's
> static analyzer with _Nullable, to be able to play with it.  An
> _Optional qualifier would only be usable by something like -fanalyzer,
> or Clang's analyzer, since it needs to avoid false positives that are
> quite complex.  It's not a warning that you'd want in -Wall.
> 
> And since Clang's analyzer isn't easy to use, I'm not working on that
> until they make it easier.

Ok I see. GCC's -fanalyzer is useful I find, I've not tried Clang.

I made my own compile_assert() that may/may not be of use for the things you are working on, it works in GCC, its just like regular code. I use to check for things like NULL pointers, or overflows at compile time, rather than runtime like assert().

https://github.com/jonnygrant/compile_assert

There will be some false positives on complex areas of code. It's quite simple, and is just using the tooling we have with GCC to catch things at compile time, that static_assert() can't. Anyway, interested to hear any feedback if you do try it.

Cheers, Jonny


^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
  2023-11-09 11:08                                 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
  2023-11-09 14:06                                   ` catenate vs concatenate Jonny Grant
@ 2023-11-27 14:33                                   ` Zack Weinberg
  2023-11-27 15:08                                     ` Alejandro Colomar
  1 sibling, 1 reply; 138+ messages in thread
From: Zack Weinberg @ 2023-11-27 14:33 UTC (permalink / raw)
  To: Alejandro Colomar, Jonny Grant
  Cc: Paul Eggert, Carlos O'Donell, GNU libc development,
	'linux-man'

[all attribution deleted because it was so tangled I couldn't make
sense of it]

>> Rather than "catenation", in my experience "concatenation" is the
>> common term
...
> We began fighting this pomposity before v7. There has only been
> backsliding since. "Catenate" is crisper, means the same thing,

[English pedant mode on]

"Concatenate" is the correct term; "catenate" means something completely
different, probably "hang between two posts like a chain".  You can't
chop prefixes off a Latinate word and have it still mean the same thing.

[English pedant mode off]

Also, and much more importantly, "concatenate" is used at least 100x
more often than "catenate" in modern English, and that means it's the
word that a randomly selected reader of the manpages is more likely to
know, and, therefore, the word that the manpages should be using.

https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3

zw

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
  2023-11-27 14:33                                   ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
@ 2023-11-27 15:08                                     ` Alejandro Colomar
  2023-11-27 15:13                                       ` Alejandro Colomar
  2023-11-27 16:59                                       ` G. Branden Robinson
  0 siblings, 2 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-27 15:08 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: Jonny Grant, Paul Eggert, Carlos O'Donell,
	GNU libc development, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 2002 bytes --]

Hi Zack,

On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote:
> [all attribution deleted because it was so tangled I couldn't make
> sense of it]
> 
> >> Rather than "catenation", in my experience "concatenation" is the
> >> common term

The above was Jonny Grant.

> > We began fighting this pomposity before v7. There has only been
> > backsliding since. "Catenate" is crisper, means the same thing,

The above was Doug McIlroy.

> [English pedant mode on]
> 
> "Concatenate" is the correct term; "catenate" means something completely
> different, probably "hang between two posts like a chain".  You can't
> chop prefixes off a Latinate word and have it still mean the same thing.

[Latin pedant mode on]

contatenate comes from the Latin concatenare.  The prefix "con-" means
"join", "together", and "catena" means "chain".
<https://en.wiktionary.org/wiki/concatenate>

catenate comes from the Latin catenare, which AFAICS, seems a synonym.
It just drops the redundant "con-" prefix, since "catena" already
implies it.
<https://en.wiktionary.org/wiki/catenate>

English isn't as propense as other Latin languages to have such synonyms
where one of them simply adds a redundant prefix or suffix, but Catalan
or Spanish for example have several such cases.

[Latin pedant mode off]

> [English pedant mode off]
> 
> Also, and much more importantly, "concatenate" is used at least 100x
> more often than "catenate" in modern English, and that means it's the
> word that a randomly selected reader of the manpages is more likely to
> know, and, therefore, the word that the manpages should be using.
> 
> https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3

Heh, Paul sent a patch for changing it to append, which I applied, since
it reads better, even if it removes the mnemonics of cat for catenate.  :)

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
  2023-11-27 15:08                                     ` Alejandro Colomar
@ 2023-11-27 15:13                                       ` Alejandro Colomar
  2023-11-27 16:59                                       ` G. Branden Robinson
  1 sibling, 0 replies; 138+ messages in thread
From: Alejandro Colomar @ 2023-11-27 15:13 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: Jonny Grant, Paul Eggert, Carlos O'Donell,
	GNU libc development, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 2245 bytes --]

On Mon, Nov 27, 2023 at 04:08:17PM +0100, Alejandro Colomar wrote:
> Hi Zack,
> 
> On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote:
> > [all attribution deleted because it was so tangled I couldn't make
> > sense of it]
> > 
> > >> Rather than "catenation", in my experience "concatenation" is the
> > >> common term
> 
> The above was Jonny Grant.
> 
> > > We began fighting this pomposity before v7. There has only been
> > > backsliding since. "Catenate" is crisper, means the same thing,
> 
> The above was Doug McIlroy.
> 
> > [English pedant mode on]
> > 
> > "Concatenate" is the correct term; "catenate" means something completely
> > different, probably "hang between two posts like a chain".  You can't
> > chop prefixes off a Latinate word and have it still mean the same thing.
> 
> [Latin pedant mode on]
> 
> contatenate comes from the Latin concatenare.  The prefix "con-" means
> "join", "together", and "catena" means "chain".
> <https://en.wiktionary.org/wiki/concatenate>
> 
> catenate comes from the Latin catenare, which AFAICS, seems a synonym.
> It just drops the redundant "con-" prefix, since "catena" already
> implies it.
> <https://en.wiktionary.org/wiki/catenate>
> 
> English isn't as propense as other Latin languages to have such synonyms

s/other//

> where one of them simply adds a redundant prefix or suffix, but Catalan
> or Spanish for example have several such cases.
> 
> [Latin pedant mode off]
> 
> > [English pedant mode off]
> > 
> > Also, and much more importantly, "concatenate" is used at least 100x
> > more often than "catenate" in modern English, and that means it's the
> > word that a randomly selected reader of the manpages is more likely to
> > know, and, therefore, the word that the manpages should be using.
> > 
> > https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3
> 
> Heh, Paul sent a patch for changing it to append, which I applied, since
> it reads better, even if it removes the mnemonics of cat for catenate.  :)
> 
> Cheers,
> Alex
> 
> -- 
> <https://www.alejandro-colomar.es/>



-- 
<https://www.alejandro-colomar.es/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
  2023-11-27 15:08                                     ` Alejandro Colomar
  2023-11-27 15:13                                       ` Alejandro Colomar
@ 2023-11-27 16:59                                       ` G. Branden Robinson
  2023-11-27 18:35                                         ` Zack Weinberg
  1 sibling, 1 reply; 138+ messages in thread
From: G. Branden Robinson @ 2023-11-27 16:59 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Zack Weinberg, Jonny Grant, Paul Eggert, Carlos O'Donell,
	GNU libc development, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 5481 bytes --]

At 2023-11-27T16:08:17+0100, Alejandro Colomar wrote:
> On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote:
> > [all attribution deleted because it was so tangled I couldn't make
> > sense of it]

This elision was pretty poor form, given that one of the people whose
attribution (and opinion) Zack discarded was a relevant authority: M.
Douglas McIlroy, an alum of the Bell Labs Computing Science Research
Center and editor of the Seventh Edition Unix Programmer's Manual.

> > > We began fighting this pomposity before v7. There has only been
> > > backsliding since. "Catenate" is crisper, means the same thing,
> 
> The above was Doug McIlroy.
> 
> > [English pedant mode on]
> > 
> > "Concatenate" is the correct term; "catenate" means something
> > completely different, probably "hang between two posts like a
> > chain".  You can't chop prefixes off a Latinate word and have it
> > still mean the same thing.

In some cases, you can.  Witness the case of "flammable"/inflammable",
which are synonymous.  The former term arose because the prefix "in-"
alters meaning in multiple ways in English[1] (maybe Latin, too).  The
coinage of "flammable" later became important in the labeling and
transport of hazardous materials.  Some pedants must despair of this
linguistic innovation, perhaps viewing the prospect of handlers of such
materials burning to death as a just punishment for their lack of
morphological and etymological sophistication.  If you don't want to die
like a prole, get an English degree, eh?[2]

Here, the "con-" prefix is duplicative.  It doesn't pay its freight.

> > [English pedant mode off]

When one discards all other authorities, all that remains is one's own.
I trust we can recognize the parallels here with Dunning-Krugeresque
self-regard.

> > Also, and much more importantly, "concatenate" is used at least 100x
> > more often than "catenate" in modern English, and that means it's
> > the word that a randomly selected reader of the manpages is more
> > likely to know, and, therefore, the word that the manpages should be
> > using.

Man pages are specialized technical literature demanding a bespoke
vocabulary.  Some employment of jargon is inescapable, even necessary.
In any case, "catenate" has ~50 years of attestation in this domain
alone, which constitutes approximately the entire history of Unix
discourse.

If you apply this sort of frequency analysis to contrast man page and
general English corpora more broadly, I predict that you'll find many
candidates for terminological replacement that you would _not_ embrace.

For instance...[3]

https://books.google.com/ngrams/graph?content=open+source%2Cfree+software&year_start=1980&year_end=2019&corpus=en-2019&smoothing=3
https://books.google.com/ngrams/graph?content=emacs%2Cvi&year_start=1980&year_end=2019&corpus=en-2019&smoothing=3

Zack also overlooks the process by which speakers and readers of a
language grapple with unfamiliar words that they encounter unexpectedly.
Before undertaking to reach for dictionaries (online or otherwise), many
readers morphophonemically analyze them to see if they can infer their
meanings from familiar components.[4]

> > https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3
> 
> Heh, Paul sent a patch for changing it to append, which I applied,
> since it reads better, even if it removes the mnemonics of cat for
> catenate.  :)

In Unix culture, one will need to remain conversant with the term
"catenate" to know why cat(1) is not named "concat(1)".  ;-)

"Concatenate" may end up prevailing even in *nix man pages; languages do
not necessarily evolve in directions that maximize lexical economy.[5]

But to change one's usage based on the break room reasoning put on offer
in this thread is a terrible idea.

Regards,
Branden

[1] https://www.saturdayeveningpost.com/2023/02/in-a-word-flammable-inflammable-and-nonflammable/

[2] ...where the first-order factor in determining your academic merit
    will be your facility with the ideas of 20th-century French
    political philosophers.

[3] One can complain that the second example suffers from a confounding
    effect given one of the terms' appearance as a roman numeral.
    Precisely.  Google Ngram Viewer is not sensitive to context.  Zack's
    use of it is a makeweight recourse to cloak an opinion grounded on
    personal preference in a shroud of false objectivity.

[4] I see this practice offered as advice in numerous resources, and it
    reflects my own approach as a native English speaker who acquired
    language before the availability of computerized (let alone
    hyperlinked) dictionaries in the home, but in a perfunctory search I
    couldn't turn up any _studies_ of what readers _actually do_.  One
    technique that could arise from Zack's approach would be to obtain
    an English word list sorted by frequency, strike off known words
    until encountering an unfamiliar one, learn it, then resume the
    process until the unfamiliar word that actually came up is reached.
    (This way you can be more confident in your own writing and speech
    that you don't use an obscure word where a more common one
    suffices.)  How well do we suppose such a process might work?

[5] certainly not if _my_ emails play any part in that evolution <drum fill>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
  2023-11-27 16:59                                       ` G. Branden Robinson
@ 2023-11-27 18:35                                         ` Zack Weinberg
  2023-11-27 23:45                                           ` G. Branden Robinson
  0 siblings, 1 reply; 138+ messages in thread
From: Zack Weinberg @ 2023-11-27 18:35 UTC (permalink / raw)
  To: G. Branden Robinson, Alejandro Colomar
  Cc: Jonny Grant, Paul Eggert, Carlos O'Donell,
	GNU libc development, 'linux-man'

On Mon, Nov 27, 2023, at 11:59 AM, G. Branden Robinson wrote:
> At 2023-11-27T16:08:17+0100, Alejandro Colomar wrote:
>> On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote:
>> > [English pedant mode on]
>> >
>> > "Concatenate" is the correct term; "catenate" means something
>> > completely different, probably "hang between two posts like a
>> > chain".  You can't chop prefixes off a Latinate word and have it
>> > still mean the same thing.
>
> In some cases, you can.  Witness the case of "flammable"/inflammable",
> which are synonymous.

Yeah, and (after seeing Alejandro's reply) I did look up both
"concatenate" and "catenate" and find that they are synonymous in
English and both are attested from the 1600s.

**But I had to look that up.**

I cannot recall ever encountering the word "catenate" prior to this
thread, and my knee-jerk reaction was "typo."  Based on actual
experience trying, and mostly failing, to teach college undergraduates
to read man pages, I believe someone new to English technical
documentation would have a different, much more troublesome knee-jerk
reaction: "There must be some subtle reason why this documentation is
using an unfamiliar term 'catenate', instead of 'concatenate' that I
already know." Followed by wasting a bunch of time trying to research
that unfamiliar term, and when they find it's an exact synonym, adding
another tick mark to their mental tally for "manpages are badly written
and hard to understand."

> Man pages are specialized technical literature demanding a bespoke
> vocabulary.  Some employment of jargon is inescapable, even necessary.
> In any case, "catenate" has ~50 years of attestation in this domain
> alone, which constitutes approximately the entire history of Unix
> discourse.

This is no excuse.  Specialized technical jargon is only appropriate
when there is an actual difference in meaning.  (Thus, your "open
source" vs "free software" counterpoint is bogus.)

> Zack also overlooks the process by which speakers and readers of a
> language grapple with unfamiliar words that they encounter
> unexpectedly. Before undertaking to reach for dictionaries (online or
> otherwise), many readers morphophonemically analyze them to see if
> they can infer their meanings from familiar components.[4]

In grappling with general literature, yes.  In grappling with technical
writing, *no*, and again I am speaking from direct experience as an
educator.  Readers who encounter an unfamiliar word in technical
documents will most probably assume that the word has a precise meaning
that they must learn, and that they *cannot* deduce that meaning from
context. If they can't find a definition -- and they might not even try
looking in a general dictionary, since they may assume that the relevant
definition is too specialized to appear there; also it seems to me that
schoolchildren are not being taught how to use dictionaries anymore --
*they will give up on the entire document*.

Yes, this is bad.  It's an instance of learned helplessness, and it's
going to take decades and major educational reform at the grade-school
level to fix.  But there's one thing we, authors of technical documents,
can do about it right now, and that is embrace plain talk.  For example,
whenever there really is no difference of meaning, the most common word
in general usage is the word that should be used.

> In Unix culture, one will need to remain conversant with the term
> "catenate" to know why cat(1) is not named "concat(1)".  ;-)

This is how I would teach it: 'concat' is too long for Kernighan
and Ritchie's 1970s (or more precisely ASR33) tastes; 'con' was already
in use as an abbreviation for 'console' (not in Unix itself, but in
other contemporary OSes); and 'cat' is the next three letters of
"concatenate".  So that's what they picked.

zw

^ permalink raw reply	[flat|nested] 138+ messages in thread

* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated)
  2023-11-27 18:35                                         ` Zack Weinberg
@ 2023-11-27 23:45                                           ` G. Branden Robinson
  0 siblings, 0 replies; 138+ messages in thread
From: G. Branden Robinson @ 2023-11-27 23:45 UTC (permalink / raw)
  To: Zack Weinberg
  Cc: Alejandro Colomar, Jonny Grant, Paul Eggert, Carlos O'Donell,
	GNU libc development, 'linux-man'

[-- Attachment #1: Type: text/plain, Size: 10006 bytes --]

Hi Zack,

At 2023-11-27T13:35:01-0500, Zack Weinberg wrote:
> On Mon, Nov 27, 2023, at 11:59 AM, G. Branden Robinson wrote:
> > At 2023-11-27T16:08:17+0100, Alejandro Colomar wrote:
> >> On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote:
> >> > [English pedant mode on]
> >> >
> >> > "Concatenate" is the correct term; "catenate" means something
> >> > completely different, probably "hang between two posts like a
> >> > chain".  You can't chop prefixes off a Latinate word and have it
> >> > still mean the same thing.
> >
> > In some cases, you can.  Witness the case of
> > "flammable"/inflammable", which are synonymous.
> 
> Yeah, and (after seeing Alejandro's reply) I did look up both
> "concatenate" and "catenate" and find that they are synonymous in
> English and both are attested from the 1600s.
> 
> **But I had to look that up.**

That's not a bug.  When we stop learning, our brains die.

> I cannot recall ever encountering the word "catenate" prior to this
> thread, and my knee-jerk reaction was "typo."

The patellar reflex is not a reliable guide to purposeful development.

> Based on actual experience trying, and mostly failing, to teach
> college undergraduates to read man pages,

I empathize with you here.  I have a bit of background in teaching and a
bit more in man page composition.  Over the years my emotional response
to being frustrated that I have to quote a man page to other software
professionals in an email or message board has evolved into relief that
I have material of reasonable quality to quote to people...when that
happens.  Sometimes a person raises an issue and my internal Gilbert
Gottfried yells, "you FOOL![1]  That's plainly documented in--wait, uh,
give me a second.  Uh...sh*t, I need to write a patch to this man page."

> I believe someone new to English technical documentation would have a
> different, much more troublesome knee-jerk reaction: "There must be
> some subtle reason why this documentation is using an unfamiliar term
> 'catenate', instead of 'concatenate' that I already know." Followed by
> wasting a bunch of time trying to research that unfamiliar term, and
> when they find it's an exact synonym, adding another tick mark to
> their mental tally for "manpages are badly written and hard to
> understand."

I think your hypothesis is sorely in need of testing.  My own feeling is
that unfamiliarity with standard English vocabulary is well down the
list of things that people find frustrating about man pages, if we take
the product of annoyance level times the number of people perceiving a
defect.

> > Man pages are specialized technical literature demanding a bespoke
> > vocabulary.  Some employment of jargon is inescapable, even
> > necessary.  In any case, "catenate" has ~50 years of attestation in
> > this domain alone, which constitutes approximately the entire
> > history of Unix discourse.
> 
> This is no excuse.  Specialized technical jargon is only appropriate
> when there is an actual difference in meaning.  (Thus, your "open
> source" vs "free software" counterpoint is bogus.)

I offered them in a tongue-in-cheek effort at humor.  I don't regard
"Emacs" and "vi" as synonymous, either.  Also I know they'll take away
your GNU card if you claim "open source" and "free software"
equivalence.[2]

Analogously, "disenfranchise" and "disfranchise" are also synonymous,
and I prefer the latter to the former for the same reason, popularity be
damned.

> > Before undertaking to reach for dictionaries (online or otherwise),
> > many readers morphophonemically analyze them to see if they can
> > infer their meanings from familiar components.
> 
> In grappling with general literature, yes.  In grappling with
> technical writing, *no*, and again I am speaking from direct
> experience as an educator.  Readers who encounter an unfamiliar word
> in technical documents will most probably assume that the word has a
> precise meaning that they must learn, and that they *cannot* deduce
> that meaning from context.

If that's the case, then our field is doing a crap job at terminology
selection.  (Stop the presses, right?)

> If they can't find a definition -- and they might not even try looking
> in a general dictionary, since they may assume that the relevant
> definition is too specialized to appear there; also it seems to me
> that schoolchildren are not being taught how to use dictionaries
> anymore

Enough of them seem to be using urbandictionary.com that the concept
remains familiar.

> -- *they will give up on the entire document*.
> 
> Yes, this is bad.  It's an instance of learned helplessness, and it's
> going to take decades and major educational reform at the grade-school
> level to fix.  But there's one thing we, authors of technical
> documents, can do about it right now, and that is embrace plain talk.
> For example, whenever there really is no difference of meaning, the
> most common word in general usage is the word that should be used.

Again I'm going to have to disagree with you.  Where we can
morphologically simplify without loss of meaning, I think that fits a
meaning of "plain talk" that is reasonably robust across the many
cultural contexts in which English is used.  Your popularity metric is
vulnerable to sampling biases, particularly of the geographical sort.
And the plainer the talk, the more it is exposed to confounding regional
factors.  When I moved to Australia, I had a frustrating experience at
the grocery store.  I need to replace a light bulb.  No sign anywhere in
the store helped me.  While searching fruitlessly, I vaguely noted a
sign for "globes", and a thought that didn't quite reach the top of my
brain observed that globes are a damned weird thing to sell in a
grocery--but hey, it's Australia, maybe they need a _reminder_ that
they're hanging from the Earth's underbelly.[9]  After a few more
minutes, these two threads joined.

Q:  How many seppos does it take to screw in a light bulb?
A:  What's gardening got to do with it?

> > In Unix culture, one will need to remain conversant with the term
> > "catenate" to know why cat(1) is not named "concat(1)".  ;-)
> 
> This is how I would teach it: 'concat' is too long for Kernighan and
> Ritchie's 1970s (or more precisely ASR33) tastes; 'con' was already in
> use as an abbreviation for 'console' (not in Unix itself, but in other
> contemporary OSes); and 'cat' is the next three letters of
> "concatenate".  So that's what they picked.

Please don't teach that.  There's a lot about it I find dubious.

1.  Thompson was the primary human force for extreme terseness in Unix
    culture, as far as I can tell from my readings in CSRC history.
    (There were other technical and ergonomic forces driving it, like
    low line speeds and the Fortran linker on the PDP-11--which C
    initially re-used--being limited to six significant characters in
    external identifiers.)  Kernighan's own writings suggest that he
    preferred clear labels over cryptic ones (see his _The Elements of
    Programming Style_, with Plauger; _Software Tools_, also with
    Plauger; and _The Unix Programming Environment_, with Pike).  I
    speculate that Thompson reasoned that he'd never need more than
    26*26 commands anyway, so there was no reason to use an encoding
    space larger than that to denote them.[3]

2.  "ASR33" is a misleading misnomer in a couple of respects.  You're
    referring to a Western Electric Teletype Model 33.  "ASR" is neither
    a manufacturer nor a model, but a configuration option.
    Specifically, "ASR" devices didn't have keyboards--just a paper tape
    punch and reader--so they were not much used for Unix development.
    "KSR" (keyboard send and receive) was the relevant configuration.

3.  The Bell Labs CSRC didn't use Model 33s anyway.  Western Electric
    was also part of the Bell monopoly, and by late 1972 at the latest,
    Labs personnel got to drive Cadillacs--the Model 37, and moreover
    the ones used to produce Unix had the "Greek" character set
    extension.[4]  You will find references to both devices in the
    Seventh Edition man pages, but the terminal driver was "tuned for
    Teletype Model 37's"[5], and troff(1) named it as a supported
    terminal device rather than the 33.[6] That said, Model 33s were
    supported, and widely used at Unix installations outside the Labs.

4.  Your deployment of "CON" to refer to the console device may be
    anachronistic.  I can't find any evidence that Multics used this
    name for it.  I'm not familiar enough with IBM's OS offerings over
    the decades to be able to navigate online material about it.  Many
    people likely know that MS-DOS called its console device that, but
    cat(1) is about a decade older than that product.[7][8]

Regards
Branden

[1] https://www.youtube.com/watch?v=2NpTmKmWdzk
[2] https://www.gnu.org/philosophy/open-source-misses-the-point.en.html

[3] I base this surmise on more than an attempt at mind reading.  See
    the first footnote on page 6 of McIlroy's "A Research Unix Reader".
    https://www.cs.dartmouth.edu/~doug/reader.pdf

[4] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V3/man/man7/greek.7
[5] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/man/man4/tty.4
[6] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/man/man1/troff.1
[7] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V1/man/man1/cat.1
[8] https://www.os2museum.com/wp/dos/dos-1-0-and-1-1/

[9] I'm teasing.  I'd have loved an "upside-down" globe, not least as a
    reminder that the melting of the Antarctic ice sheets will pour
    inundating destruction down on most of us thanks to the superior
    qualities of billionaires.  I already had a counter-clockwise clock,
    but didn't take it with me to Oz.  Also the moon is wrong there.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 138+ messages in thread

end of thread, other threads:[~2023-11-27 23:45 UTC | newest]

Thread overview: 138+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant
2023-11-04 19:33 ` Alejandro Colomar
2023-11-04 21:18   ` Jonny Grant
2023-11-05  1:36     ` Alejandro Colomar
2023-11-05 21:16   ` Jonny Grant
2023-11-05 23:31     ` Alejandro Colomar
2023-11-07 11:52       ` Jonny Grant
2023-11-07 13:23         ` Alejandro Colomar
2023-11-07 14:19           ` Jonny Grant
2023-11-07 16:17             ` Alejandro Colomar
2023-11-07 17:00               ` Jonny Grant
2023-11-07 17:20                 ` Alejandro Colomar
2023-11-08  6:18               ` Oskari Pirhonen
2023-11-08  9:51                 ` Alejandro Colomar
2023-11-08  9:59                   ` Thorsten Kukuk
2023-11-08 15:09                     ` Alejandro Colomar
     [not found]                     ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>
2023-11-08 15:44                       ` Thorsten Kukuk
2023-11-08 17:26                         ` Adhemerval Zanella Netto
2023-11-08 14:06                   ` Zack Weinberg
2023-11-08 15:07                     ` Alejandro Colomar
2023-11-08 19:45                       ` G. Branden Robinson
2023-11-08 21:35                       ` Carlos O'Donell
2023-11-08 22:11                         ` Alejandro Colomar
2023-11-08 23:31                           ` Paul Eggert
2023-11-09  0:29                             ` Alejandro Colomar
2023-11-09 10:13                               ` Jonny Grant
2023-11-09 11:08                                 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar
2023-11-09 14:06                                   ` catenate vs concatenate Jonny Grant
2023-11-27 14:33                                   ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg
2023-11-27 15:08                                     ` Alejandro Colomar
2023-11-27 15:13                                       ` Alejandro Colomar
2023-11-27 16:59                                       ` G. Branden Robinson
2023-11-27 18:35                                         ` Zack Weinberg
2023-11-27 23:45                                           ` G. Branden Robinson
2023-11-09 11:13                                 ` strncpy clarify result may not be null terminated Alejandro Colomar
2023-11-09 14:05                                   ` Jonny Grant
2023-11-09 15:04                                     ` Alejandro Colomar
2023-11-08 19:04                   ` DJ Delorie
2023-11-08 19:40                     ` Alejandro Colomar
2023-11-08 19:58                       ` DJ Delorie
2023-11-08 20:13                         ` Alejandro Colomar
2023-11-08 21:07                           ` DJ Delorie
2023-11-08 21:50                             ` Alejandro Colomar
2023-11-08 22:17                               ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar
2023-11-08 23:06                                 ` Paul Eggert
2023-11-08 23:28                                   ` DJ Delorie
2023-11-09  0:24                                   ` Alejandro Colomar
2023-11-09 14:11                                   ` Jonny Grant
2023-11-09 14:35                                     ` Alejandro Colomar
2023-11-09 14:47                                       ` Jonny Grant
2023-11-09 15:02                                         ` Alejandro Colomar
2023-11-09 17:30                                           ` DJ Delorie
2023-11-09 17:54                                             ` Andreas Schwab
2023-11-09 18:00                                             ` Alejandro Colomar
2023-11-09 19:42                                             ` Jonny Grant
2023-11-09  7:23                                 ` Oskari Pirhonen
2023-11-09 15:20                                 ` [PATCH v2 1/2] " Alejandro Colomar
2023-11-09 15:20                                 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar
2023-11-10  5:47                                   ` Oskari Pirhonen
2023-11-10 10:47                                     ` Alejandro Colomar
2023-11-08  2:12           ` strncpy clarify result may not be null terminated Matthew House
2023-11-08 19:33             ` Alejandro Colomar
2023-11-08 19:40               ` Alejandro Colomar
2023-11-09  3:13               ` Matthew House
2023-11-09 10:26                 ` Jonny Grant
2023-11-09 10:31                 ` Jonny Grant
2023-11-09 11:38                   ` Alejandro Colomar
2023-11-09 12:43                     ` Alejandro Colomar
2023-11-09 12:51                     ` Xi Ruoyao
2023-11-09 14:01                       ` Alejandro Colomar
2023-11-09 18:11                     ` Paul Eggert
2023-11-09 23:48                       ` Alejandro Colomar
2023-11-10  5:36                         ` Paul Eggert
2023-11-10 11:05                           ` Alejandro Colomar
2023-11-10 11:47                             ` Alejandro Colomar
2023-11-10 17:58                             ` Paul Eggert
2023-11-10 18:36                               ` Alejandro Colomar
2023-11-10 20:19                                 ` Alejandro Colomar
2023-11-10 23:44                                   ` Jonny Grant
2023-11-10 19:52                               ` Alejandro Colomar
2023-11-10 22:14                                 ` Paul Eggert
2023-11-11 21:13                                   ` Alejandro Colomar
2023-11-11 22:20                                     ` Paul Eggert
2023-11-12  9:52                                     ` Jonny Grant
2023-11-12 10:59                                       ` Alejandro Colomar
2023-11-12 20:49                                         ` Paul Eggert
2023-11-12 21:00                                           ` Alejandro Colomar
2023-11-12 21:45                                             ` Alejandro Colomar
2023-11-13 23:46                                           ` Jonny Grant
2023-11-17 21:57                                         ` Jonny Grant
2023-11-18 10:12                                           ` Alejandro Colomar
2023-11-18 23:03                                             ` Jonny Grant
2023-11-10 11:36                           ` Jonny Grant
2023-11-10 13:15                             ` Alejandro Colomar
2023-11-18 23:40                               ` Jonny Grant
2023-11-20 11:56                                 ` Jonny Grant
2023-11-20 15:12                                   ` Alejandro Colomar
2023-11-20 23:08                                     ` Jonny Grant
2023-11-20 23:42                                       ` Alejandro Colomar
2023-11-10 11:23                     ` Jonny Grant
2023-11-09 12:23                 ` Alejandro Colomar
2023-11-09 12:35                   ` Alejandro Colomar
2023-11-10  7:06                   ` Oskari Pirhonen
2023-11-10 11:18                     ` Alejandro Colomar
2023-11-11  7:55                       ` Oskari Pirhonen
2023-11-10 16:06                   ` Matthew House
2023-11-10 17:48                     ` Alejandro Colomar
2023-11-13 15:01                       ` Matthew House
2023-11-11 20:55                     ` Jonny Grant
2023-11-11 21:15                       ` Jonny Grant
2023-11-11 22:36                         ` Alejandro Colomar
2023-11-11 23:19                           ` Alejandro Colomar
2023-11-17 21:46                           ` Jonny Grant
2023-11-18  9:37                             ` PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) Alejandro Colomar
2023-11-19  0:22                               ` Deri
2023-11-19  1:19                                 ` Alejandro Colomar
2023-11-19  9:29                                   ` Alejandro Colomar
2023-11-19 16:21                                   ` Deri
2023-11-19 20:58                                     ` Alejandro Colomar
2023-11-20  0:46                                       ` G. Branden Robinson
2023-11-20  9:43                                         ` Alejandro Colomar
2023-11-18  9:44                             ` NULL safety " Alejandro Colomar
2023-11-18 23:21                               ` NULL safety Jonny Grant
2023-11-24 22:25                                 ` Alejandro Colomar
2023-11-25  0:57                                   ` Jonny Grant
2023-11-10 10:40               ` strncpy clarify result may not be null terminated Stefan Puiu
2023-11-10 11:06                 ` Jonny Grant
2023-11-10 11:20                 ` Alejandro Colomar
2023-11-12  9:17 ` [PATCH 0/2] Expand BUGS section of string_copying(7) Alejandro Colomar
2023-11-12  9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
2023-11-12  9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar
2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar
2023-11-17 21:43   ` Jonny Grant
2023-11-18  0:25     ` Signing all patches and email to this list Matthew House
2023-11-18 23:24       ` Jonny Grant
2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar
2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).