* strncpy clarify result may not be null terminated @ 2023-11-04 11:27 Jonny Grant 2023-11-04 19:33 ` Alejandro Colomar ` (7 more replies) 0 siblings, 8 replies; 138+ messages in thread From: Jonny Grant @ 2023-11-04 11:27 UTC (permalink / raw) To: linux-man, Alejandro Colomar (man-pages) Hello I have a suggestion for strncpy. C23 draft states this caveat for strncpy. "373) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will not be null- terminated." https://man7.org/linux/man-pages/man3/strncpy.3.html "If the destination buffer, limited by its size, isn't large enough to hold the copy, the resulting character sequence is truncated. " How about clarifying this as: "If the destination buffer, limited by its size, isn't large enough to hold the copy, the resulting character sequence is truncated; where there is no null terminating byte in the first n characters the result will not be null terminated. " Kind regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant @ 2023-11-04 19:33 ` Alejandro Colomar 2023-11-04 21:18 ` Jonny Grant 2023-11-05 21:16 ` Jonny Grant 2023-11-12 9:17 ` [PATCH 0/2] Expand BUGS section of string_copying(7) Alejandro Colomar ` (6 subsequent siblings) 7 siblings, 2 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-04 19:33 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 1940 bytes --] Hi Jonny, On Sat, Nov 04, 2023 at 11:27:44AM +0000, Jonny Grant wrote: > Hello > I have a suggestion for strncpy. > > C23 draft states this caveat for strncpy. > > "373) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will not be null- > terminated." > > > https://man7.org/linux/man-pages/man3/strncpy.3.html > > "If the destination buffer, limited by its size, isn't large > enough to hold the copy, the resulting character sequence is > truncated. " The use of the term "character sequence" instead of "string" isn't casual. A "string" is a sequence of zero or more non-zero characters, followed by exactly one NUL. A "character sequence" is a sequence of zero or more non-zero characters, period. To be clearer in that regard, the CAVEATS section of the same page says this: CAVEATS The name of these functions is confusing. These functions pro‐ duce a null‐padded character sequence, not a string (see string_copying(7)). Saying that these functions don't produce a string should warn anyone thinking it would. The page string_copying(7) goes into more detail. > > How about clarifying this as: > > > "If the destination buffer, limited by its size, isn't large > enough to hold the copy, the resulting character sequence is > truncated; where there is no null terminating byte in the first n > characters the result will not be null terminated. " strncpy(3) should !*NEVER*! be used to produce a string. I don't think that should be conditional. Your suggested change could induce to the mistake of thinking that strncpy(3) is useful if the size of the buffer is enough. Do not ever use that function for producing strings. Use something else, like strlcpy(3), strcpy(3), or stpecpy(3). Cheers, Alex > > Kind regards, Jonny -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-04 19:33 ` Alejandro Colomar @ 2023-11-04 21:18 ` Jonny Grant 2023-11-05 1:36 ` Alejandro Colomar 2023-11-05 21:16 ` Jonny Grant 1 sibling, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-04 21:18 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man On 04/11/2023 19:33, Alejandro Colomar wrote: > Hi Jonny, > > On Sat, Nov 04, 2023 at 11:27:44AM +0000, Jonny Grant wrote: >> Hello >> I have a suggestion for strncpy. >> >> C23 draft states this caveat for strncpy. >> >> "373) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will not be null- >> terminated." >> >> >> https://man7.org/linux/man-pages/man3/strncpy.3.html >> >> "If the destination buffer, limited by its size, isn't large >> enough to hold the copy, the resulting character sequence is >> truncated. " > > The use of the term "character sequence" instead of "string" isn't > casual. A "string" is a sequence of zero or more non-zero characters, > followed by exactly one NUL. A "character sequence" is a sequence of > zero or more non-zero characters, period. Ok that's good to know. C23 calls it those "array", POSIX too. POSIX explains if the array is a string (ie null terminated) it pads with nulls, I'll paste it below: https://pubs.opengroup.org/onlinepubs/009696899/functions/strncpy.html "If the array pointed to by s2 is a string that is shorter than n bytes, null bytes shall be appended to the copy in the array pointed to by s1, until n bytes in all are written." > > To be clearer in that regard, the CAVEATS section of the same page says > this: > > CAVEATS > The name of these functions is confusing. These functions pro‐ > duce a null‐padded character sequence, not a string (see > string_copying(7)). > > Saying that these functions don't produce a string should warn anyone > thinking it would. The page string_copying(7) goes into more detail. > >> >> How about clarifying this as: >> >> >> "If the destination buffer, limited by its size, isn't large >> enough to hold the copy, the resulting character sequence is >> truncated; where there is no null terminating byte in the first n >> characters the result will not be null terminated. " > > strncpy(3) should !*NEVER*! be used to produce a string. > I don't think that should be conditional. Your suggested change could > induce to the mistake of thinking that strncpy(3) is useful if the size > of the buffer is enough. Do not ever use that function for producing > strings. Use something else, like strlcpy(3), strcpy(3), or stpecpy(3). Just documentation feedback based on C23, not writing code today. Perhaps you may have seen Michael Kerrisk article about the risks with strlcpy. https://lwn.net/Articles/507319/ re strcpy doesn't that risk buffer overruns? That's a surely a cyber security risk? strlcpy is also bad in certain ways, it breaks ISO TR24731 "Do not unexpectedly truncate strings", can cause overruns and crashes. I guess if you feel strncpy should "never be used to produce a string" you could describe that somewhere with an explanation in an article. You didn't mention why you feel it is not useful even if the size of the buffer is enough - including a null terminator I hope! strncpy_s is a better solution, not widely available, and not part of glibc. That's another debate. Is stpecpy standardised? If you can send me an online manual for it, I'll take a look. Regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-04 21:18 ` Jonny Grant @ 2023-11-05 1:36 ` Alejandro Colomar 0 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-05 1:36 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 7865 bytes --] Hi Jonny, On Sat, Nov 04, 2023 at 09:18:08PM +0000, Jonny Grant wrote: > On 04/11/2023 19:33, Alejandro Colomar wrote: > > Hi Jonny, > > > > On Sat, Nov 04, 2023 at 11:27:44AM +0000, Jonny Grant wrote: > >> Hello > >> I have a suggestion for strncpy. > >> > >> C23 draft states this caveat for strncpy. > >> > >> "373) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will not be null- > >> terminated." > >> > >> > >> https://man7.org/linux/man-pages/man3/strncpy.3.html > >> > >> "If the destination buffer, limited by its size, isn't large > >> enough to hold the copy, the resulting character sequence is > >> truncated. " > > > > The use of the term "character sequence" instead of "string" isn't > > casual. A "string" is a sequence of zero or more non-zero characters, > > followed by exactly one NUL. A "character sequence" is a sequence of > > zero or more non-zero characters, period. > > Ok that's good to know. C23 calls it those "array", POSIX too. POSIX explains if the array is a string (ie null terminated) it pads with nulls, I'll paste it below: > > https://pubs.opengroup.org/onlinepubs/009696899/functions/strncpy.html > > "If the array pointed to by s2 is a string that is shorter than n bytes, null bytes shall be appended to the copy in the array pointed to by s1, until n bytes in all are written." By array, C23 and POSIX (AFAICS) refer to the array of char (so, a `char []`) that holds the data, and not to the data itself. By character sequence, I refer to the data, with consists of characters in the range [1, 255] (zero or more of them). Note that a character sequence doesn't contain null characters. The padding that strncpy(3) writes after the character sequence is not part of the character sequence, even though it is contained in the character array. > > To be clearer in that regard, the CAVEATS section of the same page says > > this: > > > > CAVEATS > > The name of these functions is confusing. These functions pro‐ > > duce a null‐padded character sequence, not a string (see > > string_copying(7)). > > > > Saying that these functions don't produce a string should warn anyone > > thinking it would. The page string_copying(7) goes into more detail. > > > >> > >> How about clarifying this as: > >> > >> > >> "If the destination buffer, limited by its size, isn't large > >> enough to hold the copy, the resulting character sequence is > >> truncated; where there is no null terminating byte in the first n > >> characters the result will not be null terminated. " > > > > strncpy(3) should !*NEVER*! be used to produce a string. > > I don't think that should be conditional. Your suggested change could > > induce to the mistake of thinking that strncpy(3) is useful if the size > > of the buffer is enough. Do not ever use that function for producing > > strings. Use something else, like strlcpy(3), strcpy(3), or stpecpy(3). > > Just documentation feedback based on C23, not writing code today. > > Perhaps you may have seen Michael Kerrisk article about the risks with strlcpy. > https://lwn.net/Articles/507319/ Yes. I believe Michael's article and I agree on most terms. That article, though, is a bit outdated, and recent versions of _FORTIFY_SOURCE (see ftm(7)) have changed things significantly. > > re strcpy doesn't that risk buffer overruns? That's a surely a cyber security risk? Not so much if you use _FORTIFY_SOURCE. The feature probably still has a few corner cases that it cannot detect, but I'm going to guess that they are few. > strlcpy is also bad in certain ways, it breaks ISO TR24731 "Do not unexpectedly truncate strings", can cause overruns and crashes. And does strncpy(3) do any better? It also truncates, so it necessarily shares the same problems that strlcpy(3) has. And then it has its own ones. - strlcpy(3) truncates the resulting string, which most of the time is bad, and a bug if it the return value is ignored. However, the the return value tells if there was truncation. - strncpy(3) truncates the resulting character sequence (it's not null- terminated, so it's not a string), _and_ it can't report truncation via the return value. See: by yourself: char a[4]; strncpy(a, "asdf"); There was no truncation, since the entire data is available in the resulting character sequence. However, there's still the bug if you try to read that as a string. > > I guess if you feel strncpy should "never be used to produce a string" you could describe that somewhere with an explanation in an article. You didn't mention why you feel it is not useful even if the size of the buffer is enough - including a null terminator I hope! Yes. The article, or explanation, you can find it in string_copying(7), a manual page that I wrote recently to address precisely this. Regarding why: - In case you don't want truncation, and prefer to abort, it is usually preferable to call strcpy(3) and rely on _FORTIFY_SOURCE. Only if you have doubts about the ability of _FORTIFY_SOURCE to know the buffer size, you should use a different function (continue reading for that). Such a case would be if you do very obscure operations to get a buffer and the compiler will be blind to it. - In case you want truncation, which is seldom, you need to use strlcpy(3), which is the only standard function that creates a truncated string. - In case you don't want truncation, and don't have _FORTIFY_SOURCE available (or you know it won't be able to handle a specific case), or you don't want to crash your program and want to simplify report an error, you also need to use strlcpy(3), which detects truncation easily, so you can check for that and report an error. But there's no case where you want a string and the most suitable call would be strncpy(3); it is never the best function. Except when you don't want a string, of course. If you're working with utmp(5), then go ahead and use that function. But for new interfaces, you should not design them so that they use this function. utmp(5) and strncpy(3) should be a mistake of the past, not to be repeated. > > strncpy_s is a better solution, not widely available, and not part of glibc. That's another debate. No, it's not. strncpy_s(3)'s interface is rather bad. It is a function to catch programmer errors, by adding another parameter that the programmer has to write. What if the programmer makes an error while writing the new argument of these _s functions? Kaboom. _FORTIFY_SOURCE accomplishes the same task, but the size is calculated internally by the implementation, which means the programmer can't write a bug in the code that is trying to prevent bugs. Here's an article on these Annex K interfaces: <https://open-std.org/jtc1/sc22/wg14/www/docs/n1967.htm> > > Is stpecpy standardised? If you can send me an online manual for it, I'll take a look. No it's not. It's similar to strlcpy(3), but designed to chain better. So, if you just need to call strlcpy(3), it's probably simpler to do it. But if you need to call strlcat(3), then you may consider stpecpy(3) a better alternative. The main difference is that with strlcpy(3) + strlcat(3), you need to check for truncation after every call, while with stpecpy(3) you only need to check once after the last call. Also, it's simpler (less tricky) to implement (although now that strlcpy(3) is standard, it's less of a problem). You can find stpecpy(3) documented, with an implementation, in the string_copying(7) manual page. Cheers, Alex > > Regards, Jonny -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-04 19:33 ` Alejandro Colomar 2023-11-04 21:18 ` Jonny Grant @ 2023-11-05 21:16 ` Jonny Grant 2023-11-05 23:31 ` Alejandro Colomar 1 sibling, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-05 21:16 UTC (permalink / raw) To: Michael Kerrisk; +Cc: linux-man, Alejandro Colomar On 04/11/2023 19:33, Alejandro Colomar wrote: > Hi Jonny, > > On Sat, Nov 04, 2023 at 11:27:44AM +0000, Jonny Grant wrote: >> Hello >> I have a suggestion for strncpy. >> >> C23 draft states this caveat for strncpy. >> >> "373) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will not be null- >> terminated." >> >> >> https://man7.org/linux/man-pages/man3/strncpy.3.html >> >> "If the destination buffer, limited by its size, isn't large >> enough to hold the copy, the resulting character sequence is >> truncated. " > > The use of the term "character sequence" instead of "string" isn't > casual. A "string" is a sequence of zero or more non-zero characters, > followed by exactly one NUL. A "character sequence" is a sequence of > zero or more non-zero characters, period. > > To be clearer in that regard, the CAVEATS section of the same page says > this: > > CAVEATS > The name of these functions is confusing. These functions pro‐ > duce a null‐padded character sequence, not a string (see > string_copying(7)). > > Saying that these functions don't produce a string should warn anyone > thinking it would. The page string_copying(7) goes into more detail. > >> >> How about clarifying this as: >> >> >> "If the destination buffer, limited by its size, isn't large >> enough to hold the copy, the resulting character sequence is >> truncated; where there is no null terminating byte in the first n >> characters the result will not be null terminated. " > > strncpy(3) should !*NEVER*! be used to produce a string. > I don't think that should be conditional. Your suggested change could > induce to the mistake of thinking that strncpy(3) is useful if the size > of the buffer is enough. Do not ever use that function for producing > strings. Use something else, like strlcpy(3), strcpy(3), or stpecpy(3). > > Cheers, > Alex > >> >> Kind regards, Jonny Michael, what do you think about this documentation suggestion I have made. Interested to hear your opinion. Should the man page follow the C spec description of the strncpy function and how when it copies the arrays, it may leave the resulting array of characters not terminated, and warn about this pitfall. C99 had this, and it is still there in latest C23 draft - worth clarifying on strncpy(3)? "7.21.2.4 The strncpy function" "269) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will not be null-terminated." Note, I'm not using strncpy myself, it's a documentation clarification proposal. Kind regards Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-05 21:16 ` Jonny Grant @ 2023-11-05 23:31 ` Alejandro Colomar 2023-11-07 11:52 ` Jonny Grant 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-05 23:31 UTC (permalink / raw) To: Jonny Grant; +Cc: Michael Kerrisk, linux-man [-- Attachment #1: Type: text/plain, Size: 2240 bytes --] Hi Jonny, On Sun, Nov 05, 2023 at 09:16:25PM +0000, Jonny Grant wrote: > Michael, what do you think about this documentation suggestion I have made. Interested to hear your opinion. > > Should the man page follow the C spec description of the strncpy function and how when it copies the arrays, it may leave the resulting array of characters not terminated, and warn about this pitfall. > > C99 had this, and it is still there in latest C23 draft - worth clarifying on strncpy(3)? > > "7.21.2.4 The strncpy function" > > "269) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will > not be null-terminated." What ISO C has said and continues to say about strncpy(3) is the actual harmful stuff, which has led many programmers to believe strncpy(3) was useful at all for producing strings. The problem I see with what ISO C says about strncpy(3) is that it treats it as a string-copying function. If you treat strncpy(3) as a string-copying function, then it is really broken and should be removed from libc. However, its functionality is still useful for those cases where you don't want a string, which is the only reason I didn't mark the function as [[deprecated]]. > > Note, I'm not using strncpy myself, it's a documentation clarification proposal. I think it could be useful to add a note that one should first read the CAVEATS section and string_copying(7) and only then read this page. diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 index 239a2eb7e..c7bb79028 100644 --- a/man3/stpncpy.3 +++ b/man3/stpncpy.3 @@ -37,6 +37,12 @@ .SH SYNOPSIS _GNU_SOURCE .fi .SH DESCRIPTION +.IR Note : +These functions are probably not what you want. +Read CAVEATS below, +and +.BR string_copying (7). +.PP These functions copy the string pointed to by .I src into a null-padded character sequence at the fixed-width buffer pointed to by Is this scary enough? Do you think this would tell readers to never use this function unless they know what they're doing (and even when they think they do, they probably don't)? Cheers, Alex > > Kind regards > Jonny -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-05 23:31 ` Alejandro Colomar @ 2023-11-07 11:52 ` Jonny Grant 2023-11-07 13:23 ` Alejandro Colomar 0 siblings, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-07 11:52 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Michael Kerrisk, linux-man On 05/11/2023 23:31, Alejandro Colomar wrote: > Hi Jonny, > > On Sun, Nov 05, 2023 at 09:16:25PM +0000, Jonny Grant wrote: >> Michael, what do you think about this documentation suggestion I have made. Interested to hear your opinion. >> >> Should the man page follow the C spec description of the strncpy function and how when it copies the arrays, it may leave the resulting array of characters not terminated, and warn about this pitfall. >> >> C99 had this, and it is still there in latest C23 draft - worth clarifying on strncpy(3)? >> >> "7.21.2.4 The strncpy function" >> >> "269) Thus, if there is no null character in the first n characters of the array pointed to by s2, the result will >> not be null-terminated." > > What ISO C has said and continues to say about strncpy(3) is the actual > harmful stuff, which has led many programmers to believe strncpy(3) was > useful at all for producing strings. > > The problem I see with what ISO C says about strncpy(3) is that it > treats it as a string-copying function. If you treat strncpy(3) as a > string-copying function, then it is really broken and should be removed > from libc. > > However, its functionality is still useful for those cases where you > don't want a string, which is the only reason I didn't mark the function > as [[deprecated]]. > >> >> Note, I'm not using strncpy myself, it's a documentation clarification proposal. > > I think it could be useful to add a note that one should first read the > CAVEATS section and string_copying(7) and only then read this page. > > > diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 > index 239a2eb7e..c7bb79028 100644 > --- a/man3/stpncpy.3 > +++ b/man3/stpncpy.3 > @@ -37,6 +37,12 @@ .SH SYNOPSIS > _GNU_SOURCE > .fi > .SH DESCRIPTION > +.IR Note : > +These functions are probably not what you want. > +Read CAVEATS below, > +and > +.BR string_copying (7). > +.PP > These functions copy the string pointed to by > .I src > into a null-padded character sequence at the fixed-width buffer pointed to by > > > Is this scary enough? Do you think this would tell readers to never use > this function unless they know what they're doing (and even when they > think they do, they probably don't)? > > Cheers, > Alex > >> >> Kind regards >> Jonny > Alejandro, We see things differently, I'm on the C standard side on this one. Would any information change your mind? With kind regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-07 11:52 ` Jonny Grant @ 2023-11-07 13:23 ` Alejandro Colomar 2023-11-07 14:19 ` Jonny Grant 2023-11-08 2:12 ` strncpy clarify result may not be null terminated Matthew House 0 siblings, 2 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-07 13:23 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 420 bytes --] On Tue, Nov 07, 2023 at 11:52:44AM +0000, Jonny Grant wrote: > We see things differently, I'm on the C standard side on this one. Would any information change your mind? It's difficult to say, but I doubt it. But let me ask you something: In what cases would you find strncpy(3) appropriate to use, and why? Maybe if I understand that it helps. Kind regards, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-07 13:23 ` Alejandro Colomar @ 2023-11-07 14:19 ` Jonny Grant 2023-11-07 16:17 ` Alejandro Colomar 2023-11-08 2:12 ` strncpy clarify result may not be null terminated Matthew House 1 sibling, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-07 14:19 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man On 07/11/2023 13:23, Alejandro Colomar wrote: > On Tue, Nov 07, 2023 at 11:52:44AM +0000, Jonny Grant wrote: >> We see things differently, I'm on the C standard side on this one. Would any information change your mind? > > It's difficult to say, but I doubt it. But let me ask you something: > In what cases would you find strncpy(3) appropriate to use, and why? > Maybe if I understand that it helps. > > Kind regards, > Alex I don't find strncpy appropriate - that's why I proposed a change to clarify the known defect in the man page of strncpy that C99 describes. Worth reading my first email if you're unclear. If you doubt the esteemed C standards, I won't add anything further. Kind regards Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-07 14:19 ` Jonny Grant @ 2023-11-07 16:17 ` Alejandro Colomar 2023-11-07 17:00 ` Jonny Grant 2023-11-08 6:18 ` Oskari Pirhonen 0 siblings, 2 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-07 16:17 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 4236 bytes --] Hi Jonny, On Tue, Nov 07, 2023 at 02:19:56PM +0000, Jonny Grant wrote: > > > On 07/11/2023 13:23, Alejandro Colomar wrote: > > On Tue, Nov 07, 2023 at 11:52:44AM +0000, Jonny Grant wrote: > >> We see things differently, I'm on the C standard side on this one. Would any information change your mind? > > > > It's difficult to say, but I doubt it. But let me ask you something: > > In what cases would you find strncpy(3) appropriate to use, and why? > > Maybe if I understand that it helps. > > > > Kind regards, > > Alex > > I don't find strncpy appropriate Would any information change your mind in this regard? Let me show you some structure to which you should write using strncpy(3): $ man utmp | sed 's/^ //' | grepc -h utmp struct utmp { short ut_type; /* Type of record */ pid_t ut_pid; /* PID of login process */ char ut_line[UT_LINESIZE]; /* Device name of tty - "/dev/" */ char ut_id[4]; /* Terminal name suffix, or inittab(5) ID */ char ut_user[UT_NAMESIZE]; /* Username */ char ut_host[UT_HOSTSIZE]; /* Hostname for remote login, or kernel version for run-level messages */ struct exit_status ut_exit; /* Exit status of a process marked as DEAD_PROCESS; not used by Linux init(1) */ /* The ut_session and ut_tv fields must be the same size when compiled 32- and 64-bit. This allows data files and shared memory to be shared between 32- and 64-bit applications. */ #if __WORDSIZE == 64 && defined __WORDSIZE_COMPAT32 int32_t ut_session; /* Session ID (getsid(2)), used for windowing */ struct { int32_t tv_sec; /* Seconds */ int32_t tv_usec; /* Microseconds */ } ut_tv; /* Time entry was made */ #else long ut_session; /* Session ID */ struct timeval ut_tv; /* Time entry was made */ #endif int32_t ut_addr_v6[4]; /* Internet address of remote host; IPv4 address uses just ut_addr_v6[0] */ char __unused[20]; /* Reserved for future use */ }; The fields 'ut_line', 'ut_user', amd 'ut_host' are fixed-width character array without a terminating NUL. I wish this API hadn't been designed this way, and thus that strncpy(3) wouldn't be useful for writing to these structures, but we got what we got. strcpy(3) and strlcpy(3) will both try to write a NUL byte, thus not being able to use the last one byte. I would happily waste that last byte, but then if you write portable shadow utils that are compatible with other software that may have written those fields previously, you need to be able to support that last character, and so you need strncpy(3). >- that's why I proposed a change to clarify the known defect in the man page of strncpy that C99 describes. Worth reading my first email if you're unclear. I would love to find this API useless, and in that case, I'd go further and add [[deprecated]] in the synopsis, and write a heavy statement in a BUGS section. But I can't do that while it's still a good function in some cases (even if those cases are bad design, such as utmp(5)). On the other hand, utmp(5) has other issues, like Y2038, and AFAIR it's being deprecated, so maybe we could consider deprecating strncpy(3). If I see enough proof that all APIs that require this function are deprecated, I'll happily declare the function deprecated as well. (in fact I already did some time ago, but then found this use with utmp(5), which is why I removed the deprecation; see <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/man3/strncpy.3?id=30d458d1a6261221bad15e58f1862e0dda24f4a0>). Cheers, Alex > > If you doubt the esteemed C standards, I won't add anything further. > Kind regards Jonny -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-07 16:17 ` Alejandro Colomar @ 2023-11-07 17:00 ` Jonny Grant 2023-11-07 17:20 ` Alejandro Colomar 2023-11-08 6:18 ` Oskari Pirhonen 1 sibling, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-07 17:00 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man Your comments don't relate to aligning the man page to C99 spec. Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-07 17:00 ` Jonny Grant @ 2023-11-07 17:20 ` Alejandro Colomar 0 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-07 17:20 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 537 bytes --] On Tue, Nov 07, 2023 at 05:00:19PM +0000, Jonny Grant wrote: > Your comments don't relate to aligning the man page to C99 spec. No, and blindly repeating what the spec says isn't positive in itself. My comments align with recommending safe use of libc functions, and recommending against using bogus functions. For reading the spec, we already have the spec. I only want to add information if it is useful. I welcome you to convince me that it's useful. Thanks, Alex > Jonny -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-07 16:17 ` Alejandro Colomar 2023-11-07 17:00 ` Jonny Grant @ 2023-11-08 6:18 ` Oskari Pirhonen 2023-11-08 9:51 ` Alejandro Colomar 1 sibling, 1 reply; 138+ messages in thread From: Oskari Pirhonen @ 2023-11-08 6:18 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 1306 bytes --] On Tue, Nov 07, 2023 at 17:17:29 +0100, Alejandro Colomar wrote: > > I would love to find this API useless, and in that case, I'd go further > and add [[deprecated]] in the synopsis, and write a heavy statement in a > BUGS section. But I can't do that while it's still a good function in > some cases (even if those cases are bad design, such as utmp(5)). > > On the other hand, utmp(5) has other issues, like Y2038, and AFAIR it's > being deprecated, so maybe we could consider deprecating strncpy(3). > > If I see enough proof that all APIs that require this function are > deprecated, I'll happily declare the function deprecated as well. > (in fact I already did some time ago, but then found this use with > utmp(5), which is why I removed the deprecation; see > <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/man3/strncpy.3?id=30d458d1a6261221bad15e58f1862e0dda24f4a0>). > If you ask me, I'd not mark libc functions as deprecated without some kind of consesnsus from the libc maintainers too. They may not go so far as to add the `deprecated` attribute in their own headers, at least not yet at that point in time, but some kind of written "Yes, please don't use this function" would be nice to have before marking them in the man pages. - Oskari [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 6:18 ` Oskari Pirhonen @ 2023-11-08 9:51 ` Alejandro Colomar 2023-11-08 9:59 ` Thorsten Kukuk ` (2 more replies) 0 siblings, 3 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-08 9:51 UTC (permalink / raw) To: libc-alpha, Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 1930 bytes --] On Wed, Nov 08, 2023 at 12:18:09AM -0600, Oskari Pirhonen wrote: > On Tue, Nov 07, 2023 at 17:17:29 +0100, Alejandro Colomar wrote: > > > > I would love to find this API useless, and in that case, I'd go further > > and add [[deprecated]] in the synopsis, and write a heavy statement in a > > BUGS section. But I can't do that while it's still a good function in > > some cases (even if those cases are bad design, such as utmp(5)). > > > > On the other hand, utmp(5) has other issues, like Y2038, and AFAIR it's > > being deprecated, so maybe we could consider deprecating strncpy(3). > > > > If I see enough proof that all APIs that require this function are > > deprecated, I'll happily declare the function deprecated as well. > > (in fact I already did some time ago, but then found this use with > > utmp(5), which is why I removed the deprecation; see > > <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/man3/strncpy.3?id=30d458d1a6261221bad15e58f1862e0dda24f4a0>). > > > > If you ask me, I'd not mark libc functions as deprecated without some > kind of consesnsus from the libc maintainers too. They may not go so far > as to add the `deprecated` attribute in their own headers, at least not > yet at that point in time, but some kind of written "Yes, please don't > use this function" would be nice to have before marking them in the man > pages. Okay, let's ask them. Hi glibc developers, strncpy(3) is useful to write to fixed-width buffers like `struct utmp` and `struct utmpx`. Is there any other libc API that needs strncpy(3)? Of those two APIs (utmp and utmpx) and any other that need strncpy(3), are those deprecated, or is any such API still good for new code? If all APIs that need strncpy(3) are deprecated, I propose recommending against its use in new code. Thanks, Alex > > - Oskari -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 9:51 ` Alejandro Colomar @ 2023-11-08 9:59 ` Thorsten Kukuk 2023-11-08 15:09 ` Alejandro Colomar [not found] ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com> 2023-11-08 14:06 ` Zack Weinberg 2023-11-08 19:04 ` DJ Delorie 2 siblings, 2 replies; 138+ messages in thread From: Thorsten Kukuk @ 2023-11-08 9:59 UTC (permalink / raw) To: Alejandro Colomar; +Cc: libc-alpha, Jonny Grant, linux-man On Wed, Nov 08, Alejandro Colomar wrote: > strncpy(3) is useful to write to fixed-width buffers like `struct utmp` > and `struct utmpx`. Is there any other libc API that needs strncpy(3)? > Of those two APIs (utmp and utmpx) and any other that need strncpy(3), > are those deprecated, or is any such API still good for new code? Everything around utmp/utmpx/wtmp/lastlog is deprecated. openSUSE Tumbleweed and MicroOS are no longer using nor supporting them and fresh installations don't have that files anymore. So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives are e.g. systemd-logind/wtmpdb/lastlog2. Thorsten -- Thorsten Kukuk, Distinguished Engineer, Senior Architect, Future Technologies SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nuernberg, Germany Managing Director: Ivo Totev, Andrew McDonald, Werner Knoblich (HRB 36809, AG Nürnberg) ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 9:59 ` Thorsten Kukuk @ 2023-11-08 15:09 ` Alejandro Colomar [not found] ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com> 1 sibling, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-08 15:09 UTC (permalink / raw) To: Thorsten Kukuk; +Cc: libc-alpha, Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 899 bytes --] On Wed, Nov 08, 2023 at 09:59:11AM +0000, Thorsten Kukuk wrote: > On Wed, Nov 08, Alejandro Colomar wrote: > > > strncpy(3) is useful to write to fixed-width buffers like `struct utmp` > > and `struct utmpx`. Is there any other libc API that needs strncpy(3)? > > Of those two APIs (utmp and utmpx) and any other that need strncpy(3), > > are those deprecated, or is any such API still good for new code? > Hi Thorsten! > Everything around utmp/utmpx/wtmp/lastlog is deprecated. Is this a Linux-specific thing? Do you know if the BSDs also deprecated utmpx? Thanks, Alex > > openSUSE Tumbleweed and MicroOS are no longer using nor supporting them > and fresh installations don't have that files anymore. > So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives > are e.g. systemd-logind/wtmpdb/lastlog2. -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
[parent not found: <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com>]
* Re: strncpy clarify result may not be null terminated [not found] ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com> @ 2023-11-08 15:44 ` Thorsten Kukuk 2023-11-08 17:26 ` Adhemerval Zanella Netto 0 siblings, 1 reply; 138+ messages in thread From: Thorsten Kukuk @ 2023-11-08 15:44 UTC (permalink / raw) To: Alejandro Colomar; +Cc: libc-alpha, Jonny Grant, linux-man On Wed, Nov 08, Alejandro Colomar wrote: > On Wed, Nov 08, 2023 at 09:59:11AM +0000, Thorsten Kukuk wrote: > > On Wed, Nov 08, Alejandro Colomar wrote: > > > > > strncpy(3) is useful to write to fixed-width buffers like `struct utmp` > > > and `struct utmpx`. Is there any other libc API that needs strncpy(3)? > > > Of those two APIs (utmp and utmpx) and any other that need strncpy(3), > > > are those deprecated, or is any such API still good for new code? > > > > Hi Thorsten! > > > Everything around utmp/utmpx/wtmp/lastlog is deprecated. > > Is this a Linux-specific thing? Do you know if the BSDs also deprecated > utmpx? Beside the design issues of the interface, which are generic, the Y2038 issue is more or less glibc specific and a result of supporting 32bit and 64bit userland at the same time. For most other implementations I'm aware of there is no Y2038 problem, either because they don't support utmp/utmpx/... like musl libc, or they were able to switch to a 64bit time variable or used that already. So no need to change anything. For BSD I don't really know the situation, but as far as I know, they don't have the problem and thus no need to change anything. Thorsten > Thanks, > Alex > > > > > openSUSE Tumbleweed and MicroOS are no longer using nor supporting them > > and fresh installations don't have that files anymore. > > So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives > > are e.g. systemd-logind/wtmpdb/lastlog2. > > -- > <https://www.alejandro-colomar.es/> -- Thorsten Kukuk, Distinguished Engineer, Senior Architect, Future Technologies SUSE Software Solutions Germany GmbH, Frankenstraße 146, 90461 Nuernberg, Germany Managing Director: Ivo Totev, Andrew McDonald, Werner Knoblich (HRB 36809, AG Nürnberg) ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 15:44 ` Thorsten Kukuk @ 2023-11-08 17:26 ` Adhemerval Zanella Netto 0 siblings, 0 replies; 138+ messages in thread From: Adhemerval Zanella Netto @ 2023-11-08 17:26 UTC (permalink / raw) To: Thorsten Kukuk, Alejandro Colomar; +Cc: libc-alpha, Jonny Grant, linux-man On 08/11/23 12:44, Thorsten Kukuk wrote: > On Wed, Nov 08, Alejandro Colomar wrote: > >> On Wed, Nov 08, 2023 at 09:59:11AM +0000, Thorsten Kukuk wrote: >>> On Wed, Nov 08, Alejandro Colomar wrote: >>> >>>> strncpy(3) is useful to write to fixed-width buffers like `struct utmp` >>>> and `struct utmpx`. Is there any other libc API that needs strncpy(3)? >>>> Of those two APIs (utmp and utmpx) and any other that need strncpy(3), >>>> are those deprecated, or is any such API still good for new code? >>> >> >> Hi Thorsten! >> >>> Everything around utmp/utmpx/wtmp/lastlog is deprecated. >> >> Is this a Linux-specific thing? Do you know if the BSDs also deprecated >> utmpx? > > Beside the design issues of the interface, which are generic, the Y2038 > issue is more or less glibc specific and a result of supporting 32bit > and 64bit userland at the same time. > For most other implementations I'm aware of there is no Y2038 problem, > either because they don't support utmp/utmpx/... like musl libc, or they > were able to switch to a 64bit time variable or used that already. > So no need to change anything. In fact the glibc utmp y2038 support depends of the ABI, some 64 bit ABIs decided to be compatible with 32 bits so the utmp files could be read/parsed by both ABIs (defined by __WORDSIZE_TIME64_COMPAT32). This required the ut_tv field to be define not as a 'struct timeval', but rather with a similar struct with 32 bit tv_sec (yes, it is a mess and not sure why it was considered a good idea back then). It means that for 64 bits that define __WORDSIZE_TIME64_COMPAT32ABI (mips, riscv, s390, sparc, powerpc, and x86) the utmp ABI is broken regarding y2038 support. The ut_tv is also defined depending of the time_t at build time (_TIME_BITS), so if you have programs with different time_t support, they won't correctly access the utmp (gnulib seems to have some overrides to fix it). Fixing those issues would require a lot of work that I don't think it worth for a API with some inherent implementation flaws [1] (most likely it would require a complete rewrite, which logind basically did). That's why I am leaning to complete remove glibc implementation and mimic what musl did (no-op implementation that return -1/ENOTSUP where applicable). [1] https://sourceware.org/bugzilla/show_bug.cgi?id=24492 > For BSD I don't really know the situation, but as far as I know, they > don't have the problem and thus no need to change anything. > > Thorsten > >> Thanks, >> Alex >> >>> >>> openSUSE Tumbleweed and MicroOS are no longer using nor supporting them >>> and fresh installations don't have that files anymore. >>> So new code should not use utmp/utmp/wtmp/lastlog anymore. Alternatives >>> are e.g. systemd-logind/wtmpdb/lastlog2. >> >> -- >> <https://www.alejandro-colomar.es/> > > > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 9:51 ` Alejandro Colomar 2023-11-08 9:59 ` Thorsten Kukuk @ 2023-11-08 14:06 ` Zack Weinberg 2023-11-08 15:07 ` Alejandro Colomar 2023-11-08 19:04 ` DJ Delorie 2 siblings, 1 reply; 138+ messages in thread From: Zack Weinberg @ 2023-11-08 14:06 UTC (permalink / raw) To: Alejandro Colomar, GNU libc development, Jonny Grant, 'linux-man' >> If you ask me, I'd not mark libc functions as deprecated without some >> kind of consesnsus from the libc maintainers too. ... > Okay, let's ask them. ... > Hi glibc developers, > > strncpy(3) ... Speaking only for myself, I would be very reluctant to declare any standardized function "deprecated" by glibc unless the relevant standards have also made that declaration. This goes double for anything that was in C89. Also speaking only for myself, the Linux manpages are welcome to discourage the use of any function that you feel is not a wise choice for new programs, but the word "deprecated" should be reserved for cases where there really has been a declaration of deprecation by us and/or the standards. The word "obsolete" should also be used very cautiously; it's broader, but I personally would only use it in situations where there is a direct replacement (e.g. sigaction replaces signal, strsep replaces strtok and strtok_r). In the specific cases we're discussing: I would definitely like to see a BUGS or NOTES section in the strncpy(3) manpage, warning people that it's probably not what they want and recommending use of strlen+memcpy instead. I don't know enough about the utmp(x) situation to have a strong opinion, but I do think the manpages need to be very clear that this particular proposed replacement for utmp(x) is Linux-specific and still somewhat experimental. zw ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 14:06 ` Zack Weinberg @ 2023-11-08 15:07 ` Alejandro Colomar 2023-11-08 19:45 ` G. Branden Robinson 2023-11-08 21:35 ` Carlos O'Donell 0 siblings, 2 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-08 15:07 UTC (permalink / raw) To: Zack Weinberg; +Cc: GNU libc development, Jonny Grant, 'linux-man' [-- Attachment #1: Type: text/plain, Size: 3528 bytes --] Hi Zack! On Wed, Nov 08, 2023 at 09:06:48AM -0500, Zack Weinberg wrote: > >> If you ask me, I'd not mark libc functions as deprecated without some > >> kind of consesnsus from the libc maintainers too. > ... > > Okay, let's ask them. > ... > > Hi glibc developers, > > > > strncpy(3) > ... > > Speaking only for myself, I would be very reluctant to declare any > standardized function "deprecated" by glibc unless the relevant > standards have also made that declaration. This goes double for > anything that was in C89. I understand your point of view, but disagree with it. Deprecation by ISO C or POSIX takes very very long. We had gets(3) for decades until they realized it should be removed from the standards. STANDARDS POSIX.1‐2008. HISTORY C89, POSIX.1‐2001. LSB deprecates gets(). POSIX.1‐2008 marks gets() obsoles‐ cent. ISO C11 removes the specification of gets() from the C language, and since glibc 2.16, glibc header files don’t expose the function declaration if the _ISOC11_SOURCE fea‐ ture test macro is defined. So we had it in ISO C in C89 and C99, and only in C11 they realized it had to be removed. POSIX hasn't even removed it yet! I won't hesitate to kill a function just because of bureaucracy. The standard, especially C89, was just a reflection of the commonalities of most implementation. It was a burden of implementations to add new stuff or to remove existing stuff. Later revisions of the standards invented more, though. In this case, since ISO C has no APIs that use strncpy(3), it could (and should) already deprecate strncpy(3) from ISO C. POSIX still needs it while it keeps utmpx(5), because there's no other way to correctly write to the fixed-width buffers within struct utmpx. > > Also speaking only for myself, the Linux manpages are welcome to > discourage the use of any function that you feel is not a wise choicei > for new programs, but the word "deprecated" should be reserved for > cases where there really has been a declaration of deprecation by us > and/or the standards. If a function is deprecated by a standard or other entity, that will be reflected in the STANDARDS or HISTORY section. For deprecation by the manual itself, the SYNOPSIS (and BUGS) sections are fine. In the end, the word 'deprecate' isn't any magic. From WordNet (r) 3.0 (2006) [wn]: deprecate v 1: express strong disapproval of; deplore That term applies to strncpy(3). > The word "obsolete" should also be used very cautiously; it's broader, > but I personally would only use it in situations where there is a > direct replacement (e.g. sigaction replaces signal, strsep replaces strtok and strtok_r). > > In the specific cases we're discussing: I would definitely like to see > a BUGS or NOTES section in the strncpy(3) manpage, warning people that > it's probably not what they want and recommending use of strlen+memcpy > instead. I don't know enough about the utmp(x) situation to have a > strong opinion, but I do think the manpages need to be very clear that > this particular proposed replacement for utmp(x) is Linux-specific and > still somewhat experimental. But yes, we need to make sure that the APIs that need strncpy(3) are all deprecated. If other Unix systems still need utmpx or similar stuff, strncpy(3) will still be necessary. Cheers, Alex > > zw -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 15:07 ` Alejandro Colomar @ 2023-11-08 19:45 ` G. Branden Robinson 2023-11-08 21:35 ` Carlos O'Donell 1 sibling, 0 replies; 138+ messages in thread From: G. Branden Robinson @ 2023-11-08 19:45 UTC (permalink / raw) To: 'linux-man' [-- Attachment #1: Type: text/plain, Size: 9838 bytes --] [bouncing a copy to linux-man with PDF attachment stripped] Hi Alex, At 2023-11-08T16:07:42+0100, Alejandro Colomar wrote: > I understand your point of view, but disagree with it. Deprecation by > ISO C or POSIX takes very very long. We had gets(3) for decades until > they realized it should be removed from the standards. I think it likely that the humans involved in the decision-making processes realized that gets(3) _should_ be removed a long time before it actually was. It is often difficult to get to the truth of why there is so much inertia, particularly when large commercial vendors are involved; such entities have long traditions of opacity. Sometimes it is because they send relatively clueless people as representatives to the standards body, because they don't value standards development as "real work" (how does it generate profit?), because it's a handy place to dump someone who's been awarded a sinecure--or who annoys many colleagues but isn't worth the effort to fire, or because that person is on an unstated mission to frustrate a market rival and doesn't care what the collateral damage is. My favorite example of the last is when Groupe Bull sent a fool[1] to the ISO 8859 standardization group. DEC's MCS (multinational character set) was a sound candidate to become ISO 8859-1 as-was, but it must have been thought that this would be "handing a victory" to DEC, so the Bull representative--one source says it was a Belgian--endorsed disruptive changes that made the encoding objectively worse for representation of standard French script. Sometimes Gallic chauvinism has to take a back seat to giving Maynard, Massachusetts a poke in the eye. Source attached. It's in French. I therefore think it's beneficial for you to pursue your campaign against strncpy(). Vested interests cling to interfaces for reasons they won't disclose, and cargo-cult programmers will employ them for reasons they don't understand. One of the fruits of discussions like these is that we can get the actual technical merits and demerits of such interfaces on the record. > So we had it in ISO C in C89 and C99, and only in C11 they realized it > had to be removed. POSIX hasn't even removed it yet! I won't > hesitate to kill a function just because of bureaucracy. You can't kill it; implementations will retain it practically forever to keep old code compiling. But you can sometimes scare away the cargo cultists by lighting yourself on fire and waving your arms. > The standard, especially C89, was just a reflection of the > commonalities of most implementation. It was a burden of > implementations to add new stuff or to remove existing stuff. Later > revisions of the standards invented more, though. And for what it's worth, Dennis Ritchie thought they lost the plot by doing so.[2] I admire a great deal of what Ritchie achieved, but I'm not confident he made the right call there. One elitist explanation I've seen ventured is that Bell Labs simply had inherently smarter people than most other software development shops could gather. _Maybe_ there is some truth to that, but I would venture a hypothesis less grounded on individual characteristics. The CSRC was a _research_ environment. It was emphatically not about measuring productivity by counting lines of code, or "moving fast and breaking stuff", or how many "Ship It" boxes you've ticket on your projects in the last year. Google was pretty explicit that suitability for production-line code output was a design objective for the Go language.[3] They had hired tons upon tons of smart people but found that it was hard to get their "ship it" metrics satisfactorily high when driving all their newly hired sheep through the mine fields of C (and especially C++[4]) programming. An old adage says, "it's a poor workman who blames his tools". But when nearly every worker to whom you give a set of tools struggles with high failure rates, it's time to question the fitness of those tools for the objective you have in mind. So Google did, and attempted to recreate for software engineers what Frederick Winslow Taylor achieved for factory laborers a hundred years ago. If there's less room for individual initiative, creativity, or insight, too bad--those don't keep the share price up.[5] You're a grunt. GBTW. > In this case, since ISO C has no APIs that use strncpy(3), it could > (and should) already deprecate strncpy(3) from ISO C. POSIX still > needs it while it keeps utmpx(5), because there's no other way to > correctly write to the fixed-width buffers within struct utmpx. I would like to emphasize that a fixed-width buffer is inherently an uneasy fit with C-style strings in the first place. The major selling point of null-terminated strings is their length flexibility. They are the entire reason we don't use Pascal-style strings, upon which C coders eagerly spit (too easily, when they embarrass themselves with strncpy()). And yet fixed-width buffers are traditionally ubiquitous in C, especially in the days before the GNU Coding Standards (and programmers' frequent desires for generality and adaptability) spurred C codes to use dynamic allocation much more aggressively. Why were these practices in tension is a language as purportedly shot through with genius as C was? Because, in my opinion, it was a bit of unfinished business in the language. This is why malloc(3) and free(3) are managed by the runtime rather than defined in the language proper. Back in 1970s and 1980s, "everybody knew" that you couldn't have safe dynamic memory allocation without a garbage collector, and there was no way to have a garbage collector run deterministically in general, a fatal flaw in real-time applications. (Even then, there were alternatives to throwing everything explicitly onto the heap.[6]) Thanks to particular improvements in compiler development (originally intended for code optimization), static analysis tools, an influential (if under-recognized) research programming language called Cyclone,[7] and a new language--Rust--that is making the fruits of these improvements available to a wide audience, we're learning to be better programmers. ...against the resistance of C grognards, who of course vociferously oppose deprecation of strncpy(3), because (they claim) it never caused _them_ any problems. > > Also speaking only for myself, the Linux manpages are welcome to > > discourage the use of any function that you feel is not a wise > > choicei for new programs, but the word "deprecated" should be > > reserved for cases where there really has been a declaration of > > deprecation by us and/or the standards. > > If a function is deprecated by a standard or other entity, that will be > reflected in the STANDARDS or HISTORY section. For deprecation by the > manual itself, the SYNOPSIS (and BUGS) sections are fine. In the end, > the word 'deprecate' isn't any magic. > > From WordNet (r) 3.0 (2006) [wn]: > > deprecate > v 1: express strong disapproval of; deplore > > That term applies to strncpy(3). Yes, but Zack raises a good point. Deprecation by ISO, by POSIX, by the glibc developers, and by the Linux man-pages project are all different things, and they all have different implications for portability. It is helpful for the everyday C programmer to know which of those implications to infer. Were I in your shoes, I would use the term "discourage". "The Linux man-pages project discourages use of strncpy() {for the reasons listed above, because ...}." > But yes, we need to make sure that the APIs that need strncpy(3) are > all deprecated. If other Unix systems still need utmpx or similar > stuff, strncpy(3) will still be necessary. You might also say this: "The deprecated strncpy(3) is mainly used in conjunction with other deprecated interfaces, like utmpx(5)." Regards, Branden [1] The term "moron" also comes to mind. Too strong a term? Just applying Hanlon's Razor here. [2] https://www.computerworld.com/article/2826125/the-future-according-to-dennis-ritchie--a-2000-interview-.html?page=2 This, followed by his death, is why there's never been a third edition of _The C Programming Language_, which I guess continues to be a best-seller for its publisher, even though it's not a good idea for newcomers to C to learn from it, any more than Kernighan & Pike's _The Unix Programming Environment_ is. (Once you've acquired a little historical perspective, they're _excellent_ resources!) [3] https://go.dev/talks/2012/splash.article Just read every sentence containing the word "productive". [4] https://google.github.io/styleguide/cppguide.html [5] That has to await your elevation to the C-suite, where more marketing dollars will be spent burnishing your reputation as a "genius" than any level of personal productivity could conceivably justify. See, e.g., Steve Jobs. Silicon Valley's thought leaders are on a work slowdown, you see--their compensation ratio needs to be higher[8] or they won't turn their massive brains to the trivial problems of cold fusion or room-temperature superconductors. Atlas ain't shrugging yet, but he's leaning over really far, shooting you a meaningful look, and clucking about the dire precedent set by this year's UAW strike. Where are the Pinkertons when you need them? And what's Erik Prince up to these days? [6] https://docs.adacore.com/gnat_ugx-docs/html/gnat_ugx/gnat_ugx/the_stacks.html [7] https://en.wikipedia.org/wiki/Cyclone_(programming_language) [8] https://www.epi.org/publication/ceo-pay-in-2021/ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 15:07 ` Alejandro Colomar 2023-11-08 19:45 ` G. Branden Robinson @ 2023-11-08 21:35 ` Carlos O'Donell 2023-11-08 22:11 ` Alejandro Colomar 1 sibling, 1 reply; 138+ messages in thread From: Carlos O'Donell @ 2023-11-08 21:35 UTC (permalink / raw) To: Alejandro Colomar, Zack Weinberg Cc: GNU libc development, Jonny Grant, 'linux-man' On 11/8/23 10:07, Alejandro Colomar wrote: > So we had it in ISO C in C89 and C99, and only in C11 they realized it > had to be removed. POSIX hasn't even removed it yet! I won't hesitate > to kill a function just because of bureaucracy. Attempting to get consensus at an international level, across cultural boundaries, use cases, workloads, and developer workflows is difficult and not intended to be bureaucracy for the sake of bureaucracy. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 21:35 ` Carlos O'Donell @ 2023-11-08 22:11 ` Alejandro Colomar 2023-11-08 23:31 ` Paul Eggert 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-08 22:11 UTC (permalink / raw) To: Carlos O'Donell Cc: Zack Weinberg, GNU libc development, Jonny Grant, 'linux-man' [-- Attachment #1: Type: text/plain, Size: 968 bytes --] On Wed, Nov 08, 2023 at 04:35:12PM -0500, Carlos O'Donell wrote: > On 11/8/23 10:07, Alejandro Colomar wrote: > > So we had it in ISO C in C89 and C99, and only in C11 they realized it > > had to be removed. POSIX hasn't even removed it yet! I won't hesitate > > to kill a function just because of bureaucracy. > > Attempting to get consensus at an international level, across cultural boundaries, > use cases, workloads, and developer workflows is difficult and not intended to be > bureaucracy for the sake of bureaucracy. Hi Carlos! I understand that, and respect ISO's work. I just don't think we need, as GNU or Linux projects, to be restricted to the decisions of ISO. We can realize that certain functions are bad, and mark them as deprecated in our scope. If others want to imitate (ISO might even take it as "prior art"), then great. Cheers, Alex > > -- > Cheers, > Carlos. > -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 22:11 ` Alejandro Colomar @ 2023-11-08 23:31 ` Paul Eggert 2023-11-09 0:29 ` Alejandro Colomar 0 siblings, 1 reply; 138+ messages in thread From: Paul Eggert @ 2023-11-08 23:31 UTC (permalink / raw) To: Alejandro Colomar, Carlos O'Donell Cc: Zack Weinberg, GNU libc development, Jonny Grant, 'linux-man' On 11/8/23 14:11, Alejandro Colomar wrote: > I just don't think we need, > as GNU or Linux projects, to be restricted to the decisions of ISO. We > can realize that certain functions are bad, and mark them as deprecated > in our scope. There's enough use of strncpy for the intended use (smallish fixed size character arrays that are null padded, not null terminated) that saying it's deprecated would likely cause more trouble than it's worth. It's not just utmp and tar; it's also socket programming (sun_path) and I'm sure other stuff. Were we designing the C library from scratch I'd agree with you: in that context, strncpy would clearly be more trouble than it's worth. But now that we're stuck with strncpy we have better things to do than try to deprecate it. Instead of saying "deprecate" I suggest we say something like "This function is generally a poor choice for processing strings" and point to the longer man page about strings in general. That's what the glibc manual does and it works reasonably well. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 23:31 ` Paul Eggert @ 2023-11-09 0:29 ` Alejandro Colomar 2023-11-09 10:13 ` Jonny Grant 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 0:29 UTC (permalink / raw) To: Paul Eggert Cc: Carlos O'Donell, Zack Weinberg, GNU libc development, Jonny Grant, 'linux-man' [-- Attachment #1: Type: text/plain, Size: 1811 bytes --] Hi Pail, On Wed, Nov 08, 2023 at 03:31:38PM -0800, Paul Eggert wrote: > On 11/8/23 14:11, Alejandro Colomar wrote: > > I just don't think we need, > > as GNU or Linux projects, to be restricted to the decisions of ISO. We > > can realize that certain functions are bad, and mark them as deprecated > > in our scope. > > There's enough use of strncpy for the intended use (smallish fixed size > character arrays that are null padded, not null terminated) that saying it's > deprecated would likely cause more trouble than it's worth. It's not just > utmp and tar; it's also socket programming (sun_path) and I'm sure other > stuff. > > Were we designing the C library from scratch I'd agree with you: in that > context, strncpy would clearly be more trouble than it's worth. But now that > we're stuck with strncpy we have better things to do than try to deprecate > it. No, no, I'm not trying to deprecate it. I was just saying that *iff* all of its uses were dead, I'd deprecate it. But they're clearly not dead, so it's a perfect function for those cases. > > Instead of saying "deprecate" I suggest we say something like "This function > is generally a poor choice for processing strings" and point to the longer > man page about strings in general. That's what the glibc manual does and it > works reasonably well. Yes, I've done something like this. string_copying(7) recommends avoiding fixed-width null-padded buffers in APIs. But for those use cases that already exist, this is the function to use. I'm also refusing to document how to (mis)use this function for truncating strings. If one wants to struncate strings, they'll need functions that were designed to do that (e.g., strlcpy(3)). Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 0:29 ` Alejandro Colomar @ 2023-11-09 10:13 ` Jonny Grant 2023-11-09 11:08 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar 2023-11-09 11:13 ` strncpy clarify result may not be null terminated Alejandro Colomar 0 siblings, 2 replies; 138+ messages in thread From: Jonny Grant @ 2023-11-09 10:13 UTC (permalink / raw) To: Alejandro Colomar, Paul Eggert Cc: Carlos O'Donell, Zack Weinberg, GNU libc development, 'linux-man' On 09/11/2023 00:29, Alejandro Colomar wrote: > Hi Pail, > > On Wed, Nov 08, 2023 at 03:31:38PM -0800, Paul Eggert wrote: >> On 11/8/23 14:11, Alejandro Colomar wrote: >>> I just don't think we need, >>> as GNU or Linux projects, to be restricted to the decisions of ISO. We >>> can realize that certain functions are bad, and mark them as deprecated >>> in our scope. >> >> There's enough use of strncpy for the intended use (smallish fixed size >> character arrays that are null padded, not null terminated) that saying it's >> deprecated would likely cause more trouble than it's worth. It's not just >> utmp and tar; it's also socket programming (sun_path) and I'm sure other >> stuff. >> >> Were we designing the C library from scratch I'd agree with you: in that >> context, strncpy would clearly be more trouble than it's worth. But now that >> we're stuck with strncpy we have better things to do than try to deprecate >> it. > > No, no, I'm not trying to deprecate it. I was just saying that *iff* > all of its uses were dead, I'd deprecate it. But they're clearly not > dead, so it's a perfect function for those cases. > >> >> Instead of saying "deprecate" I suggest we say something like "This function >> is generally a poor choice for processing strings" and point to the longer >> man page about strings in general. That's what the glibc manual does and it >> works reasonably well. > > Yes, I've done something like this. string_copying(7) recommends > avoiding fixed-width null-padded buffers in APIs. But for those use > cases that already exist, this is the function to use. https://man7.org/linux/man-pages/man7/string_copying.7.html Rather than "catenation", in my experience "concatenation" is the common term to explain what it does. There are quite a few on that page. Probably other man pages too. How about following the style of the other man pages that put the notes about each function below them? (rather than above) https://man7.org/linux/man-pages/man3/string.3.html size_t strlen(const char *s); Return the length of the string s. At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different: // Copy/catenate a string. char *strcpy(char *restrict dst, const char *restrict src); char *strcat(char *restrict dst, const char *restrict src); Kind regards Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* catenate vs concatenate (was: strncpy clarify result may not be null terminated) 2023-11-09 10:13 ` Jonny Grant @ 2023-11-09 11:08 ` Alejandro Colomar 2023-11-09 14:06 ` catenate vs concatenate Jonny Grant 2023-11-27 14:33 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg 2023-11-09 11:13 ` strncpy clarify result may not be null terminated Alejandro Colomar 1 sibling, 2 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 11:08 UTC (permalink / raw) To: Jonny Grant Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg, GNU libc development, 'linux-man' [-- Attachment #1: Type: text/plain, Size: 796 bytes --] Hi Jonny, On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote: > https://man7.org/linux/man-pages/man7/string_copying.7.html > Rather than "catenation", in my experience "concatenation" is the common term to explain what it does. There are quite a few on that page. Probably other man pages too. Here's why: <https://lore.kernel.org/linux-man/CAKH6PiUrQzb7vRZxUs0742WnfaLpcUec0QfdJQJ5Di8LqFg+NA@mail.gmail.com/> Douglas McIlroy wrote (Wed, 14 Dec 2022 11:22:05 -0500): >> concatenate > > We began fighting this pomposity before v7. There has only been > backsliding since.. > "Catenate" is crisper, means the same thing, and concurs with the "cat" command. > I invite you to join the battle for simplicity. Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: catenate vs concatenate 2023-11-09 11:08 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar @ 2023-11-09 14:06 ` Jonny Grant 2023-11-27 14:33 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg 1 sibling, 0 replies; 138+ messages in thread From: Jonny Grant @ 2023-11-09 14:06 UTC (permalink / raw) To: Alejandro Colomar Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg, GNU libc development, 'linux-man' On 09/11/2023 11:08, Alejandro Colomar wrote: > Hi Jonny, > > On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote: >> https://man7.org/linux/man-pages/man7/string_copying.7.html >> Rather than "catenation", in my experience "concatenation" is the common term to explain what it does. There are quite a few on that page. Probably other man pages too. > > Here's why: > <https://lore.kernel.org/linux-man/CAKH6PiUrQzb7vRZxUs0742WnfaLpcUec0QfdJQJ5Di8LqFg+NA@mail.gmail.com/> > > Douglas McIlroy wrote (Wed, 14 Dec 2022 11:22:05 -0500): >>> concatenate >> >> We began fighting this pomposity before v7. There has only been >> backsliding since.. >> "Catenate" is crisper, means the same thing, and concurs with the "cat" command. >> I invite you to join the battle for simplicity. > > Cheers, > Alex > Looks like it's already been discussed. Where a term is already in use, it's a question if to change the commonly used term. Technical documents seem to be mostly 'concatenate'. Looks like people have already decided on going with 'catenate'. Kind regards Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated) 2023-11-09 11:08 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar 2023-11-09 14:06 ` catenate vs concatenate Jonny Grant @ 2023-11-27 14:33 ` Zack Weinberg 2023-11-27 15:08 ` Alejandro Colomar 1 sibling, 1 reply; 138+ messages in thread From: Zack Weinberg @ 2023-11-27 14:33 UTC (permalink / raw) To: Alejandro Colomar, Jonny Grant Cc: Paul Eggert, Carlos O'Donell, GNU libc development, 'linux-man' [all attribution deleted because it was so tangled I couldn't make sense of it] >> Rather than "catenation", in my experience "concatenation" is the >> common term ... > We began fighting this pomposity before v7. There has only been > backsliding since. "Catenate" is crisper, means the same thing, [English pedant mode on] "Concatenate" is the correct term; "catenate" means something completely different, probably "hang between two posts like a chain". You can't chop prefixes off a Latinate word and have it still mean the same thing. [English pedant mode off] Also, and much more importantly, "concatenate" is used at least 100x more often than "catenate" in modern English, and that means it's the word that a randomly selected reader of the manpages is more likely to know, and, therefore, the word that the manpages should be using. https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3 zw ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated) 2023-11-27 14:33 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg @ 2023-11-27 15:08 ` Alejandro Colomar 2023-11-27 15:13 ` Alejandro Colomar 2023-11-27 16:59 ` G. Branden Robinson 0 siblings, 2 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-27 15:08 UTC (permalink / raw) To: Zack Weinberg Cc: Jonny Grant, Paul Eggert, Carlos O'Donell, GNU libc development, 'linux-man' [-- Attachment #1: Type: text/plain, Size: 2002 bytes --] Hi Zack, On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote: > [all attribution deleted because it was so tangled I couldn't make > sense of it] > > >> Rather than "catenation", in my experience "concatenation" is the > >> common term The above was Jonny Grant. > > We began fighting this pomposity before v7. There has only been > > backsliding since. "Catenate" is crisper, means the same thing, The above was Doug McIlroy. > [English pedant mode on] > > "Concatenate" is the correct term; "catenate" means something completely > different, probably "hang between two posts like a chain". You can't > chop prefixes off a Latinate word and have it still mean the same thing. [Latin pedant mode on] contatenate comes from the Latin concatenare. The prefix "con-" means "join", "together", and "catena" means "chain". <https://en.wiktionary.org/wiki/concatenate> catenate comes from the Latin catenare, which AFAICS, seems a synonym. It just drops the redundant "con-" prefix, since "catena" already implies it. <https://en.wiktionary.org/wiki/catenate> English isn't as propense as other Latin languages to have such synonyms where one of them simply adds a redundant prefix or suffix, but Catalan or Spanish for example have several such cases. [Latin pedant mode off] > [English pedant mode off] > > Also, and much more importantly, "concatenate" is used at least 100x > more often than "catenate" in modern English, and that means it's the > word that a randomly selected reader of the manpages is more likely to > know, and, therefore, the word that the manpages should be using. > > https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3 Heh, Paul sent a patch for changing it to append, which I applied, since it reads better, even if it removes the mnemonics of cat for catenate. :) Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated) 2023-11-27 15:08 ` Alejandro Colomar @ 2023-11-27 15:13 ` Alejandro Colomar 2023-11-27 16:59 ` G. Branden Robinson 1 sibling, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-27 15:13 UTC (permalink / raw) To: Zack Weinberg Cc: Jonny Grant, Paul Eggert, Carlos O'Donell, GNU libc development, 'linux-man' [-- Attachment #1: Type: text/plain, Size: 2245 bytes --] On Mon, Nov 27, 2023 at 04:08:17PM +0100, Alejandro Colomar wrote: > Hi Zack, > > On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote: > > [all attribution deleted because it was so tangled I couldn't make > > sense of it] > > > > >> Rather than "catenation", in my experience "concatenation" is the > > >> common term > > The above was Jonny Grant. > > > > We began fighting this pomposity before v7. There has only been > > > backsliding since. "Catenate" is crisper, means the same thing, > > The above was Doug McIlroy. > > > [English pedant mode on] > > > > "Concatenate" is the correct term; "catenate" means something completely > > different, probably "hang between two posts like a chain". You can't > > chop prefixes off a Latinate word and have it still mean the same thing. > > [Latin pedant mode on] > > contatenate comes from the Latin concatenare. The prefix "con-" means > "join", "together", and "catena" means "chain". > <https://en.wiktionary.org/wiki/concatenate> > > catenate comes from the Latin catenare, which AFAICS, seems a synonym. > It just drops the redundant "con-" prefix, since "catena" already > implies it. > <https://en.wiktionary.org/wiki/catenate> > > English isn't as propense as other Latin languages to have such synonyms s/other// > where one of them simply adds a redundant prefix or suffix, but Catalan > or Spanish for example have several such cases. > > [Latin pedant mode off] > > > [English pedant mode off] > > > > Also, and much more importantly, "concatenate" is used at least 100x > > more often than "catenate" in modern English, and that means it's the > > word that a randomly selected reader of the manpages is more likely to > > know, and, therefore, the word that the manpages should be using. > > > > https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3 > > Heh, Paul sent a patch for changing it to append, which I applied, since > it reads better, even if it removes the mnemonics of cat for catenate. :) > > Cheers, > Alex > > -- > <https://www.alejandro-colomar.es/> -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated) 2023-11-27 15:08 ` Alejandro Colomar 2023-11-27 15:13 ` Alejandro Colomar @ 2023-11-27 16:59 ` G. Branden Robinson 2023-11-27 18:35 ` Zack Weinberg 1 sibling, 1 reply; 138+ messages in thread From: G. Branden Robinson @ 2023-11-27 16:59 UTC (permalink / raw) To: Alejandro Colomar Cc: Zack Weinberg, Jonny Grant, Paul Eggert, Carlos O'Donell, GNU libc development, 'linux-man' [-- Attachment #1: Type: text/plain, Size: 5481 bytes --] At 2023-11-27T16:08:17+0100, Alejandro Colomar wrote: > On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote: > > [all attribution deleted because it was so tangled I couldn't make > > sense of it] This elision was pretty poor form, given that one of the people whose attribution (and opinion) Zack discarded was a relevant authority: M. Douglas McIlroy, an alum of the Bell Labs Computing Science Research Center and editor of the Seventh Edition Unix Programmer's Manual. > > > We began fighting this pomposity before v7. There has only been > > > backsliding since. "Catenate" is crisper, means the same thing, > > The above was Doug McIlroy. > > > [English pedant mode on] > > > > "Concatenate" is the correct term; "catenate" means something > > completely different, probably "hang between two posts like a > > chain". You can't chop prefixes off a Latinate word and have it > > still mean the same thing. In some cases, you can. Witness the case of "flammable"/inflammable", which are synonymous. The former term arose because the prefix "in-" alters meaning in multiple ways in English[1] (maybe Latin, too). The coinage of "flammable" later became important in the labeling and transport of hazardous materials. Some pedants must despair of this linguistic innovation, perhaps viewing the prospect of handlers of such materials burning to death as a just punishment for their lack of morphological and etymological sophistication. If you don't want to die like a prole, get an English degree, eh?[2] Here, the "con-" prefix is duplicative. It doesn't pay its freight. > > [English pedant mode off] When one discards all other authorities, all that remains is one's own. I trust we can recognize the parallels here with Dunning-Krugeresque self-regard. > > Also, and much more importantly, "concatenate" is used at least 100x > > more often than "catenate" in modern English, and that means it's > > the word that a randomly selected reader of the manpages is more > > likely to know, and, therefore, the word that the manpages should be > > using. Man pages are specialized technical literature demanding a bespoke vocabulary. Some employment of jargon is inescapable, even necessary. In any case, "catenate" has ~50 years of attestation in this domain alone, which constitutes approximately the entire history of Unix discourse. If you apply this sort of frequency analysis to contrast man page and general English corpora more broadly, I predict that you'll find many candidates for terminological replacement that you would _not_ embrace. For instance...[3] https://books.google.com/ngrams/graph?content=open+source%2Cfree+software&year_start=1980&year_end=2019&corpus=en-2019&smoothing=3 https://books.google.com/ngrams/graph?content=emacs%2Cvi&year_start=1980&year_end=2019&corpus=en-2019&smoothing=3 Zack also overlooks the process by which speakers and readers of a language grapple with unfamiliar words that they encounter unexpectedly. Before undertaking to reach for dictionaries (online or otherwise), many readers morphophonemically analyze them to see if they can infer their meanings from familiar components.[4] > > https://books.google.com/ngrams/graph?content=concatenate%2Ccatenate&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3 > > Heh, Paul sent a patch for changing it to append, which I applied, > since it reads better, even if it removes the mnemonics of cat for > catenate. :) In Unix culture, one will need to remain conversant with the term "catenate" to know why cat(1) is not named "concat(1)". ;-) "Concatenate" may end up prevailing even in *nix man pages; languages do not necessarily evolve in directions that maximize lexical economy.[5] But to change one's usage based on the break room reasoning put on offer in this thread is a terrible idea. Regards, Branden [1] https://www.saturdayeveningpost.com/2023/02/in-a-word-flammable-inflammable-and-nonflammable/ [2] ...where the first-order factor in determining your academic merit will be your facility with the ideas of 20th-century French political philosophers. [3] One can complain that the second example suffers from a confounding effect given one of the terms' appearance as a roman numeral. Precisely. Google Ngram Viewer is not sensitive to context. Zack's use of it is a makeweight recourse to cloak an opinion grounded on personal preference in a shroud of false objectivity. [4] I see this practice offered as advice in numerous resources, and it reflects my own approach as a native English speaker who acquired language before the availability of computerized (let alone hyperlinked) dictionaries in the home, but in a perfunctory search I couldn't turn up any _studies_ of what readers _actually do_. One technique that could arise from Zack's approach would be to obtain an English word list sorted by frequency, strike off known words until encountering an unfamiliar one, learn it, then resume the process until the unfamiliar word that actually came up is reached. (This way you can be more confident in your own writing and speech that you don't use an obscure word where a more common one suffices.) How well do we suppose such a process might work? [5] certainly not if _my_ emails play any part in that evolution <drum fill> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated) 2023-11-27 16:59 ` G. Branden Robinson @ 2023-11-27 18:35 ` Zack Weinberg 2023-11-27 23:45 ` G. Branden Robinson 0 siblings, 1 reply; 138+ messages in thread From: Zack Weinberg @ 2023-11-27 18:35 UTC (permalink / raw) To: G. Branden Robinson, Alejandro Colomar Cc: Jonny Grant, Paul Eggert, Carlos O'Donell, GNU libc development, 'linux-man' On Mon, Nov 27, 2023, at 11:59 AM, G. Branden Robinson wrote: > At 2023-11-27T16:08:17+0100, Alejandro Colomar wrote: >> On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote: >> > [English pedant mode on] >> > >> > "Concatenate" is the correct term; "catenate" means something >> > completely different, probably "hang between two posts like a >> > chain". You can't chop prefixes off a Latinate word and have it >> > still mean the same thing. > > In some cases, you can. Witness the case of "flammable"/inflammable", > which are synonymous. Yeah, and (after seeing Alejandro's reply) I did look up both "concatenate" and "catenate" and find that they are synonymous in English and both are attested from the 1600s. **But I had to look that up.** I cannot recall ever encountering the word "catenate" prior to this thread, and my knee-jerk reaction was "typo." Based on actual experience trying, and mostly failing, to teach college undergraduates to read man pages, I believe someone new to English technical documentation would have a different, much more troublesome knee-jerk reaction: "There must be some subtle reason why this documentation is using an unfamiliar term 'catenate', instead of 'concatenate' that I already know." Followed by wasting a bunch of time trying to research that unfamiliar term, and when they find it's an exact synonym, adding another tick mark to their mental tally for "manpages are badly written and hard to understand." > Man pages are specialized technical literature demanding a bespoke > vocabulary. Some employment of jargon is inescapable, even necessary. > In any case, "catenate" has ~50 years of attestation in this domain > alone, which constitutes approximately the entire history of Unix > discourse. This is no excuse. Specialized technical jargon is only appropriate when there is an actual difference in meaning. (Thus, your "open source" vs "free software" counterpoint is bogus.) > Zack also overlooks the process by which speakers and readers of a > language grapple with unfamiliar words that they encounter > unexpectedly. Before undertaking to reach for dictionaries (online or > otherwise), many readers morphophonemically analyze them to see if > they can infer their meanings from familiar components.[4] In grappling with general literature, yes. In grappling with technical writing, *no*, and again I am speaking from direct experience as an educator. Readers who encounter an unfamiliar word in technical documents will most probably assume that the word has a precise meaning that they must learn, and that they *cannot* deduce that meaning from context. If they can't find a definition -- and they might not even try looking in a general dictionary, since they may assume that the relevant definition is too specialized to appear there; also it seems to me that schoolchildren are not being taught how to use dictionaries anymore -- *they will give up on the entire document*. Yes, this is bad. It's an instance of learned helplessness, and it's going to take decades and major educational reform at the grade-school level to fix. But there's one thing we, authors of technical documents, can do about it right now, and that is embrace plain talk. For example, whenever there really is no difference of meaning, the most common word in general usage is the word that should be used. > In Unix culture, one will need to remain conversant with the term > "catenate" to know why cat(1) is not named "concat(1)". ;-) This is how I would teach it: 'concat' is too long for Kernighan and Ritchie's 1970s (or more precisely ASR33) tastes; 'con' was already in use as an abbreviation for 'console' (not in Unix itself, but in other contemporary OSes); and 'cat' is the next three letters of "concatenate". So that's what they picked. zw ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: catenate vs concatenate (was: strncpy clarify result may not be null terminated) 2023-11-27 18:35 ` Zack Weinberg @ 2023-11-27 23:45 ` G. Branden Robinson 0 siblings, 0 replies; 138+ messages in thread From: G. Branden Robinson @ 2023-11-27 23:45 UTC (permalink / raw) To: Zack Weinberg Cc: Alejandro Colomar, Jonny Grant, Paul Eggert, Carlos O'Donell, GNU libc development, 'linux-man' [-- Attachment #1: Type: text/plain, Size: 10006 bytes --] Hi Zack, At 2023-11-27T13:35:01-0500, Zack Weinberg wrote: > On Mon, Nov 27, 2023, at 11:59 AM, G. Branden Robinson wrote: > > At 2023-11-27T16:08:17+0100, Alejandro Colomar wrote: > >> On Mon, Nov 27, 2023 at 09:33:56AM -0500, Zack Weinberg wrote: > >> > [English pedant mode on] > >> > > >> > "Concatenate" is the correct term; "catenate" means something > >> > completely different, probably "hang between two posts like a > >> > chain". You can't chop prefixes off a Latinate word and have it > >> > still mean the same thing. > > > > In some cases, you can. Witness the case of > > "flammable"/inflammable", which are synonymous. > > Yeah, and (after seeing Alejandro's reply) I did look up both > "concatenate" and "catenate" and find that they are synonymous in > English and both are attested from the 1600s. > > **But I had to look that up.** That's not a bug. When we stop learning, our brains die. > I cannot recall ever encountering the word "catenate" prior to this > thread, and my knee-jerk reaction was "typo." The patellar reflex is not a reliable guide to purposeful development. > Based on actual experience trying, and mostly failing, to teach > college undergraduates to read man pages, I empathize with you here. I have a bit of background in teaching and a bit more in man page composition. Over the years my emotional response to being frustrated that I have to quote a man page to other software professionals in an email or message board has evolved into relief that I have material of reasonable quality to quote to people...when that happens. Sometimes a person raises an issue and my internal Gilbert Gottfried yells, "you FOOL![1] That's plainly documented in--wait, uh, give me a second. Uh...sh*t, I need to write a patch to this man page." > I believe someone new to English technical documentation would have a > different, much more troublesome knee-jerk reaction: "There must be > some subtle reason why this documentation is using an unfamiliar term > 'catenate', instead of 'concatenate' that I already know." Followed by > wasting a bunch of time trying to research that unfamiliar term, and > when they find it's an exact synonym, adding another tick mark to > their mental tally for "manpages are badly written and hard to > understand." I think your hypothesis is sorely in need of testing. My own feeling is that unfamiliarity with standard English vocabulary is well down the list of things that people find frustrating about man pages, if we take the product of annoyance level times the number of people perceiving a defect. > > Man pages are specialized technical literature demanding a bespoke > > vocabulary. Some employment of jargon is inescapable, even > > necessary. In any case, "catenate" has ~50 years of attestation in > > this domain alone, which constitutes approximately the entire > > history of Unix discourse. > > This is no excuse. Specialized technical jargon is only appropriate > when there is an actual difference in meaning. (Thus, your "open > source" vs "free software" counterpoint is bogus.) I offered them in a tongue-in-cheek effort at humor. I don't regard "Emacs" and "vi" as synonymous, either. Also I know they'll take away your GNU card if you claim "open source" and "free software" equivalence.[2] Analogously, "disenfranchise" and "disfranchise" are also synonymous, and I prefer the latter to the former for the same reason, popularity be damned. > > Before undertaking to reach for dictionaries (online or otherwise), > > many readers morphophonemically analyze them to see if they can > > infer their meanings from familiar components. > > In grappling with general literature, yes. In grappling with > technical writing, *no*, and again I am speaking from direct > experience as an educator. Readers who encounter an unfamiliar word > in technical documents will most probably assume that the word has a > precise meaning that they must learn, and that they *cannot* deduce > that meaning from context. If that's the case, then our field is doing a crap job at terminology selection. (Stop the presses, right?) > If they can't find a definition -- and they might not even try looking > in a general dictionary, since they may assume that the relevant > definition is too specialized to appear there; also it seems to me > that schoolchildren are not being taught how to use dictionaries > anymore Enough of them seem to be using urbandictionary.com that the concept remains familiar. > -- *they will give up on the entire document*. > > Yes, this is bad. It's an instance of learned helplessness, and it's > going to take decades and major educational reform at the grade-school > level to fix. But there's one thing we, authors of technical > documents, can do about it right now, and that is embrace plain talk. > For example, whenever there really is no difference of meaning, the > most common word in general usage is the word that should be used. Again I'm going to have to disagree with you. Where we can morphologically simplify without loss of meaning, I think that fits a meaning of "plain talk" that is reasonably robust across the many cultural contexts in which English is used. Your popularity metric is vulnerable to sampling biases, particularly of the geographical sort. And the plainer the talk, the more it is exposed to confounding regional factors. When I moved to Australia, I had a frustrating experience at the grocery store. I need to replace a light bulb. No sign anywhere in the store helped me. While searching fruitlessly, I vaguely noted a sign for "globes", and a thought that didn't quite reach the top of my brain observed that globes are a damned weird thing to sell in a grocery--but hey, it's Australia, maybe they need a _reminder_ that they're hanging from the Earth's underbelly.[9] After a few more minutes, these two threads joined. Q: How many seppos does it take to screw in a light bulb? A: What's gardening got to do with it? > > In Unix culture, one will need to remain conversant with the term > > "catenate" to know why cat(1) is not named "concat(1)". ;-) > > This is how I would teach it: 'concat' is too long for Kernighan and > Ritchie's 1970s (or more precisely ASR33) tastes; 'con' was already in > use as an abbreviation for 'console' (not in Unix itself, but in other > contemporary OSes); and 'cat' is the next three letters of > "concatenate". So that's what they picked. Please don't teach that. There's a lot about it I find dubious. 1. Thompson was the primary human force for extreme terseness in Unix culture, as far as I can tell from my readings in CSRC history. (There were other technical and ergonomic forces driving it, like low line speeds and the Fortran linker on the PDP-11--which C initially re-used--being limited to six significant characters in external identifiers.) Kernighan's own writings suggest that he preferred clear labels over cryptic ones (see his _The Elements of Programming Style_, with Plauger; _Software Tools_, also with Plauger; and _The Unix Programming Environment_, with Pike). I speculate that Thompson reasoned that he'd never need more than 26*26 commands anyway, so there was no reason to use an encoding space larger than that to denote them.[3] 2. "ASR33" is a misleading misnomer in a couple of respects. You're referring to a Western Electric Teletype Model 33. "ASR" is neither a manufacturer nor a model, but a configuration option. Specifically, "ASR" devices didn't have keyboards--just a paper tape punch and reader--so they were not much used for Unix development. "KSR" (keyboard send and receive) was the relevant configuration. 3. The Bell Labs CSRC didn't use Model 33s anyway. Western Electric was also part of the Bell monopoly, and by late 1972 at the latest, Labs personnel got to drive Cadillacs--the Model 37, and moreover the ones used to produce Unix had the "Greek" character set extension.[4] You will find references to both devices in the Seventh Edition man pages, but the terminal driver was "tuned for Teletype Model 37's"[5], and troff(1) named it as a supported terminal device rather than the 33.[6] That said, Model 33s were supported, and widely used at Unix installations outside the Labs. 4. Your deployment of "CON" to refer to the console device may be anachronistic. I can't find any evidence that Multics used this name for it. I'm not familiar enough with IBM's OS offerings over the decades to be able to navigate online material about it. Many people likely know that MS-DOS called its console device that, but cat(1) is about a decade older than that product.[7][8] Regards Branden [1] https://www.youtube.com/watch?v=2NpTmKmWdzk [2] https://www.gnu.org/philosophy/open-source-misses-the-point.en.html [3] I base this surmise on more than an attempt at mind reading. See the first footnote on page 6 of McIlroy's "A Research Unix Reader". https://www.cs.dartmouth.edu/~doug/reader.pdf [4] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V3/man/man7/greek.7 [5] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/man/man4/tty.4 [6] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/man/man1/troff.1 [7] https://minnie.tuhs.org/cgi-bin/utree.pl?file=V1/man/man1/cat.1 [8] https://www.os2museum.com/wp/dos/dos-1-0-and-1-1/ [9] I'm teasing. I'd have loved an "upside-down" globe, not least as a reminder that the melting of the Antarctic ice sheets will pour inundating destruction down on most of us thanks to the superior qualities of billionaires. I already had a counter-clockwise clock, but didn't take it with me to Oz. Also the moon is wrong there. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 10:13 ` Jonny Grant 2023-11-09 11:08 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar @ 2023-11-09 11:13 ` Alejandro Colomar 2023-11-09 14:05 ` Jonny Grant 1 sibling, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 11:13 UTC (permalink / raw) To: Jonny Grant Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg, GNU libc development, 'linux-man' [-- Attachment #1: Type: text/plain, Size: 1169 bytes --] Hi Jonny, On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote: > On 09/11/2023 00:29, Alejandro Colomar wrote: > How about following the style of the other man pages that put the notes about each function below them? (rather than above) > https://man7.org/linux/man-pages/man3/string.3.html > > size_t strlen(const char *s); > Return the length of the string s. > > > At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different: > > // Copy/catenate a string. > char *strcpy(char *restrict dst, const char *restrict src); > char *strcat(char *restrict dst, const char *restrict src); The reason for this presentation is that I want to first look at what they do, and only then look at the function you need to do that. So, if you want to copy from a character sequence into a string, you search for that, and it will tell you what functions you can use for that (strncat(3) is the only standard one). If you want to search for a specific function, you can always search with '/strncpy'. Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 11:13 ` strncpy clarify result may not be null terminated Alejandro Colomar @ 2023-11-09 14:05 ` Jonny Grant 2023-11-09 15:04 ` Alejandro Colomar 0 siblings, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-09 14:05 UTC (permalink / raw) To: Alejandro Colomar Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg, GNU libc development, 'linux-man' On 09/11/2023 11:13, Alejandro Colomar wrote: > Hi Jonny, > > On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote: >> On 09/11/2023 00:29, Alejandro Colomar wrote: >> How about following the style of the other man pages that put the notes about each function below them? (rather than above) >> https://man7.org/linux/man-pages/man3/string.3.html >> >> size_t strlen(const char *s); >> Return the length of the string s. >> >> >> At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different: >> >> // Copy/catenate a string. >> char *strcpy(char *restrict dst, const char *restrict src); >> char *strcat(char *restrict dst, const char *restrict src); > > The reason for this presentation is that I want to first look at what > they do, and only then look at the function you need to do that. That appears different to the man page convention. It looks odd especially with the extra // that I don't recall other pages having in the description, usually that would be for examples. Consistency is best, but I'll leave it with you. Kind regards Jonny > > So, if you want to copy from a character sequence into a string, you > search for that, and it will tell you what functions you can use for > that (strncat(3) is the only standard one). > > If you want to search for a specific function, you can always search > with '/strncpy'. > > Cheers, > Alex > ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 14:05 ` Jonny Grant @ 2023-11-09 15:04 ` Alejandro Colomar 0 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 15:04 UTC (permalink / raw) To: Jonny Grant Cc: Paul Eggert, Carlos O'Donell, Zack Weinberg, GNU libc development, 'linux-man' [-- Attachment #1: Type: text/plain, Size: 1545 bytes --] On Thu, Nov 09, 2023 at 02:05:38PM +0000, Jonny Grant wrote: > > > On 09/11/2023 11:13, Alejandro Colomar wrote: > > Hi Jonny, > > > > On Thu, Nov 09, 2023 at 10:13:24AM +0000, Jonny Grant wrote: > >> On 09/11/2023 00:29, Alejandro Colomar wrote: > >> How about following the style of the other man pages that put the notes about each function below them? (rather than above) > >> https://man7.org/linux/man-pages/man3/string.3.html > >> > >> size_t strlen(const char *s); > >> Return the length of the string s. > >> > >> > >> At the moment on string_copying there are // comments on the line above each function. So the presentation of the information is different: > >> > >> // Copy/catenate a string. > >> char *strcpy(char *restrict dst, const char *restrict src); > >> char *strcat(char *restrict dst, const char *restrict src); > > > > The reason for this presentation is that I want to first look at what > > they do, and only then look at the function you need to do that. > > That appears different to the man page convention. It looks odd especially with the extra // that I don't recall other pages having in the description, usually that would be for examples. Consistency is best, but I'll leave it with you. The difference is that you're comparing to man3 pages, which document specific functions. string_copying(7) instead documents how to copy functions, and specific functions are only means to that end. I'll keep it this way. Thanks, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 9:51 ` Alejandro Colomar 2023-11-08 9:59 ` Thorsten Kukuk 2023-11-08 14:06 ` Zack Weinberg @ 2023-11-08 19:04 ` DJ Delorie 2023-11-08 19:40 ` Alejandro Colomar 2 siblings, 1 reply; 138+ messages in thread From: DJ Delorie @ 2023-11-08 19:04 UTC (permalink / raw) To: Alejandro Colomar; +Cc: libc-alpha, jg, linux-man Alejandro Colomar <alx@kernel.org> writes: > strncpy(3) is useful to write to fixed-width buffers like `struct utmp` > and `struct utmpx`. Is there any other libc API that needs strncpy(3)? Let's not limit ourselves to glibc APIs. Tar format, for example, uses fixed length fields (and my bet is that strncpy was created for it) yet tar is not part of glibc. IMHO the solution here is to document strncpy with sufficiently obvious intent that it is NOT a length-limited strcpy (i.e. strlcpy) and should ONLY be used for its intended purpose (filling a space-padded but not null-terminated field) It is not documentation's purpose to limit programmer's creativity, just to give them an accurate representation of what the functions do. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 19:04 ` DJ Delorie @ 2023-11-08 19:40 ` Alejandro Colomar 2023-11-08 19:58 ` DJ Delorie 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-08 19:40 UTC (permalink / raw) To: DJ Delorie; +Cc: libc-alpha, jg, linux-man [-- Attachment #1: Type: text/plain, Size: 1849 bytes --] Hi DJ, On Wed, Nov 08, 2023 at 02:04:45PM -0500, DJ Delorie wrote: > Alejandro Colomar <alx@kernel.org> writes: > > strncpy(3) is useful to write to fixed-width buffers like `struct utmp` > > and `struct utmpx`. Is there any other libc API that needs strncpy(3)? > > Let's not limit ourselves to glibc APIs. Tar format, for example, uses > fixed length fields (and my bet is that strncpy was created for it) yet > tar is not part of glibc. > > IMHO the solution here is to document strncpy with sufficiently obvious > intent that it is NOT a length-limited strcpy (i.e. strlcpy) and should > ONLY be used for its intended purpose (filling a space-padded but not > null-terminated field) Indeed. That's what I did (I think). DESCRIPTION These functions copy the string pointed to by src into a null‐ padded character sequence at the fixed‐width buffer pointed to by dst. If the destination buffer, limited by its size, isn’t large enough to hold the copy, the resulting character sequence is truncated. ... CAVEATS The name of these functions is confusing. These functions pro‐ duce a null‐padded character sequence, not a string (see string_copying(7)). It’s impossible to distinguish truncation by the result of the call, from a character sequence that just fits the destination buffer; truncation should be detected by comparing the length of the input string with the size of the destination buffer. I refuse to add any hints that strncpy(3) is good for copying strings. > > It is not documentation's purpose to limit programmer's creativity, just > to give them an accurate representation of what the functions do. Thanks! Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 19:40 ` Alejandro Colomar @ 2023-11-08 19:58 ` DJ Delorie 2023-11-08 20:13 ` Alejandro Colomar 0 siblings, 1 reply; 138+ messages in thread From: DJ Delorie @ 2023-11-08 19:58 UTC (permalink / raw) To: Alejandro Colomar; +Cc: libc-alpha, jg, linux-man Perhaps an example that shows the problem? EXAMPLES strncpy (buf, "1", 5); { '1', 0, 0, 0, 0 } strncpy (buf, "1234", 5); { '1', '2', '3', '4', 0 } strncpy (buf, "12345", 5); { '1', '2', '3', '4', '5' } strncpy (buf, "123456", 5); { '1', '2', '3', '4', '5' } Maybe strcpy and strncpy shouldn't even share man pages, since they're not as related as we once thought? ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 19:58 ` DJ Delorie @ 2023-11-08 20:13 ` Alejandro Colomar 2023-11-08 21:07 ` DJ Delorie 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-08 20:13 UTC (permalink / raw) To: DJ Delorie; +Cc: libc-alpha, jg, linux-man [-- Attachment #1: Type: text/plain, Size: 2647 bytes --] Hi DJ, On Wed, Nov 08, 2023 at 02:58:24PM -0500, DJ Delorie wrote: > > Perhaps an example that shows the problem? Maybe. > > EXAMPLES > > strncpy (buf, "1", 5); > { '1', 0, 0, 0, 0 } > > strncpy (buf, "1234", 5); > { '1', '2', '3', '4', 0 } > > strncpy (buf, "12345", 5); > { '1', '2', '3', '4', '5' } > > strncpy (buf, "123456", 5); > { '1', '2', '3', '4', '5' } Would you mind reading the latest versions of strcpy(3), strncpy(3), and string_copying(7), as in the git repository, and comment your thoughts? You don't even need to install the pages from git. You can read them with this: $ git clone https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/ $ cd man-pages/ $ man ./man3/strcpy.3 $ man ./man3/strncpy.3 $ man ./man7/string_copying.7 Also check the examples and suggest if anything could be clearer. Thanks! > > Maybe strcpy and strncpy shouldn't even share man pages, since they're > not as related as we once thought? They don't (anymore): $ pwd /home/alx/src/linux/man-pages/man-pages/master $ git log --oneline -1 b8584be14 (HEAD -> master, korg/master, alx/main, main) bcmp.3: wfix $ grep -e '\.TH ' -e '\.so ' man3/strcpy.3 .TH strcpy 3 (date) "Linux man-pages (unreleased)" $ grep -e '\.TH ' -e '\.so ' man3/stpcpy.3 .so man3/strcpy.3 $ grep -e '\.TH ' -e '\.so ' man3/strncpy.3 .so man3/stpncpy.3 $ grep -e '\.TH ' -e '\.so ' man3/stpncpy.3 .TH stpncpy 3 (date) "Linux man-pages (unreleased)" The only shared page is string_copying(7), which attempts to clarify all of this. It was only in old versions of the Linux man-pages where they shared page. $ pwd /home/alx/src/linux/man-pages/man-pages/5/5.13 $ git log --oneline -1 091fbf1fe (HEAD, tag: man-pages-5.13) Ready for 5.13 $ grep -e '\.TH ' -e '\.so ' man3/strcpy.3 .TH STRCPY 3 2021-03-22 "GNU" "Linux Programmer's Manual" $ grep -e '\.TH ' -e '\.so ' man3/stpcpy.3 .TH STPCPY 3 2021-03-22 "GNU" "Linux Programmer's Manual" $ grep -e '\.TH ' -e '\.so ' man3/strncpy.3 .so man3/strcpy.3 $ grep -e '\.TH ' -e '\.so ' man3/stpncpy.3 .TH STPNCPY 3 2021-03-22 "GNU" "Linux Programmer's Manual" I've spent the last year working on shadow-utils' string handling code, while at the same time wrote string_copying(7) as a complete guide to *cpy() functions, detailing what they do and what they don't, and also rewrote all the pages for these functions with shorter reference guides that refer to string_copying(7) for more details. Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 20:13 ` Alejandro Colomar @ 2023-11-08 21:07 ` DJ Delorie 2023-11-08 21:50 ` Alejandro Colomar 0 siblings, 1 reply; 138+ messages in thread From: DJ Delorie @ 2023-11-08 21:07 UTC (permalink / raw) To: Alejandro Colomar; +Cc: libc-alpha, jg, linux-man Alejandro Colomar <alx@kernel.org> writes: > Would you mind reading the latest versions of strcpy(3), strncpy(3), and > string_copying(7), as in the git repository, and comment your thoughts? I think my examples would work well after the first CAVEATS paragaph: The name of these functions is confusing. These functions produce a null-padded character sequence, not a string (see string_copying(7)), like this: strncpy (buf, "1", 5) -> { '1', 0, 0, 0, 0 } strncpy (buf, "1234", 5) -> { '1', '2', '3', '4', 0 } strncpy (buf, "12345", 5) -> { '1', '2', '3', '4', '5' } strncpy (buf, "123456", 5) -> { '1', '2', '3', '4', '5' } > These functions copy the string pointed to by src into a null-padded > character sequence at the fixed-width buffer pointed to by dst. If the > destination buffer, limited by its size, isn't large enough to hold the > copy, the resulting character sequence is truncated. hmmm... perhaps These functions copy at most SZ bytes from SRC into a fixed-length buffer DST, padding any unwritten bytes in DST with NUL bytes. Specifically, if SRC has a NUL byte in the first SZ bytes, copying stops there and any remaining bytes in DST are filled with NUL bytes. If there are no NUL bytes in the first SZ bytes of SRC, SZ bytes are copied to DST. This avoids the term "string" completely and emphasises the not-string nature of the destination. stpncpy, strncpy - zero a fixed-width buffer and copy a string into a character sequence with truncation and zero the rest of it Or "fill a fixed-width zero-padded buffer with bytes from a string" That avoids saying "copy a string" string_copying.7: > For historic reasons, some standard APIs, such as utmpx(5), Perhaps "some standard APIs and file formats,, such as utmpx(5) or tar(1)," ? > however, those padding null bytes are not part of the character > sequence. add ", and may not be present if not needed." ? ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 21:07 ` DJ Delorie @ 2023-11-08 21:50 ` Alejandro Colomar 2023-11-08 22:17 ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-08 21:50 UTC (permalink / raw) To: DJ Delorie; +Cc: libc-alpha, jg, linux-man [-- Attachment #1: Type: text/plain, Size: 2856 bytes --] Hi DJ, On Wed, Nov 08, 2023 at 04:07:07PM -0500, DJ Delorie wrote: > Alejandro Colomar <alx@kernel.org> writes: > > Would you mind reading the latest versions of strcpy(3), strncpy(3), and > > string_copying(7), as in the git repository, and comment your thoughts? > > I think my examples would work well after the first CAVEATS paragaph: > > The name of these functions is confusing. These functions > produce a null-padded character sequence, not a string (see > string_copying(7)), like this: > > strncpy (buf, "1", 5) -> { '1', 0, 0, 0, 0 } > strncpy (buf, "1234", 5) -> { '1', '2', '3', '4', 0 } > strncpy (buf, "12345", 5) -> { '1', '2', '3', '4', '5' } > strncpy (buf, "123456", 5) -> { '1', '2', '3', '4', '5' } It fits perfectly there. And it also merges nicely with the paragraph below. > > > These functions copy the string pointed to by src into a null-padded > > character sequence at the fixed-width buffer pointed to by dst. If the > > destination buffer, limited by its size, isn't large enough to hold the > > copy, the resulting character sequence is truncated. > > hmmm... perhaps > > These functions copy at most SZ bytes from SRC into a fixed-length > buffer DST, padding any unwritten bytes in DST with NUL bytes. > Specifically, if SRC has a NUL byte in the first SZ bytes, copying > stops there and any remaining bytes in DST are filled with NUL bytes. > If there are no NUL bytes in the first SZ bytes of SRC, SZ bytes are > copied to DST. > > This avoids the term "string" completely and emphasises the not-string > nature of the destination. I don't like that, because it talks a lot about what the function does in terms of low-level copies of bytes. That may induce programmers to try to find an abstraction in terms of strings. > > stpncpy, strncpy - zero a fixed-width buffer and copy a string into a > character sequence with truncation and zero the rest of it > > Or "fill a fixed-width zero-padded buffer with bytes from a string" But this wording is perfect! I also used a similar wording for the description. I'll send a patch in a moment. > > That avoids saying "copy a string" Yep! > > string_copying.7: > > > For historic reasons, some standard APIs, such as utmpx(5), > > Perhaps "some standard APIs and file formats,, such as utmpx(5) or > tar(1)," ? Yes; thanks! > > > however, those padding null bytes are not part of the character > > sequence. > > add ", and may not be present if not needed." ? I'm not convinced about this one. "needed" is not the right word I think. For now, I'll add the other suggestions to a patch. Expect it in a moment. Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string 2023-11-08 21:50 ` Alejandro Colomar @ 2023-11-08 22:17 ` Alejandro Colomar 2023-11-08 23:06 ` Paul Eggert ` (3 more replies) 0 siblings, 4 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-08 22:17 UTC (permalink / raw) To: linux-man Cc: Alejandro Colomar, libc-alpha, DJ Delorie, Jonny Grant, Matthew House, Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell [-- Attachment #1: Type: text/plain, Size: 3837 bytes --] These copy *from* a string. But the destination is a simple character sequence within an array; not a string. Suggested-by: DJ Delorie <dj@redhat.com> Cc: Jonny Grant <jg@jguk.org> Cc: Matthew House <mattlloydhouse@gmail.com> Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com> Cc: Thorsten Kukuk <kukuk@suse.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Zack Weinberg <zack@owlfolio.org> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Signed-off-by: Alejandro Colomar <alx@kernel.org> --- Resending, including the mailing lists, which I forgot. man3/stpncpy.3 | 17 +++++++++++++---- man7/string_copying.7 | 20 ++++++++++---------- 2 files changed, 23 insertions(+), 14 deletions(-) diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 index b6bbfd0a3..f86ff8c29 100644 --- a/man3/stpncpy.3 +++ b/man3/stpncpy.3 @@ -6,9 +6,8 @@ .TH stpncpy 3 (date) "Linux man-pages (unreleased)" .SH NAME stpncpy, strncpy -\- zero a fixed-width buffer and -copy a string into a character sequence with truncation -and zero the rest of it +\- +fill a fixed-width null-padded buffer with bytes from a string .SH LIBRARY Standard C library .RI ( libc ", " \-lc ) @@ -37,7 +36,7 @@ .SH SYNOPSIS _GNU_SOURCE .fi .SH DESCRIPTION -These functions copy the string pointed to by +These functions copy bytes from the string pointed to by .I src into a null-padded character sequence at the fixed-width buffer pointed to by .IR dst . @@ -110,6 +109,16 @@ .SH CAVEATS These functions produce a null-padded character sequence, not a string (see .BR string_copying (7)). +For example: +.P +.in +4n +.EX +strncpy(buf, "1", 5); // { \[aq]1\[aq], 0, 0, 0, 0 } +strncpy(buf, "1234", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], 0 } +strncpy(buf, "12345", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] } +strncpy(buf, "123456", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] } +.EE +.in .P It's impossible to distinguish truncation by the result of the call, from a character sequence that just fits the destination buffer; diff --git a/man7/string_copying.7 b/man7/string_copying.7 index cadf1c539..0e179ba34 100644 --- a/man7/string_copying.7 +++ b/man7/string_copying.7 @@ -41,15 +41,11 @@ .SS Strings .\" ----- SYNOPSIS :: Null-padded character sequences --------/ .SS Null-padded character sequences .nf -// Zero a fixed-width buffer, and -// copy a string into a character sequence with truncation. -.BI "char *stpncpy(char " dst "[restrict ." sz "], \ +// Fill a fixed-width null-padded buffer with bytes from a string. +.BI "char *strncpy(char " dst "[restrict ." sz "], \ const char *restrict " src , .BI " size_t " sz ); -.P -// Zero a fixed-width buffer, and -// copy a string into a character sequence with truncation. -.BI "char *strncpy(char " dst "[restrict ." sz "], \ +.BI "char *stpncpy(char " dst "[restrict ." sz "], \ const char *restrict " src , .BI " size_t " sz ); .P @@ -240,14 +236,18 @@ .SS Truncate or not? .\" ----- DESCRIPTION :: Null-padded character sequences --------------/ .SS Null-padded character sequences For historic reasons, -some standard APIs, +some standard APIs and file formats, such as -.BR utmpx (5), +.BR utmpx (5) +and +.BR tar (1), use null-padded character sequences in fixed-width buffers. To interface with them, specialized functions need to be used. .P -To copy strings into them, use +To copy bytes from strings into these buffers, use +.BR strncpy (3) +or .BR stpncpy (3). .P To copy from an unterminated string within a fixed-width buffer into a string, -- 2.42.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 138+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string 2023-11-08 22:17 ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar @ 2023-11-08 23:06 ` Paul Eggert 2023-11-08 23:28 ` DJ Delorie ` (2 more replies) 2023-11-09 7:23 ` Oskari Pirhonen ` (2 subsequent siblings) 3 siblings, 3 replies; 138+ messages in thread From: Paul Eggert @ 2023-11-08 23:06 UTC (permalink / raw) To: Alejandro Colomar, linux-man Cc: libc-alpha, DJ Delorie, Jonny Grant, Matthew House, Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell On 11/8/23 14:17, Alejandro Colomar wrote: > These copy*from* a string Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string. By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string 2023-11-08 23:06 ` Paul Eggert @ 2023-11-08 23:28 ` DJ Delorie 2023-11-09 0:24 ` Alejandro Colomar 2023-11-09 14:11 ` Jonny Grant 2 siblings, 0 replies; 138+ messages in thread From: DJ Delorie @ 2023-11-08 23:28 UTC (permalink / raw) To: Paul Eggert Cc: alx, linux-man, libc-alpha, jg, mattlloydhouse, xxc3ncoredxx, kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos Paul Eggert <eggert@cs.ucla.edu> writes: > Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be > a string. But it will be treated as one, for the purposes of this function. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string 2023-11-08 23:06 ` Paul Eggert 2023-11-08 23:28 ` DJ Delorie @ 2023-11-09 0:24 ` Alejandro Colomar 2023-11-09 14:11 ` Jonny Grant 2 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 0:24 UTC (permalink / raw) To: Paul Eggert Cc: linux-man, libc-alpha, DJ Delorie, Jonny Grant, Matthew House, Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell [-- Attachment #1: Type: text/plain, Size: 2592 bytes --] Hi Paul, On Wed, Nov 08, 2023 at 03:06:40PM -0800, Paul Eggert wrote: > On 11/8/23 14:17, Alejandro Colomar wrote: > > These copy*from* a string > > Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a > string. Pedantically, true. But since it's quite rare to copy from a fixed-width null-padded array into another, I didn't want to waste space on that and possibly confuse readers. In such a case, the source buffer must be at least as large as the destination buffer, and will likely be the same size (because having fixed-width stuff, why make it different), so memcpy(3) will probably be simpler. > > By the way, have you looked at the recent (i.e., this-year) changes to the > glibc manual's string section? They're relevant. I hadn't; after your message, I have. <https://sourceware.org/glibc/manual/2.38/html_mono/libc.html#String-and-Array-Utilities> I like how it connects all the functions, and it explains the concepts and gives advice (e.g., avoid truncation as it's usually evil), and compares the different functions. However, I think it misses a few things: - strncpy(3) and strncat(3) are not related at all. They don't have the same relation that strcpy(3) and strcat(3) have. You can't write the following code in any case: strncpy(dst, foo, sizeof(dst)); strncat(dst, bar, sizeof(dst)); as you would with strcpy(3) or strlcpy(3). strncpy(3) and strncat(3) are opposite functions: the former reads from a string and writes to a fixed-width null-padded buffer, and the latter reads from a fixed-width buffer and writes to a string. (You can use them in other cases, pedantically, as you said above, but those cases are rather unreal.) - strncpy(3) is in a section that starts by saying: > The functions described in this section copy or concatenate the > possibly-truncated contents of a string or array to another This may mislead programmers to believe it is useful for producing strings, when it's not. In general, I would like the manual to put some more distance between these functions and the term "string". As DJ mentioned, it might be useful to mention utmp(5) and tar(1) as niche use cases for st[rp]ncpy(3). And now for some typo: - In the following sentence under "5.2 String and Array Conventions": > The array arguments and return values for these functions have type > void * or wchar_t. I believe it meant `void *` or `wchar_t *` Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string 2023-11-08 23:06 ` Paul Eggert 2023-11-08 23:28 ` DJ Delorie 2023-11-09 0:24 ` Alejandro Colomar @ 2023-11-09 14:11 ` Jonny Grant 2023-11-09 14:35 ` Alejandro Colomar 2 siblings, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-09 14:11 UTC (permalink / raw) To: Paul Eggert, Alejandro Colomar, linux-man Cc: libc-alpha, DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell On 08/11/2023 23:06, Paul Eggert wrote: > On 11/8/23 14:17, Alejandro Colomar wrote: >> These copy*from* a string > > Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string. > > By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant. That's a great reference page Paul, lots of useful information in the manual. https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html Re this man page: https://man7.org/linux/man-pages/man3/string.3.html Obsolete functions char *strncpy(char dest[restrict .n], const char src[restrict .n], size_t n); Copy at most n bytes from string src to dest, returning a pointer to the start of dest. It could clarify "Copy at most n bytes from string src to ARRAY dest, returning a pointer to the start of ARRAY dest." (caps for my emphasis in this email) Kind regards Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string 2023-11-09 14:11 ` Jonny Grant @ 2023-11-09 14:35 ` Alejandro Colomar 2023-11-09 14:47 ` Jonny Grant 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 14:35 UTC (permalink / raw) To: Jonny Grant Cc: Paul Eggert, linux-man, libc-alpha, DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell [-- Attachment #1: Type: text/plain, Size: 1456 bytes --] Hi Jonny, On Thu, Nov 09, 2023 at 02:11:14PM +0000, Jonny Grant wrote: > On 08/11/2023 23:06, Paul Eggert wrote: > > On 11/8/23 14:17, Alejandro Colomar wrote: > >> These copy*from* a string > > > > Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string. > > > > By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant. > > That's a great reference page Paul, lots of useful information in the manual. > https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html > > Re this man page: > > https://man7.org/linux/man-pages/man3/string.3.html > > Obsolete functions > char *strncpy(char dest[restrict .n], const char src[restrict .n], > size_t n); > Copy at most n bytes from string src to dest, returning a > pointer to the start of dest. Uh, I forgot about that page. I'll have a look at it and update it. At least, I need to remove that "Obsolete functions". > > > It could clarify > "Copy at most n bytes from string src to ARRAY dest, returning a > pointer to the start of ARRAY dest." I think I prefer DJ's suggestion: "Fill a fixed‐width null‐padded buffer with bytes from a string." Thanks! Alex > > (caps for my emphasis in this email) > > Kind regards > Jonny -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string 2023-11-09 14:35 ` Alejandro Colomar @ 2023-11-09 14:47 ` Jonny Grant 2023-11-09 15:02 ` Alejandro Colomar 0 siblings, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-09 14:47 UTC (permalink / raw) To: Alejandro Colomar Cc: Paul Eggert, linux-man, libc-alpha, DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell On 09/11/2023 14:35, Alejandro Colomar wrote: > Hi Jonny, > > On Thu, Nov 09, 2023 at 02:11:14PM +0000, Jonny Grant wrote: >> On 08/11/2023 23:06, Paul Eggert wrote: >>> On 11/8/23 14:17, Alejandro Colomar wrote: >>>> These copy*from* a string >>> >>> Not necessarily. For example, in strncpy (DST, SRC, N), SRC need not be a string. >>> >>> By the way, have you looked at the recent (i.e., this-year) changes to the glibc manual's string section? They're relevant. >> >> That's a great reference page Paul, lots of useful information in the manual. >> https://www.gnu.org/software/libc/manual/html_node/String-and-Array-Utilities.html >> >> Re this man page: >> >> https://man7.org/linux/man-pages/man3/string.3.html >> >> Obsolete functions >> char *strncpy(char dest[restrict .n], const char src[restrict .n], >> size_t n); >> Copy at most n bytes from string src to dest, returning a >> pointer to the start of dest. > > Uh, I forgot about that page. I'll have a look at it and update it. At > least, I need to remove that "Obsolete functions". > >> >> >> It could clarify >> "Copy at most n bytes from string src to ARRAY dest, returning a >> pointer to the start of ARRAY dest." > > I think I prefer DJ's suggestion: > > "Fill a fixed‐width null‐padded buffer with bytes from a string." Better to make it clear it's null-padded after? "Fill a fixed‐width buffer with bytes from a string and pad with null bytes." I'll leave it with you. Kind regards Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string 2023-11-09 14:47 ` Jonny Grant @ 2023-11-09 15:02 ` Alejandro Colomar 2023-11-09 17:30 ` DJ Delorie 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 15:02 UTC (permalink / raw) To: Jonny Grant Cc: Paul Eggert, linux-man, libc-alpha, DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell [-- Attachment #1: Type: text/plain, Size: 756 bytes --] On Thu, Nov 09, 2023 at 02:47:05PM +0000, Jonny Grant wrote: > >> It could clarify > >> "Copy at most n bytes from string src to ARRAY dest, returning a > >> pointer to the start of ARRAY dest." > > > > I think I prefer DJ's suggestion: > > > > "Fill a fixed‐width null‐padded buffer with bytes from a string." > > Better to make it clear it's null-padded after? > > "Fill a fixed‐width buffer with bytes from a string and pad with null bytes." Yes, that looks even better. And I wasn't very happy with "bytes". Maybe: "Fill a fixed-width buffer with characters from a string and pad with null bytes." Thanks, Alex > > I'll leave it with you. > > Kind regards > Jonny -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string 2023-11-09 15:02 ` Alejandro Colomar @ 2023-11-09 17:30 ` DJ Delorie 2023-11-09 17:54 ` Andreas Schwab ` (2 more replies) 0 siblings, 3 replies; 138+ messages in thread From: DJ Delorie @ 2023-11-09 17:30 UTC (permalink / raw) To: Alejandro Colomar Cc: jg, eggert, linux-man, libc-alpha, mattlloydhouse, xxc3ncoredxx, kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos Alejandro Colomar <alx@kernel.org> writes: > "Fill a fixed-width buffer with characters from a string and pad with > null bytes." The pedant in me says it should be NUL bytes (or NUL's), not null bytes. nul/NUL is a character, null/NULL is a pointer. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string 2023-11-09 17:30 ` DJ Delorie @ 2023-11-09 17:54 ` Andreas Schwab 2023-11-09 18:00 ` Alejandro Colomar 2023-11-09 19:42 ` Jonny Grant 2 siblings, 0 replies; 138+ messages in thread From: Andreas Schwab @ 2023-11-09 17:54 UTC (permalink / raw) To: DJ Delorie Cc: Alejandro Colomar, jg, eggert, linux-man, libc-alpha, mattlloydhouse, xxc3ncoredxx, kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos On Nov 09 2023, DJ Delorie wrote: > The pedant in me says it should be NUL bytes (or NUL's), not null bytes. > nul/NUL is a character, null/NULL is a pointer. NUL is the ASCII abbreviation for Null (see RFC 20). -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different." ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string 2023-11-09 17:30 ` DJ Delorie 2023-11-09 17:54 ` Andreas Schwab @ 2023-11-09 18:00 ` Alejandro Colomar 2023-11-09 19:42 ` Jonny Grant 2 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 18:00 UTC (permalink / raw) To: DJ Delorie Cc: jg, eggert, linux-man, libc-alpha, mattlloydhouse, xxc3ncoredxx, kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos [-- Attachment #1: Type: text/plain, Size: 2519 bytes --] Hi DJ, On Thu, Nov 09, 2023 at 12:30:17PM -0500, DJ Delorie wrote: > Alejandro Colomar <alx@kernel.org> writes: > > "Fill a fixed-width buffer with characters from a string and pad with > > null bytes." > > The pedant in me says it should be NUL bytes (or NUL's), not null bytes. > nul/NUL is a character, null/NULL is a pointer. Here's what man-pages(7) (written by Michael Kerrisk) says: NULL, NUL, null pointer, and null byte A null pointer is a pointer that points to nothing, and is nor‐ mally indicated by the constant NULL. On the other hand, NUL is the null byte, a byte with the value 0, represented in C via the character constant '\0'. The preferred term for the pointer is "null pointer" or simply "NULL"; avoid writing "NULL pointer". The preferred term for the byte is "null byte". Avoid writing "NUL", since it is too easily confused with "NULL". Avoid also the terms "zero byte" and "null character". The byte that termi‐ nates a C string should be described as "the terminating null byte"; strings may be described as "null‐terminated", but avoid the use of "NUL‐terminated". I don't necessarily agree with all of that, but mostly. I don't agree with not saying null character, because as well as we have the null wide character (L'\0'), using null character for '\0' makes it symmetric. Other than that, I mostly agree with Michael. Here's what I think of these terms: - NULL is a null pointer constant (as well as 0 is another null pointer constant). - A null pointer is a more generic term that includes a run-time null pointer as well. - The null byte is 0. - The null character, '\0', is composed of a null byte. - The null wide character, L'\0' is composed of several null bytes. - NUL is the ASCII name of the null byte, or maybe is it null character here? It's a bit muddy. I use null byte for padding, and null character for the string terminator, to make a stronger difference between strings and null-padded fixed-width arrays. I need to review string_copying(7) to make sure I was consistent in this regard. Colloquially, I find it fine to write NULL instead of null pointer (even for non-constant cases), and NUL instead of any of "null character", "null byte", or "null wide character", but for being precise, I prefer "null something". Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string 2023-11-09 17:30 ` DJ Delorie 2023-11-09 17:54 ` Andreas Schwab 2023-11-09 18:00 ` Alejandro Colomar @ 2023-11-09 19:42 ` Jonny Grant 2 siblings, 0 replies; 138+ messages in thread From: Jonny Grant @ 2023-11-09 19:42 UTC (permalink / raw) To: DJ Delorie, Alejandro Colomar Cc: eggert, linux-man, libc-alpha, mattlloydhouse, xxc3ncoredxx, kukuk, adhemerval.zanella, zack, g.branden.robinson, carlos On 09/11/2023 17:30, DJ Delorie wrote: > Alejandro Colomar <alx@kernel.org> writes: >> "Fill a fixed-width buffer with characters from a string and pad with >> null bytes." > > The pedant in me says it should be NUL bytes (or NUL's), not null bytes. > nul/NUL is a character, null/NULL is a pointer. > NUL would be a big improvement. Kind regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string 2023-11-08 22:17 ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar 2023-11-08 23:06 ` Paul Eggert @ 2023-11-09 7:23 ` Oskari Pirhonen 2023-11-09 15:20 ` [PATCH v2 1/2] " Alejandro Colomar 2023-11-09 15:20 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar 3 siblings, 0 replies; 138+ messages in thread From: Oskari Pirhonen @ 2023-11-09 7:23 UTC (permalink / raw) To: Alejandro Colomar Cc: linux-man, libc-alpha, DJ Delorie, Jonny Grant, Matthew House, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell [-- Attachment #1: Type: text/plain, Size: 4198 bytes --] On Wed, Nov 08, 2023 at 23:17:07 +0100, Alejandro Colomar wrote: > These copy *from* a string. But the destination is a simple character > sequence within an array; not a string. > > Suggested-by: DJ Delorie <dj@redhat.com> > Cc: Jonny Grant <jg@jguk.org> > Cc: Matthew House <mattlloydhouse@gmail.com> > Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com> > Cc: Thorsten Kukuk <kukuk@suse.com> > Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> > Cc: Zack Weinberg <zack@owlfolio.org> > Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com> > Cc: Carlos O'Donell <carlos@redhat.com> > Signed-off-by: Alejandro Colomar <alx@kernel.org> > --- I like the "with bytes from a string" wording. Good call. - Oskari > > Resending, including the mailing lists, which I forgot. > > man3/stpncpy.3 | 17 +++++++++++++---- > man7/string_copying.7 | 20 ++++++++++---------- > 2 files changed, 23 insertions(+), 14 deletions(-) > > diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 > index b6bbfd0a3..f86ff8c29 100644 > --- a/man3/stpncpy.3 > +++ b/man3/stpncpy.3 > @@ -6,9 +6,8 @@ > .TH stpncpy 3 (date) "Linux man-pages (unreleased)" > .SH NAME > stpncpy, strncpy > -\- zero a fixed-width buffer and > -copy a string into a character sequence with truncation > -and zero the rest of it > +\- > +fill a fixed-width null-padded buffer with bytes from a string > .SH LIBRARY > Standard C library > .RI ( libc ", " \-lc ) > @@ -37,7 +36,7 @@ .SH SYNOPSIS > _GNU_SOURCE > .fi > .SH DESCRIPTION > -These functions copy the string pointed to by > +These functions copy bytes from the string pointed to by > .I src > into a null-padded character sequence at the fixed-width buffer pointed to by > .IR dst . > @@ -110,6 +109,16 @@ .SH CAVEATS > These functions produce a null-padded character sequence, > not a string (see > .BR string_copying (7)). > +For example: > +.P > +.in +4n > +.EX > +strncpy(buf, "1", 5); // { \[aq]1\[aq], 0, 0, 0, 0 } > +strncpy(buf, "1234", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], 0 } > +strncpy(buf, "12345", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] } > +strncpy(buf, "123456", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] } > +.EE > +.in > .P > It's impossible to distinguish truncation by the result of the call, > from a character sequence that just fits the destination buffer; > diff --git a/man7/string_copying.7 b/man7/string_copying.7 > index cadf1c539..0e179ba34 100644 > --- a/man7/string_copying.7 > +++ b/man7/string_copying.7 > @@ -41,15 +41,11 @@ .SS Strings > .\" ----- SYNOPSIS :: Null-padded character sequences --------/ > .SS Null-padded character sequences > .nf > -// Zero a fixed-width buffer, and > -// copy a string into a character sequence with truncation. > -.BI "char *stpncpy(char " dst "[restrict ." sz "], \ > +// Fill a fixed-width null-padded buffer with bytes from a string. > +.BI "char *strncpy(char " dst "[restrict ." sz "], \ > const char *restrict " src , > .BI " size_t " sz ); > -.P > -// Zero a fixed-width buffer, and > -// copy a string into a character sequence with truncation. > -.BI "char *strncpy(char " dst "[restrict ." sz "], \ > +.BI "char *stpncpy(char " dst "[restrict ." sz "], \ > const char *restrict " src , > .BI " size_t " sz ); > .P > @@ -240,14 +236,18 @@ .SS Truncate or not? > .\" ----- DESCRIPTION :: Null-padded character sequences --------------/ > .SS Null-padded character sequences > For historic reasons, > -some standard APIs, > +some standard APIs and file formats, > such as > -.BR utmpx (5), > +.BR utmpx (5) > +and > +.BR tar (1), > use null-padded character sequences in fixed-width buffers. > To interface with them, > specialized functions need to be used. > .P > -To copy strings into them, use > +To copy bytes from strings into these buffers, use > +.BR strncpy (3) > +or > .BR stpncpy (3). > .P > To copy from an unterminated string within a fixed-width buffer into a string, > -- > 2.42.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* [PATCH v2 1/2] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string 2023-11-08 22:17 ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar 2023-11-08 23:06 ` Paul Eggert 2023-11-09 7:23 ` Oskari Pirhonen @ 2023-11-09 15:20 ` Alejandro Colomar 2023-11-09 15:20 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar 3 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 15:20 UTC (permalink / raw) To: linux-man Cc: Alejandro Colomar, libc-alpha, DJ Delorie, Oskari Pirhonen, Jonny Grant, Matthew House, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell, Paul Eggert, Xi Ruoyao These copy *from* a string. But the destination is a simple character sequence within an array; not a string. Suggested-by: DJ Delorie <dj@redhat.com> Acked-by: Oskari Pirhonen <xxc3ncoredxx@gmail.com> Cc: Jonny Grant <jg@jguk.org> Cc: Matthew House <mattlloydhouse@gmail.com> Cc: Thorsten Kukuk <kukuk@suse.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Zack Weinberg <zack@owlfolio.org> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Alejandro Colomar <alx@kernel.org> --- Patch 1/2 is just a resend, with more CCs. Patch 2/2 is a new one further clarifying the wording, after Jonny's suggestions. man3/stpncpy.3 | 17 +++++++++++++---- man7/string_copying.7 | 20 ++++++++++---------- 2 files changed, 23 insertions(+), 14 deletions(-) diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 index b6bbfd0a3..f86ff8c29 100644 --- a/man3/stpncpy.3 +++ b/man3/stpncpy.3 @@ -6,9 +6,8 @@ .TH stpncpy 3 (date) "Linux man-pages (unreleased)" .SH NAME stpncpy, strncpy -\- zero a fixed-width buffer and -copy a string into a character sequence with truncation -and zero the rest of it +\- +fill a fixed-width null-padded buffer with bytes from a string .SH LIBRARY Standard C library .RI ( libc ", " \-lc ) @@ -37,7 +36,7 @@ .SH SYNOPSIS _GNU_SOURCE .fi .SH DESCRIPTION -These functions copy the string pointed to by +These functions copy bytes from the string pointed to by .I src into a null-padded character sequence at the fixed-width buffer pointed to by .IR dst . @@ -110,6 +109,16 @@ .SH CAVEATS These functions produce a null-padded character sequence, not a string (see .BR string_copying (7)). +For example: +.P +.in +4n +.EX +strncpy(buf, "1", 5); // { \[aq]1\[aq], 0, 0, 0, 0 } +strncpy(buf, "1234", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], 0 } +strncpy(buf, "12345", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] } +strncpy(buf, "123456", 5); // { \[aq]1\[aq], \[aq]2\[aq], \[aq]3\[aq], \[aq]4\[aq], \[aq]5\[aq] } +.EE +.in .P It's impossible to distinguish truncation by the result of the call, from a character sequence that just fits the destination buffer; diff --git a/man7/string_copying.7 b/man7/string_copying.7 index cadf1c539..0e179ba34 100644 --- a/man7/string_copying.7 +++ b/man7/string_copying.7 @@ -41,15 +41,11 @@ .SS Strings .\" ----- SYNOPSIS :: Null-padded character sequences --------/ .SS Null-padded character sequences .nf -// Zero a fixed-width buffer, and -// copy a string into a character sequence with truncation. -.BI "char *stpncpy(char " dst "[restrict ." sz "], \ +// Fill a fixed-width null-padded buffer with bytes from a string. +.BI "char *strncpy(char " dst "[restrict ." sz "], \ const char *restrict " src , .BI " size_t " sz ); -.P -// Zero a fixed-width buffer, and -// copy a string into a character sequence with truncation. -.BI "char *strncpy(char " dst "[restrict ." sz "], \ +.BI "char *stpncpy(char " dst "[restrict ." sz "], \ const char *restrict " src , .BI " size_t " sz ); .P @@ -240,14 +236,18 @@ .SS Truncate or not? .\" ----- DESCRIPTION :: Null-padded character sequences --------------/ .SS Null-padded character sequences For historic reasons, -some standard APIs, +some standard APIs and file formats, such as -.BR utmpx (5), +.BR utmpx (5) +and +.BR tar (1), use null-padded character sequences in fixed-width buffers. To interface with them, specialized functions need to be used. .P -To copy strings into them, use +To copy bytes from strings into these buffers, use +.BR strncpy (3) +or .BR stpncpy (3). .P To copy from an unterminated string within a fixed-width buffer into a string, -- 2.42.0 ^ permalink raw reply related [flat|nested] 138+ messages in thread
* [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes 2023-11-08 22:17 ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar ` (2 preceding siblings ...) 2023-11-09 15:20 ` [PATCH v2 1/2] " Alejandro Colomar @ 2023-11-09 15:20 ` Alejandro Colomar 2023-11-10 5:47 ` Oskari Pirhonen 3 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 15:20 UTC (permalink / raw) To: linux-man Cc: Alejandro Colomar, libc-alpha, Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell, Paul Eggert, Xi Ruoyao The previous wording could be interpreted as if the nulls were already in place. Clarify that it's this function which pads with null bytes. Also, it copies "characters" from the src string. That's a bit more specific than copying "bytes", and makes it clearer that the terminating null byte in src is not part of the copy. Suggested-by: Jonny Grant <jg@jguk.org> Cc: DJ Delorie <dj@redhat.com> Cc: Jonny Grant <jg@jguk.org> Cc: Matthew House <mattlloydhouse@gmail.com> Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com> Cc: Thorsten Kukuk <kukuk@suse.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Zack Weinberg <zack@owlfolio.org> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Alejandro Colomar <alx@kernel.org> --- man3/stpncpy.3 | 10 ++++++---- man3/string.3 | 11 ++--------- man7/string_copying.7 | 3 ++- 3 files changed, 10 insertions(+), 14 deletions(-) diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 index f86ff8c29..3cf4eb371 100644 --- a/man3/stpncpy.3 +++ b/man3/stpncpy.3 @@ -7,7 +7,8 @@ .SH NAME stpncpy, strncpy \- -fill a fixed-width null-padded buffer with bytes from a string +fill a fixed-width buffer with characters from a string +and pad with null bytes .SH LIBRARY Standard C library .RI ( libc ", " \-lc ) @@ -36,10 +37,11 @@ .SH SYNOPSIS _GNU_SOURCE .fi .SH DESCRIPTION -These functions copy bytes from the string pointed to by +These functions copy characters from the string pointed to by .I src -into a null-padded character sequence at the fixed-width buffer pointed to by -.IR dst . +into a character sequence at the fixed-width buffer pointed to by +.IR dst , +and pad with null bytes. If the destination buffer, limited by its size, isn't large enough to hold the copy, diff --git a/man3/string.3 b/man3/string.3 index aba5efd2b..bd8b342a6 100644 --- a/man3/string.3 +++ b/man3/string.3 @@ -179,21 +179,14 @@ .SH SYNOPSIS .I n bytes to .IR dest . -.SS Obsolete functions .TP .nf .BI "char *strncpy(char " dest "[restrict ." n "], \ const char " src "[restrict ." n ], .BI " size_t " n ); .fi -Copy at most -.I n -bytes from string -.I src -to -.IR dest , -returning a pointer to the start of -.IR dest . +Fill a fixed‐width buffer with characters from a string +and pad with null bytes. .SH DESCRIPTION The string functions perform operations on null-terminated strings. diff --git a/man7/string_copying.7 b/man7/string_copying.7 index 0e179ba34..865271c6f 100644 --- a/man7/string_copying.7 +++ b/man7/string_copying.7 @@ -41,7 +41,8 @@ .SS Strings .\" ----- SYNOPSIS :: Null-padded character sequences --------/ .SS Null-padded character sequences .nf -// Fill a fixed-width null-padded buffer with bytes from a string. +// Fill a fixed-width buffer with characters from a string +// and pad with null bytes. .BI "char *strncpy(char " dst "[restrict ." sz "], \ const char *restrict " src , .BI " size_t " sz ); -- 2.42.0 ^ permalink raw reply related [flat|nested] 138+ messages in thread
* Re: [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes 2023-11-09 15:20 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar @ 2023-11-10 5:47 ` Oskari Pirhonen 2023-11-10 10:47 ` Alejandro Colomar 0 siblings, 1 reply; 138+ messages in thread From: Oskari Pirhonen @ 2023-11-10 5:47 UTC (permalink / raw) To: Alejandro Colomar Cc: linux-man, libc-alpha, Jonny Grant, DJ Delorie, Matthew House, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell, Paul Eggert, Xi Ruoyao [-- Attachment #1: Type: text/plain, Size: 1941 bytes --] On Thu, Nov 09, 2023 at 16:20:39 +0100, Alejandro Colomar wrote: > The previous wording could be interpreted as if the nulls were already > in place. Clarify that it's this function which pads with null bytes. > > Also, it copies "characters" from the src string. That's a bit more > specific than copying "bytes", and makes it clearer that the terminating > null byte in src is not part of the copy. > > Suggested-by: Jonny Grant <jg@jguk.org> > Cc: DJ Delorie <dj@redhat.com> > Cc: Jonny Grant <jg@jguk.org> > Cc: Matthew House <mattlloydhouse@gmail.com> > Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com> > Cc: Thorsten Kukuk <kukuk@suse.com> > Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> > Cc: Zack Weinberg <zack@owlfolio.org> > Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com> > Cc: Carlos O'Donell <carlos@redhat.com> > Cc: Paul Eggert <eggert@cs.ucla.edu> > Cc: Xi Ruoyao <xry111@xry111.site> > Signed-off-by: Alejandro Colomar <alx@kernel.org> > --- > man3/stpncpy.3 | 10 ++++++---- > man3/string.3 | 11 ++--------- > man7/string_copying.7 | 3 ++- > 3 files changed, 10 insertions(+), 14 deletions(-) > ... snip ... > diff --git a/man3/string.3 b/man3/string.3 > index aba5efd2b..bd8b342a6 100644 > --- a/man3/string.3 > +++ b/man3/string.3 > @@ -179,21 +179,14 @@ .SH SYNOPSIS > .I n > bytes to > .IR dest . > -.SS Obsolete functions If you're removing this section ... > .TP > .nf > .BI "char *strncpy(char " dest "[restrict ." n "], \ > const char " src "[restrict ." n ], > .BI " size_t " n ); > .fi > -Copy at most > -.I n > -bytes from string > -.I src > -to > -.IR dest , > -returning a pointer to the start of > -.IR dest . > +Fill a fixed‐width buffer with characters from a string > +and pad with null bytes. ... shouldn't you also move the rest of this up to keep it alphabetized? - Oskari [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes 2023-11-10 5:47 ` Oskari Pirhonen @ 2023-11-10 10:47 ` Alejandro Colomar 0 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-10 10:47 UTC (permalink / raw) To: linux-man, libc-alpha, Jonny Grant, DJ Delorie, Matthew House, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell, Paul Eggert, Xi Ruoyao [-- Attachment #1: Type: text/plain, Size: 2310 bytes --] On Thu, Nov 09, 2023 at 11:47:34PM -0600, Oskari Pirhonen wrote: > On Thu, Nov 09, 2023 at 16:20:39 +0100, Alejandro Colomar wrote: > > The previous wording could be interpreted as if the nulls were already > > in place. Clarify that it's this function which pads with null bytes. > > > > Also, it copies "characters" from the src string. That's a bit more > > specific than copying "bytes", and makes it clearer that the terminating > > null byte in src is not part of the copy. > > > > Suggested-by: Jonny Grant <jg@jguk.org> > > Cc: DJ Delorie <dj@redhat.com> > > Cc: Jonny Grant <jg@jguk.org> > > Cc: Matthew House <mattlloydhouse@gmail.com> > > Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com> > > Cc: Thorsten Kukuk <kukuk@suse.com> > > Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> > > Cc: Zack Weinberg <zack@owlfolio.org> > > Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com> > > Cc: Carlos O'Donell <carlos@redhat.com> > > Cc: Paul Eggert <eggert@cs.ucla.edu> > > Cc: Xi Ruoyao <xry111@xry111.site> > > Signed-off-by: Alejandro Colomar <alx@kernel.org> > > --- > > man3/stpncpy.3 | 10 ++++++---- > > man3/string.3 | 11 ++--------- > > man7/string_copying.7 | 3 ++- > > 3 files changed, 10 insertions(+), 14 deletions(-) > > > > ... snip ... > > > diff --git a/man3/string.3 b/man3/string.3 > > index aba5efd2b..bd8b342a6 100644 > > --- a/man3/string.3 > > +++ b/man3/string.3 > > @@ -179,21 +179,14 @@ .SH SYNOPSIS > > .I n > > bytes to > > .IR dest . > > -.SS Obsolete functions > > If you're removing this section ... > > > .TP > > .nf > > .BI "char *strncpy(char " dest "[restrict ." n "], \ > > const char " src "[restrict ." n ], > > .BI " size_t " n ); > > .fi > > -Copy at most > > -.I n > > -bytes from string > > -.I src > > -to > > -.IR dest , > > -returning a pointer to the start of > > -.IR dest . > > +Fill a fixed‐width buffer with characters from a string > > +and pad with null bytes. > > ... shouldn't you also move the rest of this up to keep it alphabetized? Hi Oskari, Sure! I was trying to find a pattern in the order, but didn't see it yesterday. Thanks! :) Cheers, Alex > > - Oskari -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-07 13:23 ` Alejandro Colomar 2023-11-07 14:19 ` Jonny Grant @ 2023-11-08 2:12 ` Matthew House 2023-11-08 19:33 ` Alejandro Colomar 1 sibling, 1 reply; 138+ messages in thread From: Matthew House @ 2023-11-08 2:12 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Jonny Grant, linux-man On Tue, Nov 7, 2023 at 8:21 AM Alejandro Colomar <alx@kernel.org> wrote: > On Tue, Nov 07, 2023 at 11:52:44AM +0000, Jonny Grant wrote: > > We see things differently, I'm on the C standard side on this one. Would any information change your mind? > > It's difficult to say, but I doubt it. But let me ask you something: > In what cases would you find strncpy(3) appropriate to use, and why? > Maybe if I understand that it helps. > > Kind regards, > Alex Man pages aren't read only by people writing new code, but also by people reading and modifying existing code. And despite your preferences regarding which functions ought to be used to produce strings, it's a widespread (and correct) practice to produce a string from the character sequence created by strncpy(3). There are two ways of doing this, either by setting the last character of the destination buffer to null if you want to produce a truncated string, or by testing the last character against zero if you want to detect truncation and raise an error. I'm not aware of any alternative to a strncpy(3)-based snippet for producing a possibly-truncated copy of a string, except for your preferred strlcpy(3) or stpecpy(3), which aren't available to anyone without a brand-new glibc (nor, by extension, any applications or libraries that want to support people without a brand-new glibc, nor any libraries that want to support other platforms like Windows with only ISO C and POSIX-ish functions); snprintf(3), which has the insidious flaw of not supporting more than INT_MAX characters on pain of UB, and also produces a warning if the compiler notices the possible truncation; or strlen(3) + min() + memcpy(3) + manually adding a null terminator, which is certainly more explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if that poses a performance problem, but similarly opens up room for off-by-one errors. For the sake of reference, I looked into a few big C and C++ projects to see how often a strncpy(3)-based snippet was used to produce a truncated copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22, 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note that I haven't filtered out vendored dependencies, so there's a little bit of double-counting.) It seems like most codebases that don't ban strncpy(3) use a derived snippet somewhere or another. Also, I found 3 instances in glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by checking the last character. So these two snippets really are widespread, especially among the long tail of smaller C and C++ applications and libraries that don't perform enough string manipulation that it warrants creating a custom set of more- foolproof wrapper functions (at least, in the opinion of their authors). Thus, since they're not going away, it would be useful for anyone reading the code to understand the concept behind how these two snippets work, that the only difference between the strncpy(3)'s special "character sequence" and an ordinary C string is an additional null terminator at the end of the destination buffer. In other words, strncpy(3) doesn't create a truncated string, but it creates something which can be easily turned into to a truncated string, and that's its most relevant quality for most of its uses in existing code. Further, apart from snprintf(3), there's no other portable way to produce a truncated string without manual arithmetic. Thus, I'd also find it reasonable to highlight precisely why strncpy(3)'s output isn't a string (viz., the lack of a null terminator), instead of trying to insist that its output is worlds apart from anything string-related, especially given the volume of existing correct code that belies that notion. Or, to answer your question, "It's appropriate to keep using strncpy(3) in existing code where it's currently used as part of creating a truncated string, and it's not especially inappropriate to use strncpy(3) in new code as part of creating a truncated string, if the code must support platforms without strlcpy(3) or similar, and if the resulting snippets are few enough and well-commented enough that they create less mental load than creating and maintaining a custom helper function." (As an aside, I find the remark in the man page that "It's impossible to distinguish truncation by the result of the call" extremely misleading at best, since truncation can easily be distinguished by inspecting the last output character.) Thank you, Matthew House ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 2:12 ` strncpy clarify result may not be null terminated Matthew House @ 2023-11-08 19:33 ` Alejandro Colomar 2023-11-08 19:40 ` Alejandro Colomar ` (2 more replies) 0 siblings, 3 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-08 19:33 UTC (permalink / raw) To: Matthew House; +Cc: Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 8926 bytes --] Hi Matthew, On Tue, Nov 07, 2023 at 09:12:37PM -0500, Matthew House wrote: > On Tue, Nov 7, 2023 at 8:21 AM Alejandro Colomar <alx@kernel.org> wrote: > > On Tue, Nov 07, 2023 at 11:52:44AM +0000, Jonny Grant wrote: > > > We see things differently, I'm on the C standard side on this one. Would any information change your mind? > > > > It's difficult to say, but I doubt it. But let me ask you something: > > In what cases would you find strncpy(3) appropriate to use, and why? > > Maybe if I understand that it helps. > > > > Kind regards, > > Alex > > Man pages aren't read only by people writing new code, but also by people > reading and modifying existing code. And despite your preferences regarding > which functions ought to be used to produce strings, it's a widespread (and > correct) practice to produce a string from the character sequence created > by strncpy(3). There are two ways of doing this, either by setting the last > character of the destination buffer to null if you want to produce a > truncated string, or by testing the last character against zero if you want > to detect truncation and raise an error. It is not strncpy(3) who truncated, but the programmer by adding a NULL in buff[BUFSIZ - 1]. In the following snippet, strncpy(3) will not truncate: char cs[3]; strncpy(cs, "foo", 3); And yet your code doing if (cs[2] != '\0') { goto error; } would think it did. That's because you deformed strncpy(3) to implement a poor man's strlcpy(3). char cs[3]; strncpy(cs, "foo", 3); cs[2] = '\0'; // The truncation is here, not in strncpy(3). > I'm not aware of any alternative to a strncpy(3)-based snippet for > producing a possibly-truncated copy of a string, except for your preferred > strlcpy(3) or stpecpy(3), which aren't available to anyone without a The Linux kernel has strscpy(3), which is also good, but is not available to user space. > brand-new glibc (nor, by extension, any applications or libraries that want libbsd has provided strlcpy(3) since basically forever. It is a very portable library. You don't need a brand-new glibc for having strlcpy(3). <https://libbsd.freedesktop.org/wiki/> > to support people without a brand-new glibc, nor any libraries that want to > support other platforms like Windows with only ISO C and POSIX-ish If you program for Windows, it depends. If you have POSIX available, you may be able to port libbsd; I don't know. In any case, I don't case about Windows enough. You could always write your own string- copying function for Windows. > functions); snprintf(3), which has the insidious flaw of not supporting > more than INT_MAX characters on pain of UB, and also produces a warning if > the compiler notices the possible truncation; or strlen(3) + min() + > memcpy(3) + manually adding a null terminator, which is certainly more > explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if > that poses a performance problem, but similarly opens up room for > off-by-one errors. More than the performance problem, I'm more worried about the maintainability of strncpy(3). When 20 years from now, a programmer reading a piece of code full of strncpy(3) wants to migrate to a sane function like strlcpy(3) or strcpy(3), the programmer needs to understand if the zeroing was purposeful or just accidental. Because by using strlcpy(3), it may start leaking some trailing data if the trailing of the buffer is meaningful to some program. > > For the sake of reference, I looked into a few big C and C++ projects to > see how often a strncpy(3)-based snippet was used to produce a truncated > copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite > of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in > GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22, > 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note > that I haven't filtered out vendored dependencies, so there's a little bit > of double-counting.) It seems like most codebases that don't ban strncpy(3) > use a derived snippet somewhere or another. Also, I found 3 instances in > glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by > checking the last character. I know. I've been rewriting the code handling strings in shadow-utils for the last year, and ther was a lot of it. I fixed several small bugs in the process, so I recommend avoiding it. > > So these two snippets really are widespread, especially among the long tail > of smaller C and C++ applications and libraries that don't perform enough > string manipulation that it warrants creating a custom set of more- > foolproof wrapper functions (at least, in the opinion of their authors). > Thus, since they're not going away, it would be useful for anyone reading > the code to understand the concept behind how these two snippets work, that > the only difference between the strncpy(3)'s special "character sequence" > and an ordinary C string is an additional null terminator at the end of the > destination buffer. This is part of string_copying(7): DESCRIPTION Terms (and abbreviations) string (str) is a sequence of zero or more non‐null characters followed by a null byte. character sequence is a sequence of zero or more non‐null characters. A program should never use a character sequence where a string is required. However, with appropriate care, a string can be used in the place of a character sequence. I think that is very explicit in the difference. strncpy(3) refers to that page for understanding the differences, so I think it is documented. strncpy(3): CAVEATS The name of these functions is confusing. These functions produce a null‐padded character sequence, not a string (see string_copying(7)). > > In other words, strncpy(3) doesn't create a truncated string, but it > creates something which can be easily turned into to a truncated string, > and that's its most relevant quality for most of its uses in existing code. > Further, apart from snprintf(3), there's no other portable way to produce a > truncated string without manual arithmetic. Thus, I'd also find it Portable is relative. With libbsd, you can port to most POSIX systems. Windows is another story. > reasonable to highlight precisely why strncpy(3)'s output isn't a string How about this?: diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 index d4c2ce83d..c80c8b640 100644 --- a/man3/stpncpy.3 +++ b/man3/stpncpy.3 @@ -108,7 +108,10 @@ .SH HISTORY .SH CAVEATS The name of these functions is confusing. These functions produce a null-padded character sequence, -not a string (see +not a string. +While strings have a terminating NUL byte, +character sequences do not have any terminating byte +(see .BR string_copying (7)). .P It's impossible to distinguish truncation by the result of the call, > (viz., the lack of a null terminator), instead of trying to insist that its > output is worlds apart from anything string-related, especially given the > volume of existing correct code that belies that notion. It is not correct code. That code is doing extra work which confuses maintainers. It is a lot like writing dead code, since you're writing zeros that nobody is reading, which confuses maintainers. Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no, it's not correct code. It's rather dangerous code that just happens to not be vulnerable most of the time. > > Or, to answer your question, "It's appropriate to keep using strncpy(3) in > existing code where it's currently used as part of creating a truncated > string, and it's not especially inappropriate to use strncpy(3) in new code > as part of creating a truncated string, if the code must support platforms > without strlcpy(3) or similar, and if the resulting snippets are few enough > and well-commented enough that they create less mental load than creating > and maintaining a custom helper function." strncpy(3) calls are never well documented. Do you add a comment in each such call saying "this zeroing is superfluous"? Probably not. > > (As an aside, I find the remark in the man page that "It's impossible to > distinguish truncation by the result of the call" extremely misleading at > best, since truncation can easily be distinguished by inspecting the last > output character.) Again, strncpy(3)'s truncation is impossible to detect. What you can detect is that your construct that resembles strlcpy(3) truncates, which is a different thing. Thanks, Alex > > Thank you, > Matthew House -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 19:33 ` Alejandro Colomar @ 2023-11-08 19:40 ` Alejandro Colomar 2023-11-09 3:13 ` Matthew House 2023-11-10 10:40 ` strncpy clarify result may not be null terminated Stefan Puiu 2 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-08 19:40 UTC (permalink / raw) To: Matthew House; +Cc: Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 9484 bytes --] On Wed, Nov 08, 2023 at 08:33:34PM +0100, Alejandro Colomar wrote: > Hi Matthew, > > On Tue, Nov 07, 2023 at 09:12:37PM -0500, Matthew House wrote: > > On Tue, Nov 7, 2023 at 8:21 AM Alejandro Colomar <alx@kernel.org> wrote: > > > On Tue, Nov 07, 2023 at 11:52:44AM +0000, Jonny Grant wrote: > > > > We see things differently, I'm on the C standard side on this one. Would any information change your mind? > > > > > > It's difficult to say, but I doubt it. But let me ask you something: > > > In what cases would you find strncpy(3) appropriate to use, and why? > > > Maybe if I understand that it helps. > > > > > > Kind regards, > > > Alex > > > > Man pages aren't read only by people writing new code, but also by people > > reading and modifying existing code. And despite your preferences regarding > > which functions ought to be used to produce strings, it's a widespread (and > > correct) practice to produce a string from the character sequence created > > by strncpy(3). There are two ways of doing this, either by setting the last > > character of the destination buffer to null if you want to produce a > > truncated string, or by testing the last character against zero if you want > > to detect truncation and raise an error. > > It is not strncpy(3) who truncated, but the programmer by adding a NULL Oops. s/NULL/NUL/ > in buff[BUFSIZ - 1]. In the following snippet, strncpy(3) will not > truncate: > > char cs[3]; > > strncpy(cs, "foo", 3); > > And yet your code doing if (cs[2] != '\0') { goto error; } would think > it did. That's because you deformed strncpy(3) to implement a poor > man's strlcpy(3). > > > char cs[3]; > > strncpy(cs, "foo", 3); > cs[2] = '\0'; // The truncation is here, not in strncpy(3). > > > I'm not aware of any alternative to a strncpy(3)-based snippet for > > producing a possibly-truncated copy of a string, except for your preferred > > strlcpy(3) or stpecpy(3), which aren't available to anyone without a > > The Linux kernel has strscpy(3), which is also good, but is not > available to user space. > > > brand-new glibc (nor, by extension, any applications or libraries that want > > libbsd has provided strlcpy(3) since basically forever. It is a very > portable library. You don't need a brand-new glibc for having > strlcpy(3). > > <https://libbsd.freedesktop.org/wiki/> > > > to support people without a brand-new glibc, nor any libraries that want to > > support other platforms like Windows with only ISO C and POSIX-ish > > If you program for Windows, it depends. If you have POSIX available, > you may be able to port libbsd; I don't know. In any case, I don't > case about Windows enough. You could always write your own string- > copying function for Windows. > > > functions); snprintf(3), which has the insidious flaw of not supporting > > more than INT_MAX characters on pain of UB, and also produces a warning if > > the compiler notices the possible truncation; or strlen(3) + min() + > > memcpy(3) + manually adding a null terminator, which is certainly more > > explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if > > that poses a performance problem, but similarly opens up room for > > off-by-one errors. > > More than the performance problem, I'm more worried about the > maintainability of strncpy(3). When 20 years from now, a programmer > reading a piece of code full of strncpy(3) wants to migrate to a sane > function like strlcpy(3) or strcpy(3), the programmer needs to > understand if the zeroing was purposeful or just accidental. Because > by using strlcpy(3), it may start leaking some trailing data if the > trailing of the buffer is meaningful to some program. > > > > > For the sake of reference, I looked into a few big C and C++ projects to > > see how often a strncpy(3)-based snippet was used to produce a truncated > > copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite > > of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in > > GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22, > > 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note > > that I haven't filtered out vendored dependencies, so there's a little bit > > of double-counting.) It seems like most codebases that don't ban strncpy(3) > > use a derived snippet somewhere or another. Also, I found 3 instances in > > glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by > > checking the last character. > > I know. I've been rewriting the code handling strings in shadow-utils > for the last year, and ther was a lot of it. I fixed several small bugs > in the process, so I recommend avoiding it. > > > > > So these two snippets really are widespread, especially among the long tail > > of smaller C and C++ applications and libraries that don't perform enough > > string manipulation that it warrants creating a custom set of more- > > foolproof wrapper functions (at least, in the opinion of their authors). > > > > > Thus, since they're not going away, it would be useful for anyone reading > > the code to understand the concept behind how these two snippets work, that > > the only difference between the strncpy(3)'s special "character sequence" > > and an ordinary C string is an additional null terminator at the end of the > > destination buffer. > > This is part of string_copying(7): > > DESCRIPTION > Terms (and abbreviations) > string (str) > is a sequence of zero or more non‐null characters followed by a > null byte. > > character sequence > is a sequence of zero or more non‐null characters. A program > should never use a character sequence where a string is required. > However, with appropriate care, a string can be used in the place > of a character sequence. > > I think that is very explicit in the difference. strncpy(3) refers to > that page for understanding the differences, so I think it is > documented. > > strncpy(3): > CAVEATS > The name of these functions is confusing. These functions produce a > null‐padded character sequence, not a string (see string_copying(7)). > > > > > In other words, strncpy(3) doesn't create a truncated string, but it > > creates something which can be easily turned into to a truncated string, > > and that's its most relevant quality for most of its uses in existing code. > > Further, apart from snprintf(3), there's no other portable way to produce a > > truncated string without manual arithmetic. Thus, I'd also find it > > Portable is relative. With libbsd, you can port to most POSIX systems. > Windows is another story. > > > reasonable to highlight precisely why strncpy(3)'s output isn't a string > > How about this?: > > diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 > index d4c2ce83d..c80c8b640 100644 > --- a/man3/stpncpy.3 > +++ b/man3/stpncpy.3 > @@ -108,7 +108,10 @@ .SH HISTORY > .SH CAVEATS > The name of these functions is confusing. > These functions produce a null-padded character sequence, > -not a string (see > +not a string. > +While strings have a terminating NUL byte, > +character sequences do not have any terminating byte > +(see > .BR string_copying (7)). > .P > It's impossible to distinguish truncation by the result of the call, > > > > (viz., the lack of a null terminator), instead of trying to insist that its > > output is worlds apart from anything string-related, especially given the > > volume of existing correct code that belies that notion. > > It is not correct code. That code is doing extra work which confuses > maintainers. It is a lot like writing dead code, since you're writing > zeros that nobody is reading, which confuses maintainers. > > Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no, > it's not correct code. It's rather dangerous code that just happens to > not be vulnerable most of the time. > > > > > Or, to answer your question, "It's appropriate to keep using strncpy(3) in > > existing code where it's currently used as part of creating a truncated > > string, and it's not especially inappropriate to use strncpy(3) in new code > > as part of creating a truncated string, if the code must support platforms > > without strlcpy(3) or similar, and if the resulting snippets are few enough > > and well-commented enough that they create less mental load than creating > > and maintaining a custom helper function." > > strncpy(3) calls are never well documented. Do you add a comment in > each such call saying "this zeroing is superfluous"? Probably not. > > > > > (As an aside, I find the remark in the man page that "It's impossible to > > distinguish truncation by the result of the call" extremely misleading at > > best, since truncation can easily be distinguished by inspecting the last > > output character.) > > Again, strncpy(3)'s truncation is impossible to detect. What you can > detect is that your construct that resembles strlcpy(3) truncates, which > is a different thing. > > Thanks, > Alex > > > > > Thank you, > > Matthew House > > -- > <https://www.alejandro-colomar.es/> -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 19:33 ` Alejandro Colomar 2023-11-08 19:40 ` Alejandro Colomar @ 2023-11-09 3:13 ` Matthew House 2023-11-09 10:26 ` Jonny Grant ` (2 more replies) 2023-11-10 10:40 ` strncpy clarify result may not be null terminated Stefan Puiu 2 siblings, 3 replies; 138+ messages in thread From: Matthew House @ 2023-11-09 3:13 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Jonny Grant, linux-man On Wed, Nov 8, 2023 at 2:33 PM Alejandro Colomar <alx@kernel.org> wrote: > On Tue, Nov 07, 2023 at 09:12:37PM -0500, Matthew House wrote: > > Man pages aren't read only by people writing new code, but also by people > > reading and modifying existing code. And despite your preferences regarding > > which functions ought to be used to produce strings, it's a widespread (and > > correct) practice to produce a string from the character sequence created > > by strncpy(3). There are two ways of doing this, either by setting the last > > character of the destination buffer to null if you want to produce a > > truncated string, or by testing the last character against zero if you want > > to detect truncation and raise an error. > > It is not strncpy(3) who truncated, but the programmer by adding a NULL > in buff[BUFSIZ - 1]. In the following snippet, strncpy(3) will not > truncate: > > char cs[3]; > > strncpy(cs, "foo", 3); > > And yet your code doing if (cs[2] != '\0') { goto error; } would think > it did. That's because you deformed strncpy(3) to implement a poor > man's strlcpy(3). > > char cs[3]; > > strncpy(cs, "foo", 3); > cs[2] = '\0'; // The truncation is here, not in strncpy(3). That's indeed a self-consistent interpretation of strncpy(3)'s function, but I don't think it's borne out by its formal definition, which I was basing my reasoning on. The current Linux man page for strncpy(3) says, These functions copy the string pointed to by src into a null-padded character sequence at the fixed-width buffer pointed to by dst. If the destination buffer, limited by its size, isn't large enough to hold the copy, the resulting character sequence is truncated. Notice how it "copies the string": as your string_copying(7) says, a string includes both a character sequence and a final null byte. So I'd ordinarily read this definition as saying that strncpy(3) tries to copy src up to and including the null byte, but produces a truncated copy of the whole string if the destination buffer is too small. Thus, even if the destination buffer contains all non-null characters in the original string, then the copy has still been "truncated" in this sense. The ISO C definition, and by extension, the POSIX definition, make this interpretation even more explicit: The strncpy function copies not more than n characters (characters that follow a null character are not copied) from the array pointed to by s2 to the array pointed to by s1. That is, the terminating null byte is part of the copy, but not anything after the terminating null byte. So one can interpret strncpy(3) as copying a prefix of a character sequence into a buffer (and zero-filling the remainder), in which case you're correct that truncation cannot be detected. But the function is fomally defined as copying a prefix of a string into a buffer (and zero-filling the remainder), in which case the string has been truncated if the buffer doesn't end in a null byte afterward. It's just that one may not care about the terminating null byte being truncated if the user of the result just wants the initial character sequence. > > I'm not aware of any alternative to a strncpy(3)-based snippet for > > producing a possibly-truncated copy of a string, except for your preferred > > strlcpy(3) or stpecpy(3), which aren't available to anyone without a > > The Linux kernel has strscpy(3), which is also good, but is not > available to user space. > > > brand-new glibc (nor, by extension, any applications or libraries that want > > libbsd has provided strlcpy(3) since basically forever. It is a very > portable library. You don't need a brand-new glibc for having > strlcpy(3). > > <https://libbsd.freedesktop.org/wiki/> That's a nice library that I didn't know about! Unfortunately, I don't think it's a very viable option for the long tail of small libraries I've referred to, which generally don't have any sub-dependencies of their own, apart from those provided by the platform. Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their configure scripts from whatever build system you're using (in such a way that libbsd can locate libmd), ensuring they're safe for cross-compilation if that's a goal, ensuring you bundle them in a way that respects their license terms, and ensuring that any user of your library links to the two dependencies and doesn't duplicate them. At that point, rolling your own strlcpy(3) equivalent definitely sounds like less mental load, at least to me. > > functions); snprintf(3), which has the insidious flaw of not supporting > > more than INT_MAX characters on pain of UB, and also produces a warning if > > the compiler notices the possible truncation; or strlen(3) + min() + > > memcpy(3) + manually adding a null terminator, which is certainly more > > explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if > > that poses a performance problem, but similarly opens up room for > > off-by-one errors. > > More than the performance problem, I'm more worried about the > maintainability of strncpy(3). When 20 years from now, a programmer > reading a piece of code full of strncpy(3) wants to migrate to a sane > function like strlcpy(3) or strcpy(3), the programmer needs to > understand if the zeroing was purposeful or just accidental. Because > by using strlcpy(3), it may start leaking some trailing data if the > trailing of the buffer is meaningful to some program. I didn't see this as an issue in practice when I was reviewing all those existing usages of strncpy(3). The vast majority were used in the midst of simple string manipulation, where the destination buffer starts as uninitialized or zeroed out, and ultimately gets passed into a user expecting an ordinary null-terminated string. (One exception was a few functions that used strncpy(dst, "", len) to zero out the buffer, which is thankfully pretty obvious. Another exception was the functions that actually used strncpy(3) to produce a null-padded character sequence, e.g., when writing a value into a section of a binary. But in general, I found that it's usually not difficult to tell when a usage is being clever enough that the null padding might be significant.) In fact, the greater confusion came from the surprisingly common practice of using strncpy(3) like it's memcpy(3), by giving it the known length of the source string, or of some prefix computed through strchr(3) or similar. This is often then followed up by strncat(3) or similar, indicating that the writer clearly expects the full length to have non-null characters. But if the length computation is separated far enough from the actual call to strncpy(3), then it can become unclear whether the source is actually expected to have any interior null bytes before the computed length. (So if a list of alternatives to strncpy(3) is ever drawn up, then I'd suggest that ordinary memcpy(3) be one of them.) > > For the sake of reference, I looked into a few big C and C++ projects to > > see how often a strncpy(3)-based snippet was used to produce a truncated > > copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite > > of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in > > GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22, > > 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note > > that I haven't filtered out vendored dependencies, so there's a little bit > > of double-counting.) It seems like most codebases that don't ban strncpy(3) > > use a derived snippet somewhere or another. Also, I found 3 instances in > > glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by > > checking the last character. > > I know. I've been rewriting the code handling strings in shadow-utils > for the last year, and ther was a lot of it. I fixed several small bugs > in the process, so I recommend avoiding it. I can't tell you about your own experience, but in mine, the root cause of most string-handling bugs has been excessive cleverness in using the standard string functions, rather than the behavior of the functions themselves. So one worry of mine is that if strncpy(3) ends up being deprecated or whatever, then authors of portable libraries will start writing lots of custom memcpy(3)-based replacements to their strncpy(3)- based snippets, and more lines of code will introduce more opportunities for cleverness. (This is also why I was confused by your support for strcpy(3) on the grounds that _FORTIFY_SOURCE exists. Sure, it's better than strncpy(3) in that its behavior isn't nearly so subtle, but _FORTIFY_SOURCE can only protect us from overruns, not from all the "small bugs" that might ensue from people becoming more clever with sizing the destination buffer with strcpy(3). Also, if it were truly a panacea, then we'd hardly have to worry about the problems of strncpy(3) at all, since it would detect any misuse of the function.) Probably the only way to solve the cleverness issue for good is to have an immediately-available, foolproof, performant set of string functions that are extremely straightforward to understand and use, flexible enough for any use case, and generally agreed to be the first choice for string manipulation. Unfortunately, probably the closest match to those criteria, especially the availability criterion, is snprintf(3), which has the flaws of using int instead of size_t for most sizes, not being very performant, and not being async-signal-safe. Alas, it will likely remain a dream, given all the wars over which safer string functions have the best API. But at least strlcpy(3) has a pretty sound interface, if other platforms ever get around to including it by default. > > the code to understand the concept behind how these two snippets work, that > > the only difference between the strncpy(3)'s special "character sequence" > > and an ordinary C string is an additional null terminator at the end of the > > destination buffer. > > This is part of string_copying(7): > > DESCRIPTION > Terms (and abbreviations) > string (str) > is a sequence of zero or more non‐null characters followed by a > null byte. > > character sequence > is a sequence of zero or more non‐null characters. A program > should never use a character sequence where a string is required. > However, with appropriate care, a string can be used in the place > of a character sequence. > > I think that is very explicit in the difference. strncpy(3) refers to > that page for understanding the differences, so I think it is > documented. > > strncpy(3): > CAVEATS > The name of these functions is confusing. These functions produce a > null‐padded character sequence, not a string (see string_copying(7)). My point is isn't that the difference is undocumented, but that the typical man page reader isn't reading the man pages for their own sake, but because they're looking at some code, and they want to Know What It's Doing as soon as possible. If they're getting directed around elsewhere with weird warnings about "not a string" ("what's it going on about, I thought it was null-padded?"), then I worry there's a good chance that they'll instead bounce off the man page and try figuring it out some other way. And even if they do follow the reference, then they might have difficulty understanding the implications, since many people don't think of things in terms of formal definitions. > > reasonable to highlight precisely why strncpy(3)'s output isn't a string > > How about this?: > > diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 > index d4c2ce83d..c80c8b640 100644 > --- a/man3/stpncpy.3 > +++ b/man3/stpncpy.3 > @@ -108,7 +108,10 @@ .SH HISTORY > .SH CAVEATS > The name of these functions is confusing. > These functions produce a null-padded character sequence, > -not a string (see > +not a string. > +While strings have a terminating NUL byte, > +character sequences do not have any terminating byte > +(see > .BR string_copying (7)). > .P > It's impossible to distinguish truncation by the result of the call, Yes, I'd be perfectly happy with something like that. That way, the scariness is far more immediate ("the output might not be terminated!?"), and thus more accessible to the typical reader. > > (viz., the lack of a null terminator), instead of trying to insist that its > > output is worlds apart from anything string-related, especially given the > > volume of existing correct code that belies that notion. > > It is not correct code. That code is doing extra work which confuses > maintainers. It is a lot like writing dead code, since you're writing > zeros that nobody is reading, which confuses maintainers. I am really not a fan of conflating the notions of "code that is difficult to maintain" with "code that doesn't perform the task it is intended to perform". When I think about incorrect code, I think about things like setenv(3) that are just waiting to cause trouble in popular libraries built and deployed today. Meanwhile, "confusing maintainers" is a very subjective notion specific to the both the code and the maintainers: if someone sees some code allocating a fresh buffer, strncpy(3)ing a string into it, slapping a terminator on the end, and finally passing the result into something clearly expecting a string, then why would they be guaranteed to be sweating bullets over whatever happened to rest of the fresh buffer? Especially given how widespread the strncpy(3) + extra null terminator pattern already is. Instead, it's code making use of strncpy(3) in a particularly clever way that I'd find confusing, and in those cases, I lie the blame squarely on the cleverness rather than the function itself. > Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no, > it's not correct code. It's rather dangerous code that just happens to > not be vulnerable most of the time. So will all the custom strlen(3)+memcpy(3)-based replacements suddenly be immune to off-by-one bugs? Or will the vast majority of current strncpy(3) users be willing to either restrict their platform support or add two extra dependencies to their build process just to have strlcpy(3)? I'd hardly be inclined to think that off-by-one bugs are a particular specialty of strncpy(3). > > Or, to answer your question, "It's appropriate to keep using strncpy(3) in > > existing code where it's currently used as part of creating a truncated > > string, and it's not especially inappropriate to use strncpy(3) in new code > > as part of creating a truncated string, if the code must support platforms > > without strlcpy(3) or similar, and if the resulting snippets are few enough > > and well-commented enough that they create less mental load than creating > > and maintaining a custom helper function." > > strncpy(3) calls are never well documented. Do you add a comment in > each such call saying "this zeroing is superfluous"? Probably not. By that standard, every call to a function that takes an output pointer and returns the number of elements written (say, readlink(2)) would need a comment saying "the remaining elements in this array now have undefined values". I don't think it's controversial that in many situations, we tacitly understand that we simply don't care about the remainder of a buffer after a certain point. In the case of producing a string, that point is going to be the null terminator, in the absence of on-site documentation to the contrary; I'd label anything else as overly clever. Meanwhile, "never" would be a strong word to describe the rate that strncpy(3)'s lack of null termination is documented at the call site; 30 of the 339 call sites I mentioned have an associated comment regarding null termination. (ICU seems to be the best library comment-wise, but even it doesn't place them consistently.) It's obviously far from routine in existing code, but it's not something that never happens. Thank you, Matthew House ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 3:13 ` Matthew House @ 2023-11-09 10:26 ` Jonny Grant 2023-11-09 10:31 ` Jonny Grant 2023-11-09 12:23 ` Alejandro Colomar 2 siblings, 0 replies; 138+ messages in thread From: Jonny Grant @ 2023-11-09 10:26 UTC (permalink / raw) To: Matthew House; +Cc: Alejandro Colomar, linux-man On Thu, 9 Nov 2023 at 03:13, Matthew House <mattlloydhouse@gmail.com> wrote: > > On Wed, Nov 8, 2023 at 2:33 PM Alejandro Colomar <alx@kernel.org> wrote: > > On Tue, Nov 07, 2023 at 09:12:37PM -0500, Matthew House wrote: > > > Man pages aren't read only by people writing new code, but also by people > > > reading and modifying existing code. And despite your preferences regarding > > > which functions ought to be used to produce strings, it's a widespread (and > > > correct) practice to produce a string from the character sequence created > > > by strncpy(3). There are two ways of doing this, either by setting the last > > > character of the destination buffer to null if you want to produce a > > > truncated string, or by testing the last character against zero if you want > > > to detect truncation and raise an error. > > > > It is not strncpy(3) who truncated, but the programmer by adding a NULL > > in buff[BUFSIZ - 1]. In the following snippet, strncpy(3) will not > > truncate: > > > > char cs[3]; > > > > strncpy(cs, "foo", 3); > > > > And yet your code doing if (cs[2] != '\0') { goto error; } would think > > it did. That's because you deformed strncpy(3) to implement a poor > > man's strlcpy(3). > > > > char cs[3]; > > > > strncpy(cs, "foo", 3); > > cs[2] = '\0'; // The truncation is here, not in strncpy(3). > > That's indeed a self-consistent interpretation of strncpy(3)'s function, > but I don't think it's borne out by its formal definition, which I was > basing my reasoning on. The current Linux man page for strncpy(3) says, > > These functions copy the string pointed to by src into a null-padded > character sequence at the fixed-width buffer pointed to by dst. If the > destination buffer, limited by its size, isn't large enough to hold the > copy, the resulting character sequence is truncated. > > Notice how it "copies the string": as your string_copying(7) says, a string > includes both a character sequence and a final null byte. So I'd ordinarily > read this definition as saying that strncpy(3) tries to copy src up to and > including the null byte, but produces a truncated copy of the whole string > if the destination buffer is too small. Thus, even if the destination > buffer contains all non-null characters in the original string, then the > copy has still been "truncated" in this sense. > > The ISO C definition, and by extension, the POSIX definition, make this > interpretation even more explicit: > > The strncpy function copies not more than n characters (characters that > follow a null character are not copied) from the array pointed to by s2 > to the array pointed to by s1. > > That is, the terminating null byte is part of the copy, but not anything > after the terminating null byte. > > So one can interpret strncpy(3) as copying a prefix of a character sequence > into a buffer (and zero-filling the remainder), in which case you're > correct that truncation cannot be detected. But the function is fomally > defined as copying a prefix of a string into a buffer (and zero-filling the > remainder), in which case the string has been truncated if the buffer > doesn't end in a null byte afterward. It's just that one may not care about > the terminating null byte being truncated if the user of the result just > wants the initial character sequence. > > > > I'm not aware of any alternative to a strncpy(3)-based snippet for > > > producing a possibly-truncated copy of a string, except for your preferred > > > strlcpy(3) or stpecpy(3), which aren't available to anyone without a > > > > The Linux kernel has strscpy(3), which is also good, but is not > > available to user space. > > > > > brand-new glibc (nor, by extension, any applications or libraries that want > > > > libbsd has provided strlcpy(3) since basically forever. It is a very > > portable library. You don't need a brand-new glibc for having > > strlcpy(3). > > > > <https://libbsd.freedesktop.org/wiki/> > > That's a nice library that I didn't know about! Unfortunately, I don't > think it's a very viable option for the long tail of small libraries I've > referred to, which generally don't have any sub-dependencies of their own, > apart from those provided by the platform. > > Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their > configure scripts from whatever build system you're using (in such a way > that libbsd can locate libmd), ensuring they're safe for cross-compilation > if that's a goal, ensuring you bundle them in a way that respects their > license terms, and ensuring that any user of your library links to the two > dependencies and doesn't duplicate them. At that point, rolling your own > strlcpy(3) equivalent definitely sounds like less mental load, at least to > me. > > > > functions); snprintf(3), which has the insidious flaw of not supporting > > > more than INT_MAX characters on pain of UB, and also produces a warning if > > > the compiler notices the possible truncation; or strlen(3) + min() + > > > memcpy(3) + manually adding a null terminator, which is certainly more > > > explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if > > > that poses a performance problem, but similarly opens up room for > > > off-by-one errors. > > > > More than the performance problem, I'm more worried about the > > maintainability of strncpy(3). When 20 years from now, a programmer > > reading a piece of code full of strncpy(3) wants to migrate to a sane > > function like strlcpy(3) or strcpy(3), the programmer needs to > > understand if the zeroing was purposeful or just accidental. Because > > by using strlcpy(3), it may start leaking some trailing data if the > > trailing of the buffer is meaningful to some program. > > I didn't see this as an issue in practice when I was reviewing all those > existing usages of strncpy(3). The vast majority were used in the midst of > simple string manipulation, where the destination buffer starts as > uninitialized or zeroed out, and ultimately gets passed into a user > expecting an ordinary null-terminated string. > > (One exception was a few functions that used strncpy(dst, "", len) to zero > out the buffer, which is thankfully pretty obvious. Another exception was > the functions that actually used strncpy(3) to produce a null-padded > character sequence, e.g., when writing a value into a section of a binary. > But in general, I found that it's usually not difficult to tell when a > usage is being clever enough that the null padding might be significant.) > > In fact, the greater confusion came from the surprisingly common practice > of using strncpy(3) like it's memcpy(3), by giving it the known length of > the source string, or of some prefix computed through strchr(3) or similar. > This is often then followed up by strncat(3) or similar, indicating that > the writer clearly expects the full length to have non-null characters. But > if the length computation is separated far enough from the actual call to > strncpy(3), then it can become unclear whether the source is actually > expected to have any interior null bytes before the computed length. (So if > a list of alternatives to strncpy(3) is ever drawn up, then I'd suggest > that ordinary memcpy(3) be one of them.) > > > > For the sake of reference, I looked into a few big C and C++ projects to > > > see how often a strncpy(3)-based snippet was used to produce a truncated > > > copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite > > > of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in > > > GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22, > > > 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note > > > that I haven't filtered out vendored dependencies, so there's a little bit > > > of double-counting.) It seems like most codebases that don't ban strncpy(3) > > > use a derived snippet somewhere or another. Also, I found 3 instances in > > > glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by > > > checking the last character. > > > > I know. I've been rewriting the code handling strings in shadow-utils > > for the last year, and ther was a lot of it. I fixed several small bugs > > in the process, so I recommend avoiding it. > > I can't tell you about your own experience, but in mine, the root cause of > most string-handling bugs has been excessive cleverness in using the > standard string functions, rather than the behavior of the functions > themselves. So one worry of mine is that if strncpy(3) ends up being > deprecated or whatever, then authors of portable libraries will start > writing lots of custom memcpy(3)-based replacements to their strncpy(3)- > based snippets, and more lines of code will introduce more opportunities > for cleverness. > > (This is also why I was confused by your support for strcpy(3) on the > grounds that _FORTIFY_SOURCE exists. Sure, it's better than strncpy(3) in > that its behavior isn't nearly so subtle, but _FORTIFY_SOURCE can only > protect us from overruns, not from all the "small bugs" that might ensue > from people becoming more clever with sizing the destination buffer with > strcpy(3). Also, if it were truly a panacea, then we'd hardly have to worry > about the problems of strncpy(3) at all, since it would detect any misuse > of the function.) Matthew, thank you for sharing your information. https://www.gnu.org/software/libc/manual/html_node/Source-Fortification.html I do find _FORTIFY_SOURCE useful in a developer build, for testing, it raises SIGABRT and we can get useful coredump. Without that macro, it would likely still crash or corrupt. However, in my experience in safety critical applications, we really need to avoid the crashes, so we'd write user-space functions that do the same sanity checks (in the same way that fortify does) and then propagate the error back to the application to report the failure, and log it. > > Probably the only way to solve the cleverness issue for good is to have an > immediately-available, foolproof, performant set of string functions that > are extremely straightforward to understand and use, flexible enough for > any use case, and generally agreed to be the first choice for string > manipulation. What's the best standardized function for C string copying in your opinion? They all seem to have drawbacks, strlcpy truncates (I'd rather it rejected if it didn't have enough buffer - could cause issues if the meaning of the string changed due to truncation, eg if it was a file path). Other alternative functions aren't widely in use. Kind regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 3:13 ` Matthew House 2023-11-09 10:26 ` Jonny Grant @ 2023-11-09 10:31 ` Jonny Grant 2023-11-09 11:38 ` Alejandro Colomar 2023-11-09 12:23 ` Alejandro Colomar 2 siblings, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-09 10:31 UTC (permalink / raw) To: Matthew House; +Cc: Alejandro Colomar, linux-man, GNU C Library With glibc added On Thu, 9 Nov 2023 at 03:13, Matthew House <mattlloydhouse@gmail.com> wrote: > > On Wed, Nov 8, 2023 at 2:33 PM Alejandro Colomar <alx@kernel.org> wrote: > > On Tue, Nov 07, 2023 at 09:12:37PM -0500, Matthew House wrote: > > > Man pages aren't read only by people writing new code, but also by people > > > reading and modifying existing code. And despite your preferences regarding > > > which functions ought to be used to produce strings, it's a widespread (and > > > correct) practice to produce a string from the character sequence created > > > by strncpy(3). There are two ways of doing this, either by setting the last > > > character of the destination buffer to null if you want to produce a > > > truncated string, or by testing the last character against zero if you want > > > to detect truncation and raise an error. > > > > It is not strncpy(3) who truncated, but the programmer by adding a NULL > > in buff[BUFSIZ - 1]. In the following snippet, strncpy(3) will not > > truncate: > > > > char cs[3]; > > > > strncpy(cs, "foo", 3); > > > > And yet your code doing if (cs[2] != '\0') { goto error; } would think > > it did. That's because you deformed strncpy(3) to implement a poor > > man's strlcpy(3). > > > > char cs[3]; > > > > strncpy(cs, "foo", 3); > > cs[2] = '\0'; // The truncation is here, not in strncpy(3). > > That's indeed a self-consistent interpretation of strncpy(3)'s function, > but I don't think it's borne out by its formal definition, which I was > basing my reasoning on. The current Linux man page for strncpy(3) says, > > These functions copy the string pointed to by src into a null-padded > character sequence at the fixed-width buffer pointed to by dst. If the > destination buffer, limited by its size, isn't large enough to hold the > copy, the resulting character sequence is truncated. > > Notice how it "copies the string": as your string_copying(7) says, a string > includes both a character sequence and a final null byte. So I'd ordinarily > read this definition as saying that strncpy(3) tries to copy src up to and > including the null byte, but produces a truncated copy of the whole string > if the destination buffer is too small. Thus, even if the destination > buffer contains all non-null characters in the original string, then the > copy has still been "truncated" in this sense. > > The ISO C definition, and by extension, the POSIX definition, make this > interpretation even more explicit: > > The strncpy function copies not more than n characters (characters that > follow a null character are not copied) from the array pointed to by s2 > to the array pointed to by s1. > > That is, the terminating null byte is part of the copy, but not anything > after the terminating null byte. > > So one can interpret strncpy(3) as copying a prefix of a character sequence > into a buffer (and zero-filling the remainder), in which case you're > correct that truncation cannot be detected. But the function is fomally > defined as copying a prefix of a string into a buffer (and zero-filling the > remainder), in which case the string has been truncated if the buffer > doesn't end in a null byte afterward. It's just that one may not care about > the terminating null byte being truncated if the user of the result just > wants the initial character sequence. > > > > I'm not aware of any alternative to a strncpy(3)-based snippet for > > > producing a possibly-truncated copy of a string, except for your preferred > > > strlcpy(3) or stpecpy(3), which aren't available to anyone without a > > > > The Linux kernel has strscpy(3), which is also good, but is not > > available to user space. > > > > > brand-new glibc (nor, by extension, any applications or libraries that want > > > > libbsd has provided strlcpy(3) since basically forever. It is a very > > portable library. You don't need a brand-new glibc for having > > strlcpy(3). > > > > <https://libbsd.freedesktop.org/wiki/> > > That's a nice library that I didn't know about! Unfortunately, I don't > think it's a very viable option for the long tail of small libraries I've > referred to, which generally don't have any sub-dependencies of their own, > apart from those provided by the platform. > > Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their > configure scripts from whatever build system you're using (in such a way > that libbsd can locate libmd), ensuring they're safe for cross-compilation > if that's a goal, ensuring you bundle them in a way that respects their > license terms, and ensuring that any user of your library links to the two > dependencies and doesn't duplicate them. At that point, rolling your own > strlcpy(3) equivalent definitely sounds like less mental load, at least to > me. > > > > functions); snprintf(3), which has the insidious flaw of not supporting > > > more than INT_MAX characters on pain of UB, and also produces a warning if > > > the compiler notices the possible truncation; or strlen(3) + min() + > > > memcpy(3) + manually adding a null terminator, which is certainly more > > > explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if > > > that poses a performance problem, but similarly opens up room for > > > off-by-one errors. > > > > More than the performance problem, I'm more worried about the > > maintainability of strncpy(3). When 20 years from now, a programmer > > reading a piece of code full of strncpy(3) wants to migrate to a sane > > function like strlcpy(3) or strcpy(3), the programmer needs to > > understand if the zeroing was purposeful or just accidental. Because > > by using strlcpy(3), it may start leaking some trailing data if the > > trailing of the buffer is meaningful to some program. > > I didn't see this as an issue in practice when I was reviewing all those > existing usages of strncpy(3). The vast majority were used in the midst of > simple string manipulation, where the destination buffer starts as > uninitialized or zeroed out, and ultimately gets passed into a user > expecting an ordinary null-terminated string. > > (One exception was a few functions that used strncpy(dst, "", len) to zero > out the buffer, which is thankfully pretty obvious. Another exception was > the functions that actually used strncpy(3) to produce a null-padded > character sequence, e.g., when writing a value into a section of a binary. > But in general, I found that it's usually not difficult to tell when a > usage is being clever enough that the null padding might be significant.) > > In fact, the greater confusion came from the surprisingly common practice > of using strncpy(3) like it's memcpy(3), by giving it the known length of > the source string, or of some prefix computed through strchr(3) or similar. > This is often then followed up by strncat(3) or similar, indicating that > the writer clearly expects the full length to have non-null characters. But > if the length computation is separated far enough from the actual call to > strncpy(3), then it can become unclear whether the source is actually > expected to have any interior null bytes before the computed length. (So if > a list of alternatives to strncpy(3) is ever drawn up, then I'd suggest > that ordinary memcpy(3) be one of them.) > > > > For the sake of reference, I looked into a few big C and C++ projects to > > > see how often a strncpy(3)-based snippet was used to produce a truncated > > > copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite > > > of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in > > > GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22, > > > 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note > > > that I haven't filtered out vendored dependencies, so there's a little bit > > > of double-counting.) It seems like most codebases that don't ban strncpy(3) > > > use a derived snippet somewhere or another. Also, I found 3 instances in > > > glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by > > > checking the last character. > > > > I know. I've been rewriting the code handling strings in shadow-utils > > for the last year, and ther was a lot of it. I fixed several small bugs > > in the process, so I recommend avoiding it. > > I can't tell you about your own experience, but in mine, the root cause of > most string-handling bugs has been excessive cleverness in using the > standard string functions, rather than the behavior of the functions > themselves. So one worry of mine is that if strncpy(3) ends up being > deprecated or whatever, then authors of portable libraries will start > writing lots of custom memcpy(3)-based replacements to their strncpy(3)- > based snippets, and more lines of code will introduce more opportunities > for cleverness. > > (This is also why I was confused by your support for strcpy(3) on the > grounds that _FORTIFY_SOURCE exists. Sure, it's better than strncpy(3) in > that its behavior isn't nearly so subtle, but _FORTIFY_SOURCE can only > protect us from overruns, not from all the "small bugs" that might ensue > from people becoming more clever with sizing the destination buffer with > strcpy(3). Also, if it were truly a panacea, then we'd hardly have to worry > about the problems of strncpy(3) at all, since it would detect any misuse > of the function.) Matthew, thank you for sharing your information. https://www.gnu.org/software/libc/manual/html_node/Source-Fortification.html I do find _FORTIFY_SOURCE useful in a developer build, for testing, it raises SIGABRT and we can get useful coredump. Without that macro, it would likely still crash or corrupt. However, in my experience in safety critical applications, we really need to avoid the crashes, so we'd write user-space functions that do the same sanity checks (in the same way that fortify does) and then propagate the error back to the application to report the failure, and log it. > > Probably the only way to solve the cleverness issue for good is to have an > immediately-available, foolproof, performant set of string functions that > are extremely straightforward to understand and use, flexible enough for > any use case, and generally agreed to be the first choice for string > manipulation. What's the best standardized function for C string copying in your opinion? They all seem to have drawbacks, strlcpy truncates (I'd rather it rejected if it didn't have enough buffer - could cause issues if the meaning of the string changed due to truncation, eg if it was a file path). Other alternative functions aren't widely in use. Kind regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 10:31 ` Jonny Grant @ 2023-11-09 11:38 ` Alejandro Colomar 2023-11-09 12:43 ` Alejandro Colomar ` (3 more replies) 0 siblings, 4 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 11:38 UTC (permalink / raw) To: Jonny Grant; +Cc: Matthew House, linux-man, GNU C Library [-- Attachment #1: Type: text/plain, Size: 2122 bytes --] Hi Jonny, On Thu, Nov 09, 2023 at 10:31:49AM +0000, Jonny Grant wrote: > > Probably the only way to solve the cleverness issue for good is to have an > > immediately-available, foolproof, performant set of string functions that > > are extremely straightforward to understand and use, flexible enough for > > any use case, and generally agreed to be the first choice for string > > manipulation. > > What's the best standardized function for C string copying in your strlcpy(3) will soon be standard. POSIX.1-202x (Issue 8) will add it, which is why it's been added recently to glibc. Hopefully, ISO C3x will follow (yeah, it's not like tomorrow). > opinion? They all seem to have drawbacks, strlcpy truncates (I'd > rather it rejected if it didn't have enough buffer - could cause > issues if the meaning of the string changed due to truncation, eg if > it was a file path). Other alternative functions aren't widely in use. If you are consistent in checking the return value of strlcpy(3) and reporting an error, it's the best standard alternative nowadays. snprintf(3), except for using int instead of size_t, has an equivalent API, and is in C99, in case that means something. If you would want to write something based on Michael Kerrisk's article, you could do this: ssize_t strxcpy(char *restrict dst, char *restrict src, size_t dsize) { if (strlen(src) < dsize) return -1; strcpy(dst, src); } You may also want to calculate 'dsize' automagically, to avoid human error, in case it's an array, so you could write a macro on top of it: #define STRXCPY(dst, src) strxcpy(dst, src, ARRAY_SIZE(dst)) These are just small wrappers over standard functions, so you shouldn't have problems adding them to your project. This is my long term plan for shadow-utils, indeed. I'm first transforming strncpy(3) calls into strlcpy(3) to remove the superfluous padding, and later will use this strxcpy() to remove the truncated strings to avoid misinterpretation. Cheers, Alex > > Kind regards, Jonny -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 11:38 ` Alejandro Colomar @ 2023-11-09 12:43 ` Alejandro Colomar 2023-11-09 12:51 ` Xi Ruoyao ` (2 subsequent siblings) 3 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 12:43 UTC (permalink / raw) To: Jonny Grant; +Cc: Matthew House, linux-man, GNU C Library [-- Attachment #1: Type: text/plain, Size: 1158 bytes --] On Thu, Nov 09, 2023 at 12:38:37PM +0100, Alejandro Colomar wrote: > If you would want to write something based on Michael Kerrisk's article, > you could do this: > > ssize_t > strxcpy(char *restrict dst, char *restrict src, size_t dsize) > { > if (strlen(src) < dsize) Heh, here's my off-by-one bug of the day. Good thing is I can fix it in a single place; unlike calling strncpy(3) all the time. This should have been <=. Cheers, Alex > return -1; > > strcpy(dst, src); > } > > You may also want to calculate 'dsize' automagically, to avoid human > error, in case it's an array, so you could write a macro on top of it: > > #define STRXCPY(dst, src) strxcpy(dst, src, ARRAY_SIZE(dst)) > > These are just small wrappers over standard functions, so you shouldn't > have problems adding them to your project. > > This is my long term plan for shadow-utils, indeed. I'm first > transforming strncpy(3) calls into strlcpy(3) to remove the superfluous > padding, and later will use this strxcpy() to remove the truncated > strings to avoid misinterpretation. -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 11:38 ` Alejandro Colomar 2023-11-09 12:43 ` Alejandro Colomar @ 2023-11-09 12:51 ` Xi Ruoyao 2023-11-09 14:01 ` Alejandro Colomar 2023-11-09 18:11 ` Paul Eggert 2023-11-10 11:23 ` Jonny Grant 3 siblings, 1 reply; 138+ messages in thread From: Xi Ruoyao @ 2023-11-09 12:51 UTC (permalink / raw) To: Alejandro Colomar, Jonny Grant; +Cc: Matthew House, linux-man, GNU C Library On Thu, 2023-11-09 at 12:38 +0100, Alejandro Colomar wrote: > If you are consistent in checking the return value of strlcpy(3) and > reporting an error, it's the best standard alternative nowadays. > snprintf(3), except for using int instead of size_t, has an equivalent > API, and is in C99, in case that means something. Yes, you can always create your own wrapper instead of demanding a standard function which must be implemented by every libc. > If you would want to write something based on Michael Kerrisk's article, > you could do this: > ssize_t > strxcpy(char *restrict dst, char *restrict src, size_t dsize) > { > if (strlen(src) < dsize) > return -1; > > strcpy(dst, src); > } I'd like to add __attribute__ ((warn_unused_result)) for this wrapper as well. -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 12:51 ` Xi Ruoyao @ 2023-11-09 14:01 ` Alejandro Colomar 0 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 14:01 UTC (permalink / raw) To: Xi Ruoyao; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library [-- Attachment #1: Type: text/plain, Size: 1089 bytes --] On Thu, Nov 09, 2023 at 08:51:34PM +0800, Xi Ruoyao wrote: > On Thu, 2023-11-09 at 12:38 +0100, Alejandro Colomar wrote: > > If you are consistent in checking the return value of strlcpy(3) and > > reporting an error, it's the best standard alternative nowadays. > > snprintf(3), except for using int instead of size_t, has an equivalent > > API, and is in C99, in case that means something. > > Yes, you can always create your own wrapper instead of demanding a > standard function which must be implemented by every libc. > > > If you would want to write something based on Michael Kerrisk's article, > > you could do this: > > > ssize_t > > strxcpy(char *restrict dst, char *restrict src, size_t dsize) > > { > > if (strlen(src) < dsize) > > return -1; > > > > strcpy(dst, src); > > } > > I'd like to add __attribute__ ((warn_unused_result)) for this wrapper as > well. Indeed. Thanks! > > -- > Xi Ruoyao <xry111@xry111.site> > School of Aerospace Science and Technology, Xidian University -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 11:38 ` Alejandro Colomar 2023-11-09 12:43 ` Alejandro Colomar 2023-11-09 12:51 ` Xi Ruoyao @ 2023-11-09 18:11 ` Paul Eggert 2023-11-09 23:48 ` Alejandro Colomar 2023-11-10 11:23 ` Jonny Grant 3 siblings, 1 reply; 138+ messages in thread From: Paul Eggert @ 2023-11-09 18:11 UTC (permalink / raw) To: Alejandro Colomar, Jonny Grant; +Cc: Matthew House, linux-man, GNU C Library On 2023-11-09 03:38, Alejandro Colomar wrote: > If you are consistent in checking the return value of strlcpy(3) and > reporting an error, it's the best standard alternative nowadays. Not necessarily. strlcpy is subject to denial-of-service attacks if the attacker has control of the source string and can attack by using long source strings. strncpy, as bad as it is, does not have this problem. Instead of this: if (strlcpy (dst, src, dstsize) == dstsize) return failure; applications that want want to copy a string into a small nonempty fixed-size buffer, failing if the string doesn't fit, should do something like this: if (strncpy (dst, src, dstsize)[dstsize - 1]) return failure; This avoids the denial-of-service attack and is portable all the way back to K&R C. It's unfortunate that strlcpy was misdesigned but here we are. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 18:11 ` Paul Eggert @ 2023-11-09 23:48 ` Alejandro Colomar 2023-11-10 5:36 ` Paul Eggert 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 23:48 UTC (permalink / raw) To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library [-- Attachment #1: Type: text/plain, Size: 1126 bytes --] On Thu, Nov 09, 2023 at 10:11:10AM -0800, Paul Eggert wrote: > On 2023-11-09 03:38, Alejandro Colomar wrote: > > If you are consistent in checking the return value of strlcpy(3) and > > reporting an error, it's the best standard alternative nowadays. > > Not necessarily. strlcpy is subject to denial-of-service attacks if the > attacker has control of the source string and can attack by using long > source strings. strncpy, as bad as it is, does not have this problem. Interesting thing. I'd then just use strlen(3)+strcpy(3), avoiding strncpy(3). > > Instead of this: > > if (strlcpy (dst, src, dstsize) == dstsize) > return failure; > > applications that want want to copy a string into a small nonempty > fixed-size buffer, failing if the string doesn't fit, should do something > like this: > > if (strncpy (dst, src, dstsize)[dstsize - 1]) > return failure; > > This avoids the denial-of-service attack and is portable all the way back to > K&R C. > > It's unfortunate that strlcpy was misdesigned but here we are. > -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 23:48 ` Alejandro Colomar @ 2023-11-10 5:36 ` Paul Eggert 2023-11-10 11:05 ` Alejandro Colomar 2023-11-10 11:36 ` Jonny Grant 0 siblings, 2 replies; 138+ messages in thread From: Paul Eggert @ 2023-11-10 5:36 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library On 2023-11-09 15:48, Alejandro Colomar wrote: > I'd then just use strlen(3)+strcpy(3), avoiding > strncpy(3). But that is vulnerable to the same denial-of-service attack that strlcpy is vulnerable to. You'd need strnlen+strcpy instead. The strncpy approach I suggested is simpler, and (though this doesn't matter much in practice) is typically significantly faster than strnlen+strcpy in the typical case where the destination is a small fixed-size buffer. Although strncpy is not a good design, it's often simpler or faster or safer than later "improvements". ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 5:36 ` Paul Eggert @ 2023-11-10 11:05 ` Alejandro Colomar 2023-11-10 11:47 ` Alejandro Colomar 2023-11-10 17:58 ` Paul Eggert 2023-11-10 11:36 ` Jonny Grant 1 sibling, 2 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-10 11:05 UTC (permalink / raw) To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library [-- Attachment #1: Type: text/plain, Size: 1427 bytes --] Hi Paul, On Thu, Nov 09, 2023 at 09:36:43PM -0800, Paul Eggert wrote: > On 2023-11-09 15:48, Alejandro Colomar wrote: > > I'd then just use strlen(3)+strcpy(3), avoiding > > strncpy(3). Heh, brain fart on my side. > > But that is vulnerable to the same denial-of-service attack that strlcpy is > vulnerable to. You'd need strnlen+strcpy instead. > > The strncpy approach I suggested is simpler, and (though this doesn't matter Yeah, although you can always wrap strnlen(3)+memcpy(3) in a strxcpy() inline function and have it even simpler. Rewriting the strxcpy() wrapper I wrote the other day to not be vulnerable to DoS, and hoping I get it right today. [[nodiscard]] inline ssize_t strxcpy(char *restrict dst, const char *restrict src, size_t dsize) { size_t slen; slen = strnlen(src, dsize); if (slen >= dsize) return -1; memcpy(dst, src, slen + 1); return slen; } Hopefully, it won't be so bad in terms of performance. And it is still protected by fortification of memcpy(3). And thanks to [[nodiscard]], it should be hard to misuse. > much in practice) is typically significantly faster than strnlen+strcpy in > the typical case where the destination is a small fixed-size buffer. > > Although strncpy is not a good design, it's often simpler or faster or safer > than later "improvements". Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 11:05 ` Alejandro Colomar @ 2023-11-10 11:47 ` Alejandro Colomar 2023-11-10 17:58 ` Paul Eggert 1 sibling, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-10 11:47 UTC (permalink / raw) To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library [-- Attachment #1: Type: text/plain, Size: 1666 bytes --] On Fri, Nov 10, 2023 at 12:05:31PM +0100, Alejandro Colomar wrote: > Hi Paul, > > On Thu, Nov 09, 2023 at 09:36:43PM -0800, Paul Eggert wrote: > > On 2023-11-09 15:48, Alejandro Colomar wrote: > > > I'd then just use strlen(3)+strcpy(3), avoiding > > > strncpy(3). > > Heh, brain fart on my side. > > > > > But that is vulnerable to the same denial-of-service attack that strlcpy is > > vulnerable to. You'd need strnlen+strcpy instead. > > > > The strncpy approach I suggested is simpler, and (though this doesn't matter > > Yeah, although you can always wrap strnlen(3)+memcpy(3) in a strxcpy() > inline function and have it even simpler. > > Rewriting the strxcpy() wrapper I wrote the other day to not be > vulnerable to DoS, and hoping I get it right today. > > [[nodiscard]] > inline ssize_t > strxcpy(char *restrict dst, const char *restrict src, size_t dsize) > { > size_t slen; > > slen = strnlen(src, dsize); > if (slen >= dsize) Oops: s/>=/==/ > return -1; > > memcpy(dst, src, slen + 1); > > return slen; > } > > Hopefully, it won't be so bad in terms of performance. And it is still > protected by fortification of memcpy(3). And thanks to [[nodiscard]], > it should be hard to misuse. > > > much in practice) is typically significantly faster than strnlen+strcpy in > > the typical case where the destination is a small fixed-size buffer. > > > > Although strncpy is not a good design, it's often simpler or faster or safer > > than later "improvements". > > Cheers, > Alex > > -- > <https://www.alejandro-colomar.es/> -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 11:05 ` Alejandro Colomar 2023-11-10 11:47 ` Alejandro Colomar @ 2023-11-10 17:58 ` Paul Eggert 2023-11-10 18:36 ` Alejandro Colomar 2023-11-10 19:52 ` Alejandro Colomar 1 sibling, 2 replies; 138+ messages in thread From: Paul Eggert @ 2023-11-10 17:58 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library On 2023-11-10 03:05, Alejandro Colomar wrote: > Hopefully, it won't be so bad in terms of performance. It's significantly slower than strncpy for typical use (smallish fixed-size destination buffers). So just use strncpy for that. It may be bad, but it's better than the alternatives you've mentioned. You can package strncpy inside a [[nodiscard]] inline wrapper if you like. More importantly, the manual should not push strlcpy as being superior or being in any way a "fix" for strncpy's problems. strlcpy is worse than strncpy in important ways and besides - as mentioned in the glibc manual - neither function is a good choice for string processing. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 17:58 ` Paul Eggert @ 2023-11-10 18:36 ` Alejandro Colomar 2023-11-10 20:19 ` Alejandro Colomar 2023-11-10 19:52 ` Alejandro Colomar 1 sibling, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-10 18:36 UTC (permalink / raw) To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library [-- Attachment #1: Type: text/plain, Size: 2030 bytes --] Hi Paul, On Fri, Nov 10, 2023 at 09:58:42AM -0800, Paul Eggert wrote: > On 2023-11-10 03:05, Alejandro Colomar wrote: > > Hopefully, it won't be so bad in terms of performance. > > It's significantly slower than strncpy for typical use (smallish fixed-size > destination buffers). So just use strncpy for that. It may be bad, but it's > better than the alternatives you've mentioned. You can package strncpy > inside a [[nodiscard]] inline wrapper if you like. > > More importantly, the manual should not push strlcpy as being superior or > being in any way a "fix" for strncpy's problems. strlcpy is worse than > strncpy in important ways and besides - as mentioned in the glibc manual - > neither function is a good choice for string processing. Hmmmm, that sounds convincing. How about this as a starting point? diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 index 3cf4eb371..3aff18106 100644 --- a/man3/stpncpy.3 +++ b/man3/stpncpy.3 @@ -67,6 +67,38 @@ .SH DESCRIPTION } .EE .in +.\" +.SS Copying a string with truncation +Although this function wasn't designed to copy a string with truncation, +it can be used with appropriate care for that purpose. +Such use is prone to off-by-one bugs, +so it is recommended that you write a wrapper function +that encloses all the danger. +.P +.in +4n +.EX +[[nodiscard]] +inline ssize_t +strxcpy(char *restrict dst, const char *restrict src, char dsize) +{ + char *p; + + p = stpncpy(dst, src, dsize); + if (dst[dsize - 1] != '\0') + return -1; + + return p - dst - 1; +} +.EE +.in +You could implement a similar function in terms of +.BR strlen (3) +and +.BR memcpy (3), +or in terms of +.BR strlcpy (3), +and it would be simpler, +but this implementation is faster. .SH RETURN VALUE .TP .BR strncpy () I used stpncpy(3), assuming it will have the same performance of strncpy(3), because it can be used to return the length. Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 18:36 ` Alejandro Colomar @ 2023-11-10 20:19 ` Alejandro Colomar 2023-11-10 23:44 ` Jonny Grant 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-10 20:19 UTC (permalink / raw) To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library [-- Attachment #1: Type: text/plain, Size: 3565 bytes --] Hi Paul, On Fri, Nov 10, 2023 at 07:36:33PM +0100, Alejandro Colomar wrote: > Hi Paul, > > > On Fri, Nov 10, 2023 at 09:58:42AM -0800, Paul Eggert wrote: > > On 2023-11-10 03:05, Alejandro Colomar wrote: > > > Hopefully, it won't be so bad in terms of performance. > > > > It's significantly slower than strncpy for typical use (smallish fixed-size > > destination buffers). So just use strncpy for that. It may be bad, but it's > > better than the alternatives you've mentioned. You can package strncpy > > inside a [[nodiscard]] inline wrapper if you like. > > > > More importantly, the manual should not push strlcpy as being superior or > > being in any way a "fix" for strncpy's problems. strlcpy is worse than > > strncpy in important ways and besides - as mentioned in the glibc manual - > > neither function is a good choice for string processing. > > Hmmmm, that sounds convincing. How about this as a starting point? Something slightly better: diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 index 3cf4eb371..8ffedae01 100644 --- a/man3/stpncpy.3 +++ b/man3/stpncpy.3 @@ -67,6 +67,88 @@ .SH DESCRIPTION } .EE .in +.\" +.SS Producing a string in a fixed-width buffer +Programs should normally avoid arbitrary string limitations. +However, some programs may need to write strings into fixed-width buffers. +.P +Although this function wasn't designed to produce a string, +it can be used with appropriate care for that purpose. +There are two main cases where it can be useful: +.IP \[bu] 3 +Copying a string into a new string in a fixed-width buffer, +preventing buffer overflow. +.IP \[bu] +Copying a string into a new string in a fixed-width buffer, +with truncation. +.P +Using +.BR strncpy (3) +in any of those cases is prone to several classes of bugs, +so it is recommended that you write a wrapper function +that encloses all the dangers. +.TP +Copying a string preventing buffer overflow +.in +4n +.EX +[[nodiscard]] +inline ssize_t +strxcpy(char *restrict dst, const char *restrict src, char dsize) +{ + char *p; + + if (dsize == 0) + return -1; + + p = stpncpy(dst, src, dsize); + if (dst[dsize - 1] != '\0') + return -1; + + return p - dst; +} +.EE +.in +.P +If it returns -1, +the contents of +.I dst +are undefined, +and the program should handle the error. +.P +You could implement a similar function in terms of +.BR strlen (3) +and +.BR memcpy (3), +or in terms of +.BR strlcpy (3), +and it would be simpler, +but this implementation is faster. +.\" +.TP +Copying a string with truncation +Truncation is almost always a bug. +However, in the few cases where it is not a bug, +you can use the following function. +.in +4n +.EX +inline ssize_t +strtcpy(char *restrict dst, const char *restrict src, char dsize) +{ + char *p; + + if (dsize == 0) + return -1; + + p = stpncpy(dst, src, dsize); + if (dst[dsize - 1] != '\0') { + dst[dsize - 1] = '\0'; + p--; + } + + return p - dst; +} +.EE +.in .SH RETURN VALUE .TP .BR strncpy () However, note how many branches we need to make a function that handles all corner cases. Is it still faster than strnlen+memcpy? stpncpy must be heavily optimized for that. Also, strnlen(3) might be optimized out by the compiler in many cases, so maybe in real code it would be better to use memcpy. I'd very much like to see some numbers. Thanks, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 20:19 ` Alejandro Colomar @ 2023-11-10 23:44 ` Jonny Grant 0 siblings, 0 replies; 138+ messages in thread From: Jonny Grant @ 2023-11-10 23:44 UTC (permalink / raw) To: Alejandro Colomar, Paul Eggert; +Cc: Matthew House, linux-man, GNU C Library On 10/11/2023 20:19, Alejandro Colomar wrote: > Hi Paul, > > On Fri, Nov 10, 2023 at 07:36:33PM +0100, Alejandro Colomar wrote: >> Hi Paul, >> >> >> On Fri, Nov 10, 2023 at 09:58:42AM -0800, Paul Eggert wrote: >>> On 2023-11-10 03:05, Alejandro Colomar wrote: >>>> Hopefully, it won't be so bad in terms of performance. >>> >>> It's significantly slower than strncpy for typical use (smallish fixed-size >>> destination buffers). So just use strncpy for that. It may be bad, but it's >>> better than the alternatives you've mentioned. You can package strncpy >>> inside a [[nodiscard]] inline wrapper if you like. >>> >>> More importantly, the manual should not push strlcpy as being superior or >>> being in any way a "fix" for strncpy's problems. strlcpy is worse than >>> strncpy in important ways and besides - as mentioned in the glibc manual - >>> neither function is a good choice for string processing. >> >> Hmmmm, that sounds convincing. How about this as a starting point? > > Something slightly better: > > diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 > index 3cf4eb371..8ffedae01 100644 > --- a/man3/stpncpy.3 > +++ b/man3/stpncpy.3 > @@ -67,6 +67,88 @@ .SH DESCRIPTION > } > .EE > .in > +.\" > +.SS Producing a string in a fixed-width buffer > +Programs should normally avoid arbitrary string limitations. > +However, some programs may need to write strings into fixed-width buffers. > +.P > +Although this function wasn't designed to produce a string, > +it can be used with appropriate care for that purpose. > +There are two main cases where it can be useful: > +.IP \[bu] 3 > +Copying a string into a new string in a fixed-width buffer, > +preventing buffer overflow. > +.IP \[bu] > +Copying a string into a new string in a fixed-width buffer, > +with truncation. > +.P > +Using > +.BR strncpy (3) > +in any of those cases is prone to several classes of bugs, > +so it is recommended that you write a wrapper function > +that encloses all the dangers. Some feedback about last line: "that covers all the risks" is clearer. > +.TP > +Copying a string preventing buffer overflow > +.in +4n > +.EX > +[[nodiscard]] > +inline ssize_t > +strxcpy(char *restrict dst, const char *restrict src, char dsize) > +{ > + char *p; > + > + if (dsize == 0) > + return -1; > + > + p = stpncpy(dst, src, dsize); > + if (dst[dsize - 1] != '\0') > + return -1; > + > + return p - dst; > +} > +.EE > +.in > +.P > +If it returns -1, > +the contents of > +.I dst > +are undefined, > +and the program should handle the error. > +.P > +You could implement a similar function in terms of > +.BR strlen (3) > +and > +.BR memcpy (3), > +or in terms of > +.BR strlcpy (3), > +and it would be simpler, > +but this implementation is faster. I suggest to add a little more information, could append "because it accesses less memory". > +.\" > +.TP > +Copying a string with truncation > +Truncation is almost always a bug. > +However, in the few cases where it is not a bug, > +you can use the following function. > +.in +4n > +.EX > +inline ssize_t > +strtcpy(char *restrict dst, const char *restrict src, char dsize) > +{ > + char *p; > + > + if (dsize == 0) > + return -1; > + > + p = stpncpy(dst, src, dsize); > + if (dst[dsize - 1] != '\0') { > + dst[dsize - 1] = '\0'; > + p--; > + } > + > + return p - dst; > +} > +.EE > +.in > .SH RETURN VALUE > .TP > .BR strncpy () > > > However, note how many branches we need to make a function that handles > all corner cases. Is it still faster than strnlen+memcpy? stpncpy must > be heavily optimized for that. Also, strnlen(3) might be optimized out > by the compiler in many cases, so maybe in real code it would be better > to use memcpy. I'd very much like to see some numbers. A benchmark test would show performance. Can't be that many lines of code in a loop to measure this. strnlen_s is in the C standard Annex K, but strnlen didn't make it in yet, even C23. Kind regards Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 17:58 ` Paul Eggert 2023-11-10 18:36 ` Alejandro Colomar @ 2023-11-10 19:52 ` Alejandro Colomar 2023-11-10 22:14 ` Paul Eggert 1 sibling, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-10 19:52 UTC (permalink / raw) To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library [-- Attachment #1: Type: text/plain, Size: 1010 bytes --] On Fri, Nov 10, 2023 at 09:58:42AM -0800, Paul Eggert wrote: > On 2023-11-10 03:05, Alejandro Colomar wrote: > > Hopefully, it won't be so bad in terms of performance. > > It's significantly slower than strncpy for typical use (smallish fixed-size > destination buffers). So just use strncpy for that. It may be bad, but it's Do you have any numbers? I'm curious to see strnlen+memcpy vs stpncpy for buffers of some typical sizes (say 80 and BUFSIZ) under amd64 and arm64 (two typical archs). Are we talking of 1%, 10%, or 100%? > better than the alternatives you've mentioned. You can package strncpy > inside a [[nodiscard]] inline wrapper if you like. > > More importantly, the manual should not push strlcpy as being superior or > being in any way a "fix" for strncpy's problems. strlcpy is worse than > strncpy in important ways and besides - as mentioned in the glibc manual - > neither function is a good choice for string processing. -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 19:52 ` Alejandro Colomar @ 2023-11-10 22:14 ` Paul Eggert 2023-11-11 21:13 ` Alejandro Colomar 0 siblings, 1 reply; 138+ messages in thread From: Paul Eggert @ 2023-11-10 22:14 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library [-- Attachment #1: Type: text/plain, Size: 1133 bytes --] On 2023-11-10 11:52, Alejandro Colomar wrote: > Do you have any numbers? It depends on size of course. With programs like 'tar' (one of the few programs that actually needs something like strncpy) the destination buffer is usually fairly small (32 bytes or less) though some of them are 100 bytes. I used 16 bytes in the following shell transcript: $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy strlcpy; do echo; echo $i:; time ./a.out 16 100000000 abcdefghijk $i; done strnlen+strcpy: real 0m0.411s user 0m0.411s sys 0m0.000s strnlen+memcpy: real 0m0.392s user 0m0.388s sys 0m0.004s strncpy: real 0m0.300s user 0m0.300s sys 0m0.000s stpncpy: real 0m0.326s user 0m0.326s sys 0m0.000s strlcpy: real 0m0.623s user 0m0.623s sys 0m0.000s ... where a.out was generated by compiling the attached program with gcc -O2 on Ubuntu 23.10 64-bit on a Xeon W-1350. I wouldn't take these numbers all that seriously, as microbenchmarks like these are not that informative these days. Still, for a typical case one should not assume strncpy must be slower merely because it has more work to do; quite the contrary. [-- Attachment #2: strncpy-bench.c --] [-- Type: text/x-csrc, Size: 1090 bytes --] #include <stdlib.h> #include <string.h> int main (int argc, char **argv) { if (argc != 5) return 2; long bufsize = atol (argv[1]); char *buf = malloc (bufsize); long n = atol (argv[2]); char const *a = argv[3]; if (strcmp (argv[4], "strnlen+strcpy") == 0) { for (long i = 0; i < n; i++) { if (strnlen (a, bufsize) == bufsize) return 1; strcpy (buf, a); } } else if (strcmp (argv[4], "strnlen+memcpy") == 0) { for (long i = 0; i < n; i++) { size_t alen = strnlen (a, bufsize); if (alen == bufsize) return 1; memcpy (buf, a, alen + 1); } } else if (strcmp (argv[4], "strncpy") == 0) { for (long i = 0; i < n; i++) if (strncpy (buf, a, bufsize)[bufsize - 1]) return 1; } else if (strcmp (argv[4], "stpncpy") == 0) { for (long i = 0; i < n; i++) if (stpncpy (buf, a, bufsize) == buf + bufsize) return 1; } else if (strcmp (argv[4], "strlcpy") == 0) { for (long i = 0; i < n; i++) if (strlcpy (buf, a, bufsize) == bufsize) return 1; } else return 2; } ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 22:14 ` Paul Eggert @ 2023-11-11 21:13 ` Alejandro Colomar 2023-11-11 22:20 ` Paul Eggert 2023-11-12 9:52 ` Jonny Grant 0 siblings, 2 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-11 21:13 UTC (permalink / raw) To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library [-- Attachment #1: Type: text/plain, Size: 6922 bytes --] Hi Paul, On Fri, Nov 10, 2023 at 02:14:13PM -0800, Paul Eggert wrote: > On 2023-11-10 11:52, Alejandro Colomar wrote: > > > Do you have any numbers? > > It depends on size of course. With programs like 'tar' (one of the few > programs that actually needs something like strncpy) the destination buffer > is usually fairly small (32 bytes or less) though some of them are 100 > bytes. I used 16 bytes in the following shell transcript: > > $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy strlcpy; do echo; > echo $i:; time ./a.out 16 100000000 abcdefghijk $i; done > > strnlen+strcpy: > > real 0m0.411s > user 0m0.411s > sys 0m0.000s > > strnlen+memcpy: > > real 0m0.392s > user 0m0.388s > sys 0m0.004s > > strncpy: > > real 0m0.300s > user 0m0.300s > sys 0m0.000s > > stpncpy: > > real 0m0.326s > user 0m0.326s > sys 0m0.000s > > strlcpy: > > real 0m0.623s > user 0m0.623s > sys 0m0.000s > > > ... where a.out was generated by compiling the attached program with gcc -O2 > on Ubuntu 23.10 64-bit on a Xeon W-1350. > > I wouldn't take these numbers all that seriously, as microbenchmarks like > these are not that informative these days. Still, for a typical case one > should not assume strncpy must be slower merely because it has more work to > do; quite the contrary. Thanks for the benchmarck! Yeah, I won't take it as the last word, but it shows the growth order (and its cause) of the different alternatives. I'd like to point out some curious things about it: - strnlen+strcpy is slower than strnlen+memcpy. The compiler has all the information necessary here, so I don't see why it's not optimizing out the strcpy(3) into a simple memcpy(3). AFAICS, it's a missed optimization. Even with -O3, it misses the optimization. - strncpy is slower than stpncpy in my computer. stpncpy is in fact the fastest call in my computer. Was strncpy(3) optimized in a recent version of glibc that you have? I'm using Debian Sid on an underclocked i9-13900T. Or is it maybe just luck? I'm curious. $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do echo; echo $i:; time ./a.out 16 100000000 abcdefghijk $i; done; strnlen+strcpy: real 0m0.188s user 0m0.184s sys 0m0.004s strnlen+memcpy: real 0m0.148s user 0m0.148s sys 0m0.000s strncpy: real 0m0.157s user 0m0.157s sys 0m0.000s stpncpy: real 0m0.135s user 0m0.135s sys 0m0.000s memccpy: real 0m0.208s user 0m0.208s sys 0m0.000s strlcpy: real 0m0.322s user 0m0.322s sys 0m0.000s - strlcpy(3) is very heavy. Much more than I expected. See some tests with larger strings. The main growth of strlcpy(3) comes from slen. $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do echo; echo $i:; time ./a.out 64 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i; done; strnlen+strcpy: real 0m0.242s user 0m0.242s sys 0m0.000s strnlen+memcpy: real 0m0.190s user 0m0.186s sys 0m0.004s strncpy: real 0m0.174s user 0m0.173s sys 0m0.000s stpncpy: real 0m0.170s user 0m0.166s sys 0m0.004s memccpy: real 0m0.253s user 0m0.249s sys 0m0.004s strlcpy: real 0m1.385s user 0m1.385s sys 0m0.000s - strncpy(3) also gets heavy compared to strnlen+memcpy. Considering how small the difference with memcpy is for small strings, I wouldn't recommend it instead of memcpy, except for micro-optimizations. The main growth of strncpy(3) comes from dsize. $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do echo; echo $i:; time ./a.out 256 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i; done; strnlen+strcpy: real 0m0.234s user 0m0.233s sys 0m0.001s strnlen+memcpy: real 0m0.192s user 0m0.192s sys 0m0.000s strncpy: real 0m0.268s user 0m0.268s sys 0m0.000s stpncpy: real 0m0.267s user 0m0.267s sys 0m0.000s memccpy: real 0m0.257s user 0m0.256s sys 0m0.001s strlcpy: real 0m1.574s user 0m1.574s sys 0m0.000s $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do echo; echo $i:; time ./a.out 4096 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i; done; strnlen+strcpy: real 0m0.227s user 0m0.227s sys 0m0.000s strnlen+memcpy: real 0m0.190s user 0m0.190s sys 0m0.000s strncpy: real 0m1.400s user 0m1.399s sys 0m0.000s stpncpy: real 0m1.398s user 0m1.398s sys 0m0.000s memccpy: real 0m0.256s user 0m0.256s sys 0m0.000s strlcpy: real 0m1.184s user 0m1.184s sys 0m0.000s - strnlen(3)+memcpy(3) becomes the fastest when dsize grows a bit over a few hundred bytes, and is only a few 10%'s slower than the fastest for smaller buffers. It is also the most semantically correct (together with strnlen+strcpy), avoiding unnecessary dead code (padding). This should get the main backing from the manual pages. However, it can be useful to document typical alternatives to prevent mistakes from users. Especially, since some micro-optimizations may favor uses of strncpy(3). Cheers, Alex > #include <stdlib.h> > #include <string.h> > > > int > main (int argc, char **argv) > { > if (argc != 5) > return 2; > long bufsize = atol (argv[1]); > char *buf = malloc (bufsize); > long n = atol (argv[2]); > char const *a = argv[3]; > if (strcmp (argv[4], "strnlen+strcpy") == 0) > { > for (long i = 0; i < n; i++) > { > if (strnlen (a, bufsize) == bufsize) > return 1; > strcpy (buf, a); > } > } > else if (strcmp (argv[4], "strnlen+memcpy") == 0) > { > for (long i = 0; i < n; i++) > { > size_t alen = strnlen (a, bufsize); > if (alen == bufsize) > return 1; > memcpy (buf, a, alen + 1); > } > } > else if (strcmp (argv[4], "strncpy") == 0) > { > for (long i = 0; i < n; i++) > if (strncpy (buf, a, bufsize)[bufsize - 1]) > return 1; > } > else if (strcmp (argv[4], "stpncpy") == 0) > { > for (long i = 0; i < n; i++) > if (stpncpy (buf, a, bufsize) == buf + bufsize) > return 1; > } I've added the following one for completeness. Especially now that it'll be in C2x. else if (strcmp (argv[4], "memccpy") == 0) { for (long i = 0; i < n; i++) if (memccpy (buf, a, 0, bufsize) == NULL) return 1; } > else if (strcmp (argv[4], "strlcpy") == 0) > { > for (long i = 0; i < n; i++) > if (strlcpy (buf, a, bufsize) == bufsize) This should have been >= bufsize, right? > return 1; > } > else > return 2; > } -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-11 21:13 ` Alejandro Colomar @ 2023-11-11 22:20 ` Paul Eggert 2023-11-12 9:52 ` Jonny Grant 1 sibling, 0 replies; 138+ messages in thread From: Paul Eggert @ 2023-11-11 22:20 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Jonny Grant, Matthew House, linux-man, GNU C Library On 2023-11-11 13:13, Alejandro Colomar wrote: > Was strncpy(3) optimized in a recent version of glibc that you have? Ubuntu 23.10 currently uses glibc 2.38-1ubuntu6. Fortification is on by default, so __builtin___strncpy_chk is involved. Again, I wouldn't take these numbers too seriously. It's just a microbenchmark. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-11 21:13 ` Alejandro Colomar 2023-11-11 22:20 ` Paul Eggert @ 2023-11-12 9:52 ` Jonny Grant 2023-11-12 10:59 ` Alejandro Colomar 1 sibling, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-12 9:52 UTC (permalink / raw) To: Alejandro Colomar, Paul Eggert; +Cc: Matthew House, linux-man, GNU C Library On 11/11/2023 21:13, Alejandro Colomar wrote: > Hi Paul, > > On Fri, Nov 10, 2023 at 02:14:13PM -0800, Paul Eggert wrote: >> On 2023-11-10 11:52, Alejandro Colomar wrote: >> >>> Do you have any numbers? >> >> It depends on size of course. With programs like 'tar' (one of the few >> programs that actually needs something like strncpy) the destination buffer >> is usually fairly small (32 bytes or less) though some of them are 100 >> bytes. I used 16 bytes in the following shell transcript: >> >> $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy strlcpy; do echo; >> echo $i:; time ./a.out 16 100000000 abcdefghijk $i; done >> >> strnlen+strcpy: >> >> real 0m0.411s >> user 0m0.411s >> sys 0m0.000s >> >> strnlen+memcpy: >> >> real 0m0.392s >> user 0m0.388s >> sys 0m0.004s >> >> strncpy: >> >> real 0m0.300s >> user 0m0.300s >> sys 0m0.000s >> >> stpncpy: >> >> real 0m0.326s >> user 0m0.326s >> sys 0m0.000s >> >> strlcpy: >> >> real 0m0.623s >> user 0m0.623s >> sys 0m0.000s >> >> >> ... where a.out was generated by compiling the attached program with gcc -O2 >> on Ubuntu 23.10 64-bit on a Xeon W-1350. >> >> I wouldn't take these numbers all that seriously, as microbenchmarks like >> these are not that informative these days. Still, for a typical case one >> should not assume strncpy must be slower merely because it has more work to >> do; quite the contrary. > > Thanks for the benchmarck! Yeah, I won't take it as the last word, but > it shows the growth order (and its cause) of the different alternatives. > > I'd like to point out some curious things about it: > > - strnlen+strcpy is slower than strnlen+memcpy. > > The compiler has all the information necessary here, so I don't see > why it's not optimizing out the strcpy(3) into a simple memcpy(3). > AFAICS, it's a missed optimization. Even with -O3, it misses the > optimization. > > - strncpy is slower than stpncpy in my computer. > > stpncpy is in fact the fastest call in my computer. > > Was strncpy(3) optimized in a recent version of glibc that you have? > I'm using Debian Sid on an underclocked i9-13900T. Or is it maybe > just luck? I'm curious. > > $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do > echo; echo $i:; > time ./a.out 16 100000000 abcdefghijk $i; > done; > > strnlen+strcpy: > > real 0m0.188s > user 0m0.184s > sys 0m0.004s > > strnlen+memcpy: > > real 0m0.148s > user 0m0.148s > sys 0m0.000s > > strncpy: > > real 0m0.157s > user 0m0.157s > sys 0m0.000s > > stpncpy: > > real 0m0.135s > user 0m0.135s > sys 0m0.000s > > memccpy: > > real 0m0.208s > user 0m0.208s > sys 0m0.000s > > strlcpy: > > real 0m0.322s > user 0m0.322s > sys 0m0.000s > > - strlcpy(3) is very heavy. Much more than I expected. See some tests > with larger strings. The main growth of strlcpy(3) comes from slen. > > $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do > echo; echo $i:; > time ./a.out 64 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i; > done; > > strnlen+strcpy: > > real 0m0.242s > user 0m0.242s > sys 0m0.000s > > strnlen+memcpy: > > real 0m0.190s > user 0m0.186s > sys 0m0.004s > > strncpy: > > real 0m0.174s > user 0m0.173s > sys 0m0.000s > > stpncpy: > > real 0m0.170s > user 0m0.166s > sys 0m0.004s > > memccpy: > > real 0m0.253s > user 0m0.249s > sys 0m0.004s > > strlcpy: > > real 0m1.385s > user 0m1.385s > sys 0m0.000s > > - strncpy(3) also gets heavy compared to strnlen+memcpy. > Considering how small the difference with memcpy is for small > strings, I wouldn't recommend it instead of memcpy, except for > micro-optimizations. The main growth of strncpy(3) comes from dsize. > > $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do > echo; echo $i:; > time ./a.out 256 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i; > done; > > strnlen+strcpy: > > real 0m0.234s > user 0m0.233s > sys 0m0.001s > > strnlen+memcpy: > > real 0m0.192s > user 0m0.192s > sys 0m0.000s > > strncpy: > > real 0m0.268s > user 0m0.268s > sys 0m0.000s > > stpncpy: > > real 0m0.267s > user 0m0.267s > sys 0m0.000s > > memccpy: > > real 0m0.257s > user 0m0.256s > sys 0m0.001s > > strlcpy: > > real 0m1.574s > user 0m1.574s > sys 0m0.000s > > $ for i in strnlen+strcpy strnlen+memcpy strncpy stpncpy memccpy strlcpy; do > echo; echo $i:; > time ./a.out 4096 100000000 aaaabbbbaaaaccccaaaabbbbaaaadddd $i; > done; > > strnlen+strcpy: > > real 0m0.227s > user 0m0.227s > sys 0m0.000s > > strnlen+memcpy: > > real 0m0.190s > user 0m0.190s > sys 0m0.000s > > strncpy: > > real 0m1.400s > user 0m1.399s > sys 0m0.000s > > stpncpy: > > real 0m1.398s > user 0m1.398s > sys 0m0.000s > > memccpy: > > real 0m0.256s > user 0m0.256s > sys 0m0.000s > > strlcpy: > > real 0m1.184s > user 0m1.184s > sys 0m0.000s > > > - strnlen(3)+memcpy(3) becomes the fastest when dsize grows a bit over > a few hundred bytes, and is only a few 10%'s slower than the fastest > for smaller buffers. > > It is also the most semantically correct (together with > strnlen+strcpy), avoiding unnecessary dead code (padding). This > should get the main backing from the manual pages. > > However, it can be useful to document typical alternatives to prevent > mistakes from users. Especially, since some micro-optimizations may > favor uses of strncpy(3). > > Cheers, > Alex > >> #include <stdlib.h> >> #include <string.h> >> >> >> int >> main (int argc, char **argv) >> { >> if (argc != 5) >> return 2; >> long bufsize = atol (argv[1]); >> char *buf = malloc (bufsize); >> long n = atol (argv[2]); >> char const *a = argv[3]; >> if (strcmp (argv[4], "strnlen+strcpy") == 0) >> { >> for (long i = 0; i < n; i++) >> { >> if (strnlen (a, bufsize) == bufsize) >> return 1; >> strcpy (buf, a); >> } >> } >> else if (strcmp (argv[4], "strnlen+memcpy") == 0) >> { >> for (long i = 0; i < n; i++) >> { >> size_t alen = strnlen (a, bufsize); >> if (alen == bufsize) >> return 1; >> memcpy (buf, a, alen + 1); >> } >> } >> else if (strcmp (argv[4], "strncpy") == 0) >> { >> for (long i = 0; i < n; i++) >> if (strncpy (buf, a, bufsize)[bufsize - 1]) >> return 1; >> } >> else if (strcmp (argv[4], "stpncpy") == 0) >> { >> for (long i = 0; i < n; i++) >> if (stpncpy (buf, a, bufsize) == buf + bufsize) >> return 1; >> } > > I've added the following one for completeness. Especially now that > it'll be in C2x. > > else if (strcmp (argv[4], "memccpy") == 0) > { > for (long i = 0; i < n; i++) > if (memccpy (buf, a, 0, bufsize) == NULL) > return 1; > } > >> else if (strcmp (argv[4], "strlcpy") == 0) >> { >> for (long i = 0; i < n; i++) >> if (strlcpy (buf, a, bufsize) == bufsize) > > This should have been >= bufsize, right? > >> return 1; >> } >> else >> return 2; >> } > > Maybe we're gonna need a bigger benchmark. Probably there existing studies. Or could patch something like SQLite Benchmark to utilise each string function just for measurements. Hopefully it moves around at least 2GB of strings to give some meaningful comparison timings. As Paul mentioned, strlcpy is a poor choice for processing strings. Could rely on their guidance as they already measured. https://www.gnu.org/software/libc/manual/html_node/Truncating-Strings.html Maybe the strlcpy API is easier, safer for programmers; but the compiler can't figure out that the programmer already knew src string length. So the strlcpy does a strlen() and wastes time reading over memory. If the src length is known, can just memcpy. When I've benchmarked things, reducing the memory accesses for read, write boosted performance, also looked at the cycles taken, of course cache and alignment all play a part too. Maybe could suggest in your man page programmers should keep track of the src size ? - to save the cost of the strlen(). At least the strlen functions are optimized: glibc/strnlen.c calls memchr() searching for '\0' memchr searches 4 bytes at a time. glibc/strlen.c searches 4 bytes at a time. glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice? memcpy (dest, src, size); dest[size - 1] = '\0'; Kind regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-12 9:52 ` Jonny Grant @ 2023-11-12 10:59 ` Alejandro Colomar 2023-11-12 20:49 ` Paul Eggert 2023-11-17 21:57 ` Jonny Grant 0 siblings, 2 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-12 10:59 UTC (permalink / raw) To: Jonny Grant; +Cc: Paul Eggert, Matthew House, linux-man, GNU C Library [-- Attachment #1: Type: text/plain, Size: 4572 bytes --] Hi Jonny, On Sun, Nov 12, 2023 at 09:52:20AM +0000, Jonny Grant wrote: [... some micro-benchmarks...] > > Maybe we're gonna need a bigger benchmark. Not really. > > Probably there existing studies. Or could patch something like SQLite > Benchmark to utilise each string function just for measurements. > Hopefully it moves around at least 2GB of strings to give some > meaningful comparison timings. I wasn't so interested in the small differences between functions. What this micro-benchmark showed clearly, without needing much more info to be conclusive, is the first order of growth of each of the functions: - strlcpy(3)'s first order growth corresponds to strlen(src). That's due to returning strlen(src), which proves to be a poor API. - strncpy(3)'s first order growth corresponds to sizeof(dst). That's of course due to the zeroing. If sizeof(dst) is kept very small, you could live with it. When the size grows to more or less 4 KiB, this drag becomes meaningful. - strnlen(3)+*cpy() first order growth corresponds to strnlen(src, sizeof(dst)), which is the fastest order of growth you can get from a truncating string-copying function (except if you keep track of your slen manually and call directly memcpy(3)). Of course, first order of growth ignores second order of growth and so on, which for small inputs can be important. That is, O(x^3) is bigger than O(x^2), but x3 + x2 can be smaller than 5*x2 for small x. > > As Paul mentioned, strlcpy is a poor choice for processing strings.\ > Could rely on their guidance as they already measured. > https://www.gnu.org/software/libc/manual/html_node/Truncating-Strings.html Indeed. I've added important notices in BUGS about it, and recommended against. > > Maybe the strlcpy API is easier, safer for programmers; but the > compiler can't figure out that the programmer already knew src string > length. So the strlcpy does a strlen() and wastes time reading over > memory. If the src length is known, can just memcpy. I've written strtcpy(3) as an alternative to strlcpy(3) that doesn't suffer its problems. It should be even safer and easier to use, and its first order of growth is better. I'll send a patch for review in a moment. > When I've benchmarked things, reducing the memory accesses for read, > write boosted performance, also looked at the cycles taken, of course > cache and alignment all play a part too. If one wants to micro-optimize for their use case, its none of my business. I provide a function that should be safe and relatively fast for all use cases, which libc doesn't. > Maybe could suggest in your man page programmers should keep track of > the src size ? - to save the cost of the strlen(). No. Optimizations are not my business. Writing good APIs should make these optimizations low value so that they aren't done, except for the most performance-critical programs. The problem comes when libc doesn't provide anything usable, and the user has no guidance on where to start. Then, programmers start being clever, usually too clever. That's why I think the man-pages should go ahead and write wrapper functions such as strtcpy() and stpecpy() aound libc functions; these wrappers should provide a fast and safe starting point for most programs. It's true that memcpy(3) is the fastest function one can use, but it requires the programmer to be rather careful with the lengths of the strings. I don't think keeping track of all those little details is what the common programmer should do. > > At least the strlen functions are optimized: > glibc/strnlen.c calls memchr() searching for '\0' memchr searches 4 bytes at a time. > glibc/strlen.c searches 4 bytes at a time. > > glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice? > > memcpy (dest, src, size); > dest[size - 1] = '\0'; -1's in the source code make up for off-by-one bugs. APIs should be written so that common use doesn't involve manually writing -1 if possible. I acknowledge the performance benefits of this construction, and have used it myself in NGINX code, but I also find it very dangerous, which is why I recommend using a wrapper over it: char * ustr2stp(char *restrict dst, const char *restrict src, size_t len) { char *p; p = mempcpy(dst, src, len); *p = '\0'; return p; } Cheers, Alex > > Kind regards, Jonny -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-12 10:59 ` Alejandro Colomar @ 2023-11-12 20:49 ` Paul Eggert 2023-11-12 21:00 ` Alejandro Colomar 2023-11-13 23:46 ` Jonny Grant 2023-11-17 21:57 ` Jonny Grant 1 sibling, 2 replies; 138+ messages in thread From: Paul Eggert @ 2023-11-12 20:49 UTC (permalink / raw) To: Alejandro Colomar, Jonny Grant; +Cc: Matthew House, linux-man [dropping libc-alpha since this is only about the man pages] On 2023-11-12 02:59, Alejandro Colomar wrote: > I think the man-pages should go > ahead and write wrapper functions such as strtcpy() and stpecpy() > aound libc functions; these wrappers should provide a fast and safe > starting point for most programs. It's OK for man pages to give these in EXAMPLES sections. However, the man pages currently go too far in this direction. Currently, if I type "man stpecpy", I get a man page with a synopsis and it looks to me like glibc supports stpecpy(3) just like it supports stpcpy(3). But glibc doesn't do that, as stpecpy is merely a man-pages invention: although the source code for stpecpy is in the EXAMPLES section of string_copying(7), you can't use stpecpy in an app without copy-and-pasting the man page's source into your code. It's not just stepecpy. For example, there is no ustr2stp function in glibc, but "man ustr2stp" acts as if there is one. The man pages should describe the library that exists, not the library that some of us would rather have. > It's true that memcpy(3) is the fastest function one can use, but it > requires the programmer to be rather careful with the lengths of the > strings. I don't think keeping track of all those little details is > what the common programmer should do. Unfortunately, C is not designed for string use that's that convenient. If you want safe and efficient use of possibly-long C strings, keeping track of lengths is generally the best way to do it. >> glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice? >> >> memcpy (dest, src, size); >> dest[size - 1] = '\0'; > > -1's in the source code make up for off-by-one bugs. The "dest[size - 1] = '\0';" is there because strlcpy(dst, src, sz) is defined to null-terminate the result if sz!=0, so that particular "-1" isn't a bug. (Perhaps you meant that the strlcpy spec itself is buggy? It wasn't clear to me.) That "last byte, twice" question is: why is the last argument to memcpy "size" and not "size - 1" which would be equally correct? The answer is performance: memcpy often works faster when copying a number of bytes that is a multiple of a smallish power of two, and "size" is more likely than "size - 1" to be such a multiple. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-12 20:49 ` Paul Eggert @ 2023-11-12 21:00 ` Alejandro Colomar 2023-11-12 21:45 ` Alejandro Colomar 2023-11-13 23:46 ` Jonny Grant 1 sibling, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-12 21:00 UTC (permalink / raw) To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man [-- Attachment #1: Type: text/plain, Size: 3346 bytes --] On Sun, Nov 12, 2023 at 12:49:44PM -0800, Paul Eggert wrote: > [dropping libc-alpha since this is only about the man pages] > > On 2023-11-12 02:59, Alejandro Colomar wrote: > > > I think the man-pages should go > > ahead and write wrapper functions such as strtcpy() and stpecpy() > > aound libc functions; these wrappers should provide a fast and safe > > starting point for most programs. > > It's OK for man pages to give these in EXAMPLES sections. However, the man > pages currently go too far in this direction. Currently, if I type "man > stpecpy", I get a man page with a synopsis and it looks to me like glibc > supports stpecpy(3) just like it supports stpcpy(3). But glibc doesn't do > that, as stpecpy is merely a man-pages invention: although the source code > for stpecpy is in the EXAMPLES section of string_copying(7), you can't use > stpecpy in an app without copy-and-pasting the man page's source into your > code. > > It's not just stepecpy. For example, there is no ustr2stp function in glibc, > but "man ustr2stp" acts as if there is one. Yeah, I've thought of removing those links. Will do it. > > The man pages should describe the library that exists, not the library that > some of us would rather have. > > > > It's true that memcpy(3) is the fastest function one can use, but it > > requires the programmer to be rather careful with the lengths of the > > strings. I don't think keeping track of all those little details is > > what the common programmer should do. > > Unfortunately, C is not designed for string use that's that convenient. If > you want safe and efficient use of possibly-long C strings, keeping track of > lengths is generally the best way to do it. > > > > > glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice? > > > > > > memcpy (dest, src, size); > > > dest[size - 1] = '\0'; > > > > -1's in the source code make up for off-by-one bugs. > > The "dest[size - 1] = '\0';" is there because strlcpy(dst, src, sz) is > defined to null-terminate the result if sz!=0, so that particular "-1" isn't > a bug. (Perhaps you meant that the strlcpy spec itself is buggy? It wasn't > clear to me.) I didn't mean this code has a bug. I meant that writing this code all the time is prone to bugs, because one may forget the -1 in some of the cases. And yes, the strlcpy(3) spec is buggy in that it forces a pattern that is prone to off-by-one bugs: to check for truncation, one must use '>=', which one may mistype as '>' (or even '=='). It would have been much better to return -1 on truncation, to have a simple == -1 check as most libc functions. Any function that requires writing hundreds of 'size - 1', or hundreds of '>=' should at least be wrapped. If that use is the only intended use of the function (as is of snprintf(3) and strlcpy(3)), it's a bad API. Cheers, Alex > > That "last byte, twice" question is: why is the last argument to memcpy > "size" and not "size - 1" which would be equally correct? The answer is > performance: memcpy often works faster when copying a number of bytes that > is a multiple of a smallish power of two, and "size" is more likely than > "size - 1" to be such a multiple. > -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-12 21:00 ` Alejandro Colomar @ 2023-11-12 21:45 ` Alejandro Colomar 0 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-12 21:45 UTC (permalink / raw) To: Paul Eggert; +Cc: Jonny Grant, Matthew House, linux-man [-- Attachment #1: Type: text/plain, Size: 4011 bytes --] On Sun, Nov 12, 2023 at 10:00:06PM +0100, Alejandro Colomar wrote: > On Sun, Nov 12, 2023 at 12:49:44PM -0800, Paul Eggert wrote: > > [dropping libc-alpha since this is only about the man pages] > > > > On 2023-11-12 02:59, Alejandro Colomar wrote: > > > > > I think the man-pages should go > > > ahead and write wrapper functions such as strtcpy() and stpecpy() > > > aound libc functions; these wrappers should provide a fast and safe > > > starting point for most programs. > > > > It's OK for man pages to give these in EXAMPLES sections. However, the man > > pages currently go too far in this direction. Currently, if I type "man > > stpecpy", I get a man page with a synopsis and it looks to me like glibc > > supports stpecpy(3) just like it supports stpcpy(3). But glibc doesn't do > > that, as stpecpy is merely a man-pages invention: although the source code > > for stpecpy is in the EXAMPLES section of string_copying(7), you can't use > > stpecpy in an app without copy-and-pasting the man page's source into your > > code. > > > > It's not just stepecpy. For example, there is no ustr2stp function in glibc, > > but "man ustr2stp" acts as if there is one. > > Yeah, I've thought of removing those links. Will do it. > > > > > The man pages should describe the library that exists, not the library that > > some of us would rather have. > > > > > > > It's true that memcpy(3) is the fastest function one can use, but it > > > requires the programmer to be rather careful with the lengths of the > > > strings. I don't think keeping track of all those little details is > > > what the common programmer should do. > > > > Unfortunately, C is not designed for string use that's that convenient. If > > you want safe and efficient use of possibly-long C strings, keeping track of > > lengths is generally the best way to do it. > > > > > > > > glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice? > > > > > > > > memcpy (dest, src, size); > > > > dest[size - 1] = '\0'; > > > > > > -1's in the source code make up for off-by-one bugs. > > > > The "dest[size - 1] = '\0';" is there because strlcpy(dst, src, sz) is > > defined to null-terminate the result if sz!=0, so that particular "-1" isn't > > a bug. (Perhaps you meant that the strlcpy spec itself is buggy? It wasn't > > clear to me.) > > I didn't mean this code has a bug. I meant that writing this code all > the time is prone to bugs, because one may forget the -1 in some of the > cases. Ahh, I hadn't noticed that was part of the implementation of strlcpy(3). I though it was some pattern showing how to use memcpy(3) to copy strings. I was saying that such a pattern would be a bad thing to write all the time. But yeah, inside strlcpy(3) it's fine, and I don't think strlcpy(3) is bad in that regard. The only problem I see in strlcpy(3) is the return value. > > And yes, the strlcpy(3) spec is buggy in that it forces a pattern that > is prone to off-by-one bugs: to check for truncation, one must use '>=', > which one may mistype as '>' (or even '=='). It would have been much > better to return -1 on truncation, to have a simple == -1 check as most > libc functions. > > Any function that requires writing hundreds of 'size - 1', or hundreds > of '>=' should at least be wrapped. If that use is the only intended > use of the function (as is of snprintf(3) and strlcpy(3)), it's a bad > API. > > Cheers, > Alex > > > > > That "last byte, twice" question is: why is the last argument to memcpy > > "size" and not "size - 1" which would be equally correct? The answer is > > performance: memcpy often works faster when copying a number of bytes that > > is a multiple of a smallish power of two, and "size" is more likely than > > "size - 1" to be such a multiple. > > > > -- > <https://www.alejandro-colomar.es/> -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-12 20:49 ` Paul Eggert 2023-11-12 21:00 ` Alejandro Colomar @ 2023-11-13 23:46 ` Jonny Grant 1 sibling, 0 replies; 138+ messages in thread From: Jonny Grant @ 2023-11-13 23:46 UTC (permalink / raw) To: Paul Eggert, Alejandro Colomar; +Cc: Matthew House, linux-man On 12/11/2023 20:49, Paul Eggert wrote: > [dropping libc-alpha since this is only about the man pages] > > On 2023-11-12 02:59, Alejandro Colomar wrote: > >> I think the man-pages should go >> ahead and write wrapper functions such as strtcpy() and stpecpy() >> aound libc functions; these wrappers should provide a fast and safe >> starting point for most programs. > > It's OK for man pages to give these in EXAMPLES sections. However, the man pages currently go too far in this direction. Currently, if I type "man stpecpy", I get a man page with a synopsis and it looks to me like glibc supports stpecpy(3) just like it supports stpcpy(3). But glibc doesn't do that, as stpecpy is merely a man-pages invention: although the source code for stpecpy is in the EXAMPLES section of string_copying(7), you can't use stpecpy in an app without copy-and-pasting the man page's source into your code. > > It's not just stepecpy. For example, there is no ustr2stp function in glibc, but "man ustr2stp" acts as if there is one. > > The man pages should describe the library that exists, not the library that some of us would rather have. > > >> It's true that memcpy(3) is the fastest function one can use, but it >> requires the programmer to be rather careful with the lengths of the >> strings. I don't think keeping track of all those little details is >> what the common programmer should do. > > Unfortunately, C is not designed for string use that's that convenient. If you want safe and efficient use of possibly-long C strings, keeping track of lengths is generally the best way to do it. > > >>> glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice? >>> >>> memcpy (dest, src, size); >>> dest[size - 1] = '\0'; >> >> -1's in the source code make up for off-by-one bugs. > > The "dest[size - 1] = '\0';" is there because strlcpy(dst, src, sz) is defined to null-terminate the result if sz!=0, so that particular "-1" isn't a bug. (Perhaps you meant that the strlcpy spec itself is buggy? It wasn't clear to me.) > > That "last byte, twice" question is: why is the last argument to memcpy "size" and not "size - 1" which would be equally correct? The answer is performance: memcpy often works faster when copying a number of bytes that is a multiple of a smallish power of two, and "size" is more likely than "size - 1" to be such a multiple. > Thank you for your reply. I see what you mean, many programmers consider sizes and would make their dest buffer say 32 bytes, so when this truncation occurs it makes sense to make the most of that to copy quickly, even if that means writing the null terminator on top of the last written byte. Probably someone measured strlcpy with these truncation calls and saw a lot of convenient power of 2 sizes coming through, when truncating strings in this way. Personally, I'm not sure if it is much use when strings are truncated, as strlcpy detects, an API like this could just return an error and not partially copy. Then the programmer would have a chance to realloc() and copy the full string. The strlcpy API returns src_length, even when it's truncated and didn't write src_length+1 bytes to dest, how misleading. Shame strlcpy can't be [[deprecated]]. I'm sure everyone may have read these posts before about strlcpy, just sharing while I remember: Ulrich Drepper frowned upon strlcpy: https://sourceware.org/legacy-ml/libc-alpha/2000-08/msg00053.html "This is horribly inefficient BSD crap. Using these function only leads to other errors. Correct string handling means that you always know how long your strings are and therefore you can you memcpy (instead of strcpy). Beside, those who are using strcat or variants deserved to be punished." The rest of the thread is also interesting. Kind regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-12 10:59 ` Alejandro Colomar 2023-11-12 20:49 ` Paul Eggert @ 2023-11-17 21:57 ` Jonny Grant 2023-11-18 10:12 ` Alejandro Colomar 1 sibling, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-17 21:57 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Paul Eggert, Matthew House, linux-man On 12/11/2023 10:59, Alejandro Colomar wrote: > Hi Jonny, > > On Sun, Nov 12, 2023 at 09:52:20AM +0000, Jonny Grant wrote: > [... some micro-benchmarks...] > >> >> Maybe we're gonna need a bigger benchmark. > > Not really. > >> >> Probably there existing studies. Or could patch something like SQLite >> Benchmark to utilise each string function just for measurements. >> Hopefully it moves around at least 2GB of strings to give some >> meaningful comparison timings. > > I wasn't so interested in the small differences between functions. > What this micro-benchmark showed clearly, without needing much more info > to be conclusive, is the first order of growth of each of the functions: > > - strlcpy(3)'s first order growth corresponds to strlen(src). That's > due to returning strlen(src), which proves to be a poor API. > > - strncpy(3)'s first order growth corresponds to sizeof(dst). That's > of course due to the zeroing. If sizeof(dst) is kept very small, you > could live with it. When the size grows to more or less 4 KiB, this > drag becomes meaningful. > > - strnlen(3)+*cpy() first order growth corresponds to > strnlen(src, sizeof(dst)), which is the fastest order of growth > you can get from a truncating string-copying function (except if you > keep track of your slen manually and call directly memcpy(3)). That's a really good point, keeping track of the length (and buffer size) and then just using memcpy. The copy time should be closer to the number of bytes read and written. > > Of course, first order of growth ignores second order of growth and so > on, which for small inputs can be important. That is, O(x^3) is bigger > than O(x^2), but x3 + x2 can be smaller than 5*x2 for small x. > >> >> As Paul mentioned, strlcpy is a poor choice for processing strings.\ >> Could rely on their guidance as they already measured. >> https://www.gnu.org/software/libc/manual/html_node/Truncating-Strings.html > > Indeed. I've added important notices in BUGS about it, and recommended > against Saw glibc have (11) functions listed as a poor choice for string processing > >> >> Maybe the strlcpy API is easier, safer for programmers; but the >> compiler can't figure out that the programmer already knew src string >> length. So the strlcpy does a strlen() and wastes time reading over >> memory. If the src length is known, can just memcpy. > > I've written strtcpy(3) as an alternative to strlcpy(3) that doesn't > suffer its problems. It should be even safer and easier to use, and its > first order of growth is better. I'll send a patch for review in a > moment. I did take a look at strtcpy but it calls strnlen(), reading over memory. > >> When I've benchmarked things, reducing the memory accesses for read, >> write boosted performance, also looked at the cycles taken, of course >> cache and alignment all play a part too. > > If one wants to micro-optimize for their use case, its none of my > business. I provide a function that should be safe and relatively fast > for all use cases, which libc doesn't. > >> Maybe could suggest in your man page programmers should keep track of >> the src size ? - to save the cost of the strlen(). > > No. Optimizations are not my business. Writing good APIs should make > these optimizations low value so that they aren't done, except for the > most performance-critical programs. > > The problem comes when libc doesn't provide anything usable, and the > user has no guidance on where to start. Then, programmers start being > clever, usually too clever. That's why I think the man-pages should go > ahead and write wrapper functions such as strtcpy() and stpecpy() > aound libc functions; these wrappers should provide a fast and safe > starting point for most programs. > > It's true that memcpy(3) is the fastest function one can use, but it > requires the programmer to be rather careful with the lengths of the > strings. I don't think keeping track of all those little details is > what the common programmer should do. That's true, high-performance users probably create their own bespoke solutions. strtcpy probably takes the src size? > >> >> At least the strlen functions are optimized: >> glibc/strnlen.c calls memchr() searching for '\0' memchr searches 4 bytes at a time. >> glibc/strlen.c searches 4 bytes at a time. >> >> glibc/strlcpy.c __strlcpy() is there a reason when truncating it overwrites the last byte, twice? >> >> memcpy (dest, src, size); >> dest[size - 1] = '\0'; > > -1's in the source code make up for off-by-one bugs. APIs should be > written so that common use doesn't involve manually writing -1 if > possible. What way do you feel they should be doing it? > > I acknowledge the performance benefits of this construction, and have > used it myself in NGINX code, but I also find it very dangerous, which > is why I recommend using a wrapper over it: > > char * > ustr2stp(char *restrict dst, const char *restrict src, size_t len) > { > char *p; > > p = mempcpy(dst, src, len); > *p = '\0'; > > return p; > } > > Cheers, > Alex > >> >> Kind regards, Jonny > Kind regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-17 21:57 ` Jonny Grant @ 2023-11-18 10:12 ` Alejandro Colomar 2023-11-18 23:03 ` Jonny Grant 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-18 10:12 UTC (permalink / raw) To: Jonny Grant; +Cc: Paul Eggert, Matthew House, linux-man [-- Attachment #1: Type: text/plain, Size: 5478 bytes --] Hi Jonny, On Fri, Nov 17, 2023 at 09:57:39PM +0000, Jonny Grant wrote: > > - strlcpy(3)'s first order growth corresponds to strlen(src). That's > > due to returning strlen(src), which proves to be a poor API. > > > > - strncpy(3)'s first order growth corresponds to sizeof(dst). That's > > of course due to the zeroing. If sizeof(dst) is kept very small, you > > could live with it. When the size grows to more or less 4 KiB, this > > drag becomes meaningful. > > > > - strnlen(3)+*cpy() first order growth corresponds to > > strnlen(src, sizeof(dst)), which is the fastest order of growth > > you can get from a truncating string-copying function (except if you > > keep track of your slen manually and call directly memcpy(3)). > > That's a really good point, keeping track of the length (and buffer size) and then just using memcpy. > The copy time should be closer to the number of bytes read and written. Actually, the performance of memcpy(3) should also be on the order of strnlen(src, sizeof(dst)), so it should always take similar times compared to strnlen(3)+*cpy(). It is only that it will always be slightly faster due to avoiding a second read, but it will only be a %. Nothing like 10x, which can easily happen with strlcpy(3) or strncpy(3). > > Of course, first order of growth ignores second order of growth and so > > on, which for small inputs can be important. That is, O(x^3) is bigger > > than O(x^2), but x3 + x2 can be smaller than 5*x2 for small x. > > > >> > >> As Paul mentioned, strlcpy is a poor choice for processing strings.\ > >> Could rely on their guidance as they already measured. > >> https://www.gnu.org/software/libc/manual/html_node/Truncating-Strings.html > > > > Indeed. I've added important notices in BUGS about it, and recommended > > against > > Saw glibc have (11) functions listed as a poor choice for string processing They list many functions as poor choices for string processing. The problem is that they list those functions for string processing. I went a bit further and de-listed some: We don't list strncpy(3) or strncat(3) as functions that process strings, but rather as something else. And they are actually good functions for processing that something else. The problem with strlcpy(3) is that it's a function that is designed to process strings, and being bad at processing strings makes it a bad function period. > >> Maybe the strlcpy API is easier, safer for programmers; but the > >> compiler can't figure out that the programmer already knew src string > >> length. So the strlcpy does a strlen() and wastes time reading over > >> memory. If the src length is known, can just memcpy. > > > > I've written strtcpy(3) as an alternative to strlcpy(3) that doesn't > > suffer its problems. It should be even safer and easier to use, and its > > first order of growth is better. I'll send a patch for review in a > > moment. > > I did take a look at strtcpy but it calls strnlen(), reading over memory. That's just a few % slower than memcpy(3). Don't expect memcpy(3) to be much faster than this. strtcpy() reads twice writes once; memcpy(3) reads once writes once. So you can expect memcpy(3) to be constantly 33% faster (very roughly). If you implement you own strtcpy() in assembly, maybe you can get something that's in the single-digit % slower than memcpy(3), similar to strcpy(3). > >> When I've benchmarked things, reducing the memory accesses for read, > >> write boosted performance, also looked at the cycles taken, of course > >> cache and alignment all play a part too. > > > > If one wants to micro-optimize for their use case, its none of my > > business. I provide a function that should be safe and relatively fast > > for all use cases, which libc doesn't. > > > >> Maybe could suggest in your man page programmers should keep track of > >> the src size ? - to save the cost of the strlen(). > > > > No. Optimizations are not my business. Writing good APIs should make > > these optimizations low value so that they aren't done, except for the > > most performance-critical programs. > > > > The problem comes when libc doesn't provide anything usable, and the > > user has no guidance on where to start. Then, programmers start being > > clever, usually too clever. That's why I think the man-pages should go > > ahead and write wrapper functions such as strtcpy() and stpecpy() > > aound libc functions; these wrappers should provide a fast and safe > > starting point for most programs. > > > > It's true that memcpy(3) is the fastest function one can use, but it > > requires the programmer to be rather careful with the lengths of the > > strings. I don't think keeping track of all those little details is > > what the common programmer should do. > > That's true, high-performance users probably create their own bespoke solutions. > strtcpy probably takes the src size? No. strtcpy() takes the dst size. ssize_t strtcpy(char dst[restrict dsize], const char *restrict src, size_t dsize); This function doesn't care about the src size. It requires that it's either a string, or a character array larger than dst. In both cases, it means that the internal calculation of slen = strnlen(src, dsize) will never overrun the buffer, while costing only a small time. -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-18 10:12 ` Alejandro Colomar @ 2023-11-18 23:03 ` Jonny Grant 0 siblings, 0 replies; 138+ messages in thread From: Jonny Grant @ 2023-11-18 23:03 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Paul Eggert, Matthew House, linux-man On 18/11/2023 10:12, Alejandro Colomar wrote: > Hi Jonny, > > On Fri, Nov 17, 2023 at 09:57:39PM +0000, Jonny Grant wrote: >>> - strlcpy(3)'s first order growth corresponds to strlen(src). That's >>> due to returning strlen(src), which proves to be a poor API. >>> >>> - strncpy(3)'s first order growth corresponds to sizeof(dst). That's >>> of course due to the zeroing. If sizeof(dst) is kept very small, you >>> could live with it. When the size grows to more or less 4 KiB, this >>> drag becomes meaningful. >>> >>> - strnlen(3)+*cpy() first order growth corresponds to >>> strnlen(src, sizeof(dst)), which is the fastest order of growth >>> you can get from a truncating string-copying function (except if you >>> keep track of your slen manually and call directly memcpy(3)). >> >> That's a really good point, keeping track of the length (and buffer size) and then just using memcpy. >> The copy time should be closer to the number of bytes read and written. > > Actually, the performance of memcpy(3) should also be on the order of > strnlen(src, sizeof(dst)), so it should always take similar times > compared to strnlen(3)+*cpy(). It is only that it will always be > slightly faster due to avoiding a second read, but it will only be a %. > Nothing like 10x, which can easily happen with strlcpy(3) or strncpy(3). > >>> Of course, first order of growth ignores second order of growth and so >>> on, which for small inputs can be important. That is, O(x^3) is bigger >>> than O(x^2), but x3 + x2 can be smaller than 5*x2 for small x. >>> >>>> >>>> As Paul mentioned, strlcpy is a poor choice for processing strings.\ >>>> Could rely on their guidance as they already measured. >>>> https://www.gnu.org/software/libc/manual/html_node/Truncating-Strings.html >>> >>> Indeed. I've added important notices in BUGS about it, and recommended >>> against >> >> Saw glibc have (11) functions listed as a poor choice for string processing > > They list many functions as poor choices for string processing. The > problem is that they list those functions for string processing. I went > a bit further and de-listed some: We don't list strncpy(3) or strncat(3) > as functions that process strings, but rather as something else. And > they are actually good functions for processing that something else. > > The problem with strlcpy(3) is that it's a function that is designed to > process strings, and being bad at processing strings makes it a bad > function period. > >>>> Maybe the strlcpy API is easier, safer for programmers; but the >>>> compiler can't figure out that the programmer already knew src string >>>> length. So the strlcpy does a strlen() and wastes time reading over >>>> memory. If the src length is known, can just memcpy. >>> >>> I've written strtcpy(3) as an alternative to strlcpy(3) that doesn't >>> suffer its problems. It should be even safer and easier to use, and its >>> first order of growth is better. I'll send a patch for review in a >>> moment. >> >> I did take a look at strtcpy but it calls strnlen(), reading over memory. > > That's just a few % slower than memcpy(3). Don't expect memcpy(3) to be > much faster than this. strtcpy() reads twice writes once; memcpy(3) > reads once writes once. So you can expect memcpy(3) to be constantly > 33% faster (very roughly). Probably there are benchmarks, measurements comparing those functions which use strnlen() to those that just do memcpy()? Would be interesting to hear what the time is to do those reads & writes, or just do writes. > If you implement you own strtcpy() in assembly, maybe you can get > something that's in the single-digit % slower than memcpy(3), similar to > strcpy(3). > >>>> When I've benchmarked things, reducing the memory accesses for read, >>>> write boosted performance, also looked at the cycles taken, of course >>>> cache and alignment all play a part too. >>> >>> If one wants to micro-optimize for their use case, its none of my >>> business. I provide a function that should be safe and relatively fast >>> for all use cases, which libc doesn't. >>> >>>> Maybe could suggest in your man page programmers should keep track of >>>> the src size ? - to save the cost of the strlen(). >>> >>> No. Optimizations are not my business. Writing good APIs should make >>> these optimizations low value so that they aren't done, except for the >>> most performance-critical programs. >>> >>> The problem comes when libc doesn't provide anything usable, and the >>> user has no guidance on where to start. Then, programmers start being >>> clever, usually too clever. That's why I think the man-pages should go >>> ahead and write wrapper functions such as strtcpy() and stpecpy() >>> aound libc functions; these wrappers should provide a fast and safe >>> starting point for most programs. >>> >>> It's true that memcpy(3) is the fastest function one can use, but it >>> requires the programmer to be rather careful with the lengths of the >>> strings. I don't think keeping track of all those little details is >>> what the common programmer should do. >> >> That's true, high-performance users probably create their own bespoke solutions. >> strtcpy probably takes the src size? > > No. strtcpy() takes the dst size. > > ssize_t > strtcpy(char dst[restrict dsize], const char *restrict src, size_t dsize); > > This function doesn't care about the src size. It requires that it's > either a string, or a character array larger than dst. In both cases, > it means that the internal calculation of slen = strnlen(src, dsize) > will never overrun the buffer, while costing only a small time. Ok I see, I would rather use something that allowed the src_len to be specified, to save that strnlen() cost. Kind regards Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 5:36 ` Paul Eggert 2023-11-10 11:05 ` Alejandro Colomar @ 2023-11-10 11:36 ` Jonny Grant 2023-11-10 13:15 ` Alejandro Colomar 1 sibling, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-10 11:36 UTC (permalink / raw) To: Paul Eggert, Alejandro Colomar; +Cc: Matthew House, linux-man, GNU C Library On 10/11/2023 05:36, Paul Eggert wrote: > On 2023-11-09 15:48, Alejandro Colomar wrote: >> I'd then just use strlen(3)+strcpy(3), avoiding >> strncpy(3). > > But that is vulnerable to the same denial-of-service attack that strlcpy is vulnerable to. You'd need strnlen+strcpy instead. > > The strncpy approach I suggested is simpler, and (though this doesn't matter much in practice) is typically significantly faster than strnlen+strcpy in the typical case where the destination is a small fixed-size buffer. > > Although strncpy is not a good design, it's often simpler or faster or safer than later "improvements". As you say, it is a known API. I recall looking for a standardized bounded string copy a few years ago that avoids pitfalls: 1) cost of any initial strnlen() reading memory to determine input src size 2) accepts a src_max_size to actually try to copy from src 3) does not truncate by writing anything to the buffer if there isn't enough space in the dest_max_size to fit src_max_size 4) check for NULL pointers 5) probably other thing I've overlooked Something like this API: int my_str_copy(char *dest, const char *src, size_t dest_max_size, size_t src_max_size, size_t * dest_written); These sizes are including any NUL terminating byte. 0 on success, or an an error code like EINVAL, or ERANGE if would truncate All comments welcome. Kind regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 11:36 ` Jonny Grant @ 2023-11-10 13:15 ` Alejandro Colomar 2023-11-18 23:40 ` Jonny Grant 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-10 13:15 UTC (permalink / raw) To: Jonny Grant; +Cc: Paul Eggert, Matthew House, linux-man, GNU C Library [-- Attachment #1: Type: text/plain, Size: 3255 bytes --] Hi Jonny, On Fri, Nov 10, 2023 at 11:36:20AM +0000, Jonny Grant wrote: > > > On 10/11/2023 05:36, Paul Eggert wrote: > > On 2023-11-09 15:48, Alejandro Colomar wrote: > >> I'd then just use strlen(3)+strcpy(3), avoiding > >> strncpy(3). > > > > But that is vulnerable to the same denial-of-service attack that strlcpy is vulnerable to. You'd need strnlen+strcpy instead. > > > > The strncpy approach I suggested is simpler, and (though this doesn't matter much in practice) is typically significantly faster than strnlen+strcpy in the typical case where the destination is a small fixed-size buffer. > > > > Although strncpy is not a good design, it's often simpler or faster or safer than later "improvements". > > As you say, it is a known API. I recall looking for a standardized bounded string copy a few years ago that avoids pitfalls: > > 1) cost of any initial strnlen() reading memory to determine input src size > 2) accepts a src_max_size to actually try to copy from src > 3) does not truncate by writing anything to the buffer if there isn't enough space in the dest_max_size to fit src_max_size > 4) check for NULL pointers > 5) probably other thing I've overlooked > > Something like this API: > int my_str_copy(char *dest, const char *src, size_t dest_max_size, size_t src_max_size, size_t * dest_written); > These sizes are including any NUL terminating byte. > > 0 on success, or an an error code like EINVAL, or ERANGE if would truncate - Linux kernel's strscpy() returns -E2BIG if it would truncate. You may want to follow suit if you want such an errno(3) code. However, I think it's simpler to return the "standard" user-space error return value: -1 If you'd need to distinguish error reasons, you could distinguish error codes, but for a string-copying function I think it's not so useful. - Why specify the src buffer size? If you're copying strings, then you know it'll be null-terminated, so strnlen(3) will not overrun. If you're not copying strings, then you'll need a different function that reads from a non-string. The only standard such function is strncat(3), which reads from a fixed-width null-padded buffer, and writes to a string. You may want to write a function similar to strncat(3) that doesn't catenate, if you want to just copy; I call that function zustr2stp(), and you can find an implementation in string_copying(7). - You can reuse the return value for the dest_written value with ssize_t. Just return -1 on error and the string length on success. That's how most libc functions behave. - Regarding NULL checks, it depends on how you program. I wouldn't add them, but if you want to avoid crashes at all costs, it may be necessary for you. You could do a wrapper over strxcpy(): inline ssize_t strxcpy0(char *restrict dst, const char *restrict src, size_t dsize) { if (dst == NULL || src == NULL) return -1; return strxcpy(dst, src, dsize); } I used 0 in the name to mark that this function checks for null pointers. Cheers, Alex > > All comments welcome. > > Kind regards, Jonny -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 13:15 ` Alejandro Colomar @ 2023-11-18 23:40 ` Jonny Grant 2023-11-20 11:56 ` Jonny Grant 0 siblings, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-18 23:40 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Paul Eggert, Matthew House, linux-man On 10/11/2023 13:15, Alejandro Colomar wrote: > Hi Jonny, > > On Fri, Nov 10, 2023 at 11:36:20AM +0000, Jonny Grant wrote: >> >> >> On 10/11/2023 05:36, Paul Eggert wrote: >>> On 2023-11-09 15:48, Alejandro Colomar wrote: >>>> I'd then just use strlen(3)+strcpy(3), avoiding >>>> strncpy(3). >>> >>> But that is vulnerable to the same denial-of-service attack that strlcpy is vulnerable to. You'd need strnlen+strcpy instead. >>> >>> The strncpy approach I suggested is simpler, and (though this doesn't matter much in practice) is typically significantly faster than strnlen+strcpy in the typical case where the destination is a small fixed-size buffer. >>> >>> Although strncpy is not a good design, it's often simpler or faster or safer than later "improvements". >> >> As you say, it is a known API. I recall looking for a standardized bounded string copy a few years ago that avoids pitfalls: >> >> 1) cost of any initial strnlen() reading memory to determine input src size >> 2) accepts a src_max_size to actually try to copy from src >> 3) does not truncate by writing anything to the buffer if there isn't enough space in the dest_max_size to fit src_max_size >> 4) check for NULL pointers >> 5) probably other thing I've overlooked >> >> Something like this API: >> int my_str_copy(char *dest, const char *src, size_t dest_max_size, size_t src_max_size, size_t * dest_written); >> These sizes are including any NUL terminating byte. >> >> 0 on success, or an an error code like EINVAL, or ERANGE if would truncate > > - Linux kernel's strscpy() returns -E2BIG if it would truncate. You > may want to follow suit if you want such an errno(3) code. That is good, E2BIG if the dest_max_size can't accommodate src_max_size > > However, I think it's simpler to return the "standard" user-space > error return value: -1> > If you'd need to distinguish error reasons, you could distinguish > error codes, but for a string-copying function I think it's not so > useful. In the past I've used different values, eg -1 .. -5 as there are 5 different errors detected by this function I made a test version of, so application just needs to check for 0 for success. (The different error returns are useful when the issue is logged, to see where the error was detected in the function.) > - Why specify the src buffer size? If you're copying strings, then you > know it'll be null-terminated, so strnlen(3) will not overrun. The application should know the src buffer size, given that it allocated the buffer. That saves the performance cost of strnlen(). > If > you're not copying strings, then you'll need a different function > that reads from a non-string. The only standard such function is > strncat(3), which reads from a fixed-width null-padded buffer, and > writes to a string. You may want to write a function similar to > strncat(3) that doesn't catenate, if you want to just copy; I call > that function zustr2stp(), and you can find an implementation in > string_copying(7). > > - You can reuse the return value for the dest_written value with > ssize_t. Just return -1 on error and the string length on success. > That's how most libc functions behave. Sounds good. > > - Regarding NULL checks, it depends on how you program. I wouldn't add > them, but if you want to avoid crashes at all costs, it may be > necessary for you. You could do a wrapper over strxcpy(): > > > inline ssize_t > strxcpy0(char *restrict dst, const char *restrict src, size_t dsize) > { > if (dst == NULL || src == NULL) > return -1; > > return strxcpy(dst, src, dsize); > } > > I used 0 in the name to mark that this function checks for null > pointers. > > Cheers, > Alex > >> >> All comments welcome. >> >> Kind regards, Jonny > Kind regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-18 23:40 ` Jonny Grant @ 2023-11-20 11:56 ` Jonny Grant 2023-11-20 15:12 ` Alejandro Colomar 0 siblings, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-20 11:56 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Paul Eggert, Matthew House, linux-man BTW, GCC has a useful warning for truncation that may help code bases that use strncpy, you've probably seen this and the article, just sharing for completeness. warning: ‘__builtin_strncpy’ output truncated before terminating nul copying XYZ bytes from a string of the same length [-Wstringop-truncation] Martin's article from 2019 https://developers.redhat.com/blog/2018/05/24/detecting-string-truncation-with-gcc-8#forming_truncated_strings_with_snprintf ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-20 11:56 ` Jonny Grant @ 2023-11-20 15:12 ` Alejandro Colomar 2023-11-20 23:08 ` Jonny Grant 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-20 15:12 UTC (permalink / raw) To: Jonny Grant; +Cc: Paul Eggert, Matthew House, linux-man [-- Attachment #1: Type: text/plain, Size: 1133 bytes --] Hi Jonny, On Mon, Nov 20, 2023 at 11:56:40AM +0000, Jonny Grant wrote: > BTW, GCC has a useful warning for truncation that may help code bases that use strncpy, you've probably seen this and the article, just sharing for completeness. It's actually the opposite. GCC's warnings about strncpy(3) are nefarious, as it warns in valid uses of strncpy(3) for writing a null-padded character sequence (the use for which strncpy(3) was designed), recommending the bogus use as a function for copying truncated strings. > > warning: ‘__builtin_strncpy’ output truncated before terminating nul copying XYZ bytes from a string of the same length [-Wstringop-truncation] > > > Martin's article from 2019 > https://developers.redhat.com/blog/2018/05/24/detecting-string-truncation-with-gcc-8#forming_truncated_strings_with_snprintf I discussed with Martin about this, IIRC, and he told me they had to decide which use of strncpy(3) to support, with the side effect that other uses would be warned about, and they chose the one that I think is bogus. Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-20 15:12 ` Alejandro Colomar @ 2023-11-20 23:08 ` Jonny Grant 2023-11-20 23:42 ` Alejandro Colomar 0 siblings, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-20 23:08 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Paul Eggert, Matthew House, linux-man On 20/11/2023 15:12, Alejandro Colomar wrote: > Hi Jonny, > > On Mon, Nov 20, 2023 at 11:56:40AM +0000, Jonny Grant wrote: >> BTW, GCC has a useful warning for truncation that may help code bases that use strncpy, you've probably seen this and the article, just sharing for completeness. > > It's actually the opposite. GCC's warnings about strncpy(3) are > nefarious, as it warns in valid uses of strncpy(3) for writing a > null-padded character sequence (the use for which strncpy(3) was > designed), recommending the bogus use as a function for copying > truncated strings. You're right, I can see this warning is issued for valid uses of strncpy(3) to copy a sequence of characters, (without even a single NUL pad). It does not warn when the byte sequence count includes a NUL byte. >> >> warning: ‘__builtin_strncpy’ output truncated before terminating nul copying XYZ bytes from a string of the same length [-Wstringop-truncation] >> >> >> Martin's article from 2019 >> https://developers.redhat.com/blog/2018/05/24/detecting-string-truncation-with-gcc-8#forming_truncated_strings_with_snprintf > > I discussed with Martin about this, IIRC, and he told me they had to > decide which use of strncpy(3) to support, with the side effect that > other uses would be warned about, and they chose the one that I think is > bogus. Fair enough. While I remember, the strlcpy discussion has been going on for over 20 years. https://sourceware.org/legacy-ml/libc-alpha/2000-08/msg00053.html https://news.ycombinator.com/item?id=6940601 Kind regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-20 23:08 ` Jonny Grant @ 2023-11-20 23:42 ` Alejandro Colomar 0 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-20 23:42 UTC (permalink / raw) To: Jonny Grant; +Cc: Paul Eggert, Matthew House, linux-man [-- Attachment #1: Type: text/plain, Size: 1240 bytes --] Hi Jonny, On Mon, Nov 20, 2023 at 11:08:58PM +0000, Jonny Grant wrote: > > I discussed with Martin about this, IIRC, and he told me they had to > > decide which use of strncpy(3) to support, with the side effect that > > other uses would be warned about, and they chose the one that I think is > > bogus. > > Fair enough. To be fair with Martin and GCC, the uses of strncpy(3) that I consider correct are so trivial that those warnings are unnecessary, since one should always use sizeof(dst) in the call, which can be done by a wrapper macro #define STRNCPY(dst, src) strncpy(dst, src, nitems(dst)) which is precisely what I did in shadow-utils. With this, the chances of getting the size wrong are 0, so I'd just turn off those warnings. Since strncpy(3) should always be used for writing to a fixed-size array, it's likely to be an actual array, of which you can take the size with nitems(). At least in shadow-utils, all calls have been replaced by that macro. I'm curious if all uses are similarly trivial in tar(1). So if this warning helps those who misuse strncpy(3) to at least misuse it safely, then it's a partially-good thing. Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 11:38 ` Alejandro Colomar ` (2 preceding siblings ...) 2023-11-09 18:11 ` Paul Eggert @ 2023-11-10 11:23 ` Jonny Grant 3 siblings, 0 replies; 138+ messages in thread From: Jonny Grant @ 2023-11-10 11:23 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Matthew House, linux-man, GNU C Library On 09/11/2023 11:38, Alejandro Colomar wrote: > Hi Jonny, > > On Thu, Nov 09, 2023 at 10:31:49AM +0000, Jonny Grant wrote: >>> Probably the only way to solve the cleverness issue for good is to have an >>> immediately-available, foolproof, performant set of string functions that >>> are extremely straightforward to understand and use, flexible enough for >>> any use case, and generally agreed to be the first choice for string >>> manipulation. >> >> What's the best standardized function for C string copying in your > > strlcpy(3) will soon be standard. POSIX.1-202x (Issue 8) will add it, > which is why it's been added recently to glibc. Hopefully, ISO C3x will > follow (yeah, it's not like tomorrow). > >> opinion? They all seem to have drawbacks, strlcpy truncates (I'd >> rather it rejected if it didn't have enough buffer - could cause >> issues if the meaning of the string changed due to truncation, eg if >> it was a file path). Other alternative functions aren't widely in use. > > If you are consistent in checking the return value of strlcpy(3) and > reporting an error, it's the best standard alternative nowadays. > snprintf(3), except for using int instead of size_t, has an equivalent > API, and is in C99, in case that means something. > > If you would want to write something based on Michael Kerrisk's article, > you could do this: > > ssize_t > strxcpy(char *restrict dst, char *restrict src, size_t dsize) > { > if (strlen(src) < dsize) > return -1; > > strcpy(dst, src); > } > > You may also want to calculate 'dsize' automagically, to avoid human > error, in case it's an array, so you could write a macro on top of it: > > #define STRXCPY(dst, src) strxcpy(dst, src, ARRAY_SIZE(dst)) > > These are just small wrappers over standard functions, so you shouldn't > have problems adding them to your project. > > This is my long term plan for shadow-utils, indeed. I'm first > transforming strncpy(3) calls into strlcpy(3) to remove the superfluous > padding, and later will use this strxcpy() to remove the truncated > strings to avoid misinterpretation. > > Cheers, > Alex > >> >> Kind regards, Jonny > Yes, I like to look for a libc library function before writing my own wrapper, but I would consider something like strxcpy. snprintf will truncate if not enough space, but will then return the number of bytes that would have been written had there not been truncation. So one could use snprintf on an array buffer on the stack, and then if truncation, discard the buffer and return an error, otherwise carry on using the string (that wasn't truncated). Re strlcpy I see BSD man page gives some examples how to check for truncation by strlcpy. Perhaps examples could be added to linux kernel man page. https://man.freebsd.org/cgi/man.cgi?query=strlcat&sektion=3 Kind regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 3:13 ` Matthew House 2023-11-09 10:26 ` Jonny Grant 2023-11-09 10:31 ` Jonny Grant @ 2023-11-09 12:23 ` Alejandro Colomar 2023-11-09 12:35 ` Alejandro Colomar ` (2 more replies) 2 siblings, 3 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 12:23 UTC (permalink / raw) To: Matthew House; +Cc: Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 20957 bytes --] Hi Matthew, On Wed, Nov 08, 2023 at 10:13:39PM -0500, Matthew House wrote: > On Wed, Nov 8, 2023 at 2:33 PM Alejandro Colomar <alx@kernel.org> wrote: > > On Tue, Nov 07, 2023 at 09:12:37PM -0500, Matthew House wrote: > > > Man pages aren't read only by people writing new code, but also by people > > > reading and modifying existing code. And despite your preferences regarding > > > which functions ought to be used to produce strings, it's a widespread (and > > > correct) practice to produce a string from the character sequence created > > > by strncpy(3). There are two ways of doing this, either by setting the last > > > character of the destination buffer to null if you want to produce a > > > truncated string, or by testing the last character against zero if you want > > > to detect truncation and raise an error. > > > > It is not strncpy(3) who truncated, but the programmer by adding a NULL > > in buff[BUFSIZ - 1]. In the following snippet, strncpy(3) will not > > truncate: > > > > char cs[3]; > > > > strncpy(cs, "foo", 3); > > > > And yet your code doing if (cs[2] != '\0') { goto error; } would think > > it did. That's because you deformed strncpy(3) to implement a poor > > man's strlcpy(3). > > > > char cs[3]; > > > > strncpy(cs, "foo", 3); > > cs[2] = '\0'; // The truncation is here, not in strncpy(3). > > That's indeed a self-consistent interpretation of strncpy(3)'s function, > but I don't think it's borne out by its formal definition, which I was > basing my reasoning on. The current Linux man page for strncpy(3) says, > > These functions copy the string pointed to by src into a null-padded > character sequence at the fixed-width buffer pointed to by dst. If the > destination buffer, limited by its size, isn't large enough to hold the > copy, the resulting character sequence is truncated. > > Notice how it "copies the string": as your string_copying(7) says, a string > includes both a character sequence and a final null byte. So I'd ordinarily > read this definition as saying that strncpy(3) tries to copy src up to and > including the null byte, but produces a truncated copy of the whole string > if the destination buffer is too small. Thus, even if the destination > buffer contains all non-null characters in the original string, then the > copy has still been "truncated" in this sense. Yes, that was an inconsistency in my definition. Thanks to DJ's suggestion ("copies bytes from the string", that has been fixed. Maybe it would be even better to say "copies characters from the string". > > The ISO C definition, and by extension, the POSIX definition, make this > interpretation even more explicit: > > The strncpy function copies not more than n characters (characters that > follow a null character are not copied) from the array pointed to by s2 > to the array pointed to by s1. > > That is, the terminating null byte is part of the copy, but not anything > after the terminating null byte. > > So one can interpret strncpy(3) as copying a prefix of a character sequence > into a buffer (and zero-filling the remainder), in which case you're > correct that truncation cannot be detected. But the function is fomally > defined as copying a prefix of a string into a buffer (and zero-filling the > remainder), in which case the string has been truncated if the buffer > doesn't end in a null byte afterward. It's just that one may not care about > the terminating null byte being truncated if the user of the result just > wants the initial character sequence. Yes, with the ISO C definition of strncpy(3), you can detect truncation. The problem is that while my definition of it is complete, the definition by ISO C makes it an incomplete function (to complete its functionallity in copying strings, you need to add an explicit '\0' after the call). So I prefer mine, and for self-consistency, it can't report truncation. > > > > I'm not aware of any alternative to a strncpy(3)-based snippet for > > > producing a possibly-truncated copy of a string, except for your preferred > > > strlcpy(3) or stpecpy(3), which aren't available to anyone without a > > > > The Linux kernel has strscpy(3), which is also good, but is not > > available to user space. > > > > > brand-new glibc (nor, by extension, any applications or libraries that want > > > > libbsd has provided strlcpy(3) since basically forever. It is a very > > portable library. You don't need a brand-new glibc for having > > strlcpy(3). > > > > <https://libbsd.freedesktop.org/wiki/> > > That's a nice library that I didn't know about! Unfortunately, I don't > think it's a very viable option for the long tail of small libraries I've > referred to, which generally don't have any sub-dependencies of their own, > apart from those provided by the platform. > > Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their > configure scripts from whatever build system you're using (in such a way > that libbsd can locate libmd), ensuring they're safe for cross-compilation > if that's a goal, ensuring you bundle them in a way that respects their > license terms, and ensuring that any user of your library links to the two > dependencies and doesn't duplicate them. At that point, rolling your own > strlcpy(3) equivalent definitely sounds like less mental load, at least to > me. Yes, if you had 0 deps, it might be simpler to add your implementation. Although it's a tricky function to implement, so I'd be careful. If you need to roll your own, I would go for a simpler function; maybe a wrapper over strlen(3)+strcpy(3). > > > > functions); snprintf(3), which has the insidious flaw of not supporting > > > more than INT_MAX characters on pain of UB, and also produces a warning if > > > the compiler notices the possible truncation; or strlen(3) + min() + > > > memcpy(3) + manually adding a null terminator, which is certainly more > > > explicit in its intent, and avoids strncpy(3)'s zero-filling behavior if > > > that poses a performance problem, but similarly opens up room for > > > off-by-one errors. > > > > More than the performance problem, I'm more worried about the > > maintainability of strncpy(3). When 20 years from now, a programmer > > reading a piece of code full of strncpy(3) wants to migrate to a sane > > function like strlcpy(3) or strcpy(3), the programmer needs to > > understand if the zeroing was purposeful or just accidental. Because > > by using strlcpy(3), it may start leaking some trailing data if the > > trailing of the buffer is meaningful to some program. > > I didn't see this as an issue in practice when I was reviewing all those > existing usages of strncpy(3). The vast majority were used in the midst of > simple string manipulation, where the destination buffer starts as > uninitialized or zeroed out, and ultimately gets passed into a user > expecting an ordinary null-terminated string. > > (One exception was a few functions that used strncpy(dst, "", len) to zero Holy crap! Didn't these programmers know bzero(3) or memset(3)? :D > out the buffer, which is thankfully pretty obvious. Another exception was > the functions that actually used strncpy(3) to produce a null-padded > character sequence, e.g., when writing a value into a section of a binary. > But in general, I found that it's usually not difficult to tell when a > usage is being clever enough that the null padding might be significant.) > > In fact, the greater confusion came from the surprisingly common practice > of using strncpy(3) like it's memcpy(3), by giving it the known length of It gets better! :D > the source string, or of some prefix computed through strchr(3) or similar. > This is often then followed up by strncat(3) or similar, indicating that > the writer clearly expects the full length to have non-null characters. But > if the length computation is separated far enough from the actual call to > strncpy(3), then it can become unclear whether the source is actually > expected to have any interior null bytes before the computed length. (So if > a list of alternatives to strncpy(3) is ever drawn up, then I'd suggest > that ordinary memcpy(3) be one of them.) string_copying(7) was initially devised as a page indicating alternatives to strncpy(3), depending on the purpose of the code. memcpy(3) is not mentioned (except in SEE ALSO), but mempcpy(3) is, which is essentially the same (but with a more useful return value). > > > For the sake of reference, I looked into a few big C and C++ projects to > > > see how often a strncpy(3)-based snippet was used to produce a truncated > > > copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite > > > of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in > > > GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22, > > > 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note > > > that I haven't filtered out vendored dependencies, so there's a little bit > > > of double-counting.) It seems like most codebases that don't ban strncpy(3) > > > use a derived snippet somewhere or another. Also, I found 3 instances in > > > glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by > > > checking the last character. > > > > I know. I've been rewriting the code handling strings in shadow-utils > > for the last year, and ther was a lot of it. I fixed several small bugs > > in the process, so I recommend avoiding it. > > I can't tell you about your own experience, but in mine, the root cause of > most string-handling bugs has been excessive cleverness in using the > standard string functions, rather than the behavior of the functions > themselves. So one worry of mine is that if strncpy(3) ends up being > deprecated or whatever, then authors of portable libraries will start > writing lots of custom memcpy(3)-based replacements to their strncpy(3)- > based snippets, and more lines of code will introduce more opportunities > for cleverness. Don't worry. strncpy(3) won't be deprecated, thanks to tar(1). ;) > > (This is also why I was confused by your support for strcpy(3) on the > grounds that _FORTIFY_SOURCE exists. Sure, it's better than strncpy(3) in > that its behavior isn't nearly so subtle, but _FORTIFY_SOURCE can only > protect us from overruns, not from all the "small bugs" that might ensue > from people becoming more clever with sizing the destination buffer with > strcpy(3). I don't think strcpy(3) is as propense as strncpy(3) to ask programmers to be clever about it. In the case of strncpy(3) it's due to it being an incomplete string-copying function. strcpy(3) is complete. > Also, if it were truly a panacea, then we'd hardly have to worry > about the problems of strncpy(3) at all, since it would detect any misuse > of the function.) Fortification detects overruns in writes, which is how it protects strcpy(3). However, fortification can't protect against overruns in reads, which is what strncpy(3) causes due to missing null terminators. strncpy(3) also causes off-by-one bugs (I'll detail below), which strcpy(3) doesn't (and strlcpy(3) doesn't either). > > Probably the only way to solve the cleverness issue for good is to have an > immediately-available, foolproof, performant set of string functions that > are extremely straightforward to understand and use, flexible enough for > any use case, and generally agreed to be the first choice for string > manipulation. > > Unfortunately, probably the closest match to those criteria, especially the > availability criterion, is snprintf(3), which has the flaws of using int > instead of size_t for most sizes, not being very performant, and not being > async-signal-safe. Alas, it will likely remain a dream, given all the wars > over which safer string functions have the best API. But at least > strlcpy(3) has a pretty sound interface, if other platforms ever get around > to including it by default. strlcpy(3) will be in POSIX.1-202x (Issue 8), so it's a matter of time that it'll be widespread. > > > > the code to understand the concept behind how these two snippets work, that > > > the only difference between the strncpy(3)'s special "character sequence" > > > and an ordinary C string is an additional null terminator at the end of the > > > destination buffer. > > > > This is part of string_copying(7): > > > > DESCRIPTION > > Terms (and abbreviations) > > string (str) > > is a sequence of zero or more non‐null characters followed by a > > null byte. > > > > character sequence > > is a sequence of zero or more non‐null characters. A program > > should never use a character sequence where a string is required. > > However, with appropriate care, a string can be used in the place > > of a character sequence. > > > > I think that is very explicit in the difference. strncpy(3) refers to > > that page for understanding the differences, so I think it is > > documented. > > > > strncpy(3): > > CAVEATS > > The name of these functions is confusing. These functions produce a > > null‐padded character sequence, not a string (see string_copying(7)). > > My point is isn't that the difference is undocumented, but that the typical > man page reader isn't reading the man pages for their own sake, but because > they're looking at some code, and they want to Know What It's Doing as soon > as possible. We could maybe add a list of ways people have tried to be clever with strncpy(3) in the past and failed, and then explain why those uses are broken. This could be in a BUGS section. > If they're getting directed around elsewhere with weird > warnings about "not a string" ("what's it going on about, I thought it was > null-padded?"), then I worry there's a good chance that they'll instead > bounce off the man page and try figuring it out some other way. And even if > they do follow the reference, then they might have difficulty understanding > the implications, since many people don't think of things in terms of > formal definitions. > > > > reasonable to highlight precisely why strncpy(3)'s output isn't a string > > > > How about this?: > > > > diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 > > index d4c2ce83d..c80c8b640 100644 > > --- a/man3/stpncpy.3 > > +++ b/man3/stpncpy.3 > > @@ -108,7 +108,10 @@ .SH HISTORY > > .SH CAVEATS > > The name of these functions is confusing. > > These functions produce a null-padded character sequence, > > -not a string (see > > +not a string. > > +While strings have a terminating NUL byte, > > +character sequences do not have any terminating byte > > +(see > > .BR string_copying (7)). > > .P > > It's impossible to distinguish truncation by the result of the call, > > Yes, I'd be perfectly happy with something like that. That way, the > scariness is far more immediate ("the output might not be terminated!?"), > and thus more accessible to the typical reader. Ok; I'll add that. > > > > (viz., the lack of a null terminator), instead of trying to insist that its > > > output is worlds apart from anything string-related, especially given the > > > volume of existing correct code that belies that notion. > > > > It is not correct code. That code is doing extra work which confuses > > maintainers. It is a lot like writing dead code, since you're writing > > zeros that nobody is reading, which confuses maintainers. > > I am really not a fan of conflating the notions of "code that is difficult > to maintain" with "code that doesn't perform the task it is intended to > perform". When I think about incorrect code, I think about things like > setenv(3) that are just waiting to cause trouble in popular libraries built > and deployed today. > > Meanwhile, "confusing maintainers" is a very subjective notion specific to > the both the code and the maintainers: if someone sees some code allocating > a fresh buffer, strncpy(3)ing a string into it, slapping a terminator on > the end, and finally passing the result into something clearly expecting a > string, then why would they be guaranteed to be sweating bullets over > whatever happened to rest of the fresh buffer? Especially given how > widespread the strncpy(3) + extra null terminator pattern already is. > > Instead, it's code making use of strncpy(3) in a particularly clever way > that I'd find confusing, and in those cases, I lie the blame squarely on > the cleverness rather than the function itself. I blame the definition of the function of ISO C. Why? Because by being an incomplete string-copying function, it forces the programmer to be clever about it. You can't just use strncpy(3) and that's all; you need to do something else, and then you do clever stuff, which ends up badly. > > > Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no, > > it's not correct code. It's rather dangerous code that just happens to > > not be vulnerable most of the time. > > So will all the custom strlen(3)+memcpy(3)-based replacements suddenly be > immune to off-by-one bugs? Slightly. Here's the typical use of strlen(3)+strcpy(3): if (strlen(src) >= dsize) goto error; strcpy(dst, src); There's no +1 or -1 in that code, so it's hard to make an off-by-one mistake. Okay, you may have seen that it has a '>=', which one could accidentally replace by a '>', causing an off-by-one. I'd wrap that thing in a strxcpy() wrapper so you avoid repetition. > Or will the vast majority of current strncpy(3) > users be willing to either restrict their platform support or add two extra > dependencies to their build process just to have strlcpy(3)? I'd hardly be > inclined to think that off-by-one bugs are a particular specialty of > strncpy(3). They are. Here's the typical use of strncpy(3) as a replacement: strncpy(dst, src, dsize); if (dst[dsize - 1] != '\0') goto error; dst[dsize - 1] = '\0'; There are many more moving parts, so more chances to make mistakes. And you see it forces the programmer to write explicitly -1 twice. I've seen code that forgets to do the -1, and also code that uses -1 in the strncpy(3) call (which makes it impossible to detect truncation). > > > > Or, to answer your question, "It's appropriate to keep using strncpy(3) in > > > existing code where it's currently used as part of creating a truncated > > > string, and it's not especially inappropriate to use strncpy(3) in new code > > > as part of creating a truncated string, if the code must support platforms > > > without strlcpy(3) or similar, and if the resulting snippets are few enough > > > and well-commented enough that they create less mental load than creating > > > and maintaining a custom helper function." > > > > strncpy(3) calls are never well documented. Do you add a comment in > > each such call saying "this zeroing is superfluous"? Probably not. > > By that standard, every call to a function that takes an output pointer and > returns the number of elements written (say, readlink(2)) would need a > comment saying "the remaining elements in this array now have undefined > values". No, because it does precisely what is intended. It is when you add dead code when you need to justify it. > I don't think it's controversial that in many situations, we > tacitly understand that we simply don't care about the remainder of a While the analysis isn't very hard, it takes some time, examining all surrounding code to make sure nothing cares about the trailing bytes. When you have a hundred such calls, you need to make sure nobody was too clever around any of them. > buffer after a certain point. In the case of producing a string, that point > is going to be the null terminator, in the absence of on-site documentation > to the contrary; I'd label anything else as overly clever. But again, strncpy(3) forces you to be clever. > > Meanwhile, "never" would be a strong word to describe the rate that > strncpy(3)'s lack of null termination is documented at the call site; 30 of > the 339 call sites I mentioned have an associated comment regarding null Hmm, I should have said rarely. Cheers, Alex > termination. (ICU seems to be the best library comment-wise, but even it > doesn't place them consistently.) It's obviously far from routine in > existing code, but it's not something that never happens. > > Thank you, > Matthew House -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 12:23 ` Alejandro Colomar @ 2023-11-09 12:35 ` Alejandro Colomar 2023-11-10 7:06 ` Oskari Pirhonen 2023-11-10 16:06 ` Matthew House 2 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-09 12:35 UTC (permalink / raw) To: Matthew House; +Cc: Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 2310 bytes --] Hi Matthew, On Thu, Nov 09, 2023 at 01:23:14PM +0100, Alejandro Colomar wrote: > > > > reasonable to highlight precisely why strncpy(3)'s output isn't a string > > > > > > How about this?: > > > > > > diff --git a/man3/stpncpy.3 b/man3/stpncpy.3 > > > index d4c2ce83d..c80c8b640 100644 > > > --- a/man3/stpncpy.3 > > > +++ b/man3/stpncpy.3 > > > @@ -108,7 +108,10 @@ .SH HISTORY > > > .SH CAVEATS > > > The name of these functions is confusing. > > > These functions produce a null-padded character sequence, > > > -not a string (see > > > +not a string. > > > +While strings have a terminating NUL byte, > > > +character sequences do not have any terminating byte > > > +(see > > > .BR string_copying (7)). > > > .P > > > It's impossible to distinguish truncation by the result of the call, > > > > Yes, I'd be perfectly happy with something like that. That way, the > > scariness is far more immediate ("the output might not be terminated!?"), > > and thus more accessible to the typical reader. > > Ok; I'll add that. I think DJ's suggestion of providing an example shows this without needing a wordy explanation: <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=f502d3290c9f6f13f870cc041f553073af434949> Here's the page now: CAVEATS The name of these functions is confusing. These functions pro‐ duce a null‐padded character sequence, not a string (see string_copying(7)). For example: strncpy(buf, "1", 5); // { '1', 0, 0, 0, 0 } strncpy(buf, "1234", 5); // { '1', '2', '3', '4', 0 } strncpy(buf, "12345", 5); // { '1', '2', '3', '4', '5' } strncpy(buf, "123456", 5); // { '1', '2', '3', '4', '5' } It’s impossible to distinguish truncation by the result of the call, from a character sequence that just fits the destination buffer; truncation should be detected by comparing the length of the input string with the size of the destination buffer. I think this is quite clear regarding what this functions does and doesn't. I'll leave it like that, I think. Cheers, Alex > -- > <https://www.alejandro-colomar.es/> -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 12:23 ` Alejandro Colomar 2023-11-09 12:35 ` Alejandro Colomar @ 2023-11-10 7:06 ` Oskari Pirhonen 2023-11-10 11:18 ` Alejandro Colomar 2023-11-10 16:06 ` Matthew House 2 siblings, 1 reply; 138+ messages in thread From: Oskari Pirhonen @ 2023-11-10 7:06 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Matthew House, Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 5095 bytes --] On Thu, Nov 09, 2023 at 13:23:14 +0100, Alejandro Colomar wrote: ... snip ... > > > > For the sake of reference, I looked into a few big C and C++ projects to > > > > see how often a strncpy(3)-based snippet was used to produce a truncated > > > > copy. I found 18 instances in glibc 2.38, 2 in util-linux 2.39.2 (in spite > > > > of its custom xstrncpy() function), 61 in GNU binutils 2.41, 43 in > > > > GDB 13.2, 1 in LLVM 17.0.4, 7 in CPython 3.12.0, 99 in OpenJDK 22+22, > > > > 10 in .NET Runtime 7.0.13, 3 in V8 12.1.82, and 86 in Firefox 120.0. (Note > > > > that I haven't filtered out vendored dependencies, so there's a little bit > > > > of double-counting.) It seems like most codebases that don't ban strncpy(3) > > > > use a derived snippet somewhere or another. Also, I found 3 instances in > > > > glibc 2.38 and 5 instances in Firefox 120.0 of detecting truncation by > > > > checking the last character. > > > > > > I know. I've been rewriting the code handling strings in shadow-utils > > > for the last year, and ther was a lot of it. I fixed several small bugs > > > in the process, so I recommend avoiding it. > > > > I can't tell you about your own experience, but in mine, the root cause of > > most string-handling bugs has been excessive cleverness in using the > > standard string functions, rather than the behavior of the functions > > themselves. So one worry of mine is that if strncpy(3) ends up being > > deprecated or whatever, then authors of portable libraries will start > > writing lots of custom memcpy(3)-based replacements to their strncpy(3)- > > based snippets, and more lines of code will introduce more opportunities > > for cleverness. > > Don't worry. strncpy(3) won't be deprecated, thanks to tar(1). ;) > Just please don't tar and feather [1] the people who use it ;) ... snip ... > > > > the code to understand the concept behind how these two snippets work, that > > > > the only difference between the strncpy(3)'s special "character sequence" > > > > and an ordinary C string is an additional null terminator at the end of the > > > > destination buffer. > > > > > > This is part of string_copying(7): > > > > > > DESCRIPTION > > > Terms (and abbreviations) > > > string (str) > > > is a sequence of zero or more non‐null characters followed by a > > > null byte. > > > > > > character sequence > > > is a sequence of zero or more non‐null characters. A program > > > should never use a character sequence where a string is required. > > > However, with appropriate care, a string can be used in the place > > > of a character sequence. > > > > > > I think that is very explicit in the difference. strncpy(3) refers to > > > that page for understanding the differences, so I think it is > > > documented. > > > > > > strncpy(3): > > > CAVEATS > > > The name of these functions is confusing. These functions produce a > > > null‐padded character sequence, not a string (see string_copying(7)). > > > > My point is isn't that the difference is undocumented, but that the typical > > man page reader isn't reading the man pages for their own sake, but because > > they're looking at some code, and they want to Know What It's Doing as soon > > as possible. > > We could maybe add a list of ways people have tried to be clever with > strncpy(3) in the past and failed, and then explain why those uses are > broken. This could be in a BUGS section. > This would be a very fun read. ... snip ... > > > Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no, > > > it's not correct code. It's rather dangerous code that just happens to > > > not be vulnerable most of the time. > > > > So will all the custom strlen(3)+memcpy(3)-based replacements suddenly be > > immune to off-by-one bugs? > > Slightly. Here's the typical use of strlen(3)+strcpy(3): > > if (strlen(src) >= dsize) > goto error; > strcpy(dst, src); > > There's no +1 or -1 in that code, so it's hard to make an off-by-one > mistake. Okay, you may have seen that it has a '>=', which one could > accidentally replace by a '>', causing an off-by-one. I'd wrap that > thing in a strxcpy() wrapper so you avoid repetition. > Might I go so far as to recommend strnlen(3) instead of strlen(3)? That way, instead of blindly looking for a null terminator, you stop after a predetermined max length. Especially nice for untrusted input where you can't make assumptions on the "fitness for a purpose" of what's being fed in. if (src == NULL || strnlen(src, dsize) == dsize) goto error; strcpy(dst, src); This, of course, assumes you have POSIX at your disposal. I'm writing this before going to bed. I did briefly sanity check it with a simple test prog, but it would be quite ironic if I missed something wouldn't it... - Oskari [1]: https://en.wikipedia.org/wiki/Tarring_and_feathering [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 7:06 ` Oskari Pirhonen @ 2023-11-10 11:18 ` Alejandro Colomar 2023-11-11 7:55 ` Oskari Pirhonen 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-10 11:18 UTC (permalink / raw) To: Matthew House, Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 2970 bytes --] Hi Oskari, On Fri, Nov 10, 2023 at 01:06:44AM -0600, Oskari Pirhonen wrote: > On Thu, Nov 09, 2023 at 13:23:14 +0100, Alejandro Colomar wrote: > > Don't worry. strncpy(3) won't be deprecated, thanks to tar(1). ;) > > > > Just please don't tar and feather [1] the people who use it ;) Hmmm, it just caught me after a year fixing broken strncpy(3) calls. I was a bit unfair. I'm sorry if I wasn't so nice. Hopefully, we've all learnt something about string-copying functions. :) > > We could maybe add a list of ways people have tried to be clever with > > strncpy(3) in the past and failed, and then explain why those uses are > > broken. This could be in a BUGS section. > > > > This would be a very fun read. I'll write it then! :D > > ... snip ... > > > > > Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no, > > > > it's not correct code. It's rather dangerous code that just happens to > > > > not be vulnerable most of the time. > > > > > > So will all the custom strlen(3)+memcpy(3)-based replacements suddenly be > > > immune to off-by-one bugs? > > > > Slightly. Here's the typical use of strlen(3)+strcpy(3): > > > > if (strlen(src) >= dsize) > > goto error; > > strcpy(dst, src); > > > > There's no +1 or -1 in that code, so it's hard to make an off-by-one > > mistake. Okay, you may have seen that it has a '>=', which one could > > accidentally replace by a '>', causing an off-by-one. I'd wrap that > > thing in a strxcpy() wrapper so you avoid repetition. > > > > Might I go so far as to recommend strnlen(3) instead of strlen(3)? That > way, instead of blindly looking for a null terminator, you stop after a > predetermined max length. Especially nice for untrusted input where you > can't make assumptions on the "fitness for a purpose" of what's being > fed in. > > if (src == NULL || strnlen(src, dsize) == dsize) > goto error; > strcpy(dst, src); A NULL check shouldn't be necessary (no other copying functions have, and that's not a big deal with them, although I have mixed feelings about things like memcpy(dst, NULL, 0)). About strnlen(3), you're right, and Paul also pointed that out. See the other mail I sent to the list with an inline implementation of strxcpy() using strnlen(3). > > This, of course, assumes you have POSIX at your disposal. I always assume this. If not, please ask your vendor to provide a POSIX layer. Or at least the parts of POSIX that can be implemented in a free-standing implementation. Or stop using that vendor. > > I'm writing this before going to bed. I did briefly sanity check it with > a simple test prog, but it would be quite ironic if I missed something > wouldn't it... Looks good at first glance. :) Cheers, Alex > > - Oskari > > [1]: https://en.wikipedia.org/wiki/Tarring_and_feathering -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 11:18 ` Alejandro Colomar @ 2023-11-11 7:55 ` Oskari Pirhonen 0 siblings, 0 replies; 138+ messages in thread From: Oskari Pirhonen @ 2023-11-11 7:55 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Matthew House, Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 3350 bytes --] On Fri, Nov 10, 2023 at 12:18:56 +0100, Alejandro Colomar wrote: > Hi Oskari, > > On Fri, Nov 10, 2023 at 01:06:44AM -0600, Oskari Pirhonen wrote: > > On Thu, Nov 09, 2023 at 13:23:14 +0100, Alejandro Colomar wrote: > > > Don't worry. strncpy(3) won't be deprecated, thanks to tar(1). ;) > > > > > > > Just please don't tar and feather [1] the people who use it ;) > > Hmmm, it just caught me after a year fixing broken strncpy(3) calls. I > was a bit unfair. I'm sorry if I wasn't so nice. Hopefully, we've all > learnt something about string-copying functions. :) > Indeed we have. This whole thread became much more informative than I could've anticipated. And we also got a better wording for strncpy(3) too :) > > > We could maybe add a list of ways people have tried to be clever with > > > strncpy(3) in the past and failed, and then explain why those uses are > > > broken. This could be in a BUGS section. > > > > > > > This would be a very fun read. > > I'll write it then! :D > > > > > ... snip ... > > > > > > > Also, I've seen a lot of off-by-one bugs in calls to strncpy(3), so no, > > > > > it's not correct code. It's rather dangerous code that just happens to > > > > > not be vulnerable most of the time. > > > > > > > > So will all the custom strlen(3)+memcpy(3)-based replacements suddenly be > > > > immune to off-by-one bugs? > > > > > > Slightly. Here's the typical use of strlen(3)+strcpy(3): > > > > > > if (strlen(src) >= dsize) > > > goto error; > > > strcpy(dst, src); > > > > > > There's no +1 or -1 in that code, so it's hard to make an off-by-one > > > mistake. Okay, you may have seen that it has a '>=', which one could > > > accidentally replace by a '>', causing an off-by-one. I'd wrap that > > > thing in a strxcpy() wrapper so you avoid repetition. > > > > > > > Might I go so far as to recommend strnlen(3) instead of strlen(3)? That > > way, instead of blindly looking for a null terminator, you stop after a > > predetermined max length. Especially nice for untrusted input where you > > can't make assumptions on the "fitness for a purpose" of what's being > > fed in. > > > > if (src == NULL || strnlen(src, dsize) == dsize) > > goto error; > > strcpy(dst, src); > > A NULL check shouldn't be necessary (no other copying functions have, > and that's not a big deal with them, although I have mixed feelings > about things like memcpy(dst, NULL, 0)). > > About strnlen(3), you're right, and Paul also pointed that out. See the > other mail I sent to the list with an inline implementation of strxcpy() > using strnlen(3). > Yep. I saw it just before replying to this message. > > > > This, of course, assumes you have POSIX at your disposal. > > I always assume this. If not, please ask your vendor to provide a POSIX > layer. Or at least the parts of POSIX that can be implemented in a > free-standing implementation. Or stop using that vendor. > > > > > I'm writing this before going to bed. I did briefly sanity check it with > > a simple test prog, but it would be quite ironic if I missed something > > wouldn't it... > > Looks good at first glance. :) > Dev 1: It passes all tests. Dev 2: Ship it. Users: *proceed to break it anyway* - Oskari [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-09 12:23 ` Alejandro Colomar 2023-11-09 12:35 ` Alejandro Colomar 2023-11-10 7:06 ` Oskari Pirhonen @ 2023-11-10 16:06 ` Matthew House 2023-11-10 17:48 ` Alejandro Colomar 2023-11-11 20:55 ` Jonny Grant 2 siblings, 2 replies; 138+ messages in thread From: Matthew House @ 2023-11-10 16:06 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Jonny Grant, linux-man On Thu, Nov 9, 2023 at 7:23 AM Alejandro Colomar <alx@kernel.org> wrote: > > So one can interpret strncpy(3) as copying a prefix of a character sequence > > into a buffer (and zero-filling the remainder), in which case you're > > correct that truncation cannot be detected. But the function is fomally > > defined as copying a prefix of a string into a buffer (and zero-filling the > > remainder), in which case the string has been truncated if the buffer > > doesn't end in a null byte afterward. It's just that one may not care about > > the terminating null byte being truncated if the user of the result just > > wants the initial character sequence. > > Yes, with the ISO C definition of strncpy(3), you can detect truncation. > The problem is that while my definition of it is complete, the > definition by ISO C makes it an incomplete function (to complete its > functionallity in copying strings, you need to add an explicit '\0' > after the call). So I prefer mine, and for self-consistency, it can't > report truncation. Personally, I'm a pragmatist, and I like to see it as kind of a duality: it can be used as part of a routine that copies part of a string and reports truncation, and it can also be used as a complete routine that copies part of a character sequence but can't report truncation. That reflects how it's used in practice. And it would hardly be the first such duality in C, either, given things like the fundamental practice of manipulating arbitrary objects as if they're character arrays. (Some of these other dualities are similarly infamous in their room for error, e.g., forgetting to multiply by the element size when calling malloc(3), which I have often been guilty of myself. And still, a worrying amount of code neglects to test for multiplication overflow when doing this, even when the length comes from an untrusted source. Yet somehow I haven't seen any calls for a mallocarray(3) function to replace it. Ditto with memset(3), which can and has caused actual hard-to-notice bugs due to the first few elements looking correct even if the provided length is too short.) But you're entitled to your opinion on how it ought to be best represented in the man page, as long as the immediate shortcoming of the function w.r.t producing strings is made very clear, even to readers who aren't in the habit of contemplating formal definitions. I'm satisfied by your patch in that regard. > > That's a nice library that I didn't know about! Unfortunately, I don't > > think it's a very viable option for the long tail of small libraries I've > > referred to, which generally don't have any sub-dependencies of their own, > > apart from those provided by the platform. > > > > Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their > > configure scripts from whatever build system you're using (in such a way > > that libbsd can locate libmd), ensuring they're safe for cross-compilation > > if that's a goal, ensuring you bundle them in a way that respects their > > license terms, and ensuring that any user of your library links to the two > > dependencies and doesn't duplicate them. At that point, rolling your own > > strlcpy(3) equivalent definitely sounds like less mental load, at least to > > me. > > Yes, if you had 0 deps, it might be simpler to add your implementation. > Although it's a tricky function to implement, so I'd be careful. If you > need to roll your own, I would go for a simpler function; maybe a > wrapper over strlen(3)+strcpy(3). Such a wrapper would indeed be useful for detecting truncation, but a full strlcpy(3) equivalent would be necessary for permitting the truncation and continuing, which is the behavior of the majority of existing strncpy(3)- based code. I don't deny that this truncation behavior is often done dubiously and rarely receives enough scrutiny, but a significant chunk of the uses really are just building an informative string which won't cause any harm if truncated, and installing additional control flow to handle truncation errors in places where there currently isn't any can introduce its own bugs. > > I didn't see this as an issue in practice when I was reviewing all those > > existing usages of strncpy(3). The vast majority were used in the midst of > > simple string manipulation, where the destination buffer starts as > > uninitialized or zeroed out, and ultimately gets passed into a user > > expecting an ordinary null-terminated string. > > > > (One exception was a few functions that used strncpy(dst, "", len) to zero > > Holy crap! Didn't these programmers know bzero(3) or memset(3)? :D > > > out the buffer, which is thankfully pretty obvious. Another exception was > > the functions that actually used strncpy(3) to produce a null-padded > > character sequence, e.g., when writing a value into a section of a binary. > > But in general, I found that it's usually not difficult to tell when a > > usage is being clever enough that the null padding might be significant.) > > > > In fact, the greater confusion came from the surprisingly common practice > > of using strncpy(3) like it's memcpy(3), by giving it the known length of > > It gets better! :D In all these cases, I think the function naming really is having somewhat of a psychological effect: the authors are wrangling with strthis(3) and strthat(3) for dozens of lines, so they'd find it scary to start mixing it up with mem*(3) functions ("I'm working with C strings, not with byte arrays!"), or perhaps they don't even consider it. They'd rather remain with strncpy(3), even when it means they have to manually append it with a null terminator or another string. But I'm no psychoanalyst, so take that with a big grain of salt. (Meanwhile, in my own code, I try to work with pointer-and-length arrays whenever possible instead of fooling around with null terminators and all their off-by-one fun, so I've become leery of using any str*(3) functions apart from strlen(3) and strnlen(3).) > > (This is also why I was confused by your support for strcpy(3) on the > > grounds that _FORTIFY_SOURCE exists. Sure, it's better than strncpy(3) in > > that its behavior isn't nearly so subtle, but _FORTIFY_SOURCE can only > > protect us from overruns, not from all the "small bugs" that might ensue > > from people becoming more clever with sizing the destination buffer with > > strcpy(3). > > I don't think strcpy(3) is as propense as strncpy(3) to ask programmers > to be clever about it. In the case of strncpy(3) it's due to it being > an incomplete string-copying function. strcpy(3) is complete. > > > Also, if it were truly a panacea, then we'd hardly have to worry > > about the problems of strncpy(3) at all, since it would detect any misuse > > of the function.) > > Fortification detects overruns in writes, which is how it protects > strcpy(3). However, fortification can't protect against overruns in > reads, which is what strncpy(3) causes due to missing null terminators. > strncpy(3) also causes off-by-one bugs (I'll detail below), which > strcpy(3) doesn't (and strlcpy(3) doesn't either). Ah, thank you, I wasn't aware of that limitation in _FORTIFY_SOURCE. But I think my notion of problematic cleverness is somewhat different than yours. When I think of code being excessively clever, I specifically think of places where it relies on a certain property of the program state, but it's unclear how that property is upheld at that point in the program. This cleverness primarily appears in two different forms, in my experience. In one form, snippet A is immediately followed by snippet B, but B depends on some non-obvious property set up by A, and the code has no comments or other documentation to this effect. In the other (more common) form, snippet A sets up an obvious property that snippet B depends on, but the two snippets are miles apart in the code, and it's difficult to see the connection between the two. (The latter can be exacerbated by intervening control flow.) In this sense, cleverness is mostly orthogonal to the 'completeness' of a particular function interface. A non-clever use of strncpy(3) would be calling it and then immediately appending or testing for a null terminator; then, we have two lines forming a functionally complete whole. A clever use of strncpy(3) (of the second form) would be setting or testing the null terminator way earlier or way later in the code, both of which were unfortunately frequent in my review, though still a minority of uses. Another clever use, of the first form, would be appending a null terminator, using the output in a way that looks like we just want a string, but then secretly depending on the buffer being null-padded to the full length. This seems to be a particular concern of yours, but in practice, I haven't been able to find a single instance of this, except possibly in GNU binutils which already clearly exudes evil from every line. On the other hand, I also see strcpy(3) as no less prone to overly clever usage, despite being 'complete' in its own definition. The problem is that it's generally not a complete operation in the context of its typical use cases, which only have a finite destination buffer and need to ensure that the entire source string will fit. The author has a choice to make in deciding how to make this guarantee, and some of these choices can be arbitrarily clever. In particular, since the author doesn't strictly need to know the exact size of the source string or destination buffer at the time they call the function, they can make those sizes as nebulous and indirect as possible. For example, a non-clever use of strcpy(3) would be immediately preceding it by either an "if (strlen(src) >= dsize)" check, or an allocation of strlen(src) + 1 bytes, which I think we both agree is the ideal scenario; the code makes the guarantee and then immediately acts on it. But a clever use would be exporting this length check to all the function's callers, or only calling strlen(3) on some precursor(s) of the source string and then deriving its full length with a tricky and error-prone formula, or simply not testing the length of the source string at all, but sizing the destination buffer based on the general vibes of the interface. In fact, we can once again look at how code abuses strcpy(3) in practice: - Of sizing the destination buffer in some far-off corner of the file, I found 4 instances in GNU binutils. Similarly, of sizing the source string in a far-off corner and not checking it, I found 6 instances in llvm-nm. - Of sizing the destination buffer with an involved calculation and then trusting the result, I found 15 instances in GNU binutils, 1 in GDB, 1 in CPython, 3 in Firefox, and 4 in .NET Runtime. - Of accepting an arbitrary destination buffer size without clearly bounding it below by the source string's length, I found 24 instances in GNU binutils; I believe at least 2 can cause UB with certain configurations and inputs. (I gave up trying to enumerate these in the other codebases, since it's generally not clear at all whether a minimum size is understood to be implied by the interface.) - Of not checking the source string's length nor otherwise clearly bounding it above, I found 37 instances in GNU binutils, 3 in CPython, 14 in Firefox, 3 in .NET Runtime, and 6 in OpenJDK; I believe at least 19 can cause UB. - Of obvious off-by-one errors that will trivially result in UB, I found 2 instances in GNU binutils, 6 in CPython, 3 in Firefox, and 1 in OpenJDK. - Finally, of a non-obvious but critical side effect (i.e., unintentionally clever code of the first form), I found just 1 instance in Firefox, where a certain error branch just happens to be reachable only when the buffer is large enough for the error message to fit. And these aren't even counting its cousins strcat(3) and sprintf(3)! So I hope you'll forgive me if I have a hard time believing that authors are less likely to be overly clever with strcpy(3) than with strncpy(3), purely on account of the former's interface being more 'complete'. > > Probably the only way to solve the cleverness issue for good is to have an > > immediately-available, foolproof, performant set of string functions that > > are extremely straightforward to understand and use, flexible enough for > > any use case, and generally agreed to be the first choice for string > > manipulation. > > > > Unfortunately, probably the closest match to those criteria, especially the > > availability criterion, is snprintf(3), which has the flaws of using int > > instead of size_t for most sizes, not being very performant, and not being > > async-signal-safe. Alas, it will likely remain a dream, given all the wars > > over which safer string functions have the best API. But at least > > strlcpy(3) has a pretty sound interface, if other platforms ever get around > > to including it by default. > > strlcpy(3) will be in POSIX.1-202x (Issue 8), so it's a matter of time > that it'll be widespread. I noticed that, but I've always been a pessimist regarding the timelines of cool new things being rolled out. It will take some months to years before Issue 8 is released, months to years for all the relevant platforms to get the memo and implement it, many years for the knowledge to trickle down to the everyday library authors, and many more years for old versions of platforms to reach the end of their support periods. And I don't want to be one of those people advertising stuff that's perpetually 'just around the corner'. (For that matter, I wonder how many decades it will be before I see widespread use of posix_close(2) in a serious codebase, if ever.) > > My point is isn't that the difference is undocumented, but that the typical > > man page reader isn't reading the man pages for their own sake, but because > > they're looking at some code, and they want to Know What It's Doing as soon > > as possible. > > We could maybe add a list of ways people have tried to be clever with > strncpy(3) in the past and failed, and then explain why those uses are > broken. This could be in a BUGS section. I'd be interested in your experiences of people "trying to be clever" per your perspective; as I mentioned, in my earlier review of actual strncpy(3) usage, the only cleverness that occurs in non-negligible amounts has been either in the midst of using it in its 'intended' role for producing a null-padded character sequence (I'm referring to binutils here), or messing around with which part of the code is responsible for appending the terminator. > > Instead, it's code making use of strncpy(3) in a particularly clever way > > that I'd find confusing, and in those cases, I lie the blame squarely on > > the cleverness rather than the function itself. > > I blame the definition of the function of ISO C. Why? Because by being > an incomplete string-copying function, it forces the programmer to be > clever about it. You can't just use strncpy(3) and that's all; you need > to do something else, and then you do clever stuff, which ends up badly. It forces the programmer to perform an extra step, but it doesn't force the programmer to be clever in performing that extra step. As I have described above, strcpy(3) also needs an extra step that the programmer can be inordinately clever with, regardless of being a complete string-copying function. So I don't see strncpy(3) as being uniquely evil here. > > So will all the custom strlen(3)+memcpy(3)-based replacements suddenly be > > immune to off-by-one bugs? > > Slightly. Here's the typical use of strlen(3)+strcpy(3): > > if (strlen(src) >= dsize) > goto error; > strcpy(dst, src); > > There's no +1 or -1 in that code, so it's hard to make an off-by-one > mistake. Okay, you may have seen that it has a '>=', which one could > accidentally replace by a '>', causing an off-by-one. I'd wrap that > thing in a strxcpy() wrapper so you avoid repetition. As I learned, the typical use of strcpy(3) (at least 80% of uses in my estimation) is actually copying a string into a new buffer, not an existing buffer. And that does need a +1 to calculate a size to pass to the allocation function, and usually a lot more +s if it's going to be concatenating further strings. (Did you know that it's not an uncommon practice to use "char value[1];" for a variable-length string at the end of a struct, then depend on that 1 byte being included in the size of the struct when allocating it?) Meanwhile, code does manage to make that off-by- one error between >= and > in practice regardless. Relatedly, as I also learned from all the manual strdup(3)-like snippets that use a custom allocator, the typical library author is deathly allergic to writing a custom wrapper over anything that isn't an allocation function; they'll repeat the entirety of the logic inline as many times as it takes. So I don't buy that most people would be replacing numerous calls to strncpy(3) with calls to a unified wrapper function that can be inspected and fixed all in one place, as you seem to suggest in your later email. > > Or will the vast majority of current strncpy(3) > > users be willing to either restrict their platform support or add two extra > > dependencies to their build process just to have strlcpy(3)? I'd hardly be > > inclined to think that off-by-one bugs are a particular specialty of > > strncpy(3). > > They are. Here's the typical use of strncpy(3) as a replacement: > > strncpy(dst, src, dsize); > if (dst[dsize - 1] != '\0') > goto error; > dst[dsize - 1] = '\0'; > > There are many more moving parts, so more chances to make mistakes. > And you see it forces the programmer to write explicitly -1 twice. I've > seen code that forgets to do the -1, and also code that uses -1 in the > strncpy(3) call (which makes it impossible to detect truncation). That "dst[dsize - 1] = '\0';" line is extraneous, and none of the existing truncation-detecting uses of strncpy(3) I saw have its equivalent; after all, we just checked that character with the if statement, there's no need to set it again. Without that line, there are only two lines of logic, and a single -1, matching the single +1 needed by the typical use of strcpy(3). Also, the typical use of strncpy(3) by far is to allow a truncated string rather than raising an error on truncation, and in that use case, it makes no difference whether or not the size inside the strncpy(3) call has a -1. The memcpy(3) replacement for truncation needs an additional min() ternary or macro, and it still needs a manual null terminator that can have the exact same off-by-one error. > > By that standard, every call to a function that takes an output pointer and > > returns the number of elements written (say, readlink(2)) would need a > > comment saying "the remaining elements in this array now have undefined > > values". > > No, because it does precisely what is intended. It is when you add dead > code when you need to justify it. Again, that seems like an odd standard to apply only to strncpy(3)'s destination buffer. For instance, suppose that an API accepts an input struct with optional fields. It's a common practice to zero out every field with memset(3) or = {0}, then fill in the input fields that are actually used, regardless of whether the API is specified as actively ignoring the remaining fields. Certainly, it can be quite a task to figure out whether the fields are actually read, if the API is poorly specified; without going through its entire implementation, any of those "unused" fields could be copied around or compared before being discarded, making it dangerous to leave them uninitialized. But need we add a comment to every one of those memset(3) calls, "I'm unsure whether this zeroing is significant at all"? Perhaps such a comment might be helpful, if there really is reason to suspect that the API is nefarious, but I've hardly ever seen stuff like that in practice. (Or, for a silly reductio ad absurdum, if some code calls malloc(3), then continues with some cleanup functions if it returns NULL, then would that code have to justify why malloc(3) set an errno value that seemingly never gets read? Those cleanup functions could be doing something clever by reading errno on entry, after all!) > > I don't think it's controversial that in many situations, we > > tacitly understand that we simply don't care about the remainder of a > > While the analysis isn't very hard, it takes some time, examining all > surrounding code to make sure nothing cares about the trailing bytes. > When you have a hundred such calls, you need to make sure nobody was too > clever around any of them. Sure, there's a hypothetical concern that some later consumer might notice the zeroing and act on it. But strncpy(3) is hardly the only thing in the typical codebase that produces an unnecessarily-zeroed buffer. Authors often use calloc(3) or memset(3) for peace of mind and no other purpose, or, especially in C++, zero out any local buffers in a class constructor to avoid the specter of uninitialized memory. And of course, lots of code repeatedly reuses the same buffer for different strings, handing out pointers to it, and callers could just as easily leak the left-over data after the null terminator. Verifying that an alleged string buffer truly is only used as a string is just a fact of life when refactoring unfamiliar code in C. > > buffer after a certain point. In the case of producing a string, that point > > is going to be the null terminator, in the absence of on-site documentation > > to the contrary; I'd label anything else as overly clever. > > But again, strncpy(3) forces you to be clever. If forces you to do extra work, the same way strcpy(3) forces you to do extra work. And it allows you to be clever, the same way strcpy(3) allows you to be clever. But at least it bounds the extent of your cleverness in that it forces you to remember the size of your destination buffer. I'd much rather review a hundred typical calls to strncpy(3) than a hundred typical calls to strcpy(3) any day of the week. Thank you, Matthew House ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 16:06 ` Matthew House @ 2023-11-10 17:48 ` Alejandro Colomar 2023-11-13 15:01 ` Matthew House 2023-11-11 20:55 ` Jonny Grant 1 sibling, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-10 17:48 UTC (permalink / raw) To: Matthew House; +Cc: Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 27929 bytes --] Hi Matthew, On Fri, Nov 10, 2023 at 11:06:00AM -0500, Matthew House wrote: > On Thu, Nov 9, 2023 at 7:23 AM Alejandro Colomar <alx@kernel.org> wrote: > > > So one can interpret strncpy(3) as copying a prefix of a character sequence > > > into a buffer (and zero-filling the remainder), in which case you're > > > correct that truncation cannot be detected. But the function is fomally > > > defined as copying a prefix of a string into a buffer (and zero-filling the > > > remainder), in which case the string has been truncated if the buffer > > > doesn't end in a null byte afterward. It's just that one may not care about > > > the terminating null byte being truncated if the user of the result just > > > wants the initial character sequence. > > > > Yes, with the ISO C definition of strncpy(3), you can detect truncation. > > The problem is that while my definition of it is complete, the > > definition by ISO C makes it an incomplete function (to complete its > > functionallity in copying strings, you need to add an explicit '\0' > > after the call). So I prefer mine, and for self-consistency, it can't > > report truncation. > > Personally, I'm a pragmatist, and I like to see it as kind of a duality: it > can be used as part of a routine that copies part of a string and reports > truncation, and it can also be used as a complete routine that copies part > of a character sequence but can't report truncation. That reflects how it's > used in practice. And it would hardly be the first such duality in C, > either, given things like the fundamental practice of manipulating > arbitrary objects as if they're character arrays. > > (Some of these other dualities are similarly infamous in their room for > error, e.g., forgetting to multiply by the element size when calling > malloc(3), which I have often been guilty of myself. And still, a worrying > amount of code neglects to test for multiplication overflow when doing > this, even when the length comes from an untrusted source. Yet somehow I > haven't seen any calls for a mallocarray(3) function to replace it. Ditto Funnily enough, I have, often. Here's something I wrote about the malloc(3) family recently: <https://software.codidact.com/posts/285898/288023#answer-288023> Pretty early in that text I recommend writing your own mallocarray(3), even if libc doesn't provide it. In shadow-utils, I replaced all of the allocation calls by safer wrappers: macros that make it really hard to make mistakes, which themselves wrap *array() functions, that wrap malloc(3) basic functions. <https://github.com/shadow-maint/shadow/blob/master/lib/alloc.h> I'll fight that battle when I'm done with str*() ones. ;) > with memset(3), which can and has caused actual hard-to-notice bugs due to > the first few elements looking correct even if the provided length is too > short.) Heh, and my other one battle for standardizing bzero(3) again. You're perfectly right in that memset(3) is dangerous (well, compilers have improved in their warnings, and nowadays it isn't so bad, but still unnecessary risk). I am of the opinion that you should only use bzero(3) unless you really want to set the bytes to something else. That something else is usually UINT8_MAX (and that's already rare), and seldom something else. glibc developers reading this might recall my suggestions to reinstate bzero(3) in its right. Such is my preference to this function, that I removed some deprecation messages about it from the manual, relegating it to the minimum necessary to document in HISTORY that POSIX did remove it. > > But you're entitled to your opinion on how it ought to be best represented > in the man page, as long as the immediate shortcoming of the function w.r.t > producing strings is made very clear, even to readers who aren't in the > habit of contemplating formal definitions. I'm satisfied by your patch in > that regard. Thanks. :) > > > > That's a nice library that I didn't know about! Unfortunately, I don't > > > think it's a very viable option for the long tail of small libraries I've > > > referred to, which generally don't have any sub-dependencies of their own, > > > apart from those provided by the platform. > > > > > > Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their > > > configure scripts from whatever build system you're using (in such a way > > > that libbsd can locate libmd), ensuring they're safe for cross-compilation > > > if that's a goal, ensuring you bundle them in a way that respects their > > > license terms, and ensuring that any user of your library links to the two > > > dependencies and doesn't duplicate them. At that point, rolling your own > > > strlcpy(3) equivalent definitely sounds like less mental load, at least to > > > me. > > > > Yes, if you had 0 deps, it might be simpler to add your implementation. > > Although it's a tricky function to implement, so I'd be careful. If you > > need to roll your own, I would go for a simpler function; maybe a > > wrapper over strlen(3)+strcpy(3). > > Such a wrapper would indeed be useful for detecting truncation, but a full > strlcpy(3) equivalent would be necessary for permitting the truncation and > continuing, which is the behavior of the majority of existing strncpy(3)- > based code. Yes, in string_copying(3) I document strlcpy(3) as the function you should use for such a use case. Still, I need to revise that page after this discussion; I think we clarified many things, and that page should reflect them. > > I don't deny that this truncation behavior is often done dubiously and > rarely receives enough scrutiny, but a significant chunk of the uses really > are just building an informative string which won't cause any harm if > truncated, and installing additional control flow to handle truncation > errors in places where there currently isn't any can introduce its own > bugs. Yes. And in fact, in shadow-utils I'm taking so slow because I want to avoid a big-bang change that could introduce more errors than it fixes. So I'm first removing the superfluous zeroing of strncpy(3) by using strlcpy(3), while keeping truncation, and only when I'm done with that I'll check if truncation poses any risks and should be fixed; but fixing too much can break stuff. Granted. > > > > I didn't see this as an issue in practice when I was reviewing all those > > > existing usages of strncpy(3). The vast majority were used in the midst of > > > simple string manipulation, where the destination buffer starts as > > > uninitialized or zeroed out, and ultimately gets passed into a user > > > expecting an ordinary null-terminated string. > > > > > > (One exception was a few functions that used strncpy(dst, "", len) to zero > > > > Holy crap! Didn't these programmers know bzero(3) or memset(3)? :D > > > > > out the buffer, which is thankfully pretty obvious. Another exception was > > > the functions that actually used strncpy(3) to produce a null-padded > > > character sequence, e.g., when writing a value into a section of a binary. > > > But in general, I found that it's usually not difficult to tell when a > > > usage is being clever enough that the null padding might be significant.) > > > > > > In fact, the greater confusion came from the surprisingly common practice > > > of using strncpy(3) like it's memcpy(3), by giving it the known length of > > > > It gets better! :D > > In all these cases, I think the function naming really is having somewhat > of a psychological effect: the authors are wrangling with strthis(3) and > strthat(3) for dozens of lines, so they'd find it scary to start mixing it > up with mem*(3) functions ("I'm working with C strings, not with byte > arrays!"), or perhaps they don't even consider it. They'd rather remain > with strncpy(3), even when it means they have to manually append it with a > null terminator or another string. But I'm no psychoanalyst, so take that > with a big grain of salt. > > (Meanwhile, in my own code, I try to work with pointer-and-length arrays > whenever possible instead of fooling around with null terminators and all > their off-by-one fun, so I've become leery of using any str*(3) functions > apart from strlen(3) and strnlen(3).) > > > > (This is also why I was confused by your support for strcpy(3) on the > > > grounds that _FORTIFY_SOURCE exists. Sure, it's better than strncpy(3) in > > > that its behavior isn't nearly so subtle, but _FORTIFY_SOURCE can only > > > protect us from overruns, not from all the "small bugs" that might ensue > > > from people becoming more clever with sizing the destination buffer with > > > strcpy(3). > > > > I don't think strcpy(3) is as propense as strncpy(3) to ask programmers > > to be clever about it. In the case of strncpy(3) it's due to it being > > an incomplete string-copying function. strcpy(3) is complete. > > > > > Also, if it were truly a panacea, then we'd hardly have to worry > > > about the problems of strncpy(3) at all, since it would detect any misuse > > > of the function.) > > > > Fortification detects overruns in writes, which is how it protects > > strcpy(3). However, fortification can't protect against overruns in > > reads, which is what strncpy(3) causes due to missing null terminators. > > strncpy(3) also causes off-by-one bugs (I'll detail below), which > > strcpy(3) doesn't (and strlcpy(3) doesn't either). > > Ah, thank you, I wasn't aware of that limitation in _FORTIFY_SOURCE. > > But I think my notion of problematic cleverness is somewhat different than > yours. When I think of code being excessively clever, I specifically think > of places where it relies on a certain property of the program state, but > it's unclear how that property is upheld at that point in the program. > > This cleverness primarily appears in two different forms, in my experience. > In one form, snippet A is immediately followed by snippet B, but B depends > on some non-obvious property set up by A, and the code has no comments or > other documentation to this effect. In the other (more common) form, > snippet A sets up an obvious property that snippet B depends on, but the > two snippets are miles apart in the code, and it's difficult to see the > connection between the two. (The latter can be exacerbated by intervening > control flow.) > > In this sense, cleverness is mostly orthogonal to the 'completeness' of a > particular function interface. A non-clever use of strncpy(3) would be > calling it and then immediately appending or testing for a null terminator; > then, we have two lines forming a functionally complete whole. A clever > use of strncpy(3) (of the second form) would be setting or testing the null > terminator way earlier or way later in the code, both of which were > unfortunately frequent in my review, though still a minority of uses. > > Another clever use, of the first form, would be appending a null > terminator, using the output in a way that looks like we just want a > string, but then secretly depending on the buffer being null-padded to the > full length. This seems to be a particular concern of yours, but in > practice, I haven't been able to find a single instance of this, except > possibly in GNU binutils which already clearly exudes evil from every line. > > On the other hand, I also see strcpy(3) as no less prone to overly clever > usage, despite being 'complete' in its own definition. The problem is that > it's generally not a complete operation in the context of its typical use > cases, which only have a finite destination buffer and need to ensure that > the entire source string will fit. The author has a choice to make in > deciding how to make this guarantee, and some of these choices can be > arbitrarily clever. In particular, since the author doesn't strictly need > to know the exact size of the source string or destination buffer at the > time they call the function, they can make those sizes as nebulous and > indirect as possible. > > For example, a non-clever use of strcpy(3) would be immediately preceding > it by either an "if (strlen(src) >= dsize)" check, or an allocation of > strlen(src) + 1 bytes, which I think we both agree is the ideal scenario; > the code makes the guarantee and then immediately acts on it. But a clever > use would be exporting this length check to all the function's callers, or > only calling strlen(3) on some precursor(s) of the source string and then > deriving its full length with a tricky and error-prone formula, or simply > not testing the length of the source string at all, but sizing the > destination buffer based on the general vibes of the interface. > > In fact, we can once again look at how code abuses strcpy(3) in practice: > - Of sizing the destination buffer in some far-off corner of the file, I > found 4 instances in GNU binutils. Similarly, of sizing the source string > in a far-off corner and not checking it, I found 6 instances in llvm-nm. > - Of sizing the destination buffer with an involved calculation and then > trusting the result, I found 15 instances in GNU binutils, 1 in GDB, 1 in > CPython, 3 in Firefox, and 4 in .NET Runtime. > - Of accepting an arbitrary destination buffer size without clearly > bounding it below by the source string's length, I found 24 instances in > GNU binutils; I believe at least 2 can cause UB with certain > configurations and inputs. (I gave up trying to enumerate these in the > other codebases, since it's generally not clear at all whether a minimum > size is understood to be implied by the interface.) > - Of not checking the source string's length nor otherwise clearly bounding > it above, I found 37 instances in GNU binutils, 3 in CPython, 14 in > Firefox, 3 in .NET Runtime, and 6 in OpenJDK; I believe at least 19 can > cause UB. > - Of obvious off-by-one errors that will trivially result in UB, I found 2 > instances in GNU binutils, 6 in CPython, 3 in Firefox, and 1 in OpenJDK. > - Finally, of a non-obvious but critical side effect (i.e., unintentionally > clever code of the first form), I found just 1 instance in Firefox, where > a certain error branch just happens to be reachable only when the buffer > is large enough for the error message to fit. > And these aren't even counting its cousins strcat(3) and sprintf(3)! > > So I hope you'll forgive me if I have a hard time believing that authors > are less likely to be overly clever with strcpy(3) than with strncpy(3), > purely on account of the former's interface being more 'complete'. > > > > Probably the only way to solve the cleverness issue for good is to have an > > > immediately-available, foolproof, performant set of string functions that > > > are extremely straightforward to understand and use, flexible enough for > > > any use case, and generally agreed to be the first choice for string > > > manipulation. > > > > > > Unfortunately, probably the closest match to those criteria, especially the > > > availability criterion, is snprintf(3), which has the flaws of using int > > > instead of size_t for most sizes, not being very performant, and not being > > > async-signal-safe. Alas, it will likely remain a dream, given all the wars > > > over which safer string functions have the best API. But at least > > > strlcpy(3) has a pretty sound interface, if other platforms ever get around > > > to including it by default. > > > > strlcpy(3) will be in POSIX.1-202x (Issue 8), so it's a matter of time > > that it'll be widespread. > > I noticed that, but I've always been a pessimist regarding the timelines of > cool new things being rolled out. It will take some months to years before > Issue 8 is released, months to years for all the relevant platforms to get > the memo and implement it, many years for the knowledge to trickle down to > the everyday library authors, and many more years for old versions of > platforms to reach the end of their support periods. And I don't want to > be one of those people advertising stuff that's perpetually 'just around > the corner'. (For that matter, I wonder how many decades it will be before > I see widespread use of posix_close(2) in a serious codebase, if ever.) > > > > My point is isn't that the difference is undocumented, but that the typical > > > man page reader isn't reading the man pages for their own sake, but because > > > they're looking at some code, and they want to Know What It's Doing as soon > > > as possible. > > > > We could maybe add a list of ways people have tried to be clever with > > strncpy(3) in the past and failed, and then explain why those uses are > > broken. This could be in a BUGS section. > > I'd be interested in your experiences of people "trying to be clever" per > your perspective; as I mentioned, in my earlier review of actual strncpy(3) > usage, the only cleverness that occurs in non-negligible amounts has been > either in the midst of using it in its 'intended' role for producing a > null-padded character sequence (I'm referring to binutils here), or messing > around with which part of the code is responsible for appending the > terminator. > > > > Instead, it's code making use of strncpy(3) in a particularly clever way > > > that I'd find confusing, and in those cases, I lie the blame squarely on > > > the cleverness rather than the function itself. > > > > I blame the definition of the function of ISO C. Why? Because by being > > an incomplete string-copying function, it forces the programmer to be > > clever about it. You can't just use strncpy(3) and that's all; you need > > to do something else, and then you do clever stuff, which ends up badly. > > It forces the programmer to perform an extra step, but it doesn't force the > programmer to be clever in performing that extra step. As I have described > above, strcpy(3) also needs an extra step that the programmer can be > inordinately clever with, regardless of being a complete string-copying > function. So I don't see strncpy(3) as being uniquely evil here. > > > > So will all the custom strlen(3)+memcpy(3)-based replacements suddenly be > > > immune to off-by-one bugs? > > > > Slightly. Here's the typical use of strlen(3)+strcpy(3): > > > > if (strlen(src) >= dsize) > > goto error; > > strcpy(dst, src); > > > > There's no +1 or -1 in that code, so it's hard to make an off-by-one > > mistake. Okay, you may have seen that it has a '>=', which one could > > accidentally replace by a '>', causing an off-by-one. I'd wrap that > > thing in a strxcpy() wrapper so you avoid repetition. > > As I learned, the typical use of strcpy(3) (at least 80% of uses in my > estimation) is actually copying a string into a new buffer, not an existing > buffer. And that does need a +1 to calculate a size to pass to the > allocation function, and usually a lot more +s if it's going to be If you strcpy(3) to a new buffer, you'd usually strdup(3), no? Unless it's part of a larger object. > concatenating further strings. (Did you know that it's not an uncommon > practice to use "char value[1];" for a variable-length string at the end of > a struct, then depend on that 1 byte being included in the size of the > struct when allocating it?) Not exactly that, but I've seen things like that, yeah. I wish I didn't. > Meanwhile, code does manage to make that off-by- > one error between >= and > in practice regardless. I made that error yesterday, so yes. :) > > Relatedly, as I also learned from all the manual strdup(3)-like snippets > that use a custom allocator, the typical library author is deathly allergic > to writing a custom wrapper over anything that isn't an allocation > function; they'll repeat the entirety of the logic inline as many times as > it takes. So I don't buy that most people would be replacing numerous calls > to strncpy(3) with calls to a unified wrapper function that can be > inspected and fixed all in one place, as you seem to suggest in your later > email. I try to avoid cowboy programmers, but we know it's impossible. I just do what I can. But cowboy programmers will nevertheless continue to exist and negate reality. <https://github.com/nginx/unit/issues/795> <https://github.com/nginx/unit/issues/804> <https://github.com/nginx/unit/issues/923> The responses from a programmer from nginx are gems, doubting that UB is a problem, or even suggesting implementing a cosmetic patch instead of fixing an API. You can read those links if you want some fun. > > > > Or will the vast majority of current strncpy(3) > > > users be willing to either restrict their platform support or add two extra > > > dependencies to their build process just to have strlcpy(3)? I'd hardly be > > > inclined to think that off-by-one bugs are a particular specialty of > > > strncpy(3). > > > > They are. Here's the typical use of strncpy(3) as a replacement: > > > > strncpy(dst, src, dsize); > > if (dst[dsize - 1] != '\0') > > goto error; > > dst[dsize - 1] = '\0'; > > > > There are many more moving parts, so more chances to make mistakes. > > And you see it forces the programmer to write explicitly -1 twice. I've > > seen code that forgets to do the -1, and also code that uses -1 in the > > strncpy(3) call (which makes it impossible to detect truncation). > > That "dst[dsize - 1] = '\0';" line is extraneous, and none of the existing > truncation-detecting uses of strncpy(3) I saw have its equivalent; after > all, we just checked that character with the if statement, there's no need > to set it again. Without that line, there are only two lines of logic, and > a single -1, matching the single +1 needed by the typical use of strcpy(3). Hmm you're right. I took an actual typical use of strncpy(3) as you could find them in shadow-utils, that is, without the truncation check, and added the truncation check myself without removing the zeroing. You can remove that like. And yes, that makes it a signle off-by-one chance, as well as with strlen(3). So, as long as you wrap this in an inline function, it should be as safe. Except that you still do the superfluous zeroing that I find confusing. But if you go and write a decent wrapper around strncpy(3), I would see it as decent code. > > Also, the typical use of strncpy(3) by far is to allow a truncated string > rather than raising an error on truncation, and in that use case, it makes > no difference whether or not the size inside the strncpy(3) call has a -1. True; that's a benign off-by-one cancer. But still a cancer. > The memcpy(3) replacement for truncation needs an additional min() ternary > or macro, and it still needs a manual null terminator that can have the > exact same off-by-one error. > > > > By that standard, every call to a function that takes an output pointer and > > > returns the number of elements written (say, readlink(2)) would need a > > > comment saying "the remaining elements in this array now have undefined > > > values". > > > > No, because it does precisely what is intended. It is when you add dead > > code when you need to justify it. > > Again, that seems like an odd standard to apply only to strncpy(3)'s > destination buffer. For instance, suppose that an API accepts an input > struct with optional fields. It's a common practice to zero out every field > with memset(3) or = {0}, then fill in the input fields that are actually > used, regardless of whether the API is specified as actively ignoring the > remaining fields. > > Certainly, it can be quite a task to figure out whether the fields are > actually read, if the API is poorly specified; without going through its > entire implementation, any of those "unused" fields could be copied around > or compared before being discarded, making it dangerous to leave them > uninitialized. But need we add a comment to every one of those memset(3) > calls, "I'm unsure whether this zeroing is significant at all"? Perhaps > such a comment might be helpful, if there really is reason to suspect that > the API is nefarious, but I've hardly ever seen stuff like that in > practice. Maybe it's because in the code I've worked with, there were actual calls to strncpy(3) where the zeroing matters, and they're disguised between other strncpy(3) calls, which make it all a funny amusement park. If you _only_ use strings, and wrap strncpy(3) in a wrapper that protects against off-by-ones, it would be acceptable, I must say. It's just that I don't find that code when I see strncpy(3) calls. Maybe I don't look at the right code bases. > (Or, for a silly reductio ad absurdum, if some code calls malloc(3), then > continues with some cleanup functions if it returns NULL, then would that > code have to justify why malloc(3) set an errno value that seemingly > never gets read? Those cleanup functions could be doing something clever by > reading errno on entry, after all!) > > > > I don't think it's controversial that in many situations, we > > > tacitly understand that we simply don't care about the remainder of a > > > > While the analysis isn't very hard, it takes some time, examining all > > surrounding code to make sure nothing cares about the trailing bytes. > > When you have a hundred such calls, you need to make sure nobody was too > > clever around any of them. > > Sure, there's a hypothetical concern that some later consumer might notice > the zeroing and act on it. But strncpy(3) is hardly the only thing in the > typical codebase that produces an unnecessarily-zeroed buffer. Authors > often use calloc(3) or memset(3) for peace of mind and no other purpose, Those are as nefarious IMO. They remove the ability of a static analyzer of detecting uninitialized uses. I.e., if you zero-initialize all of your code, -Wuninitialized and -Wmaybe-uninitialized (and -fanalyzer also plays a role there) get completely useless, and your program still will behave wrongly if you miss one of those cases; it's just that the compiler won't help you fix them. > or, especially in C++, zero out any local buffers in a class constructor to > avoid the specter of uninitialized memory. > > And of course, lots of code repeatedly reuses the same buffer for different > strings, handing out pointers to it, and callers could just as easily leak > the left-over data after the null terminator. Verifying that an alleged > string buffer truly is only used as a string is just a fact of life when > refactoring unfamiliar code in C. > > > > buffer after a certain point. In the case of producing a string, that point > > > is going to be the null terminator, in the absence of on-site documentation > > > to the contrary; I'd label anything else as overly clever. > > > > But again, strncpy(3) forces you to be clever. > > If forces you to do extra work, the same way strcpy(3) forces you to do > extra work. strncpy(3) still requires you to know your buffer sizes. So any dangers of strcpy(3) in that regard should be shared by strncpy(3). No? Cheers, Alex > And it allows you to be clever, the same way strcpy(3) allows > you to be clever. But at least it bounds the extent of your cleverness in > that it forces you to remember the size of your destination buffer. I'd > much rather review a hundred typical calls to strncpy(3) than a hundred > typical calls to strcpy(3) any day of the week. > > Thank you, > Matthew House -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 17:48 ` Alejandro Colomar @ 2023-11-13 15:01 ` Matthew House 0 siblings, 0 replies; 138+ messages in thread From: Matthew House @ 2023-11-13 15:01 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Jonny Grant, linux-man On Fri, Nov 10, 2023 at 12:48 PM Alejandro Colomar <alx@kernel.org> wrote: > On Fri, Nov 10, 2023 at 11:06:00AM -0500, Matthew House wrote: > > As I learned, the typical use of strcpy(3) (at least 80% of uses in my > > estimation) is actually copying a string into a new buffer, not an existing > > buffer. And that does need a +1 to calculate a size to pass to the > > allocation function, and usually a lot more +s if it's going to be > > If you strcpy(3) to a new buffer, you'd usually strdup(3), no? Unless > it's part of a larger object. Indeed, it's part of a larger object much more often than not. Empirically, the idiomatic pattern in the strcpy(3)-using codebases I checked is stuff like: char *key = ..., *value = ...; size_t dsize = strlen(key) + 1; if (value) dsize += 2 + strlen(value); char *dst = malloc(dsize); if (!dst) return NULL; strcpy(dst, key); if (value != NULL) { strcat(dst, ", "); strcat(dst, value); } return dst; (Or similar with sprintf(3), when the sequence is fixed at compile time. Combinations of strcat(3) and sprintf(3) are also common.) And even in the case of only copying a single string, strdup(3) is not an option for any codebase using functions other than malloc(3)/free(3) for allocation; they either have to write a custom wrapper (very rare in practice), or just allocate strlen(src) + 1 bytes inline and strcpy(3) it, as the limiting case of the general strcpy(3)/strcat(3) pattern. Also, strdup(3) isn't in current ISO C (yes, I know it's in C23, but I'm still a pessimist), so it isn't directly portable to Windows' CRT without conditionally #defining it as an alias of _strdup(), which probably scares off a few potential users. > > Relatedly, as I also learned from all the manual strdup(3)-like snippets > > that use a custom allocator, the typical library author is deathly allergic > > to writing a custom wrapper over anything that isn't an allocation > > function; they'll repeat the entirety of the logic inline as many times as > > it takes. So I don't buy that most people would be replacing numerous calls > > to strncpy(3) with calls to a unified wrapper function that can be > > inspected and fixed all in one place, as you seem to suggest in your later > > email. > > I try to avoid cowboy programmers, but we know it's impossible. I just > do what I can. But cowboy programmers will nevertheless continue to > exist and negate reality. > > <https://github.com/nginx/unit/issues/795> > <https://github.com/nginx/unit/issues/804> > <https://github.com/nginx/unit/issues/923> > > The responses from a programmer from nginx are gems, doubting that UB is > a problem, or even suggesting implementing a cosmetic patch instead of > fixing an API. You can read those links if you want some fun. I don't deny that 'cowboy programmers' who disregard the formal rules in favor of their own mental models, then blame the compiler devs, standards authors, et al. if they ever get bitten, can be a real problem in the C community, and targeting their specific preferences isn't always practical. (Some of them still do have valid points, though; it's not an axiom that all instances of UB or unspecified behavior currently in the standards are necessarily a net good, as a few of the cowboys' opponents seem to overzealously imply.) But I also don't think that the very common preference for repeatedly inlining code over writing a custom wrapper can simply be brushed off as solely being held by such careless programmers. I can think of at least a couple scenarios where it can make some sense even for careful programmers. First, many teams writing libraries don't have much in the way of coherent top-down control over the general layout of the codebase: every programmer works primarily on their own functionality, while trying not to trample over the work of their peers. So it can be especially difficult to set up a central file of utility functions and keep them fully stable and available. Instead, if a programmer just sticks purely to the platform-provided functions, they have the assurance of fully-consistent behavior, at the cost of the mental overhead of correctly writing patterns on top of them. Second, some code is optimized for being very literally reused, by directly transplanting functions from one project to another. For instance, CPython has a few files transplanted from other FOSS libraries in this way, used as fallbacks for mostly-but-not-entirely-portable APIs. But if such code referred to project-specific wrappers, then all the wrappers would have to be copied as well to get everything to work; thus, it's again valuable to stick to common platform APIs. More generally, if strncpy(3), short of being deprecated, became (e.g.) strongly discouraged and heavily linted against in clang-tidy and the big IDEs, to the point that library authors are pushed to git rid of it one way or another, then I'd expect to see many more inlined memcpy(3)-based replacements than foolproof wrappers. And even if some of them can be blamed on 'cowboy programmers', inlined patterns represent enough of the general codebase that we'd all have to read it anyway, which is not something I'd prefer over working through strncpy(3)'s faults. > > Also, the typical use of strncpy(3) by far is to allow a truncated string > > rather than raising an error on truncation, and in that use case, it makes > > no difference whether or not the size inside the strncpy(3) call has a -1. > > True; that's a benign off-by-one cancer. But still a cancer. I don't see it that way. Both versions make some amount of logical sense. With a -1 inside the strncpy(3) call, you're taking a raw prefix of a string, then appending a null terminator to the prefix in case it doesn't have one. Without a -1 inside a strncpy(3) call, you're again taking a raw prefix, then truncating again by one more byte to ensure that a null terminator is present. The only real question is the size that it really ought to be truncated to (assuming that truncation makes sense in the first place), but usually that's just "whatever size fills as much of the buffer as possible". > > Certainly, it can be quite a task to figure out whether the fields are > > actually read, if the API is poorly specified; without going through its > > entire implementation, any of those "unused" fields could be copied around > > or compared before being discarded, making it dangerous to leave them > > uninitialized. But need we add a comment to every one of those memset(3) > > calls, "I'm unsure whether this zeroing is significant at all"? Perhaps > > such a comment might be helpful, if there really is reason to suspect that > > the API is nefarious, but I've hardly ever seen stuff like that in > > practice. > > Maybe it's because in the code I've worked with, there were actual calls > to strncpy(3) where the zeroing matters, and they're disguised between > other strncpy(3) calls, which make it all a funny amusement park. > > If you _only_ use strings, and wrap strncpy(3) in a wrapper that > protects against off-by-ones, it would be acceptable, I must say. It's > just that I don't find that code when I see strncpy(3) calls. Maybe I > don't look at the right code bases. My condolences! But yeah, basically all codebases I've ever looked at, including the ones I reviewed for typical strncpy(3) usage, really do tend to use plain, ordinary C strings all the way; some even have comments reminding not to depend on strncpy(3)'s zero-padding, on account of a few misbehaving implementations without it. I recall seeing one library a while back that zero-padded all strings up to a multiple of 8 bytes for SIMD purposes, but IIRC, that one used entirely custom functions for string manipulation, and limited use of standard functions to reading the strings. > > If forces you to do extra work, the same way strcpy(3) forces you to do > > extra work. > > strncpy(3) still requires you to know your buffer sizes. So any dangers > of strcpy(3) in that regard should be shared by strncpy(3). No? What I was trying to say with my whole anti-strcpy(3) diatribe is, it's a very good thing that strncpy(3) requires you to know your buffer size! strcpy(3), strcat(3), and sprintf(3) share the danger that you can use them *even without knowing your buffer size* and putting in the extra work. Thus, library authors can and have frequently written clever things like void write_string_to_buffer(char *buf, const char *key, int value) { sprintf(buf, "%s, %d\n", key, value); } where the required buffer size is known neither to the caller nor the callee; callers just all coincidentally happen to use a large-enough buffer, even though the requirement isn't documented anywhere. And with enough callers, it becomes very likely to mess this up somewhere and actually expose a buffer overwrite, as I mentioned a few times in my list. Meanwhile, with strncpy(3), which requires the destination size to be set in stone, the only dangers are memcpy(3)-like uses where it turns out the source string isn't always long enough; truncating uses where truncation is logically inappropriate, or where the string is truncated too far; truncation-detecting uses where some source strings are needlessly rejected; cleverness in deciding when to append the null terminator; normal off-by-one errors; and, of course, your fear of secret reliance on the zero padding. Most of these are strictly local dangers, that can be diagnosed mainly by looking at the call to strncpy(3) and the immediate use of the destination buffer. The only exceptions are certain memcpy(3)-like uses, which can rely on the code that's creating the source string to make it long enough, and secret reliance on zero padding, which appears rare to me in practice. But strcpy(3)'s biggest and most frequent danger is a global danger that necessarily forces you to scour the codebase to track down all the callers and make sure that the source ultimately fits in the destination. And many codebases even consider this perfectly legitimate, e.g., by having some common INTERNAL_BUFFER_SIZE that they implicitly expect the source string to adhere to! That's why I say that strcpy(3)'s dangers are not really shared by strncpy(3). Thank you, Matthew House ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 16:06 ` Matthew House 2023-11-10 17:48 ` Alejandro Colomar @ 2023-11-11 20:55 ` Jonny Grant 2023-11-11 21:15 ` Jonny Grant 1 sibling, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-11 20:55 UTC (permalink / raw) To: Matthew House, Alejandro Colomar; +Cc: linux-man On 10/11/2023 16:06, Matthew House wrote: > On Thu, Nov 9, 2023 at 7:23 AM Alejandro Colomar <alx@kernel.org> wrote: >>> So one can interpret strncpy(3) as copying a prefix of a character sequence >>> into a buffer (and zero-filling the remainder), in which case you're >>> correct that truncation cannot be detected. But the function is fomally >>> defined as copying a prefix of a string into a buffer (and zero-filling the >>> remainder), in which case the string has been truncated if the buffer >>> doesn't end in a null byte afterward. It's just that one may not care about >>> the terminating null byte being truncated if the user of the result just >>> wants the initial character sequence. >> >> Yes, with the ISO C definition of strncpy(3), you can detect truncation. >> The problem is that while my definition of it is complete, the >> definition by ISO C makes it an incomplete function (to complete its >> functionallity in copying strings, you need to add an explicit '\0' >> after the call). So I prefer mine, and for self-consistency, it can't >> report truncation. > > Personally, I'm a pragmatist, and I like to see it as kind of a duality: it > can be used as part of a routine that copies part of a string and reports > truncation, and it can also be used as a complete routine that copies part > of a character sequence but can't report truncation. That reflects how it's > used in practice. And it would hardly be the first such duality in C, > either, given things like the fundamental practice of manipulating > arbitrary objects as if they're character arrays. > > (Some of these other dualities are similarly infamous in their room for > error, e.g., forgetting to multiply by the element size when calling > malloc(3), which I have often been guilty of myself. And still, a worrying > amount of code neglects to test for multiplication overflow when doing > this, even when the length comes from an untrusted source. Yet somehow I > haven't seen any calls for a mallocarray(3) function to replace it. Ditto > with memset(3), which can and has caused actual hard-to-notice bugs due to > the first few elements looking correct even if the provided length is too > short.) > > But you're entitled to your opinion on how it ought to be best represented > in the man page, as long as the immediate shortcoming of the function w.r.t > producing strings is made very clear, even to readers who aren't in the > habit of contemplating formal definitions. I'm satisfied by your patch in > that regard. > >>> That's a nice library that I didn't know about! Unfortunately, I don't >>> think it's a very viable option for the long tail of small libraries I've >>> referred to, which generally don't have any sub-dependencies of their own, >>> apart from those provided by the platform. >>> >>> Going from 0 to 2 dependencies (libbsd and libmd) requires invoking their >>> configure scripts from whatever build system you're using (in such a way >>> that libbsd can locate libmd), ensuring they're safe for cross-compilation >>> if that's a goal, ensuring you bundle them in a way that respects their >>> license terms, and ensuring that any user of your library links to the two >>> dependencies and doesn't duplicate them. At that point, rolling your own >>> strlcpy(3) equivalent definitely sounds like less mental load, at least to >>> me. >> >> Yes, if you had 0 deps, it might be simpler to add your implementation. >> Although it's a tricky function to implement, so I'd be careful. If you >> need to roll your own, I would go for a simpler function; maybe a >> wrapper over strlen(3)+strcpy(3). > > Such a wrapper would indeed be useful for detecting truncation, but a full > strlcpy(3) equivalent would be necessary for permitting the truncation and > continuing, which is the behavior of the majority of existing strncpy(3)- > based code. > > I don't deny that this truncation behavior is often done dubiously and > rarely receives enough scrutiny, but a significant chunk of the uses really > are just building an informative string which won't cause any harm if > truncated, and installing additional control flow to handle truncation > errors in places where there currently isn't any can introduce its own > bugs. Truncation seems risky, I can't think of many nice use-cases of truncation. Say it's a file path, truncation means the file path isn't accurate any more. Maybe a song title for a music player could be ok truncated, so just display first x characters of the song title etc. Doesn't feel great though. Maybe strings beyond an expected size, as a safety check. So a song title longer than the 255 bytes that the format allows could be truncated. (probably a missing NUL in the file, or a corrupt file) >>> I didn't see this as an issue in practice when I was reviewing all those >>> existing usages of strncpy(3). The vast majority were used in the midst of >>> simple string manipulation, where the destination buffer starts as >>> uninitialized or zeroed out, and ultimately gets passed into a user >>> expecting an ordinary null-terminated string. >>> >>> (One exception was a few functions that used strncpy(dst, "", len) to zero >> >> Holy crap! Didn't these programmers know bzero(3) or memset(3)? :D Perhaps that strncpy might get optimized out, if the memory modified isn't read again after memset(). So may need explicit_memset() for this situation. >>> out the buffer, which is thankfully pretty obvious. Another exception was >>> the functions that actually used strncpy(3) to produce a null-padded >>> character sequence, e.g., when writing a value into a section of a binary. >>> But in general, I found that it's usually not difficult to tell when a >>> usage is being clever enough that the null padding might be significant.) >>> >>> In fact, the greater confusion came from the surprisingly common practice >>> of using strncpy(3) like it's memcpy(3), by giving it the known length of >> >> It gets better! :D > > In all these cases, I think the function naming really is having somewhat > of a psychological effect: the authors are wrangling with strthis(3) and > strthat(3) for dozens of lines, so they'd find it scary to start mixing it > up with mem*(3) functions ("I'm working with C strings, not with byte > arrays!"), or perhaps they don't even consider it. They'd rather remain > with strncpy(3), even when it means they have to manually append it with a > null terminator or another string. But I'm no psychoanalyst, so take that > with a big grain of salt. > > (Meanwhile, in my own code, I try to work with pointer-and-length arrays > whenever possible instead of fooling around with null terminators and all > their off-by-one fun, so I've become leery of using any str*(3) functions > apart from strlen(3) and strnlen(3).) Do you mean passing a size_t around for the length of the src string? That saves needing to read memory counting bytes, which is a performance boost on big strings. Accessing memory to read or write unnecessarily is a performance drag. I saw you mention off by 1 errors, I recall seeing some old code bases decades ago where they used to allocate an extra 2 bytes, just to avoid crashes in their buggy code, pretty bad stuff. Kind regards Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-11 20:55 ` Jonny Grant @ 2023-11-11 21:15 ` Jonny Grant 2023-11-11 22:36 ` Alejandro Colomar 0 siblings, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-11 21:15 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man Alejandro I was reading again https://man7.org/linux/man-pages/man7/string_copying.7.html Sharing some comments, I realise not latest man page, if you have a new one online I could read that. I was reading man-pages 6.04, perhaps some already updated. A) Could simplify and remove the "This function" and "These functions" that start each function description. B) "RETURN VALUE" has the text before each function, rather than after as would be the convention from "DESCRIPTION", I suggest to move the return value text after each function name. Could make it like https://man7.org/linux/man-pages/man3/string.3.html C) In the examples, it's good stpecpy() checks for NULL pointers, the other's don't yet though. D) strlcpy says "These functions force a SIGSEGV if the src pointer is not a string." How does it determine the pointer isn't a string? E) Are these functions mentioned like ustpcpy() standardized by POSIX? or in use in a libc? F) char *stpncpy(char dst[restrict .sz], const char *restrict src, size_t sz); I know the 'restrict' keyword, but haven't seen this way it attempts to specify the size of the 'dst' array by using the parameter 'sz' is this in wide use in APIs? I remember C11 let us specify char ptr[static 1] to say the pointer must be at least 1 element in this example Saw a few pages started to write out functions like size_t strnlen(const char s[.maxlen], size_t maxlen); Is this just for documentation? usually it would be: const char s[static maxlen] G) "Because these functions ask for the length, and a string is by nature composed of a character sequence of the same length plus a terminating null byte, a string is also accepted as input." I suggest to adjust the order so it doesn't start with a fragment: "A string is also accepted as input, because these functions ask for the length, and a string is by nature composed of a character sequence of the same length plus a terminating null byte." Could simplify and remove "by nature". Unrelated man page strncpy, noticed this. SEE ALSO Could this refer to strcpy(3) and string(3) at the bottom? https://man7.org/linux/man-pages/man3/strncpy.3.html With kind regards Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-11 21:15 ` Jonny Grant @ 2023-11-11 22:36 ` Alejandro Colomar 2023-11-11 23:19 ` Alejandro Colomar 2023-11-17 21:46 ` Jonny Grant 0 siblings, 2 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-11 22:36 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 8669 bytes --] Hi Jonny, On Sat, Nov 11, 2023 at 09:15:12PM +0000, Jonny Grant wrote: > Alejandro > > I was reading again > https://man7.org/linux/man-pages/man7/string_copying.7.html > > Sharing some comments, I realise not latest man page, if you have a new one online I could read that. I was reading man-pages 6.04, perhaps some already updated. You can check this one: <https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/man-pages-6.05.01.pdf#string_copying_7> also available here: <https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages-6.05.01.pdf#string_copying_7> And of course, you can install them from source, or read them from the repository itself. > A) Could simplify and remove the "This function" and "These functions" that start each function description. Fixed; thanks. <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=53ea8765ed7f9733abf96e86df89619dc3d203ef> > > B) "RETURN VALUE" has the text before each function, rather than after as would be the convention from "DESCRIPTION", I suggest to move the return value text after each function name. Fixed; thanks. <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=76316bd6f98c58d70c2330f7d2a945aac7c76dd8> > > Could make it like https://man7.org/linux/man-pages/man3/string.3.html > > C) In the examples, it's good stpecpy() checks for NULL pointers, the other's don't yet though. The reason is interesting. I also designed a similar function based on snprintf(3), which can be chained with this one. Since that one can return NULL, and to reduce the number of times one needs to check for errors, I added the NULL check. alx@debian:~/src/shadow/shadow/master$ grepc -tfd stpeprintf . ./lib/stpeprintf.h:inline char * stpeprintf(char *dst, char *end, const char *restrict fmt, ...) { char *p; va_list ap; va_start(ap, fmt); p = vstpeprintf(dst, end, fmt, ap); va_end(ap); return p; } alx@debian:~/src/shadow/shadow/master$ grepc -tfd vstpeprintf . ./lib/stpeprintf.h:inline char * vstpeprintf(char *dst, char *end, const char *restrict fmt, va_list ap) { int len; ptrdiff_t size; if (dst == end) return end; if (dst == NULL) return NULL; size = end - dst; len = vsnprintf(dst, size, fmt, ap); if (len == -1) return NULL; if (len >= size) return end; return dst + len; } alx@debian:~/src/shadow/shadow/master$ grepc -tfd stpecpy . ./lib/stpecpy.h:inline char * stpecpy(char *dst, char *end, const char *restrict src) { bool trunc; char *p; size_t dsize, dlen, slen; if (dst == end) return end; if (dst == NULL) return NULL; dsize = end - dst; slen = strnlen(src, dsize); trunc = (slen == dsize); dlen = slen - trunc; p = mempcpy(dst, src, dlen); *p = '\0'; return p + trunc; } Then you can use them like this: end = buf + sizeof(buf); p = buf; p = stpecpy(p, end, "Hello "); p = stpeprintf(p, end, "%d realms", 9); p = stpecpy(p, end, "!"); if (p == end) { p--; goto toolong; } len = p - buf; puts(buf); Regarding other string-copying functions, NULL is not inherent to them, so I'm not sure if they should have explicit NULL checks. Why would these functions receive a null pointer? The main possibility is that the programmer forgot to check some malloc(3) call, which should receive a different treatment from a failed copy, normally. > D) strlcpy says > "These functions force a SIGSEGV if the src pointer is not a string." > How does it determine the pointer isn't a string? By calling strlen(src). If it isn't a string, it'll continue reading, and likely crash due to an unbound read. However, the SIGSEGV isn't guaranteed, since it may find a 0 well before crashing, so I removed that text. It is a feature and a bug of these functions: they can find programming errors where one passes a character sequence where a string is expected, and crash the program to nosily report the programmer error. But that also makes it very slow, as Paul said. > > E) Are these functions mentioned like ustpcpy() standardized by POSIX? or in use in a libc? No. They are my inventions, like stpecpy(). It seems I forgot to add a "This function is not provided by any library" in some of them. Fixed; thanks. <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=9848ac50ceb6cc4d786b3899ee4626959e5f1d81> > > F) > char *stpncpy(char dst[restrict .sz], const char *restrict src, > size_t sz); > I know the 'restrict' keyword, but haven't seen this way it attempts to specify the size of the 'dst' array by using the parameter 'sz' is this in wide use in APIs? I remember C11 let us specify char ptr[static 1] to say the pointer must be at least 1 element in this example It continues meaning the same thing. If you use array notation, the restrict must be placed inside the brackets. The following two snippets are equivalent C code: void foo(int *p, int *restrict x); void foo(int *p, int x[restrict 7]); Since I didn't use 'static', to ISO C the array notation is ignored. GCC, however, will be reasonable and understand it. To GCC, there's not much difference between the following: [[gnu::nonnull]] void bar(int x[7]); void bar(int x[static 7]); And of course, you can combine static and restrict: void baz(int *p, int x[static restrict 7]); > > Saw a few pages started to write out functions like > size_t strnlen(const char s[.maxlen], size_t maxlen); > > Is this just for documentation? usually it would be: const char s[static maxlen] I don't like static for array parameters. Specifying a size for a parameter should similarly signify to the compiler that it should expect no less than N elements. This is how GCC behaves. And static has another implication: nonnull. IMO, nonnull is tangential to array size, and should be specified separately with its own attribute or qualifier. I'd like to be able to specify the following different cases: void f1(int [10]); // NULL, or array of size >= 10 void f2(int [_Nonnull 10]); // Array of size >=10 With static, I can only do the second. Quite unreasonable. Regarding the '.', consider the following two snippets: int size; // This is the size of s[size]. void g1(char s[size], size_t size); You could be tricked to think that the size of s[] is the second parameter to the function, but it's the global variable size. void g2(char s[size], size_t size); Here's, since there's no global size, the code won't even compile. There's no way to use a parameter that comes later as a size, conforming to ISO C. We were discussing this [.identifier] syntax in linux-man@ and gcc@, as a possible extension. We haven't yet decided on it, but I'm previewing it as a documentation extension for now. The rationale for the syntax comes from similarity with designated initializers for structures. > > G) "Because these functions ask for the length, and a string is by > nature composed of a character sequence of the same length plus a > terminating null byte, a string is also accepted as input." > > I suggest to adjust the order so it doesn't start with a fragment: > > "A string is also accepted as input, because these functions ask > for the length, and a string is by nature composed of a character > sequence of the same length plus a terminating null byte." > > Could simplify and remove "by nature". Yep; thanks. <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=78b2ff8c6f25654648f0fa06c310b87a7e49128e> > > Unrelated man page strncpy, noticed this. > > SEE ALSO > Could this refer to strcpy(3) and string(3) at the bottom? > https://man7.org/linux/man-pages/man3/strncpy.3.html I removed it on purpose, because I intended to put some distance between strncpy(3), and strings and string-copying functions like strcpy(3). That's why I point to string_copying(7), where readers should be educated of all of the differences. Then, string_copying(7) has a more complete SEE ALSO, because it has already detailed all the different functions, and the reader is ready to read the individual pages. Kind regards, Alex > > With kind regards > Jonny > > > -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-11 22:36 ` Alejandro Colomar @ 2023-11-11 23:19 ` Alejandro Colomar 2023-11-17 21:46 ` Jonny Grant 1 sibling, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-11 23:19 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 9355 bytes --] On Sat, Nov 11, 2023 at 11:36:09PM +0100, Alejandro Colomar wrote: > Hi Jonny, > > On Sat, Nov 11, 2023 at 09:15:12PM +0000, Jonny Grant wrote: > > Alejandro > > > > I was reading again > > https://man7.org/linux/man-pages/man7/string_copying.7.html > > > > Sharing some comments, I realise not latest man page, if you have a new one online I could read that. I was reading man-pages 6.04, perhaps some already updated. > > You can check this one: > > <https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/man-pages-6.05.01.pdf#string_copying_7> > also available here: > <https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages-6.05.01.pdf#string_copying_7> > > And of course, you can install them from source, or read them from the > repository itself. > > > A) Could simplify and remove the "This function" and "These functions" that start each function description. > > Fixed; thanks. > > <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=53ea8765ed7f9733abf96e86df89619dc3d203ef> > > > > > B) "RETURN VALUE" has the text before each function, rather than after as would be the convention from "DESCRIPTION", I suggest to move the return value text after each function name. > > Fixed; thanks. > > <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=76316bd6f98c58d70c2330f7d2a945aac7c76dd8> > > > > > Could make it like https://man7.org/linux/man-pages/man3/string.3.html > > > > C) In the examples, it's good stpecpy() checks for NULL pointers, the other's don't yet though. > > The reason is interesting. I also designed a similar function based on > snprintf(3), which can be chained with this one. Since that one can > return NULL, and to reduce the number of times one needs to check for > errors, I added the NULL check. > > alx@debian:~/src/shadow/shadow/master$ grepc -tfd stpeprintf . > ./lib/stpeprintf.h:inline char * > stpeprintf(char *dst, char *end, const char *restrict fmt, ...) > { > char *p; > va_list ap; > > va_start(ap, fmt); > p = vstpeprintf(dst, end, fmt, ap); > va_end(ap); > > return p; > } > alx@debian:~/src/shadow/shadow/master$ grepc -tfd vstpeprintf . > ./lib/stpeprintf.h:inline char * > vstpeprintf(char *dst, char *end, const char *restrict fmt, va_list ap) > { > int len; > ptrdiff_t size; > > if (dst == end) > return end; > if (dst == NULL) > return NULL; > > size = end - dst; > len = vsnprintf(dst, size, fmt, ap); > > if (len == -1) > return NULL; > if (len >= size) > return end; > > return dst + len; > } > alx@debian:~/src/shadow/shadow/master$ grepc -tfd stpecpy . > ./lib/stpecpy.h:inline char * > stpecpy(char *dst, char *end, const char *restrict src) > { > bool trunc; > char *p; > size_t dsize, dlen, slen; > > if (dst == end) > return end; > if (dst == NULL) > return NULL; > > dsize = end - dst; > slen = strnlen(src, dsize); > trunc = (slen == dsize); > dlen = slen - trunc; > > p = mempcpy(dst, src, dlen); > *p = '\0'; > > return p + trunc; > } > > > Then you can use them like this: > > > end = buf + sizeof(buf); > p = buf; > p = stpecpy(p, end, "Hello "); > p = stpeprintf(p, end, "%d realms", 9); > p = stpecpy(p, end, "!"); Oops, missing NULL check: if (p == NULL) goto fail; > if (p == end) { > p--; > goto toolong; > } > len = p - buf; > puts(buf); > > > Regarding other string-copying functions, NULL is not inherent to them, > so I'm not sure if they should have explicit NULL checks. Why would > these functions receive a null pointer? The main possibility is that > the programmer forgot to check some malloc(3) call, which should receive > a different treatment from a failed copy, normally. > > > D) strlcpy says > > "These functions force a SIGSEGV if the src pointer is not a string." > > How does it determine the pointer isn't a string? > > By calling strlen(src). If it isn't a string, it'll continue reading, > and likely crash due to an unbound read. However, the SIGSEGV isn't > guaranteed, since it may find a 0 well before crashing, so I removed > that text. It is a feature and a bug of these functions: they can find > programming errors where one passes a character sequence where a string > is expected, and crash the program to nosily report the programmer > error. But that also makes it very slow, as Paul said. > > > > > E) Are these functions mentioned like ustpcpy() standardized by POSIX? or in use in a libc? > > No. They are my inventions, like stpecpy(). It seems I forgot to add a > "This function is not provided by any library" in some of them. > > Fixed; thanks. > <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=9848ac50ceb6cc4d786b3899ee4626959e5f1d81> > > > > > F) > > char *stpncpy(char dst[restrict .sz], const char *restrict src, > > size_t sz); > > I know the 'restrict' keyword, but haven't seen this way it attempts to specify the size of the 'dst' array by using the parameter 'sz' is this in wide use in APIs? I remember C11 let us specify char ptr[static 1] to say the pointer must be at least 1 element in this example > > It continues meaning the same thing. If you use array notation, the > restrict must be placed inside the brackets. The following two snippets > are equivalent C code: > > void foo(int *p, int *restrict x); > void foo(int *p, int x[restrict 7]); > > Since I didn't use 'static', to ISO C the array notation is ignored. > GCC, however, will be reasonable and understand it. To GCC, there's not > much difference between the following: > > [[gnu::nonnull]] > void bar(int x[7]); > void bar(int x[static 7]); > > And of course, you can combine static and restrict: > > void baz(int *p, int x[static restrict 7]); > > > > > Saw a few pages started to write out functions like > > size_t strnlen(const char s[.maxlen], size_t maxlen); > > > > Is this just for documentation? usually it would be: const char s[static maxlen] > > I don't like static for array parameters. Specifying a size for a > parameter should similarly signify to the compiler that it should expect > no less than N elements. This is how GCC behaves. > > And static has another implication: nonnull. IMO, nonnull is tangential > to array size, and should be specified separately with its own attribute > or qualifier. I'd like to be able to specify the following different > cases: > > void f1(int [10]); // NULL, or array of size >= 10 > void f2(int [_Nonnull 10]); // Array of size >=10 > > With static, I can only do the second. Quite unreasonable. > > > Regarding the '.', consider the following two snippets: > > int size; // This is the size of s[size]. > void g1(char s[size], size_t size); > > You could be tricked to think that the size of s[] is the second > parameter to the function, but it's the global variable size. > > void g2(char s[size], size_t size); > > Here's, since there's no global size, the code won't even compile. > There's no way to use a parameter that comes later as a size, conforming > to ISO C. We were discussing this [.identifier] syntax in linux-man@ > and gcc@, as a possible extension. We haven't yet decided on it, but > I'm previewing it as a documentation extension for now. The rationale > for the syntax comes from similarity with designated initializers for > structures. > > > > > G) "Because these functions ask for the length, and a string is by > > nature composed of a character sequence of the same length plus a > > terminating null byte, a string is also accepted as input." > > > > I suggest to adjust the order so it doesn't start with a fragment: > > > > "A string is also accepted as input, because these functions ask > > for the length, and a string is by nature composed of a character > > sequence of the same length plus a terminating null byte." > > > > Could simplify and remove "by nature". > > Yep; thanks. > <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=78b2ff8c6f25654648f0fa06c310b87a7e49128e> > > > > > Unrelated man page strncpy, noticed this. > > > > SEE ALSO > > Could this refer to strcpy(3) and string(3) at the bottom? > > https://man7.org/linux/man-pages/man3/strncpy.3.html > > I removed it on purpose, because I intended to put some distance between > strncpy(3), and strings and string-copying functions like strcpy(3). > > That's why I point to string_copying(7), where readers should be > educated of all of the differences. Then, string_copying(7) has a more > complete SEE ALSO, because it has already detailed all the different > functions, and the reader is ready to read the individual pages. > > Kind regards, > Alex > > > > > With kind regards > > Jonny > > > > > > > > -- > <https://www.alejandro-colomar.es/> -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-11 22:36 ` Alejandro Colomar 2023-11-11 23:19 ` Alejandro Colomar @ 2023-11-17 21:46 ` Jonny Grant 2023-11-18 9:37 ` PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) Alejandro Colomar 2023-11-18 9:44 ` NULL safety " Alejandro Colomar 1 sibling, 2 replies; 138+ messages in thread From: Jonny Grant @ 2023-11-17 21:46 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man Thank you for your swift replies Alejandro and incorporating changes. On 11/11/2023 22:36, Alejandro Colomar wrote: > Hi Jonny, > > On Sat, Nov 11, 2023 at 09:15:12PM +0000, Jonny Grant wrote: >> Alejandro >> >> I was reading again >> https://man7.org/linux/man-pages/man7/string_copying.7.html >> >> Sharing some comments, I realise not latest man page, if you have a new one online I could read that. I was reading man-pages 6.04, perhaps some already updated. > > You can check this one: > > <https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/man-pages-6.05.01.pdf#string_copying_7> > also available here: > <https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages-6.05.01.pdf#string_copying_7> > > And of course, you can install them from source, or read them from the > repository itself. That's good if you have your online PDF version of unreleased versions I could read through. >> A) Could simplify and remove the "This function" and "These functions" that start each function description. > > Fixed; thanks. > > <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=53ea8765ed7f9733abf96e86df89619dc3d203ef> > >> >> B) "RETURN VALUE" has the text before each function, rather than after as would be the convention from "DESCRIPTION", I suggest to move the return value text after each function name. > > Fixed; thanks. > > <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=76316bd6f98c58d70c2330f7d2a945aac7c76dd8> > >> >> Could make it like https://man7.org/linux/man-pages/man3/string.3.html >> >> C) In the examples, it's good stpecpy() checks for NULL pointers, the other's don't yet though. > > The reason is interesting. I also designed a similar function based on > snprintf(3), which can be chained with this one. Since that one can > return NULL, and to reduce the number of times one needs to check for > errors, I added the NULL check. That's good, any API that allocates memory could in theory return NULL, like strdup() too. > alx@debian:~/src/shadow/shadow/master$ grepc -tfd stpeprintf . > ./lib/stpeprintf.h:inline char * > stpeprintf(char *dst, char *end, const char *restrict fmt, ...) > { > char *p; > va_list ap; > > va_start(ap, fmt); > p = vstpeprintf(dst, end, fmt, ap); > va_end(ap); > > return p; > } > alx@debian:~/src/shadow/shadow/master$ grepc -tfd vstpeprintf . > ./lib/stpeprintf.h:inline char * > vstpeprintf(char *dst, char *end, const char *restrict fmt, va_list ap) > { > int len; > ptrdiff_t size; > > if (dst == end) > return end; > if (dst == NULL) > return NULL; > > size = end - dst; > len = vsnprintf(dst, size, fmt, ap); > > if (len == -1) > return NULL; > if (len >= size) > return end; > > return dst + len; > } > alx@debian:~/src/shadow/shadow/master$ grepc -tfd stpecpy . > ./lib/stpecpy.h:inline char * > stpecpy(char *dst, char *end, const char *restrict src) > { > bool trunc; > char *p; > size_t dsize, dlen, slen; > > if (dst == end) > return end; > if (dst == NULL) > return NULL; > > dsize = end - dst; > slen = strnlen(src, dsize); > trunc = (slen == dsize); > dlen = slen - trunc; > > p = mempcpy(dst, src, dlen); > *p = '\0'; > > return p + trunc; > } > > > Then you can use them like this: > > > end = buf + sizeof(buf); > p = buf; > p = stpecpy(p, end, "Hello "); > p = stpeprintf(p, end, "%d realms", 9); > p = stpecpy(p, end, "!"); > if (p == end) { > p--; > goto toolong; > } > len = p - buf; > puts(buf); > > > Regarding other string-copying functions, NULL is not inherent to them, > so I'm not sure if they should have explicit NULL checks. Why would > these functions receive a null pointer? The main possibility is that > the programmer forgot to check some malloc(3) call, which should receive > a different treatment from a failed copy, normally. Perhaps it's just my point of view. In safety critical software I always do my best to ensure no code calls an API with the null pointer constant - when it's expecting a valid pointer. Given that the null pointer constant is defined in the C standard, even if APIs have undefined behaviour if they require a pointer but are passed a NULL. So the converse is I make APIs check for NULL (if they require a valid pointer) and reject with an error. Covers all bases (there can be corrupt data files occurring that we can't anticipate), so issues can be logged, and no core dump. I'd rather display a "USB device error 51" message on a UI than suffer a core dump which turns off a piece of safety critical equipment or sends it into a restart death loop. I recall you mentioned [[gnu::nonnull]] aka __attribute__((nonnull)) which is an optimizer hint the API will always be called with a valid pointer. There is also returns_nonnull. The difficulty is the optimizer will remove any NULL pointer constant checks within those APIs (if there were any). The side effect is a useful compiler warning, if the compiler figures out someone is passing NULL. So in a safety critical system we must wrap all such APIs, to put back in the null pointer constant checks. > >> D) strlcpy says >> "These functions force a SIGSEGV if the src pointer is not a string." >> How does it determine the pointer isn't a string? > > By calling strlen(src). If it isn't a string, it'll continue reading, > and likely crash due to an unbound read. However, the SIGSEGV isn't > guaranteed, since it may find a 0 well before crashing, so I removed > that text. It is a feature and a bug of these functions: they can find > programming errors where one passes a character sequence where a string > is expected, and crash the program to nosily report the programmer > error. But that also makes it very slow, as Paul said. Ok I see what you mean. It's good you took out that line, I recall there was even a raise(SIGSEGV) in the implementation in a previous version of the man page. I wish programmers would keep track of the length of their strings if they need performance, with the pointer to avoid all these strlen(). So then we'd only need to use strnlen() to sanity check buffers given by external libraries. There are so may variations on this idea to avoid C-string with NUL terminator. Using a 'struct sbuf' to contain the string buffer https://man.freebsd.org/cgi/man.cgi?query=sbuf&apropos=0&sektion=0&manpath=FreeBSD+8.2-RELEASE&format=html C++ has all it's STL containers like std::string. Other APIs prefer start_ptr, end_ptr (the one after the last character), probably they should also keep the current allocated buffer size, or always do a realloc() when appending. Others may think differently, that's fine, not all uses of C are the same target. >> >> E) Are these functions mentioned like ustpcpy() standardized by POSIX? or in use in a libc? > > No. They are my inventions, like stpecpy(). It seems I forgot to add a > "This function is not provided by any library" in some of them. > > Fixed; thanks. > <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=9848ac50ceb6cc4d786b3899ee4626959e5f1d81> > >> >> F) >> char *stpncpy(char dst[restrict .sz], const char *restrict src, >> size_t sz); >> I know the 'restrict' keyword, but haven't seen this way it attempts to specify the size of the 'dst' array by using the parameter 'sz' is this in wide use in APIs? I remember C11 let us specify char ptr[static 1] to say the pointer must be at least 1 element in this example > > It continues meaning the same thing. If you use array notation, the > restrict must be placed inside the brackets. The following two snippets > are equivalent C code: > > void foo(int *p, int *restrict x); > void foo(int *p, int x[restrict 7]); > > Since I didn't use 'static', to ISO C the array notation is ignored. > GCC, however, will be reasonable and understand it. To GCC, there's not > much difference between the following: > > [[gnu::nonnull]] > void bar(int x[7]); > void bar(int x[static 7]); > > And of course, you can combine static and restrict: > > void baz(int *p, int x[static restrict 7]); > >> >> Saw a few pages started to write out functions like >> size_t strnlen(const char s[.maxlen], size_t maxlen); >> >> Is this just for documentation? usually it would be: const char s[static maxlen] > > I don't like static for array parameters. Specifying a size for a > parameter should similarly signify to the compiler that it should expect > no less than N elements. This is how GCC behaves. > > And static has another implication: nonnull. IMO, nonnull is tangential > to array size, and should be specified separately with its own attribute > or qualifier. I'd like to be able to specify the following different > cases: > > void f1(int [10]); // NULL, or array of size >= 10 > void f2(int [_Nonnull 10]); // Array of size >=10 > > With static, I can only do the second. Quite unreasonable. > > > Regarding the '.', consider the following two snippets: > > int size; // This is the size of s[size]. > void g1(char s[size], size_t size); > > You could be tricked to think that the size of s[] is the second > parameter to the function, but it's the global variable size. > > void g2(char s[size], size_t size); > > Here's, since there's no global size, the code won't even compile. > There's no way to use a parameter that comes later as a size, conforming > to ISO C. We were discussing this [.identifier] syntax in linux-man@ > and gcc@, as a possible extension. We haven't yet decided on it, but > I'm previewing it as a documentation extension for now. The rationale > for the syntax comes from similarity with designated initializers for > structures. That would be good if it got in ISO C. >> G) "Because these functions ask for the length, and a string is by >> nature composed of a character sequence of the same length plus a >> terminating null byte, a string is also accepted as input." >> >> I suggest to adjust the order so it doesn't start with a fragment: >> >> "A string is also accepted as input, because these functions ask >> for the length, and a string is by nature composed of a character >> sequence of the same length plus a terminating null byte." >> >> Could simplify and remove "by nature". > > Yep; thanks. > <https://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git/commit/?h=contrib&id=78b2ff8c6f25654648f0fa06c310b87a7e49128e> > >> >> Unrelated man page strncpy, noticed this. >> >> SEE ALSO >> Could this refer to strcpy(3) and string(3) at the bottom? >> https://man7.org/linux/man-pages/man3/strncpy.3.html > > I removed it on purpose, because I intended to put some distance between > strncpy(3), and strings and string-copying functions like strcpy(3). > > That's why I point to string_copying(7), where readers should be > educated of all of the differences. Then, string_copying(7) has a more > complete SEE ALSO, because it has already detailed all the different > functions, and the reader is ready to read the individual pages. > > Kind regards, > Alex Fair enough. We've all shared a lot going over strnlen and other points! Man pages are all better as a result of all your efforts. Kind regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) 2023-11-17 21:46 ` Jonny Grant @ 2023-11-18 9:37 ` Alejandro Colomar 2023-11-19 0:22 ` Deri 2023-11-18 9:44 ` NULL safety " Alejandro Colomar 1 sibling, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-18 9:37 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man, Deri James [-- Attachment #1: Type: text/plain, Size: 1976 bytes --] On Fri, Nov 17, 2023 at 09:46:47PM +0000, Jonny Grant wrote: > Thank you for your swift replies Alejandro and incorporating changes. :-) > >> I was reading again > >> https://man7.org/linux/man-pages/man7/string_copying.7.html > >> > >> Sharing some comments, I realise not latest man page, if you have a new one online I could read that. I was reading man-pages 6.04, perhaps some already updated. > > > > You can check this one: > > > > <https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/man-pages-6.05.01.pdf#string_copying_7> > > also available here: > > <https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages-6.05.01.pdf#string_copying_7> > > > > And of course, you can install them from source, or read them from the > > repository itself. > > That's good if you have your online PDF version of unreleased versions I could read through. I have that as a goal, but need some help. The thing is: we have <./scripts/LinuxManBook/>, which contains a Perl script and some helper stuff for it. It was contributed by gropdf(1)'s maintainer Deri James. Currently, that script does a lot of magic which produces the book from all of the pages. I'd like to be able to split the script into several smaller scripts that can be run on each page, and then another script that merges all of them into the single PDF file. That would be something I can merge into the Makefiles so that we can run a `make build-pdf` and if I touch a single page, it would only update the relevant part, reusing as much as possible from previous runs. Since I don't understand Perl, and don't know much of gropdf(1) either, I need help. Maybe Deri or Branden can help with that. If anyone else understands it and can also help, that's very welcome too! Then I could install a hook in my server that runs $ make build-pdf docdir=/srv/www/... Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) 2023-11-18 9:37 ` PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) Alejandro Colomar @ 2023-11-19 0:22 ` Deri 2023-11-19 1:19 ` Alejandro Colomar 0 siblings, 1 reply; 138+ messages in thread From: Deri @ 2023-11-19 0:22 UTC (permalink / raw) To: Jonny Grant, Alejandro Colomar; +Cc: linux-man On Saturday, 18 November 2023 09:37:17 GMT Alejandro Colomar wrote: > On Fri, Nov 17, 2023 at 09:46:47PM +0000, Jonny Grant wrote: > > Thank you for your swift replies Alejandro and incorporating changes. > : > :-) > : > > >> I was reading again > > >> https://man7.org/linux/man-pages/man7/string_copying.7.html > > >> > > >> Sharing some comments, I realise not latest man page, if you have a new > > >> one online I could read that. I was reading man-pages 6.04, perhaps > > >> some already updated.> > > > > You can check this one: > > > > > > <https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/ma > > > n-pages-6.05.01.pdf#string_copying_7> also available here: > > > <https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/book/man-pages > > > -6.05.01.pdf#string_copying_7> > > > > > > And of course, you can install them from source, or read them from the > > > repository itself. > > > > That's good if you have your online PDF version of unreleased versions I > > could read through. > I have that as a goal, but need some help. The thing is: we have > <./scripts/LinuxManBook/>, which contains a Perl script and some helper > stuff for it. It was contributed by gropdf(1)'s maintainer Deri James. > > Currently, that script does a lot of magic which produces the book from > all of the pages. > > I'd like to be able to split the script into several smaller scripts > that can be run on each page, and then another script that merges all of > them into the single PDF file. That would be something I can merge into > the Makefiles so that we can run a `make build-pdf` and if I touch a > single page, it would only update the relevant part, reusing as much as > possible from previous runs. Hi Alex, I assume you are thinking this will make production more efficient (quicker). The time saved would be absolutely minimal. It is obvious that to produce a pdf containing all the man pages then all the man pages have to be consumed by groff, not just the page which has changed. On my system this takes about 18 seconds to produce the 2800+ pages of the book. Of this, a quarter of a second is consumed by the "magic" part of the script, the rest of the 18 seconds is consumed by calls to groff and gropdf. So any splitting of the perl script is only going to have an effect on the quarter of a second! I don't understand why the perl script can't be included in your make file as part of build-pdf target. Presumably it would be dependent on running after the scripts which add the revision label and date to each man page. > Since I don't understand Perl, and don't know much of gropdf(1) either, > I need help. > > Maybe Deri or Branden can help with that. If anyone else understands it > and can also help, that's very welcome too! You are probably better placed to add the necessaries to your makefile. You would then just need to remember to make build-pdf any time you alter one of the source man pages. Since you are manually running my script to produce the pdf, it should not be difficult to automate it in a makefile. > Then I could install a hook in my server that runs > > $ make build-pdf docdir=/srv/www/... And wait 18s each time the hook is actioned!! Or, set the build to place the generated pdf somewhere in /srv/www/... and include the build in your normal workflow when a man page is changed. Cheers Deri > Cheers, > Alex ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) 2023-11-19 0:22 ` Deri @ 2023-11-19 1:19 ` Alejandro Colomar 2023-11-19 9:29 ` Alejandro Colomar 2023-11-19 16:21 ` Deri 0 siblings, 2 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-19 1:19 UTC (permalink / raw) To: Deri; +Cc: Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 3480 bytes --] Hi Deri! On Sun, Nov 19, 2023 at 12:22:56AM +0000, Deri wrote: > Hi Alex, > > I assume you are thinking this will make production more efficient (quicker). Not necessarily. The main reason is that I want to be able to inspect and understand every little step of the groff pipeline. See for example how I build a pdf from a single page: $ touch man2/membarrier.2 $ make build-pdf PRECONV .tmp/man/man2/membarrier.2.tbl TBL .tmp/man/man2/membarrier.2.eqn EQN .tmp/man/man2/membarrier.2.pdf.troff TROFF .tmp/man/man2/membarrier.2.pdf.set GROPDF .tmp/man/man2/membarrier.2.pdf That helps debug the pipeline, and also learn about it. If that helps parallelize some tasks, then that'll be welcome. > The time saved would be absolutely minimal. It is obvious that to produce a > pdf containing all the man pages then all the man pages have to be consumed by > groff, not just the page which has changed. But do you need to run the entire pipeline, or can you reuse most of it? I can process in parallel much faster, with `make -jN ...`. I guess the .pdf.troff files can be reused; maybe even the .pdf.set ones? Could you change the script at least to produce intermediary files as in the pipeline shown above? As many as possible would be excellent. > On my system this takes about 18 > seconds to produce the 2800+ pages of the book. Of this, a quarter of a second > is consumed by the "magic" part of the script, the rest of the 18 seconds is > consumed by calls to groff and gropdf. But how much of that work needs to be on a single process? I bought a new CPU with 24 cores. Gotta use them all :D > So any splitting of the perl script is > only going to have an effect on the quarter of a second! > > I don't understand why the perl script can't be included in your make file as > part of build-pdf target. It can. I just prefer to be strict about the Makefile having "one rule per each file", while currently the script generates 4 files (T, two .Z's, and the .pdf). > Presumably it would be dependent on running after > the scripts which add the revision label and date to each man page. I only set the revision and date on dist tarballs. For the git HEAD book, I'd keep the (unreleased) version and (date). So, no worries there. > > > Since I don't understand Perl, and don't know much of gropdf(1) either, > > I need help. > > > > Maybe Deri or Branden can help with that. If anyone else understands it > > and can also help, that's very welcome too! > > You are probably better placed to add the necessaries to your makefile. You > would then just need to remember to make build-pdf any time you alter one of > the source man pages. Since you are manually running my script to produce the > pdf, it should not be difficult to automate it in a makefile. > > > Then I could install a hook in my server that runs > > > > $ make build-pdf docdir=/srv/www/... > > And wait 18s each time the hook is actioned!! Or, set the build to place the > generated pdf somewhere in /srv/www/... and include the build in your normal > workflow when a man page is changed. Hmm. I still hope some of it can be parallelized, but 18s could be reasonable, if the server does that in the background after pushing. My old raspberry pi would burn, but the new computer should handle that just fine. Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) 2023-11-19 1:19 ` Alejandro Colomar @ 2023-11-19 9:29 ` Alejandro Colomar 2023-11-19 16:21 ` Deri 1 sibling, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-19 9:29 UTC (permalink / raw) To: Deri; +Cc: Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 4024 bytes --] On Sun, Nov 19, 2023 at 02:19:43AM +0100, Alejandro Colomar wrote: > Hi Deri! > > On Sun, Nov 19, 2023 at 12:22:56AM +0000, Deri wrote: > > Hi Alex, > > > > I assume you are thinking this will make production more efficient (quicker). > > Not necessarily. The main reason is that I want to be able to inspect > and understand every little step of the groff pipeline. See for example > how I build a pdf from a single page: > > $ touch man2/membarrier.2 > $ make build-pdf > PRECONV .tmp/man/man2/membarrier.2.tbl > TBL .tmp/man/man2/membarrier.2.eqn > EQN .tmp/man/man2/membarrier.2.pdf.troff > TROFF .tmp/man/man2/membarrier.2.pdf.set > GROPDF .tmp/man/man2/membarrier.2.pdf > > That helps debug the pipeline, and also learn about it. > > If that helps parallelize some tasks, then that'll be welcome. > > > The time saved would be absolutely minimal. It is obvious that to produce a > > pdf containing all the man pages then all the man pages have to be consumed by > > groff, not just the page which has changed. > > But do you need to run the entire pipeline, or can you reuse most of it? > I can process in parallel much faster, with `make -jN ...`. I guess > the .pdf.troff files can be reused; maybe even the .pdf.set ones? > > Could you change the script at least to produce intermediary files as in > the pipeline shown above? As many as possible would be excellent. And if then you could split the Perl script so that it is composed of several subcripts called by the main script, and each subscript produces exactly one file, that's be great. I could call each of those smaller scripts in a Makefile rule. > > > On my system this takes about 18 > > seconds to produce the 2800+ pages of the book. Of this, a quarter of a second > > is consumed by the "magic" part of the script, the rest of the 18 seconds is > > consumed by calls to groff and gropdf. > > But how much of that work needs to be on a single process? I bought a > new CPU with 24 cores. Gotta use them all :D > > > So any splitting of the perl script is > > only going to have an effect on the quarter of a second! > > > > I don't understand why the perl script can't be included in your make file as > > part of build-pdf target. > > It can. I just prefer to be strict about the Makefile having "one rule > per each file", while currently the script generates 4 files (T, two > .Z's, and the .pdf). > > > Presumably it would be dependent on running after > > the scripts which add the revision label and date to each man page. > > I only set the revision and date on dist tarballs. For the git HEAD > book, I'd keep the (unreleased) version and (date). So, no worries > there. > > > > > > Since I don't understand Perl, and don't know much of gropdf(1) either, > > > I need help. > > > > > > Maybe Deri or Branden can help with that. If anyone else understands it > > > and can also help, that's very welcome too! > > > > You are probably better placed to add the necessaries to your makefile. You > > would then just need to remember to make build-pdf any time you alter one of > > the source man pages. Since you are manually running my script to produce the > > pdf, it should not be difficult to automate it in a makefile. > > > > > Then I could install a hook in my server that runs > > > > > > $ make build-pdf docdir=/srv/www/... > > > > And wait 18s each time the hook is actioned!! Or, set the build to place the > > generated pdf somewhere in /srv/www/... and include the build in your normal > > workflow when a man page is changed. > > Hmm. I still hope some of it can be parallelized, but 18s could be > reasonable, if the server does that in the background after pushing. > My old raspberry pi would burn, but the new computer should handle that > just fine. > > Cheers, > Alex > > -- > <https://www.alejandro-colomar.es/> -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) 2023-11-19 1:19 ` Alejandro Colomar 2023-11-19 9:29 ` Alejandro Colomar @ 2023-11-19 16:21 ` Deri 2023-11-19 20:58 ` Alejandro Colomar 1 sibling, 1 reply; 138+ messages in thread From: Deri @ 2023-11-19 16:21 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Jonny Grant, linux-man On Sunday, 19 November 2023 01:19:43 GMT Alejandro Colomar wrote: > Hi Deri! > > On Sun, Nov 19, 2023 at 12:22:56AM +0000, Deri wrote: > > Hi Alex, > > > > I assume you are thinking this will make production more efficient > > (quicker). > Not necessarily. The main reason is that I want to be able to inspect > and understand every little step of the groff pipeline. See for example > how I build a pdf from a single page: > > $ touch man2/membarrier.2 > $ make build-pdf > PRECONV .tmp/man/man2/membarrier.2.tbl > TBL .tmp/man/man2/membarrier.2.eqn > EQN .tmp/man/man2/membarrier.2.pdf.troff > TROFF .tmp/man/man2/membarrier.2.pdf.set > GROPDF .tmp/man/man2/membarrier.2.pdf > > That helps debug the pipeline, and also learn about it. > > If that helps parallelize some tasks, then that'll be welcome. Hi Alex, Doing it that way actually stops the jobs being run in parallel! Each step completes before the next step starts, whereas if you let groff build the pipeline all the processes are run in parallel. Using separate steps may be desirable for "understanding every little step of the groff pipeline", (and may aid debugging an issue), but once such knowledge is obtained it is probably better to leave the pipelining to groff, in a production environment. > > The time saved would be absolutely minimal. It is obvious that to produce > > a > > pdf containing all the man pages then all the man pages have to be > > consumed by groff, not just the page which has changed. > > But do you need to run the entire pipeline, or can you reuse most of it? > I can process in parallel much faster, with `make -jN ...`. I guess > the .pdf.troff files can be reused; maybe even the .pdf.set ones? > > Could you change the script at least to produce intermediary files as in > the pipeline shown above? As many as possible would be excellent. Perhaps it would help if I explain the stages of my script. First a look at what the script needs to do to produce a pdf of all man pages. There are too many files to produce a single command line with all the filenames of each man, groff has no mechanism for passing a list of filenames, so first job is to concatenate all the separate files into one input file for groff. And while we are doing that, add the "magic sauce" which makes all the pdf links in the book and sorts out the aliases which point to another man page. After this is done there is a single troff file, called LMB.man, which is the file groff is going to process. In the script you should see something like this:- my $temp='LMB.man'; [...] my $format='pdf'; my $paper=$fpaper ||'; my $cmdstring="-T$format -k -pet -M. -F. -mandoc -manmark -dpaper=$paper -P- p$paper -rC1 -rCHECKSTYLE=3"; my $front='LMBfront.t'; my $frontdit='LMBfront.set'; my $mandit='LinuxManBook.set'; my $book="LinuxManBook.$format"; system("groff -T$format -dpaper=$paper -P-p$paper -ms $front -Z > $frontdit"); system("groff -z -dPDF.EXPORT=1 -dLABEL.REFS=1 $temp $cmdstring 2>&1 | LC_ALL=C grep '^\\. *ds' | groff -T$format $cmdstring - $temp -Z > $mandit"); system("./gro$format -F.:/usr/share/groff/current/font $frontdit $mandit - p$paper > $book"); (This includes changes by Brian Inglish ts). If you remove the lines which call system you will end up with just the single file LMB.man (in about a quarter of a second). You can treat this file just the same as your single page example if you want to. The first system call creates the title page from the troff source file LMBfront.t and produces LMBfront.set, this can be added to your makefile as an entirely separate rule depending on whether the .set file needs to be built. The second and third system calls are the calls to groff which could be put into your makefile or split into separate stages to avoid parallelism. The second system call produces LinuxManBook.set and the third system combines this with LMBfront.set to produce the pdf. The "./" in the third system call is because I gave you a pre-release gropdf, you may be using the released 1.23.0 gropdf now. > > On my system this takes about 18 > > seconds to produce the 2800+ pages of the book. Of this, a quarter of a > > second is consumed by the "magic" part of the script, the rest of the 18 > > seconds is consumed by calls to groff and gropdf. > > But how much of that work needs to be on a single process? I bought a > new CPU with 24 cores. Gotta use them all :D I realise you are having difficulty in letting go of your idea of re-using previous work, rather than starting afresh each time. Imagine a single word change in one man page causes it to grow from 2 pages to 3, so all links to pages after this changed entry would be one page adrift. This is why very little previous work is useful, and why the whole book has to be dealt with as a single process. If each entry was processed separately, as you would like to use all your shiny new cores, how would the process dealing with accept(2) know which page socket(2) would be on when it adds it as a link in the text. I hope you can see that at some point it has to be treated as a homogenous whole in order calculate correct links between entries. > > So any splitting of the perl script is > > only going to have an effect on the quarter of a second! > > > > I don't understand why the perl script can't be included in your make file > > as part of build-pdf target. > > It can. I just prefer to be strict about the Makefile having "one rule > per each file", while currently the script generates 4 files (T, two > .Z's, and the .pdf). Explained how to separate above so that the script only generates LMB.man and the system calls moved to the makefile. > > Presumably it would be dependent on running after > > the scripts which add the revision label and date to each man page. > > I only set the revision and date on dist tarballs. For the git HEAD > book, I'd keep the (unreleased) version and (date). So, no worries > there. Given that you seem to intend to offer these interim books as a download, it would make sense if they included either a date or git commit ID to differenciate them, if someone queries something it would be useful to know exactly what they were looking at. Cheers Deri > > > Since I don't understand Perl, and don't know much of gropdf(1) either, > > > I need help. > > > > > > Maybe Deri or Branden can help with that. If anyone else understands it > > > and can also help, that's very welcome too! > > > > You are probably better placed to add the necessaries to your makefile. > > You > > would then just need to remember to make build-pdf any time you alter one > > of the source man pages. Since you are manually running my script to > > produce the pdf, it should not be difficult to automate it in a makefile. > > > > > Then I could install a hook in my server that runs > > > > > > $ make build-pdf docdir=/srv/www/... > > > > And wait 18s each time the hook is actioned!! Or, set the build to place > > the generated pdf somewhere in /srv/www/... and include the build in your > > normal workflow when a man page is changed. > > Hmm. I still hope some of it can be parallelized, but 18s could be > reasonable, if the server does that in the background after pushing. > My old raspberry pi would burn, but the new computer should handle that > just fine. I'm confused. The 18s is how long it takes to generate the book, so if the book is built in response to an access to a particular url the http server can't start "pushing" for the 18s, then addon the transfer time for the pdf and I suspect you will have a lot of aborted transfers. Additionally, the script, and any makefile equivalent you write, is not designed for concurrent invocation, so if two people visit the same url within the 18 second window neither user will receive a valid pdf. I advise the build becomes part of your workflow after making changes, and then place the pdf in a location where it can be served by the http server. Your model of slicing and dicing man pages to be processed individually is doable using a website to serve the individual pages, see:- http://chuzzlewit.co.uk/WebManPDF.pl/man:/2/accept This is running on a 1" cube no more powerful than a raspberry pi 3. The difference is that the "magic sauce" added to each man page sets the links to external http calls back to itself to produce another man page, rather than internal links to another part of the pdf. You can get an index of all the man pages, on the (very old) system, here. http://chuzzlewit.co.uk/ Cheers Deri > Cheers, > Alex ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) 2023-11-19 16:21 ` Deri @ 2023-11-19 20:58 ` Alejandro Colomar 2023-11-20 0:46 ` G. Branden Robinson 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-19 20:58 UTC (permalink / raw) To: Deri; +Cc: Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 11139 bytes --] On Sun, Nov 19, 2023 at 04:21:45PM +0000, Deri wrote: > > $ touch man2/membarrier.2 > > $ make build-pdf > > PRECONV .tmp/man/man2/membarrier.2.tbl > > TBL .tmp/man/man2/membarrier.2.eqn > > EQN .tmp/man/man2/membarrier.2.pdf.troff > > TROFF .tmp/man/man2/membarrier.2.pdf.set > > GROPDF .tmp/man/man2/membarrier.2.pdf > > > > That helps debug the pipeline, and also learn about it. > > > > If that helps parallelize some tasks, then that'll be welcome. > > Hi Alex, Hi Deri, > Doing it that way actually stops the jobs being run in parallel! Each step Hmm, kind of makes sense. > completes before the next step starts, whereas if you let groff build the > pipeline all the processes are run in parallel. Using separate steps may be > desirable for "understanding every little step of the groff pipeline", (and Still a useful thing for our build system. > may aid debugging an issue), but once such knowledge is obtained it is > probably better to leave the pipelining to groff, in a production environment. Unless performance is really a problem, I prefer the understanding and debugging aid. It'll help not only me, but others who see the project and would like to learn how all this magic works. > > > The time saved would be absolutely minimal. It is obvious that to produce > > > a > > > pdf containing all the man pages then all the man pages have to be > > > consumed by groff, not just the page which has changed. > > > > But do you need to run the entire pipeline, or can you reuse most of it? > > I can process in parallel much faster, with `make -jN ...`. I guess > > the .pdf.troff files can be reused; maybe even the .pdf.set ones? > > > > Could you change the script at least to produce intermediary files as in > > the pipeline shown above? As many as possible would be excellent. > > Perhaps it would help if I explain the stages of my script. First a look at > what the script needs to do to produce a pdf of all man pages. There are too > many files to produce a single command line with all the filenames of each > man, groff has no mechanism for passing a list of filenames, so first job is You can always `find ... | xargs cat | troff /dev/stdin` > to concatenate all the separate files into one input file for groff. And while > we are doing that, add the "magic sauce" which makes all the pdf links in the > book and sorts out the aliases which point to another man page. Yep, I think I partially understood that part of the script today. It's what this `... | LC_ALL=C grep '^\\. *ds' |` pipeline produces and passes to groff, right? > After this is done there is a single troff file, called LMB.man, which is the That's what's currently called LinuxManBook.Z, right? > file groff is going to process. In the script you should see something like > this:- > > my $temp='LMB.man'; I don't. Maybe you have a slightly different version of it? > [...] > > my $format='pdf'; > my $paper=$fpaper ||'; > my $cmdstring="-T$format -k -pet -M. -F. -mandoc -manmark -dpaper=$paper -P- > p$paper -rC1 -rCHECKSTYLE=3"; > my $front='LMBfront.t'; > my $frontdit='LMBfront.set'; > my $mandit='LinuxManBook.set'; > my $book="LinuxManBook.$format"; > > system("groff -T$format -dpaper=$paper -P-p$paper -ms $front -Z > $frontdit"); This creates the front page .set file > system("groff -z -dPDF.EXPORT=1 -dLABEL.REFS=1 $temp $cmdstring 2>&1 | > LC_ALL=C grep '^\\. *ds' | This creates the bookmarks, right? > groff -T$format $cmdstring - $temp -Z > $mandit"); And this is the main .set file. > system("./gro$format -F.:/usr/share/groff/current/font $frontdit $mandit - > p$paper > $book"); And finally we have the book. > > (This includes changes by Brian Inglish ts). If you remove the lines which > call system you will end up with just the single file LMB.man (in about a > quarter of a second). You can treat this file just the same as your single > page example if you want to. > > The first system call creates the title page from the troff source file > LMBfront.t and produces LMBfront.set, this can be added to your makefile as an > entirely separate rule depending on whether the .set file needs to be built. > > The second and third system calls are the calls to groff which could be put > into your makefile or split into separate stages to avoid parallelism. > > The second system call produces LinuxManBook.set and the third system combines > this with LMBfront.set to produce the pdf. > > The "./" in the third system call is because I gave you a pre-release gropdf, > you may be using the released 1.23.0 gropdf now. > > > > On my system this takes about 18 > > > seconds to produce the 2800+ pages of the book. Of this, a quarter of a > > > second is consumed by the "magic" part of the script, the rest of the 18 > > > seconds is consumed by calls to groff and gropdf. > > > > But how much of that work needs to be on a single process? I bought a > > new CPU with 24 cores. Gotta use them all :D > > I realise you are having difficulty in letting go of your idea of re-using > previous work, rather than starting afresh each time. Imagine a single word > change in one man page causes it to grow from 2 pages to 3, so all links to > pages after this changed entry would be one page adrift. This is why very > little previous work is useful, and why the whole book has to be dealt with as > a single process. Does such a change need re-running troff(1)? Or is gropdf(1) enough? If troff(1) My problem is probably that I don't know what's done by `gropdf`, and what's done by `troff -Tpdf`. I was hoping that `troff -Tpdf` still didn't need to know about the entire book, and that only gropdf(1) would need that. > If each entry was processed separately, as you would like to > use all your shiny new cores, how would the process dealing with accept(2) > know which page socket(2) would be on when it adds it as a link in the text. I > hope you can see that at some point it has to be treated as a homogenous whole > in order calculate correct links between entries. > > > > So any splitting of the perl script is > > > only going to have an effect on the quarter of a second! > > > > > > I don't understand why the perl script can't be included in your make file > > > as part of build-pdf target. > > > > It can. I just prefer to be strict about the Makefile having "one rule > > per each file", while currently the script generates 4 files (T, two > > .Z's, and the .pdf). > > Explained how to separate above so that the script only generates LMB.man and > the system calls moved to the makefile. Thanks! > > > Presumably it would be dependent on running after > > > the scripts which add the revision label and date to each man page. > > > > I only set the revision and date on dist tarballs. For the git HEAD > > book, I'd keep the (unreleased) version and (date). So, no worries > > there. > > Given that you seem to intend to offer these interim books as a download, it > would make sense if they included either a date or git commit ID to > differenciate them, if someone queries something it would be useful to know > exactly what they were looking at. The books for releases are available at <https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/man-pages-6.05.01.pdf> (replace the version numbers for other versions, or navigate the dirs) I need to document that in the README of the project. For git HEAD, I plan to have something like <https://www.alejandro-colomar.es/share/dist/man-pages/git/man-pages-HEAD.pdf> It's mainly intended for easily checking what git HEAD looks like, and discard that later. If the audience asks for version numbers, though, I could create provide `git --describe` versions and dates in the pages. > Cheers > > Deri > > > > > Since I don't understand Perl, and don't know much of gropdf(1) either, > > > > I need help. > > > > > > > > Maybe Deri or Branden can help with that. If anyone else understands it > > > > and can also help, that's very welcome too! > > > > > > You are probably better placed to add the necessaries to your makefile. > > > You > > > would then just need to remember to make build-pdf any time you alter one > > > of the source man pages. Since you are manually running my script to > > > produce the pdf, it should not be difficult to automate it in a makefile. > > > > > > > Then I could install a hook in my server that runs > > > > > > > > $ make build-pdf docdir=/srv/www/... > > > > > > And wait 18s each time the hook is actioned!! Or, set the build to place > > > the generated pdf somewhere in /srv/www/... and include the build in your > > > normal workflow when a man page is changed. > > > > Hmm. I still hope some of it can be parallelized, but 18s could be > > reasonable, if the server does that in the background after pushing. > > My old raspberry pi would burn, but the new computer should handle that > > just fine. > > I'm confused. The 18s is how long it takes to generate the book, so if the > book is built in response to an access to a particular url the http server > can't start "pushing" for the 18s, then addon the transfer time for the pdf > and I suspect you will have a lot of aborted transfers. Additionally, the > script, and any makefile equivalent you write, is not designed for concurrent > invocation, so if two people visit the same url within the 18 second window > neither user will receive a valid pdf. No, my intention is that whenever I `git push` via SSH, the receiving server runs `make build-book-pdf` after receiving the changes. That is run after the git SSH connection has closed, so I wouldn't notice. HTTP connections wouldn't trigger anything in my server, except Nginx serving the file, of course. > I advise the build becomes part of your workflow after making changes, and > then place the pdf in a location where it can be served by the http server. > > Your model of slicing and dicing man pages to be processed individually is > doable using a website to serve the individual pages, see:- > > http://chuzzlewit.co.uk/WebManPDF.pl/man:/2/accept > > This is running on a 1" cube no more powerful than a raspberry pi 3. The > difference is that the "magic sauce" added to each man page sets the links to > external http calls back to itself to produce another man page, rather than > internal links to another part of the pdf. You can get an index of all the man > pages, on the (very old) system, here. > > http://chuzzlewit.co.uk/ Yep, I've seen that server :) Long term I also intend to provide one-page PDFs and HTML files of the pages. Although I prefer pre-generating them, instead of on-demand. Maybe a git hook, or maybe a cron job that re-generates them once a day or so. Cheers, Alex > > Cheers > > Deri -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) 2023-11-19 20:58 ` Alejandro Colomar @ 2023-11-20 0:46 ` G. Branden Robinson 2023-11-20 9:43 ` Alejandro Colomar 0 siblings, 1 reply; 138+ messages in thread From: G. Branden Robinson @ 2023-11-20 0:46 UTC (permalink / raw) To: Alejandro Colomar, Deri; +Cc: Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 3642 bytes --] Hi Alex and Deri, I'm going to address just a few small parts of this message... At 2023-11-19T21:58:03+0100, Alejandro Colomar wrote: > You can always `find ... | xargs cat | troff /dev/stdin` ...not if you need to preprocess any of the input. With tbl(1), for instance. > My problem is probably that I don't know what's done by `gropdf`, and > what's done by `troff -Tpdf`. I was hoping that `troff -Tpdf` still > didn't need to know about the entire book, and that only gropdf(1) > would need that. This stuff is documented in groff's Texinfo manual, and in the groff(1) and roff(7) man pages. Here's an excerpt of the last. Using roff When you read a man page, often a roff is the program rendering it. Some roff implementations provide wrapper programs that make it easy to use the roff system from the shell’s command line. These can be specific to a macro package, like mmroff(1), or more general. groff(1) provides command‐line options sparing the user from constructing the long, order‐dependent pipelines familiar to AT&T troff users. Further, a heuristic program, grog(1), is available to infer from a document’s contents which groff arguments should be used to process it. The roff pipeline A typical roff document is prepared by running one or more processors in series, followed by a a formatter program and then an output driver (or “device postprocessor”). Commonly, these programs are structured into a pipeline; that is, each is run in sequence such that the output of one is taken as the input to the next, without passing through secondary storage. (On non‐Unix systems, pipelines may have to be simulated with temporary files.) $ preproc1 < input‐file | preproc2 | ... | troff [option] ... \ | output‐driver Once all preprocessors have run, they deliver pure roff language input to the formatter, which in turn generates a document in a page description language that is then interpreted by a postprocessor for viewing, printing, or further processing. gropdf(1) is the output driver for the PDF "device". So "groff -T pdf input.tr" and "troff -T pdf input.tr | gropdf" are equivalent. (Yes, you still need the `-T pdf` arguments, even to troff proper. roff(7) again: Concepts [...] When a device‐independent roff formatter starts up, it obtains information about the device for which it is preparing output from the latter’s description file (see groff_font(5)). An essential property is the length of the output line, such as “6.5 inches”. ) > > Your model of slicing and dicing man pages to be processed > > individually is doable using a website to serve the individual > > pages, see:- > > > > http://chuzzlewit.co.uk/WebManPDF.pl/man:/2/accept > > > > This is running on a 1" cube no more powerful than a raspberry pi 3. > > The difference is that the "magic sauce" added to each man page sets > > the links to external http calls back to itself to produce another > > man page, rather than internal links to another part of the pdf. You > > can get an index of all the man pages, on the (very old) system, > > here. > > > > http://chuzzlewit.co.uk/ > > Yep, I've seen that server :) Is it just me, or are the fonts not getting embedded in the PDFs generated by chuzzlewit? They look fine on my desktop machine but pretty bad on my Android tablet. Regards, Branden [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) 2023-11-20 0:46 ` G. Branden Robinson @ 2023-11-20 9:43 ` Alejandro Colomar 0 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-20 9:43 UTC (permalink / raw) To: G. Branden Robinson; +Cc: Deri, Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 4450 bytes --] Hi Branden, On Sun, Nov 19, 2023 at 06:46:29PM -0600, G. Branden Robinson wrote: > Hi Alex and Deri, > > I'm going to address just a few small parts of this message... > > At 2023-11-19T21:58:03+0100, Alejandro Colomar wrote: > > You can always `find ... | xargs cat | troff /dev/stdin` > > ...not if you need to preprocess any of the input. With tbl(1), for > instance. What I mean is that I can preprocess individually: find ... | while read f; do eqn $f > $f.troff; done And only process together in a single invocation what _needs_ to be done in a single invocation: find ... | xargs cat | gropdf /dev/stdin I guess that preprocessors can be run per-file. I know that gropdf(1) must be run with the entire book as input. But I don't know if `troff -Tpdf` needs to see the entire book at once, or if it can process each file separately. In my laptop, the pipeline for building the Linux Man Book takes 23.3 s. I've split the processing of the book so that I produce every intermediary file in the pipeline (except pic(1), which I think we don't need). From that, I've seen the times it takes for each program to do its job (and importantly, the overall time wasn't slower; it took again 23.3 s): preconv(1) takes 0.04 s; tbl(1) takes 0.06 s; eqn(1) takes 0.05 s; troff(1) takes 2.8 s; and gropdf(1) takes 17.6 s. The time taken by gropdf(1) is mandatory, since it can't process the individual files separately. But if we can reduce the time taken by all other programs close to 0, it would be good. It depends on which programs need to see the entire book, and which can process each file separately. Nevertheless, I think it's interesting to process the book per-file, as much as possible, even if the overall time won't change significantly. It is a good documentation of what needs to be processed together and what not, when building a PDF document with groff. > > My problem is probably that I don't know what's done by `gropdf`, and > > what's done by `troff -Tpdf`. I was hoping that `troff -Tpdf` still > > didn't need to know about the entire book, and that only gropdf(1) > > would need that. > > This stuff is documented in groff's Texinfo manual, and in the groff(1) > and roff(7) man pages. > > Here's an excerpt of the last. > > Using roff > When you read a man page, often a roff is the program rendering > it. Some roff implementations provide wrapper programs that make > it easy to use the roff system from the shell’s command line. > These can be specific to a macro package, like mmroff(1), or more > general. groff(1) provides command‐line options sparing the user > from constructing the long, order‐dependent pipelines familiar to > AT&T troff users. Further, a heuristic program, grog(1), is > available to infer from a document’s contents which groff > arguments should be used to process it. > > The roff pipeline > A typical roff document is prepared by running one or more > processors in series, followed by a a formatter program and then > an output driver (or “device postprocessor”). Commonly, these > programs are structured into a pipeline; that is, each is run in > sequence such that the output of one is taken as the input to the > next, without passing through secondary storage. (On non‐Unix > systems, pipelines may have to be simulated with temporary > files.) > > $ preproc1 < input‐file | preproc2 | ... | troff [option] ... \ > | output‐driver > > Once all preprocessors have run, they deliver pure roff language > input to the formatter, which in turn generates a document in a > page description language that is then interpreted by a > postprocessor for viewing, printing, or further processing. > > gropdf(1) is the output driver for the PDF "device". So "groff -T pdf > input.tr" and "troff -T pdf input.tr | gropdf" are equivalent. > > (Yes, you still need the `-T pdf` arguments, even to troff proper. This doesn't answer my doubt. For generating a book, does troff(1) need to see the entire book, or it enough if gropdf(1) does? My guess is that troff(1) also needs to see the entire book, but I don't know for sure. Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* NULL safety (was: strncpy clarify result may not be null terminated) 2023-11-17 21:46 ` Jonny Grant 2023-11-18 9:37 ` PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) Alejandro Colomar @ 2023-11-18 9:44 ` Alejandro Colomar 2023-11-18 23:21 ` NULL safety Jonny Grant 1 sibling, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-18 9:44 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 2306 bytes --] Hi Jonny, On Fri, Nov 17, 2023 at 09:46:47PM +0000, Jonny Grant wrote: > > Regarding other string-copying functions, NULL is not inherent to them, > > so I'm not sure if they should have explicit NULL checks. Why would > > these functions receive a null pointer? The main possibility is that > > the programmer forgot to check some malloc(3) call, which should receive > > a different treatment from a failed copy, normally. > > Perhaps it's just my point of view. In safety critical software I always do my best to ensure no code calls an API with the null pointer constant - when it's expecting a valid pointer. Given that the null pointer constant is defined in the C standard, even if APIs have undefined behaviour if they require a pointer but are passed a NULL. So the converse is I make APIs check for NULL (if they require a valid pointer) and reject with an error. Covers all bases (there can be corrupt data files occurring that we can't anticipate), so issues can be logged, and no core dump. I'd rather display a "USB device error 51" message on a UI than suffer a core dump which turns off a piece of safety critical equipment or sends it into a restart death loop. > > I recall you mentioned [[gnu::nonnull]] aka __attribute__((nonnull)) which is an optimizer hint the API will always be called with a valid pointer. There is also returns_nonnull. > > The difficulty is the optimizer will remove any NULL pointer constant checks within those APIs (if there were any). The side effect is a useful compiler warning, if the compiler figures out someone is passing NULL. > > So in a safety critical system we must wrap all such APIs, to put back in the null pointer constant checks. There's Clang's qualifier _Nonnull, which is not a hint to the optimizer. It is an attempt to have null correctness similar to how we have const correctness. It still has little support, even from Clang itself. It has some important problem: it applies to the pointer, not to the pointee, but pointer qualifiers are discarded easily. A better design would make it a pointee qualifier. Hopefully, this will some day be there to end all NULL discussions. Until then, yeah, NULL is a dangerous part of the language. Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: NULL safety 2023-11-18 9:44 ` NULL safety " Alejandro Colomar @ 2023-11-18 23:21 ` Jonny Grant 2023-11-24 22:25 ` Alejandro Colomar 0 siblings, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-18 23:21 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man On 18/11/2023 09:44, Alejandro Colomar wrote: > Hi Jonny, > > On Fri, Nov 17, 2023 at 09:46:47PM +0000, Jonny Grant wrote: >>> Regarding other string-copying functions, NULL is not inherent to them, >>> so I'm not sure if they should have explicit NULL checks. Why would >>> these functions receive a null pointer? The main possibility is that >>> the programmer forgot to check some malloc(3) call, which should receive >>> a different treatment from a failed copy, normally. >> >> Perhaps it's just my point of view. In safety critical software I always do my best to ensure no code calls an API with the null pointer constant - when it's expecting a valid pointer. Given that the null pointer constant is defined in the C standard, even if APIs have undefined behaviour if they require a pointer but are passed a NULL. So the converse is I make APIs check for NULL (if they require a valid pointer) and reject with an error. Covers all bases (there can be corrupt data files occurring that we can't anticipate), so issues can be logged, and no core dump. I'd rather display a "USB device error 51" message on a UI than suffer a core dump which turns off a piece of safety critical equipment or sends it into a restart death loop. >> >> I recall you mentioned [[gnu::nonnull]] aka __attribute__((nonnull)) which is an optimizer hint the API will always be called with a valid pointer. There is also returns_nonnull. >> >> The difficulty is the optimizer will remove any NULL pointer constant checks within those APIs (if there were any). The side effect is a useful compiler warning, if the compiler figures out someone is passing NULL. >> >> So in a safety critical system we must wrap all such APIs, to put back in the null pointer constant checks. > > There's Clang's qualifier _Nonnull, which is not a hint to the > optimizer. It is an attempt to have null correctness similar to how we > have const correctness. It still has little support, even from Clang > itself. It has some important problem: it applies to the pointer, not > to the pointee, but pointer qualifiers are discarded easily. A better > design would make it a pointee qualifier. Hopefully, this will some day > be there to end all NULL discussions. Until then, yeah, NULL is a > dangerous part of the language. > > Cheers, > Alex > I saw Christopher Bazley was talking about this. As I understand it, _Nonnull is milder than attribute nonnull. _Nonnull probably helps with static analysis, but doesn't optimize out any code checking if(ptr == NULL) return -1; Saw this, did you get traction with your proposal? https://discourse.llvm.org/t/iso-c3x-proposal-nonnull-qualifier/59269?page=2 You're right NULL is a dangerous part of the language, there is part of the C spec that does state functions which don't document supporting arguments that are NULL, are undefined behaviour. It's implementation defined, and most don't check for it, which is fine, it's their choice. NULL is pretty easy to check for in a wrapper, simpler than catching use-after-free pointers at runtime, like valgrind or address sanitizer does. Paul Eggert drew my attention to this in C23: 7.1.4 Use of library functions "If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer, or a pointer to non-modifiable storage when the corresponding parameter is not const-qualified) or a type (after default argument promotion) not expected by a function with a variable number of arguments, the behavior is undefined." Kind regards Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: NULL safety 2023-11-18 23:21 ` NULL safety Jonny Grant @ 2023-11-24 22:25 ` Alejandro Colomar 2023-11-25 0:57 ` Jonny Grant 0 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-24 22:25 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man [-- Attachment #1: Type: text/plain, Size: 957 bytes --] Hi Jonny, On Sat, Nov 18, 2023 at 11:21:00PM +0000, Jonny Grant wrote: > I saw Christopher Bazley was talking about this. As I understand it, _Nonnull is milder than attribute nonnull. _Nonnull probably helps with static analysis, but doesn't optimize out any code checking if(ptr == NULL) return -1; > > Saw this, did you get traction with your proposal? > > https://discourse.llvm.org/t/iso-c3x-proposal-nonnull-qualifier/59269?page=2 I didn't follow up with that. I'd first like to be able to try Clang's static analyzer with _Nullable, to be able to play with it. An _Optional qualifier would only be usable by something like -fanalyzer, or Clang's analyzer, since it needs to avoid false positives that are quite complex. It's not a warning that you'd want in -Wall. And since Clang's analyzer isn't easy to use, I'm not working on that until they make it easier. Cheers, Alex -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: NULL safety 2023-11-24 22:25 ` Alejandro Colomar @ 2023-11-25 0:57 ` Jonny Grant 0 siblings, 0 replies; 138+ messages in thread From: Jonny Grant @ 2023-11-25 0:57 UTC (permalink / raw) To: Alejandro Colomar; +Cc: linux-man On 24/11/2023 22:25, Alejandro Colomar wrote: > Hi Jonny, > > On Sat, Nov 18, 2023 at 11:21:00PM +0000, Jonny Grant wrote: >> I saw Christopher Bazley was talking about this. As I understand it, _Nonnull is milder than attribute nonnull. _Nonnull probably helps with static analysis, but doesn't optimize out any code checking if(ptr == NULL) return -1; >> >> Saw this, did you get traction with your proposal? >> >> https://discourse.llvm.org/t/iso-c3x-proposal-nonnull-qualifier/59269?page=2 > > I didn't follow up with that. I'd first like to be able to try Clang's > static analyzer with _Nullable, to be able to play with it. An > _Optional qualifier would only be usable by something like -fanalyzer, > or Clang's analyzer, since it needs to avoid false positives that are > quite complex. It's not a warning that you'd want in -Wall. > > And since Clang's analyzer isn't easy to use, I'm not working on that > until they make it easier. Ok I see. GCC's -fanalyzer is useful I find, I've not tried Clang. I made my own compile_assert() that may/may not be of use for the things you are working on, it works in GCC, its just like regular code. I use to check for things like NULL pointers, or overflows at compile time, rather than runtime like assert(). https://github.com/jonnygrant/compile_assert There will be some false positives on complex areas of code. It's quite simple, and is just using the tooling we have with GCC to catch things at compile time, that static_assert() can't. Anyway, interested to hear any feedback if you do try it. Cheers, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-08 19:33 ` Alejandro Colomar 2023-11-08 19:40 ` Alejandro Colomar 2023-11-09 3:13 ` Matthew House @ 2023-11-10 10:40 ` Stefan Puiu 2023-11-10 11:06 ` Jonny Grant 2023-11-10 11:20 ` Alejandro Colomar 2 siblings, 2 replies; 138+ messages in thread From: Stefan Puiu @ 2023-11-10 10:40 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Matthew House, Jonny Grant, linux-man Hi Alex, On Wed, Nov 8, 2023 at 9:33 PM Alejandro Colomar <alx@kernel.org> wrote: [.....] > strncpy(3): > CAVEATS > The name of these functions is confusing. These functions produce a > null‐padded character sequence, not a string (see string_copying(7)). I'm a bit confused by this distinction. Isn't a null-padded sequence technically also null-terminated? If there's a '0' at the end, then it's a string, in my understanding. Or was the intention to say "a character sequence that may be null-padded", where the case in which there's no padding at all being the reason for the distinction? Thanks, Stefan. ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 10:40 ` strncpy clarify result may not be null terminated Stefan Puiu @ 2023-11-10 11:06 ` Jonny Grant 2023-11-10 11:20 ` Alejandro Colomar 1 sibling, 0 replies; 138+ messages in thread From: Jonny Grant @ 2023-11-10 11:06 UTC (permalink / raw) To: Stefan Puiu, Alejandro Colomar; +Cc: Matthew House, linux-man, GNU C Library On 10/11/2023 10:40, Stefan Puiu wrote: > Hi Alex, > > On Wed, Nov 8, 2023 at 9:33 PM Alejandro Colomar <alx@kernel.org> wrote: > [.....] >> strncpy(3): >> CAVEATS >> The name of these functions is confusing. These functions produce a >> null‐padded character sequence, not a string (see string_copying(7)). > > I'm a bit confused by this distinction. Isn't a null-padded sequence > technically also null-terminated? If there's a '0' at the end, then > it's a string, in my understanding. Or was the intention to say "a > character sequence that may be null-padded", where the case in which > there's no padding at all being the reason for the distinction? This is a null padded sequence of characters in an array: char buf[4] = {'a', '\0', '\0', '\0'}; I'm sure we are all well aware from this long email thread, strncpy is designed to fill fixed sized arrays, and pad with NUL bytes '\0' if any space left. Otherwise, the array buffer is left not padded.. there in lies the trouble, a possibly not terminated sequence of characters. Someone thought saving the extra byte was a good idea. It would have been better if that programmer had crafted their own local function rather than put out the strncpy function which is similarly named to strcpy(), they could have called it copy_to_array_nul_pad(). // a not terminated array - using printf, or strlen will carry on reading off down the memory until it finds a NUL byte '\0', perhaps reading out side the addressable space of the process, causing a SEGV. char buf[4] = {'a', 'b', 'c', 'd'}; Hope that helps. Kind regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: strncpy clarify result may not be null terminated 2023-11-10 10:40 ` strncpy clarify result may not be null terminated Stefan Puiu 2023-11-10 11:06 ` Jonny Grant @ 2023-11-10 11:20 ` Alejandro Colomar 1 sibling, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-10 11:20 UTC (permalink / raw) To: Stefan Puiu; +Cc: Matthew House, Jonny Grant, linux-man [-- Attachment #1: Type: text/plain, Size: 862 bytes --] Hi Stefan, On Fri, Nov 10, 2023 at 12:40:48PM +0200, Stefan Puiu wrote: > Hi Alex, > > On Wed, Nov 8, 2023 at 9:33 PM Alejandro Colomar <alx@kernel.org> wrote: > [.....] > > strncpy(3): > > CAVEATS > > The name of these functions is confusing. These functions produce a > > null‐padded character sequence, not a string (see string_copying(7)). > > I'm a bit confused by this distinction. Isn't a null-padded sequence > technically also null-terminated? If there's a '0' at the end, then > it's a string, in my understanding. Or was the intention to say "a > character sequence that may be null-padded", where the case in which > there's no padding at all being the reason for the distinction? The latter. I'll check the wording. Thanks! Alex > > Thanks, > Stefan. -- <https://www.alejandro-colomar.es/> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* [PATCH 0/2] Expand BUGS section of string_copying(7). 2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant 2023-11-04 19:33 ` Alejandro Colomar @ 2023-11-12 9:17 ` Alejandro Colomar 2023-11-12 9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar ` (5 subsequent siblings) 7 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-12 9:17 UTC (permalink / raw) To: linux-man; +Cc: Alejandro Colomar, libc-alpha [-- Attachment #1: Type: text/plain, Size: 462 bytes --] Hi, After Paul showing important problems of strlcpy(3) (and strlcat(3)), I've written something in string_copying(7)'s BUGS to warn against them. Cheers, Alex Alejandro Colomar (2): string_copying.7: BUGS: *cat(3) functions aren't always bad string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems man7/string_copying.7 | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) -- 2.42.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad 2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant 2023-11-04 19:33 ` Alejandro Colomar 2023-11-12 9:17 ` [PATCH 0/2] Expand BUGS section of string_copying(7) Alejandro Colomar @ 2023-11-12 9:18 ` Alejandro Colomar 2023-11-12 9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar ` (4 subsequent siblings) 7 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-12 9:18 UTC (permalink / raw) To: linux-man Cc: Alejandro Colomar, libc-alpha, Paul Eggert, Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell, Xi Ruoyao, Stefan Puiu, Andreas Schwab [-- Attachment #1: Type: text/plain, Size: 1736 bytes --] The compiler will sometimes optimize them to normal *cpy(3) functions, since the length of dst is usually known, if the previous *cpy(3) is visible to the compiler. And they provide for cleaner code. If you know that they'll get optimized, you could use them. Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: Jonny Grant <jg@jguk.org> Cc: DJ Delorie <dj@redhat.com> Cc: Matthew House <mattlloydhouse@gmail.com> Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com> Cc: Thorsten Kukuk <kukuk@suse.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Zack Weinberg <zack@owlfolio.org> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Cc: Xi Ruoyao <xry111@xry111.site> Cc: Stefan Puiu <stefan.puiu@gmail.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Alejandro Colomar <alx@kernel.org> --- man7/string_copying.7 | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/man7/string_copying.7 b/man7/string_copying.7 index 1637ebc91..0254fbba6 100644 --- a/man7/string_copying.7 +++ b/man7/string_copying.7 @@ -592,8 +592,14 @@ .SH BUGS All catenation functions share the same performance problem: .UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/ Shlemiel the painter .UE . +As a mitigation, +compilers are able to transform some calls to catenation functions +into normal copy functions, +since +.I strlen(dst) +is usually a byproduct of the previous copy. .\" ----- EXAMPLES :: -------------------------------------------------/ .SH EXAMPLES The following are examples of correct use of each of these functions. .\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/ -- 2.42.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 138+ messages in thread
* [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems 2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant ` (2 preceding siblings ...) 2023-11-12 9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar @ 2023-11-12 9:18 ` Alejandro Colomar 2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar ` (3 subsequent siblings) 7 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-12 9:18 UTC (permalink / raw) To: linux-man Cc: Alejandro Colomar, libc-alpha, Paul Eggert, Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell, Xi Ruoyao, Stefan Puiu, Andreas Schwab [-- Attachment #1: Type: text/plain, Size: 2593 bytes --] Also point to BUGS from other sections that talk about these functions. These functions are doomed due to the design decision of mirroring snprintf(3)'s return value. They must return strlen(src), which makes them terribly slow, and vulnerable to DoS if an attacker can control strlen(src). A better design would have been to return -1 when truncating. Reported-by: Paul Eggert <eggert@cs.ucla.edu> Cc: Jonny Grant <jg@jguk.org> Cc: DJ Delorie <dj@redhat.com> Cc: Matthew House <mattlloydhouse@gmail.com> Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com> Cc: Thorsten Kukuk <kukuk@suse.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Zack Weinberg <zack@owlfolio.org> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Cc: Xi Ruoyao <xry111@xry111.site> Cc: Stefan Puiu <stefan.puiu@gmail.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Alejandro Colomar <alx@kernel.org> --- man7/string_copying.7 | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/man7/string_copying.7 b/man7/string_copying.7 index 0254fbba6..cb3910db0 100644 --- a/man7/string_copying.7 +++ b/man7/string_copying.7 @@ -226,9 +226,9 @@ .SS Truncate or not? .IP \[bu] .BR strlcpy (3bsd) and .BR strlcat (3bsd) -are similar, but less efficient when chained. +are similar, but have important performance problems; see BUGS. .IP \[bu] .BR stpncpy (3) and .BR strncpy (3) @@ -417,8 +417,10 @@ .SS Functions the resulting string is truncated (but it is guaranteed to be null-terminated). They return the length of the total string they tried to create. .IP +Check BUGS before using these functions. +.IP .BR stpecpy (3) is a simpler alternative to these functions. .\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/ .TP @@ -598,8 +600,22 @@ .SH BUGS into normal copy functions, since .I strlen(dst) is usually a byproduct of the previous copy. +.P +.BR strlcpy (3) +and +.BR strlcat (3) +need to read the entire +.I src +string, +even if the destination buffer is small. +This makes them vulnerable to Denial of Service (DoS) attacks +if an attacker can control the length of the +.I src +string. +And if not, +they're still unnecessarily slow. .\" ----- EXAMPLES :: -------------------------------------------------/ .SH EXAMPLES The following are examples of correct use of each of these functions. .\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/ -- 2.42.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 138+ messages in thread
* [PATCH v2 0/3] Improve string_copying(7) 2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant ` (3 preceding siblings ...) 2023-11-12 9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar @ 2023-11-12 11:26 ` Alejandro Colomar 2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar ` (2 subsequent siblings) 7 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-12 11:26 UTC (permalink / raw) To: linux-man, Guillem Jover Cc: Alejandro Colomar, libc-alpha, Paul Eggert, Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell, Xi Ruoyao, Stefan Puiu, Andreas Schwab [-- Attachment #1: Type: text/plain, Size: 907 bytes --] Hi, v3: - Patches 1/3 and 2/3 are identical to v2, except that I CCd libbsd's maintainer (Guillem) in 2/3 so he's aware that we're documenting BUGS for strlcpy(3). Since the strlcpy(3bsd) manual page is part of libbsd, it may be interesting to also add a BUGS section in that page. - Add 3/3, which adds strtcpy(3), a function almost identical to strscpy(9), and very similar to strlcpy(3), which doesn't share its bugs. Cheers, Alex Alejandro Colomar (3): string_copying.7: BUGS: *cat(3) functions aren't always bad string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems strtcpy.3, string_copying.7: Add strtcpy(3) man3/strtcpy.3 | 1 + man7/string_copying.7 | 121 +++++++++++++++++++++++++++++++----------- 2 files changed, 92 insertions(+), 30 deletions(-) create mode 100644 man3/strtcpy.3 -- 2.42.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 138+ messages in thread
* [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad 2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant ` (4 preceding siblings ...) 2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar @ 2023-11-12 11:26 ` Alejandro Colomar 2023-11-17 21:43 ` Jonny Grant 2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar 2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar 7 siblings, 1 reply; 138+ messages in thread From: Alejandro Colomar @ 2023-11-12 11:26 UTC (permalink / raw) To: linux-man Cc: Alejandro Colomar, libc-alpha, Guillem Jover, Paul Eggert, Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell, Xi Ruoyao, Stefan Puiu, Andreas Schwab [-- Attachment #1: Type: text/plain, Size: 1736 bytes --] The compiler will sometimes optimize them to normal *cpy(3) functions, since the length of dst is usually known, if the previous *cpy(3) is visible to the compiler. And they provide for cleaner code. If you know that they'll get optimized, you could use them. Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: Jonny Grant <jg@jguk.org> Cc: DJ Delorie <dj@redhat.com> Cc: Matthew House <mattlloydhouse@gmail.com> Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com> Cc: Thorsten Kukuk <kukuk@suse.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Zack Weinberg <zack@owlfolio.org> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Cc: Xi Ruoyao <xry111@xry111.site> Cc: Stefan Puiu <stefan.puiu@gmail.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Alejandro Colomar <alx@kernel.org> --- man7/string_copying.7 | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/man7/string_copying.7 b/man7/string_copying.7 index 1637ebc91..0254fbba6 100644 --- a/man7/string_copying.7 +++ b/man7/string_copying.7 @@ -592,8 +592,14 @@ .SH BUGS All catenation functions share the same performance problem: .UR https://www.joelonsoftware.com/\:2001/12/11/\:back\-to\-basics/ Shlemiel the painter .UE . +As a mitigation, +compilers are able to transform some calls to catenation functions +into normal copy functions, +since +.I strlen(dst) +is usually a byproduct of the previous copy. .\" ----- EXAMPLES :: -------------------------------------------------/ .SH EXAMPLES The following are examples of correct use of each of these functions. .\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/ -- 2.42.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 138+ messages in thread
* Re: [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad 2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar @ 2023-11-17 21:43 ` Jonny Grant 2023-11-18 0:25 ` Signing all patches and email to this list Matthew House 0 siblings, 1 reply; 138+ messages in thread From: Jonny Grant @ 2023-11-17 21:43 UTC (permalink / raw) To: Alejandro Colomar; +Cc: Paul Eggert, linux-man On 12/11/2023 11:26, Alejandro Colomar wrote: > The compiler will sometimes optimize them to normal *cpy(3) functions, > since the length of dst is usually known, if the previous *cpy(3) is > visible to the compiler. And they provide for cleaner code. If you > know that they'll get optimized, you could use them. May I ask, is there an example or document that shows this optimization by the compiler? Perhaps a godbolt link? So it's a strcat() optimized to a strcpy()? I know gcc might unroll and just include the values of the string bytes. Kind regards, Jonny ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: Signing all patches and email to this list 2023-11-17 21:43 ` Jonny Grant @ 2023-11-18 0:25 ` Matthew House 2023-11-18 23:24 ` Jonny Grant 0 siblings, 1 reply; 138+ messages in thread From: Matthew House @ 2023-11-18 0:25 UTC (permalink / raw) To: Jonny Grant; +Cc: Alejandro Colomar, Paul Eggert, linux-man On Fri, Nov 17, 2023 at 4:43 PM Jonny Grant <jg@jguk.org> wrote: > On 12/11/2023 11:26, Alejandro Colomar wrote: > > The compiler will sometimes optimize them to normal *cpy(3) functions, > > since the length of dst is usually known, if the previous *cpy(3) is > > visible to the compiler. And they provide for cleaner code. If you > > know that they'll get optimized, you could use them. > > May I ask, is there an example or document that shows this optimization by the compiler? Perhaps a godbolt link? > > So it's a strcat() optimized to a strcpy()? > > I know gcc might unroll and just include the values of the string bytes. > > Kind regards, Jonny See <https://godbolt.org/z/e34fWrTGf>. If a function computes the strlen() of the destination before calling strcat(), without modifying its value between the two calls, GCC will replace the strcat() with a strcpy(). If a function computes the strlen() of both the source and the destination, GCC will further replace the strcat() with a memcpy(), and possibly inline the memcpy() if the size is short enough. It will also remember the increased length of the destination for any future strcat() calls, to accomodate for strcpy(), strcat(), strcat(), ... chains. This is implemented in the strlen_pass::handle_builtin_strcat() function in gcc/tree-ssa-strlen.cc. Neither Clang nor MSVC appears to implement any similar optimization. Nevertheless, I would be extremely wary of recommending the bare strcpy(3), strcat(3), and sprintf(3) functions on the basis of "providing for cleaner code". By permitting the programmer to perform the copy with no immediate knowledge of the source and destination sizes, the functions open up a unique opportunity for squirreling away the guaranteed sizes in distant and opaque parts of the codebase. And this antipattern isn't a rare exception, but shows up in nearly every library that makes extensive use of the functions. Thank you, Matthew House ^ permalink raw reply [flat|nested] 138+ messages in thread
* Re: Signing all patches and email to this list 2023-11-18 0:25 ` Signing all patches and email to this list Matthew House @ 2023-11-18 23:24 ` Jonny Grant 0 siblings, 0 replies; 138+ messages in thread From: Jonny Grant @ 2023-11-18 23:24 UTC (permalink / raw) To: Matthew House; +Cc: Alejandro Colomar, Paul Eggert, linux-man On 18/11/2023 00:25, Matthew House wrote: > On Fri, Nov 17, 2023 at 4:43 PM Jonny Grant <jg@jguk.org> wrote: >> On 12/11/2023 11:26, Alejandro Colomar wrote: >>> The compiler will sometimes optimize them to normal *cpy(3) functions, >>> since the length of dst is usually known, if the previous *cpy(3) is >>> visible to the compiler. And they provide for cleaner code. If you >>> know that they'll get optimized, you could use them. >> >> May I ask, is there an example or document that shows this optimization by the compiler? Perhaps a godbolt link? >> >> So it's a strcat() optimized to a strcpy()? >> >> I know gcc might unroll and just include the values of the string bytes. >> >> Kind regards, Jonny > > See <https://godbolt.org/z/e34fWrTGf>. If a function computes the strlen() > of the destination before calling strcat(), without modifying its value > between the two calls, GCC will replace the strcat() with a strcpy(). If a > function computes the strlen() of both the source and the destination, GCC > will further replace the strcat() with a memcpy(), and possibly inline the > memcpy() if the size is short enough. It will also remember the increased > length of the destination for any future strcat() calls, to accomodate for > strcpy(), strcat(), strcat(), ... chains. This is implemented in the > strlen_pass::handle_builtin_strcat() function in gcc/tree-ssa-strlen.cc. > Neither Clang nor MSVC appears to implement any similar optimization. That's great it optimizes, thank you for sharing the information. > Nevertheless, I would be extremely wary of recommending the bare strcpy(3), > strcat(3), and sprintf(3) functions on the basis of "providing for cleaner > code". By permitting the programmer to perform the copy with no immediate > knowledge of the source and destination sizes, the functions open up a > unique opportunity for squirreling away the guaranteed sizes in distant and > opaque parts of the codebase. And this antipattern isn't a rare exception, > but shows up in nearly every library that makes extensive use of the > functions. > > Thank you, > Matthew House ^ permalink raw reply [flat|nested] 138+ messages in thread
* [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems 2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant ` (5 preceding siblings ...) 2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar @ 2023-11-12 11:26 ` Alejandro Colomar 2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar 7 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-12 11:26 UTC (permalink / raw) To: linux-man Cc: Alejandro Colomar, libc-alpha, Guillem Jover, Paul Eggert, Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell, Xi Ruoyao, Stefan Puiu, Andreas Schwab [-- Attachment #1: Type: text/plain, Size: 2634 bytes --] Also point to BUGS from other sections that talk about these functions. These functions are doomed due to the design decision of mirroring snprintf(3)'s return value. They must return strlen(src), which makes them terribly slow, and vulnerable to DoS if an attacker can control strlen(src). A better design would have been to return -1 when truncating. Reported-by: Paul Eggert <eggert@cs.ucla.edu> Cc: Jonny Grant <jg@jguk.org> Cc: DJ Delorie <dj@redhat.com> Cc: Matthew House <mattlloydhouse@gmail.com> Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com> Cc: Thorsten Kukuk <kukuk@suse.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Zack Weinberg <zack@owlfolio.org> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Cc: Xi Ruoyao <xry111@xry111.site> Cc: Stefan Puiu <stefan.puiu@gmail.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Guillem Jover <guillem@hadrons.org> Signed-off-by: Alejandro Colomar <alx@kernel.org> --- man7/string_copying.7 | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/man7/string_copying.7 b/man7/string_copying.7 index 0254fbba6..cb3910db0 100644 --- a/man7/string_copying.7 +++ b/man7/string_copying.7 @@ -226,9 +226,9 @@ .SS Truncate or not? .IP \[bu] .BR strlcpy (3bsd) and .BR strlcat (3bsd) -are similar, but less efficient when chained. +are similar, but have important performance problems; see BUGS. .IP \[bu] .BR stpncpy (3) and .BR strncpy (3) @@ -417,8 +417,10 @@ .SS Functions the resulting string is truncated (but it is guaranteed to be null-terminated). They return the length of the total string they tried to create. .IP +Check BUGS before using these functions. +.IP .BR stpecpy (3) is a simpler alternative to these functions. .\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/ .TP @@ -598,8 +600,22 @@ .SH BUGS into normal copy functions, since .I strlen(dst) is usually a byproduct of the previous copy. +.P +.BR strlcpy (3) +and +.BR strlcat (3) +need to read the entire +.I src +string, +even if the destination buffer is small. +This makes them vulnerable to Denial of Service (DoS) attacks +if an attacker can control the length of the +.I src +string. +And if not, +they're still unnecessarily slow. .\" ----- EXAMPLES :: -------------------------------------------------/ .SH EXAMPLES The following are examples of correct use of each of these functions. .\" ----- EXAMPLES :: stpcpy(3) ---------------------------------------/ -- 2.42.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 138+ messages in thread
* [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) 2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant ` (6 preceding siblings ...) 2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar @ 2023-11-12 11:27 ` Alejandro Colomar 7 siblings, 0 replies; 138+ messages in thread From: Alejandro Colomar @ 2023-11-12 11:27 UTC (permalink / raw) To: linux-man Cc: Alejandro Colomar, libc-alpha, Guillem Jover, Paul Eggert, Jonny Grant, DJ Delorie, Matthew House, Oskari Pirhonen, Thorsten Kukuk, Adhemerval Zanella Netto, Zack Weinberg, G. Branden Robinson, Carlos O'Donell, Xi Ruoyao, Stefan Puiu, Andreas Schwab [-- Attachment #1: Type: text/plain, Size: 7496 bytes --] Add this new truncating string-copying function. It intends to fully replace strlcpy(3), which has important bugs (documented in the preceeding commit). It is almost identical to Linux kernel's strscpy(9), so reduce the documentation of strscpy(9) in this page to the minimum, giving preference to strtcpy(3). Provide a reference implementation, since no libc provides it. Providing an easy, safe, and relatively fast truncating string-copying function should prevent users from rolling their own, in which they might introduce bugs accidentally. We already made enough mistakes while discussing these functions, so it's certainly not something that should be written often. Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: Jonny Grant <jg@jguk.org> Cc: DJ Delorie <dj@redhat.com> Cc: Matthew House <mattlloydhouse@gmail.com> Cc: Oskari Pirhonen <xxc3ncoredxx@gmail.com> Cc: Thorsten Kukuk <kukuk@suse.com> Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> Cc: Zack Weinberg <zack@owlfolio.org> Cc: "G. Branden Robinson" <g.branden.robinson@gmail.com> Cc: Carlos O'Donell <carlos@redhat.com> Cc: Xi Ruoyao <xry111@xry111.site> Cc: Stefan Puiu <stefan.puiu@gmail.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Guillem Jover <guillem@hadrons.org> Signed-off-by: Alejandro Colomar <alx@kernel.org> --- man3/strtcpy.3 | 1 + man7/string_copying.7 | 97 ++++++++++++++++++++++++++++++------------- 2 files changed, 69 insertions(+), 29 deletions(-) create mode 100644 man3/strtcpy.3 diff --git a/man3/strtcpy.3 b/man3/strtcpy.3 new file mode 100644 index 000000000..beb850746 --- /dev/null +++ b/man3/strtcpy.3 @@ -0,0 +1 @@ +.so man7/string_copying.7 diff --git a/man7/string_copying.7 b/man7/string_copying.7 index cb3910db0..4f609e480 100644 --- a/man7/string_copying.7 +++ b/man7/string_copying.7 @@ -6,8 +6,9 @@ .\" ----- NAME :: -----------------------------------------------------/ .SH NAME stpcpy, strcpy, strcat, +strtcpy, stpecpy, strlcpy, strlcat, stpncpy, strncpy, @@ -30,8 +31,11 @@ .SS Strings // Chain-copy a string with truncation. .BI "char *stpecpy(char *" dst ", char " end "[0], const char *restrict " src ); .P // Copy/catenate a string with truncation. +.BI "size_t strtcpy(char " dst "[restrict ." sz "], \ +const char *restrict " src , +.BI " size_t " sz ); .BI "size_t strlcpy(char " dst "[restrict ." sz "], \ const char *restrict " src , .BI " size_t " sz ); .BI "size_t strlcat(char " dst "[restrict ." sz "], \ @@ -220,10 +224,10 @@ .SS Truncate or not? .P Functions that truncate: .IP \[bu] 3 .BR stpecpy (3) -is the most efficient string copy function that performs truncation. -It only requires to check for truncation once after all chained calls. +.IP \[bu] +.BR strtcpy (3) .IP \[bu] .BR strlcpy (3bsd) and .BR strlcat (3bsd) @@ -326,8 +330,10 @@ .SS String vs character sequence .IP \[bu] .BR strcpy (3), .BR strcat (3) .IP \[bu] +.BR strtcpy (3) +.IP \[bu] .BR stpecpy (3) .IP \[bu] .BR strlcpy (3bsd), .BR strlcat (3bsd) @@ -390,12 +396,24 @@ .SS Functions The return value is useless. .IP .BR stpcpy (3) is a faster alternative to these functions. +.\" ----- DESCRIPTION :: Functions :: strtcpy(3) ----------------------/ +.TP +.BR strtcpy (3) +Copy the input string into a destination string. +If the destination buffer isn't large enough to hold the copy, +the resulting string is truncated +(but it is guaranteed to be null-terminated). +It returns the length of the string, +or \-1 if it truncated. +.IP +This function is not provided by any library; +see EXAMPLES for a reference implementation. .\" ----- DESCRIPTION :: Functions :: stpecpy(3) ----------------------/ .TP .BR stpecpy (3) -Copy the input string into a destination string. +Chain-copy the input string into a destination string. If the destination buffer, limited by a pointer to its end, isn't large enough to hold the copy, the resulting string is truncated @@ -419,10 +437,12 @@ .SS Functions They return the length of the total string they tried to create. .IP Check BUGS before using these functions. .IP +.BR strtcpy (3) +and .BR stpecpy (3) -is a simpler alternative to these functions. +are better alternatives to these functions. .\" ----- DESCRIPTION :: Functions :: stpncpy(3) ----------------------/ .TP .BR stpncpy (3) Copy the input string into @@ -542,8 +562,17 @@ .SH RETURN VALUE .BR ustpcpy (3) A pointer to one after the last character in the destination character sequence. .TP +.BR strtcpy (3) +The length of the string. +When truncation occurs, it returns \-1. +When +.I dsize +is +.BR 0 , +it also returns \-1. +.TP .BR strlcpy (3bsd) .TQ .BR strlcat (3bsd) The length of the total string that they tried to create @@ -562,25 +591,14 @@ .SH RETURN VALUE which is useless. .\" ----- NOTES :: strscpy(9) -----------------------------------------/ .SH NOTES The Linux kernel has an internal function for copying strings, -which is similar to -.BR stpecpy (3), -except that it can't be chained: -.TP -.BR strscpy (9) -Copy the input string into a destination string. -If the destination buffer, -limited by its size, -isn't large enough to hold the copy, -the resulting string is truncated -(but it is guaranteed to be null-terminated). -It returns the length of the destination string, or +.BR strscpy (9), +which is identical to +.BR strtcpy (3), +except that it returns .B \-E2BIG -on truncation. -.IP -.BR stpecpy (3) -is a simpler and faster alternative to this function. +instead of \-1. .\" ----- CAVEATS :: --------------------------------------------------/ .SH CAVEATS Don't mix chain calls to truncating and non-truncating functions. It is conceptually wrong @@ -640,8 +658,17 @@ .SH EXAMPLES strcat(buf, "!"); len = strlen(buf); puts(buf); .EE +.\" ----- EXAMPLES :: strtcpy(3) --------------------------------------/ +.TP +.BR strtcpy (3) +.EX +len = strtcpy(buf, "Hello world!", sizeof(buf)); +if (len == \-1) + goto toolong; +puts(buf); +.EE .\" ----- EXAMPLES :: stpecpy(3) --------------------------------------/ .TP .BR stpecpy (3) .EX @@ -671,17 +698,8 @@ .SH EXAMPLES if (len >= sizeof(buf)) goto toolong; puts(buf); .EE -.\" ----- EXAMPLES :: strscpy(9) --------------------------------------/ -.TP -.BR strscpy (9) -.EX -len = strscpy(buf, "Hello world!", sizeof(buf)); -if (len == \-E2BIG) - goto toolong; -puts(buf); -.EE .\" ----- EXAMPLES :: stpncpy(3) --------------------------------------/ .TP .BR stpncpy (3) .EX @@ -765,8 +783,29 @@ .SS Implementations .in +4n .EX /* This code is in the public domain. */ \& +.\" ----- EXAMPLES :: Implementations :: strtcpy(3) -------------------/ +ssize_t +.IR strtcpy "(char *restrict dst, const char *restrict src, size_t sz)" +{ + bool trunc; + char *p; + size_t dlen, slen; +\& + if (dsize == 0) + return \-1; +\& + slen = strnlen(src, dsize); + trunc = (slen == dsize); + dlen = slen \- trunc; +\& + p = mempcpy(dst, src, dlen); + *p = \[aq]\e0\[aq]; + + return trunc ? \-1 : slen; +} +\& .\" ----- EXAMPLES :: Implementations :: stpecpy(3) -------------------/ char * .IR stpecpy "(char *dst, char end[0], const char *restrict src)" { -- 2.42.0 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 138+ messages in thread
end of thread, other threads:[~2023-11-27 23:45 UTC | newest] Thread overview: 138+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-11-04 11:27 strncpy clarify result may not be null terminated Jonny Grant 2023-11-04 19:33 ` Alejandro Colomar 2023-11-04 21:18 ` Jonny Grant 2023-11-05 1:36 ` Alejandro Colomar 2023-11-05 21:16 ` Jonny Grant 2023-11-05 23:31 ` Alejandro Colomar 2023-11-07 11:52 ` Jonny Grant 2023-11-07 13:23 ` Alejandro Colomar 2023-11-07 14:19 ` Jonny Grant 2023-11-07 16:17 ` Alejandro Colomar 2023-11-07 17:00 ` Jonny Grant 2023-11-07 17:20 ` Alejandro Colomar 2023-11-08 6:18 ` Oskari Pirhonen 2023-11-08 9:51 ` Alejandro Colomar 2023-11-08 9:59 ` Thorsten Kukuk 2023-11-08 15:09 ` Alejandro Colomar [not found] ` <6bcad2492ab843019aa63895beaea2ce@DB6PR04MB3255.eurprd04.prod.outlook.com> 2023-11-08 15:44 ` Thorsten Kukuk 2023-11-08 17:26 ` Adhemerval Zanella Netto 2023-11-08 14:06 ` Zack Weinberg 2023-11-08 15:07 ` Alejandro Colomar 2023-11-08 19:45 ` G. Branden Robinson 2023-11-08 21:35 ` Carlos O'Donell 2023-11-08 22:11 ` Alejandro Colomar 2023-11-08 23:31 ` Paul Eggert 2023-11-09 0:29 ` Alejandro Colomar 2023-11-09 10:13 ` Jonny Grant 2023-11-09 11:08 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Alejandro Colomar 2023-11-09 14:06 ` catenate vs concatenate Jonny Grant 2023-11-27 14:33 ` catenate vs concatenate (was: strncpy clarify result may not be null terminated) Zack Weinberg 2023-11-27 15:08 ` Alejandro Colomar 2023-11-27 15:13 ` Alejandro Colomar 2023-11-27 16:59 ` G. Branden Robinson 2023-11-27 18:35 ` Zack Weinberg 2023-11-27 23:45 ` G. Branden Robinson 2023-11-09 11:13 ` strncpy clarify result may not be null terminated Alejandro Colomar 2023-11-09 14:05 ` Jonny Grant 2023-11-09 15:04 ` Alejandro Colomar 2023-11-08 19:04 ` DJ Delorie 2023-11-08 19:40 ` Alejandro Colomar 2023-11-08 19:58 ` DJ Delorie 2023-11-08 20:13 ` Alejandro Colomar 2023-11-08 21:07 ` DJ Delorie 2023-11-08 21:50 ` Alejandro Colomar 2023-11-08 22:17 ` [PATCH] stpncpy.3, string_copying.7: Clarify that st[rp]ncpy() do NOT produce a string Alejandro Colomar 2023-11-08 23:06 ` Paul Eggert 2023-11-08 23:28 ` DJ Delorie 2023-11-09 0:24 ` Alejandro Colomar 2023-11-09 14:11 ` Jonny Grant 2023-11-09 14:35 ` Alejandro Colomar 2023-11-09 14:47 ` Jonny Grant 2023-11-09 15:02 ` Alejandro Colomar 2023-11-09 17:30 ` DJ Delorie 2023-11-09 17:54 ` Andreas Schwab 2023-11-09 18:00 ` Alejandro Colomar 2023-11-09 19:42 ` Jonny Grant 2023-11-09 7:23 ` Oskari Pirhonen 2023-11-09 15:20 ` [PATCH v2 1/2] " Alejandro Colomar 2023-11-09 15:20 ` [PATCH v2 2/2] stpncpy.3, string.3, string_copying.7: Clarify that st[rp]ncpy() pad with null bytes Alejandro Colomar 2023-11-10 5:47 ` Oskari Pirhonen 2023-11-10 10:47 ` Alejandro Colomar 2023-11-08 2:12 ` strncpy clarify result may not be null terminated Matthew House 2023-11-08 19:33 ` Alejandro Colomar 2023-11-08 19:40 ` Alejandro Colomar 2023-11-09 3:13 ` Matthew House 2023-11-09 10:26 ` Jonny Grant 2023-11-09 10:31 ` Jonny Grant 2023-11-09 11:38 ` Alejandro Colomar 2023-11-09 12:43 ` Alejandro Colomar 2023-11-09 12:51 ` Xi Ruoyao 2023-11-09 14:01 ` Alejandro Colomar 2023-11-09 18:11 ` Paul Eggert 2023-11-09 23:48 ` Alejandro Colomar 2023-11-10 5:36 ` Paul Eggert 2023-11-10 11:05 ` Alejandro Colomar 2023-11-10 11:47 ` Alejandro Colomar 2023-11-10 17:58 ` Paul Eggert 2023-11-10 18:36 ` Alejandro Colomar 2023-11-10 20:19 ` Alejandro Colomar 2023-11-10 23:44 ` Jonny Grant 2023-11-10 19:52 ` Alejandro Colomar 2023-11-10 22:14 ` Paul Eggert 2023-11-11 21:13 ` Alejandro Colomar 2023-11-11 22:20 ` Paul Eggert 2023-11-12 9:52 ` Jonny Grant 2023-11-12 10:59 ` Alejandro Colomar 2023-11-12 20:49 ` Paul Eggert 2023-11-12 21:00 ` Alejandro Colomar 2023-11-12 21:45 ` Alejandro Colomar 2023-11-13 23:46 ` Jonny Grant 2023-11-17 21:57 ` Jonny Grant 2023-11-18 10:12 ` Alejandro Colomar 2023-11-18 23:03 ` Jonny Grant 2023-11-10 11:36 ` Jonny Grant 2023-11-10 13:15 ` Alejandro Colomar 2023-11-18 23:40 ` Jonny Grant 2023-11-20 11:56 ` Jonny Grant 2023-11-20 15:12 ` Alejandro Colomar 2023-11-20 23:08 ` Jonny Grant 2023-11-20 23:42 ` Alejandro Colomar 2023-11-10 11:23 ` Jonny Grant 2023-11-09 12:23 ` Alejandro Colomar 2023-11-09 12:35 ` Alejandro Colomar 2023-11-10 7:06 ` Oskari Pirhonen 2023-11-10 11:18 ` Alejandro Colomar 2023-11-11 7:55 ` Oskari Pirhonen 2023-11-10 16:06 ` Matthew House 2023-11-10 17:48 ` Alejandro Colomar 2023-11-13 15:01 ` Matthew House 2023-11-11 20:55 ` Jonny Grant 2023-11-11 21:15 ` Jonny Grant 2023-11-11 22:36 ` Alejandro Colomar 2023-11-11 23:19 ` Alejandro Colomar 2023-11-17 21:46 ` Jonny Grant 2023-11-18 9:37 ` PDF book of unreleased pages (was: strncpy clarify result may not be null terminated) Alejandro Colomar 2023-11-19 0:22 ` Deri 2023-11-19 1:19 ` Alejandro Colomar 2023-11-19 9:29 ` Alejandro Colomar 2023-11-19 16:21 ` Deri 2023-11-19 20:58 ` Alejandro Colomar 2023-11-20 0:46 ` G. Branden Robinson 2023-11-20 9:43 ` Alejandro Colomar 2023-11-18 9:44 ` NULL safety " Alejandro Colomar 2023-11-18 23:21 ` NULL safety Jonny Grant 2023-11-24 22:25 ` Alejandro Colomar 2023-11-25 0:57 ` Jonny Grant 2023-11-10 10:40 ` strncpy clarify result may not be null terminated Stefan Puiu 2023-11-10 11:06 ` Jonny Grant 2023-11-10 11:20 ` Alejandro Colomar 2023-11-12 9:17 ` [PATCH 0/2] Expand BUGS section of string_copying(7) Alejandro Colomar 2023-11-12 9:18 ` [PATCH 1/2] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar 2023-11-12 9:18 ` [PATCH 2/2] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar 2023-11-12 11:26 ` [PATCH v2 0/3] Improve string_copying(7) Alejandro Colomar 2023-11-12 11:26 ` [PATCH v2 1/3] string_copying.7: BUGS: *cat(3) functions aren't always bad Alejandro Colomar 2023-11-17 21:43 ` Jonny Grant 2023-11-18 0:25 ` Signing all patches and email to this list Matthew House 2023-11-18 23:24 ` Jonny Grant 2023-11-12 11:26 ` [PATCH v2 2/3] string_copying.7: BUGS: Document strl{cpy,cat}(3)'s performance problems Alejandro Colomar 2023-11-12 11:27 ` [PATCH v2 3/3] strtcpy.3, string_copying.7: Add strtcpy(3) Alejandro Colomar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).