* strlen @ 2020-09-04 18:01 Jonny Grant 2020-09-04 19:21 ` strlen Florian Weimer 0 siblings, 1 reply; 32+ messages in thread From: Jonny Grant @ 2020-09-04 18:01 UTC (permalink / raw) To: linux-man; +Cc: Michael Kerrisk Hello https://man7.org/linux/man-pages/man3/strlen.3.html Is it possible to clarify :- * glibc will SIGSEGV if 's' is NULL * there are no ERROR returns Jonny ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2020-09-04 18:01 strlen Jonny Grant @ 2020-09-04 19:21 ` Florian Weimer 2020-09-04 23:14 ` strlen Jonny Grant 0 siblings, 1 reply; 32+ messages in thread From: Florian Weimer @ 2020-09-04 19:21 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man, Michael Kerrisk * Jonny Grant: > Hello > > https://man7.org/linux/man-pages/man3/strlen.3.html > > Is it possible to clarify :- > > * glibc will SIGSEGV if 's' is NULL > * there are no ERROR returns That would be misleading. Whether strlen (NULL) is undefined also depends on the compiler. They know that the argument cannot be zero and optimize accordingly. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2020-09-04 19:21 ` strlen Florian Weimer @ 2020-09-04 23:14 ` Jonny Grant 2020-09-05 7:12 ` strlen Florian Weimer 0 siblings, 1 reply; 32+ messages in thread From: Jonny Grant @ 2020-09-04 23:14 UTC (permalink / raw) To: Florian Weimer; +Cc: linux-man, Michael Kerrisk On 04/09/2020 20:21, Florian Weimer wrote: > * Jonny Grant: > >> Hello >> >> https://man7.org/linux/man-pages/man3/strlen.3.html >> >> Is it possible to clarify :- >> >> * glibc will SIGSEGV if 's' is NULL >> * there are no ERROR returns > > That would be misleading. Whether strlen (NULL) is undefined also > depends on the compiler. They know that the argument cannot be zero > and optimize accordingly. > Hi, Do you know a compiler that has a different behaviour? I only tested gcc and clang. How would a compiler optimise? The header has non-null attribute for the compiler (ie gcc) to use to give a nice warning str.c:6:8: warning: null argument where non-null required (argument 1) [-Wnonnull] 6 | return strlen(NULL); However, if the implementation gets NULL it still crashes. I was using -fno-builtin-strlen so it was glibc, but the __builtin_strlen ha SEGV just like glibc clang is the same as gcc. So it feels like it's pretty clear, there is no NULL check, can this be documented? Jonny ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2020-09-04 23:14 ` strlen Jonny Grant @ 2020-09-05 7:12 ` Florian Weimer 2021-07-06 20:30 ` strlen Jonny Grant 0 siblings, 1 reply; 32+ messages in thread From: Florian Weimer @ 2020-09-05 7:12 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man, Michael Kerrisk * Jonny Grant: > On 04/09/2020 20:21, Florian Weimer wrote: >> * Jonny Grant: >> >>> Hello >>> >>> https://man7.org/linux/man-pages/man3/strlen.3.html >>> >>> Is it possible to clarify :- >>> >>> * glibc will SIGSEGV if 's' is NULL >>> * there are no ERROR returns >> >> That would be misleading. Whether strlen (NULL) is undefined also >> depends on the compiler. They know that the argument cannot be zero >> and optimize accordingly. >> > > Hi, > > Do you know a compiler that has a different behaviour? I only tested > gcc and clang. How would a compiler optimise? Here's an example: #include <stddef.h> #include <stdio.h> #include <string.h> void f (const char *str) { strlen (str); if (str == NULL) puts ("str is NULL"); } int main (void) { f (NULL); return 0; } When built with -O2, GCC 8 prints nothing, and there is no crash. My point it's not just the C *library* that makes strlen (NULL) undefined. It's undefined according to the language, and even if we changed the glibc implementation, things would still go wrong. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2020-09-05 7:12 ` strlen Florian Weimer @ 2021-07-06 20:30 ` Jonny Grant 2021-07-06 22:11 ` strlen Florian Weimer 0 siblings, 1 reply; 32+ messages in thread From: Jonny Grant @ 2021-07-06 20:30 UTC (permalink / raw) To: Florian Weimer; +Cc: linux-man, Michael Kerrisk On 05/09/2020 08:12, Florian Weimer wrote: > * Jonny Grant: > >> On 04/09/2020 20:21, Florian Weimer wrote: >>> * Jonny Grant: >>> >>>> Hello >>>> >>>> https://man7.org/linux/man-pages/man3/strlen.3.html >>>> >>>> Is it possible to clarify :- >>>> >>>> * glibc will SIGSEGV if 's' is NULL >>>> * there are no ERROR returns >>> >>> That would be misleading. Whether strlen (NULL) is undefined also >>> depends on the compiler. They know that the argument cannot be zero >>> and optimize accordingly. >>> >> >> Hi, >> >> Do you know a compiler that has a different behaviour? I only tested >> gcc and clang. How would a compiler optimise? > > Here's an example: > > #include <stddef.h> > #include <stdio.h> > #include <string.h> > > void > f (const char *str) > { > strlen (str); > if (str == NULL) > puts ("str is NULL"); > } > > int > main (void) > { > f (NULL); > return 0; > } > > When built with -O2, GCC 8 prints nothing, and there is no crash. > > My point it's not just the C *library* that makes strlen (NULL) > undefined. It's undefined according to the language, and even if we > changed the glibc implementation, things would still go wrong. > Hi Florian, Apologies for tardy response time. The reason it does not crash appears to be because of this warning which removes the call to strlen due to the return not being checked? strlen.c:11:3: warning: statement with no effect [-Wunused-value] 11 | strlen (str); | ^~~~~~~~~~~~ https://godbolt.org/z/caoes5nGa .LC0: .string "str is NULL" f(char const*): test rdi, rdi je .L4 ret .L4: mov edi, OFFSET FLAT:.LC0 jmp puts main: sub rsp, 8 mov edi, OFFSET FLAT:.LC0 call puts xor eax, eax add rsp, 8 ret Changing to this, it does crash #include <stddef.h> #include <stdio.h> #include <string.h> void f (const char *str) { size_t len = strlen (str); printf("len %zu\n", len); if (str == NULL) puts ("str is NULL"); } int main (void) { f (NULL); return 0; } I could not see why in your original example it doesn't output as expected as the assembler does show it 'jmp puts' although seems to have optimised and added that code to 'main' as well? I had expected to see: $ ./strlen str is NULL Cheers, Jonny ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-06 20:30 ` strlen Jonny Grant @ 2021-07-06 22:11 ` Florian Weimer 2021-07-07 11:36 ` strlen Jonny Grant 0 siblings, 1 reply; 32+ messages in thread From: Florian Weimer @ 2021-07-06 22:11 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man, Michael Kerrisk * Jonny Grant: > The reason it does not crash appears to be because of this warning > which removes the call to strlen due to the return not being > checked? GCC uses the information that the argument to strlen cannot be null on that particular path. > strlen.c:11:3: warning: statement with no effect [-Wunused-value] > 11 | strlen (str); > | ^~~~~~~~~~~~ > > https://godbolt.org/z/caoes5nGa That site probably uses different library headers. As posted, with Debian's GCC 8.3, I get this for the main function: main: xorl %eax, %eax ret ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-06 22:11 ` strlen Florian Weimer @ 2021-07-07 11:36 ` Jonny Grant 2021-07-07 12:22 ` strlen Alejandro Colomar (man-pages) 0 siblings, 1 reply; 32+ messages in thread From: Jonny Grant @ 2021-07-07 11:36 UTC (permalink / raw) To: Florian Weimer; +Cc: linux-man, Michael Kerrisk, Jonny Grant On 06/07/2021 23:11, Florian Weimer wrote: > * Jonny Grant: > >> The reason it does not crash appears to be because of this warning >> which removes the call to strlen due to the return not being >> checked? > > GCC uses the information that the argument to strlen cannot be null on > that particular path. It's a shame GCC doesn't give a warning It may be GCC is using '__builtin_strlen' <string.h> marks the param as nonnull. However, I am surprised this does not trigger the GCC warning -Werror=nonnull /* Return the length of S. */ extern size_t strlen (const char *__s) __THROW __attribute_pure__ __nonnull ((1)); Perhaps that is just a macro that is not actually used...... If I add another function -Werror=nonnull does give a warning void test(const char * const p) __attribute__((nonnull)); https://godbolt.org/z/x37sbfWaG <source>:15:9: error: argument 1 null where non-null expected [-Werror=nonnull] 15 | test(NULL); > >> strlen.c:11:3: warning: statement with no effect [-Wunused-value] >> 11 | strlen (str); >> | ^~~~~~~~~~~~ >> >> https://godbolt.org/z/caoes5nGa > > That site probably uses different library headers. > > As posted, with Debian's GCC 8.3, I get this for the main function: > > main: > xorl %eax, %eax > ret > In that case maybe https://man7.org/linux/man-pages/man3/strlen.3.html should have a "NOTES" section stating something similar to your wording ? NOTES The behavior of strlen(NULL) is depends on different things. It may cause a SEGV exception, or the compiler may optimize and remove the code path handling NULL. Cheers, Jonny ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-07 11:36 ` strlen Jonny Grant @ 2021-07-07 12:22 ` Alejandro Colomar (man-pages) 2021-07-07 12:31 ` strlen Alejandro Colomar (man-pages) 0 siblings, 1 reply; 32+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-07 12:22 UTC (permalink / raw) To: Jonny Grant, Florian Weimer; +Cc: linux-man, Michael Kerrisk Hello Jonny, On 7/7/21 1:36 PM, Jonny Grant wrote: > > > On 06/07/2021 23:11, Florian Weimer wrote: >> * Jonny Grant: >> >>> The reason it does not crash appears to be because of this warning >>> which removes the call to strlen due to the return not being >>> checked? >> >> GCC uses the information that the argument to strlen cannot be null on >> that particular path. > > It's a shame GCC doesn't give a warning > > It may be GCC is using '__builtin_strlen' > > <string.h> marks the param as nonnull. However, I am surprised this does not trigger the GCC warning -Werror=nonnull [[gnu::nunnull]], aka __attribute__((nonnull)), is only triggered by compile time NULL. That is: passing NULL (either hardcoded or through a macro) If after optimizations the compiler realizes that an argument is NULL at compile time, it may (but also may not) trigger the warning. However, if you optimize too much, the expression may vanish, and the warning be silenced again. The -fanalyzer option of GCC may (but also may not) realize of more NULL cases. I managed to trigger the following, but only with '-O0 -fanalyzer': [[ <source>:17:20: error: use of NULL 's' where non-null expected [CWE-476] [-Werror=analyzer-null-argument] 17 | int x = strlen (s); | ~~~~~~~^~~ 'void f(const char*)': events 1-2 | | 16 | const char *s = NULL; | | ^ | | | | | (1) 's' is NULL | 17 | int x = strlen(s); | | ~~~~~~~~~~ | | | | | (2) argument 1 ('s') NULL where non-null expected | In file included from <source>:4: /usr/include/string.h:385:15: note: argument 1 of 'size_t strlen(const char*)' must be non-null 385 | extern size_t strlen (const char *__s) | ^~~~~~ ]] It is undefined behavior to pass NULL to a nonnull parameter, and it is a problem that the caller should check before the call. The compiler may warn you of the most obvious ones, but in the end, the responsibility is on the programmer. See <https://stackoverflow.com/questions/42035769/how-to-generate-warning-of-non-null-argument-for-my-custom-function>. > > /* Return the length of S. */ > extern size_t strlen (const char *__s) > __THROW __attribute_pure__ __nonnull ((1)); > > Perhaps that is just a macro that is not actually used...... It _is_ being used: /usr/include$ grep -rn 'define __nonnull' -B3 -A1 x86_64-linux-gnu/sys/cdefs.h-290-/* The nonull function attribute allows to mark pointer parameters which x86_64-linux-gnu/sys/cdefs.h-291- must not be NULL. */ x86_64-linux-gnu/sys/cdefs.h-292-#if __GNUC_PREREQ (3,3) x86_64-linux-gnu/sys/cdefs.h:293:# define __nonnull(params) __attribute__ ((__nonnull__ params)) x86_64-linux-gnu/sys/cdefs.h-294-#else x86_64-linux-gnu/sys/cdefs.h:295:# define __nonnull(params) x86_64-linux-gnu/sys/cdefs.h-296-#endif > > If I add another function -Werror=nonnull does give a warning > void test(const char * const p) __attribute__((nonnull)); > > https://godbolt.org/z/x37sbfWaG > > <source>:15:9: error: argument 1 null where non-null expected [-Werror=nonnull] > 15 | test(NULL); > >> >>> strlen.c:11:3: warning: statement with no effect [-Wunused-value] >>> 11 | strlen (str); >>> | ^~~~~~~~~~~~ >>> >>> https://godbolt.org/z/caoes5nGa >> >> That site probably uses different library headers. >> >> As posted, with Debian's GCC 8.3, I get this for the main function: >> >> main: >> xorl %eax, %eax >> ret >> > > > In that case maybe https://man7.org/linux/man-pages/man3/strlen.3.html should have a "NOTES" section stating something similar to your wording ? > > NOTES > > The behavior of strlen(NULL) is depends on different things. It may cause a SEGV exception, or the compiler may optimize and remove the code path handling NULL. > I disagree with this. It is likely that the behavior is that, given the current implementation of Linux/GCC/glibc. But it is undefined behavior, and anything can happen. You should just try harder to avoid it, and not rely on any possible outcome of it. GCC people may decide tomorrow to change the behavior to do some more agresive optimizations, and the documentation shouldn't preclude such a thing, as long as it's legal according to the relevant standards, and sane. Cheers, Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-07 12:22 ` strlen Alejandro Colomar (man-pages) @ 2021-07-07 12:31 ` Alejandro Colomar (man-pages) 2021-07-07 13:31 ` strlen Jonny Grant 0 siblings, 1 reply; 32+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-07 12:31 UTC (permalink / raw) To: Jonny Grant, Florian Weimer; +Cc: linux-man, Michael Kerrisk On 7/7/21 2:22 PM, Alejandro Colomar (man-pages) wrote: > I disagree with this. It is likely that the behavior is that, given the > current implementation of Linux/GCC/glibc. But it is undefined > behavior, and anything can happen. You should just try harder to avoid > it, and not rely on any possible outcome of it. GCC people may decide > tomorrow to change the behavior to do some more agresive optimizations, > and the documentation shouldn't preclude such a thing, as long as it's > legal according to the relevant standards, and sane. The standard (and implementations) define a set of thing you can do in C. Those are an equilibrium between usability and room for optimizations. Some things must remain undefined for the language to be more efficient and simple. If the language, or an implementation of it, attempted to provide a defined behavior for absolutely everything, some optimizations could not be done, and also, it would be much harder to actually implement it (and also document it). So for good reasons, UB (undefined behavior) remains undefined. Cheers, Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-07 12:31 ` strlen Alejandro Colomar (man-pages) @ 2021-07-07 13:31 ` Jonny Grant 2021-07-07 16:57 ` strlen Alejandro Colomar (man-pages) 0 siblings, 1 reply; 32+ messages in thread From: Jonny Grant @ 2021-07-07 13:31 UTC (permalink / raw) To: Alejandro Colomar (man-pages), Florian Weimer; +Cc: linux-man, Michael Kerrisk On 07/07/2021 13:31, Alejandro Colomar (man-pages) wrote: > On 7/7/21 2:22 PM, Alejandro Colomar (man-pages) wrote: >> I disagree with this. It is likely that the behavior is that, given the current implementation of Linux/GCC/glibc. But it is undefined behavior, and anything can happen. You should just try harder to avoid it, and not rely on any possible outcome of it. GCC people may decide tomorrow to change the behavior to do some more agresive optimizations, and the documentation shouldn't preclude such a thing, as long as it's legal according to the relevant standards, and sane. > > The standard (and implementations) define a set of thing you can do in C. Those are an equilibrium between usability and room for optimizations. Some things must remain undefined for the language to be more efficient and simple. > > If the language, or an implementation of it, attempted to provide a defined behavior for absolutely everything, some optimizations could not be done, and also, it would be much harder to actually implement it (and also document it). So for good reasons, UB (undefined behavior) remains undefined. > > > Cheers, > > Alex > > Hi Alex, Florian Do you think this would get optimized out by GCC too? size_t safestrlen(const char * s) { if (NULL == s) return 0; else return strlen(s); } Maybe the man page could just state: NOTES The calling strlen with a NULL pointer is undefined behavior. Cheers, Jonny ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-07 13:31 ` strlen Jonny Grant @ 2021-07-07 16:57 ` Alejandro Colomar (man-pages) 2021-07-07 17:23 ` strlen Alejandro Colomar (man-pages) ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-07 16:57 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man, Michael Kerrisk, Florian Weimer On 7/7/21 3:31 PM, Jonny Grant wrote: > > > On 07/07/2021 13:31, Alejandro Colomar (man-pages) wrote: >> On 7/7/21 2:22 PM, Alejandro Colomar (man-pages) wrote: >>> I disagree with this. It is likely that the behavior is that, given the current implementation of Linux/GCC/glibc. But it is undefined behavior, and anything can happen. You should just try harder to avoid it, and not rely on any possible outcome of it. GCC people may decide tomorrow to change the behavior to do some more agresive optimizations, and the documentation shouldn't preclude such a thing, as long as it's legal according to the relevant standards, and sane. >> >> The standard (and implementations) define a set of thing you can do in C. Those are an equilibrium between usability and room for optimizations. Some things must remain undefined for the language to be more efficient and simple. >> >> If the language, or an implementation of it, attempted to provide a defined behavior for absolutely everything, some optimizations could not be done, and also, it would be much harder to actually implement it (and also document it). So for good reasons, UB (undefined behavior) remains undefined. >> >> >> Cheers, >> >> Alex >> >> > > Hi Alex, Florian > > Do you think this would get optimized out by GCC too? > > size_t safestrlen(const char * s) > { > if (NULL == s) return 0; > else return strlen(s); > } This would be optimized if the compiler can determine that s == NULL or s != NULL, which tipically would be that you ask the compiler to optimize, AND the compiler can deduce at compile time its relationship with NULL; OR you ask the compiler to optimize at link time (-flto) AND the relationship of s and NULL can be deduced at link time. However, I don't see why that would be a problem. Either you can guarantee that s is not NULL, and you don't need to call this safestrlen() wrapper, or you cannot guarantee it and then you are forced to call this wrapper. The optimization, if it happens, will be good. What I recommend you to do from time to time, to make sure you don't miss any warnings, is compile the whole project first with '-O3' and then with '-O0'. If you are a bit paranoic, sporadically you can try all of them : '-Og', '-O0', '-O1', '-Os', '-O2', '-O3', '-Ofast' but I don't think that is necessary. Of course, always use '-fanalyzer' (GCC 10 and above). > > > > Maybe the man page could just state: > > > NOTES > > The calling strlen with a NULL pointer is undefined behavior. Okay. I agree that should probably be documented. I'm surprised it's not documented already. Not even in the glibc manual (or I couldn't find it). There are a lot of functions that should get this addition, though. I'd like to patch them all at once. I'll try to find a list of functions documented in the man pages and that have nonnull in the oimplementation. If I don't come back soon with a list, please ping me. Thanks, Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-07 16:57 ` strlen Alejandro Colomar (man-pages) @ 2021-07-07 17:23 ` Alejandro Colomar (man-pages) 2021-07-07 17:33 ` strlen Alejandro Colomar (man-pages) 2021-07-08 10:07 ` strlen Jonny Grant 2021-07-09 20:44 ` strlen Jonny Grant 2 siblings, 1 reply; 32+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-07 17:23 UTC (permalink / raw) To: Jonny Grant, Michael Kerrisk, Florian Weimer; +Cc: linux-man Hi Jonny, Florian, Michael, On 7/7/21 6:57 PM, Alejandro Colomar (man-pages) wrote: >> Maybe the man page could just state: >> >> >> NOTES >> >> The calling strlen with a NULL pointer is undefined behavior. > > Okay. I agree that should probably be documented. > I'm surprised it's not documented already. Not even in the glibc manual > (or I couldn't find it). > > There are a lot of functions that should get this addition, though. I'd > like to patch them all at once. I'll try to find a list of functions > documented in the man pages and that have nonnull in the > oimplementation. If I don't come back soon with a list, please ping me. Humm, the list is huge, much bigger than I expected... Just try yourselves: ~/src/gnu/glibc$ man_lsfunc ../../linux/man-pages/man3 > lsfunc; ~/src/gnu/glibc$ cat lsfunc \ | while read f; do \ grep_glibc_prototype $f \ | grep nonnull >/dev/null \ && grep_glibc_prototype $f; \ done \ > nonnull; ~/src/gnu/glibc$ wc -l nonnull; 296 nonnull grep_glibc_prototype is defined in the man-pages, in <scripts/bash_aliases>. How do you think this should be handled? Adding a line in NOTES for every such function? Adding [[gnu::nonnull]] to every such prototype in SYNOPSIS (this might be too noisy)? Else? Thanks, Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-07 17:23 ` strlen Alejandro Colomar (man-pages) @ 2021-07-07 17:33 ` Alejandro Colomar (man-pages) 2021-07-09 13:48 ` strlen Jonny Grant 0 siblings, 1 reply; 32+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-07 17:33 UTC (permalink / raw) To: Jonny Grant, Michael Kerrisk, Florian Weimer; +Cc: linux-man, Libc-alpha On 7/7/21 7:23 PM, Alejandro Colomar (man-pages) wrote: > How do you think this should be handled? > Adding a line in NOTES for every such function? Adding [[gnu::nonnull]] > to every such prototype in SYNOPSIS (this might be too noisy)? Else? As an example of how man pages could look like with the addition of [[gnu::nonnull]], you can have a look at this manual page of mine: [[ ... SYNOPSIS #include <alx/base/stdlib.h> [[gnu::nonnull]] [[gnu::warn_unused_result]] int alx_callocs(type **ptr, ptrdiff_t nmemb); [[gnu::malloc]] [[gnu::warn_unused_result]] void *alx_mallocarray(ptrdiff_t nmemb, ssize_t size); [[gnu::nonnull]] [[gnu::warn_unused_result]] int alx_mallocarrays(type **ptr, ptrdiff_t nmemb); [[gnu::nonnull]] [[gnu::warn_unused_result]] int alx_mallocs(type **ptr, ssize_t nmemb); [[gnu::warn_unused_result]] void *alx_reallocarrayf(void *ptr, ptrdiff_t nmemb, ssize_t nmemb); [[gnu::nonnull]] [[gnu::warn_unused_result]] int alx_reallocarrayfs(type **ptr, ptrdiff_t nmemb); [[gnu::nonnull]] [[gnu::warn_unused_result]] int alx_reallocfs(type **ptr, ssize_t nmemb); [[gnu::nonnull]] [[gnu::warn_unused_result]] int alx_reallocs(type **ptr, ssize_t nmemb); [[gnu::nonnull]] int alx_frees(type **ptr); ... ]] Source: <https://github.com/alejandro-colomar/libalx/tree/main/share/man/base/man3>. -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-07 17:33 ` strlen Alejandro Colomar (man-pages) @ 2021-07-09 13:48 ` Jonny Grant 0 siblings, 0 replies; 32+ messages in thread From: Jonny Grant @ 2021-07-09 13:48 UTC (permalink / raw) To: Alejandro Colomar (man-pages), Michael Kerrisk, Florian Weimer Cc: linux-man, Libc-alpha On 07/07/2021 18:33, Alejandro Colomar (man-pages) wrote: > On 7/7/21 7:23 PM, Alejandro Colomar (man-pages) wrote: >> How do you think this should be handled? >> Adding a line in NOTES for every such function? Adding [[gnu::nonnull]] to every such prototype in SYNOPSIS (this might be too noisy)? Else? > > As an example of how man pages could look like with the addition of [[gnu::nonnull]], you can have a look at this manual page of mine: > > [[ > ... > SYNOPSIS > #include <alx/base/stdlib.h> > > [[gnu::nonnull]] [[gnu::warn_unused_result]] > int alx_callocs(type **ptr, ptrdiff_t nmemb); > > [[gnu::malloc]] [[gnu::warn_unused_result]] > void *alx_mallocarray(ptrdiff_t nmemb, ssize_t size); > > [[gnu::nonnull]] [[gnu::warn_unused_result]] > int alx_mallocarrays(type **ptr, ptrdiff_t nmemb); > > [[gnu::nonnull]] [[gnu::warn_unused_result]] > int alx_mallocs(type **ptr, ssize_t nmemb); > > [[gnu::warn_unused_result]] > void *alx_reallocarrayf(void *ptr, ptrdiff_t nmemb, ssize_t nmemb); > > [[gnu::nonnull]] [[gnu::warn_unused_result]] > int alx_reallocarrayfs(type **ptr, ptrdiff_t nmemb); > > [[gnu::nonnull]] [[gnu::warn_unused_result]] > int alx_reallocfs(type **ptr, ssize_t nmemb); > > [[gnu::nonnull]] [[gnu::warn_unused_result]] > int alx_reallocs(type **ptr, ssize_t nmemb); > > [[gnu::nonnull]] > int alx_frees(type **ptr); > ... > ]] > > Source: <https://github.com/alejandro-colomar/libalx/tree/main/share/man/base/man3>. May I ask, could a note be added to the NOTES section as well? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-07 16:57 ` strlen Alejandro Colomar (man-pages) 2021-07-07 17:23 ` strlen Alejandro Colomar (man-pages) @ 2021-07-08 10:07 ` Jonny Grant 2021-07-08 11:06 ` strlen Alejandro Colomar (man-pages) 2021-07-09 20:44 ` strlen Jonny Grant 2 siblings, 1 reply; 32+ messages in thread From: Jonny Grant @ 2021-07-08 10:07 UTC (permalink / raw) To: Alejandro Colomar (man-pages) Cc: linux-man, Michael Kerrisk, Florian Weimer, Jonny Grant Hi Alex On 07/07/2021 17:57, Alejandro Colomar (man-pages) wrote: > On 7/7/21 3:31 PM, Jonny Grant wrote: >> >> >> On 07/07/2021 13:31, Alejandro Colomar (man-pages) wrote: >>> On 7/7/21 2:22 PM, Alejandro Colomar (man-pages) wrote: >>>> I disagree with this. It is likely that the behavior is that, given the current implementation of Linux/GCC/glibc. But it is undefined behavior, and anything can happen. You should just try harder to avoid it, and not rely on any possible outcome of it. GCC people may decide tomorrow to change the behavior to do some more agresive optimizations, and the documentation shouldn't preclude such a thing, as long as it's legal according to the relevant standards, and sane. >>> >>> The standard (and implementations) define a set of thing you can do in C. Those are an equilibrium between usability and room for optimizations. Some things must remain undefined for the language to be more efficient and simple. >>> >>> If the language, or an implementation of it, attempted to provide a defined behavior for absolutely everything, some optimizations could not be done, and also, it would be much harder to actually implement it (and also document it). So for good reasons, UB (undefined behavior) remains undefined. >>> >>> >>> Cheers, >>> >>> Alex >>> >>> >> >> Hi Alex, Florian >> >> Do you think this would get optimized out by GCC too? >> >> size_t safestrlen(const char * s) >> { >> if (NULL == s) return 0; >> else return strlen(s); >> } > > This would be optimized if the compiler can determine that s == NULL or s != NULL, which tipically would be that you ask the compiler to optimize, AND the compiler can deduce at compile time its relationship with NULL; OR you ask the compiler to optimize at link time (-flto) AND the relationship of s and NULL can be deduced at link time. > > However, I don't see why that would be a problem. Either you can guarantee that s is not NULL, and you don't need to call this safestrlen() wrapper, or you cannot guarantee it and then you are forced to call this wrapper. The optimization, if it happens, will be good. Thank you for your reply. We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for. I'd like to avoid the compiler removing certain execution paths. I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system. Probably this would work: size_t __attribute__((optimize("O0"))) safestrlen(const char * s) { if (NULL == s) return 0; else return strlen(s); } I also use 'volatile' for reads/writes to addresses that I don't want to be optimized out. > > What I recommend you to do from time to time, to make sure you don't miss any warnings, is compile the whole project first with '-O3' and then with '-O0'. If you are a bit paranoic, sporadically you can try all of them : '-Og', '-O0', '-O1', '-Os', '-O2', '-O3', '-Ofast' but I don't think that is necessary. Of course, always use '-fanalyzer' (GCC 10 and above). Yes, I am looking forward to David Malcom's -fanalyzer when Ubuntu LTS next upgrades, I'm on gcc 9.3 today. But -fanalyzer is only for C anyway.. much of of code base I work with is compiled as C++ so I can't use -fanalyzer yet. https://developers.redhat.com/blog/2020/03/26/static-analysis-in-gcc-10#trying_it_out I do have these other instrumentation options. -fsanitize=null,returns-nonnull-attribute,signed-integer-overflow,leak,undefined,address >> >> >> >> Maybe the man page could just state: >> >> >> NOTES >> >> The calling strlen with a NULL pointer is undefined behavior. > > Okay. I agree that should probably be documented. > I'm surprised it's not documented already. Not even in the glibc manual (or I couldn't find it). > > There are a lot of functions that should get this addition, though. I'd like to patch them all at once. I'll try to find a list of functions documented in the man pages and that have nonnull in the oimplementation. If I don't come back soon with a list, please ping me. > > Thanks, > > Alex > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-08 10:07 ` strlen Jonny Grant @ 2021-07-08 11:06 ` Alejandro Colomar (man-pages) 2021-07-08 12:13 ` strlen Xi Ruoyao ` (3 more replies) 0 siblings, 4 replies; 32+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-08 11:06 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help On 7/8/21 12:07 PM, Jonny Grant wrote: > Thank you for your reply. > > We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for. > > I'd like to avoid the compiler removing certain execution paths. > I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system. > > > Probably this would work: > > size_t __attribute__((optimize("O0"))) safestrlen(const char * s) > { > if (NULL == s) return 0; > else return strlen(s); > } I don't think you don't need that. Unless there's a bug in GCC, it shouldn't optimize that path unless it is 100% sure that it will never be called. Moreover, I recommend you to optimize as much as possible. Even though NULL is possible in your code, I guess it's unlikely. Also, calling a function safe is too generic. I'd call it with the suffix null, as it act different on null. Also, I recommend avoiding 'size_t' (and any other unsigned types, BTW). See <https://google.github.io/styleguide/cppguide.html#Integer_Types>. Use the POSIX type 'ssize_t'. That also allows differentiating a length of 0 (i.e., "") from an invalid string (i.e., NULL), by returning -1 for NULL. Recommended implementation (requires C99 or later due to the usage of inline): [ // compiler.h #define likely(x) __builtin_expect((x), 1) #define unlikely(x) __builtin_expect((x), 0) ] [ // strlennull.h #include <string.h> #include <sys/types.h> #include "compiler.h" [[gnu::pure]] inline ssize_t strlennull(const char *s); inline ssize_t strlennull(const char *s) { if (unlikely(!s)) return -1; return strlen(s); } ] [ // strlennull.c #include "strlennull.h" #include <sys/types.h> extern inline ssize_t strlennull(const char *s); ] BTW, if the input is so untrusted that NULL may be possible, I guess it might as well contain invalid strings (non-NUL-terminated), for which strlen(3) would behave as bad or worse. So I recommend having a look at strnlen(3) and maybe implement strnlennull() instead of strlennull(). Unless you _know_ there's a compiler bug that doesn't allow you to optimize, please optimize. Otherwise, if it's just a bit of paranoia, you could as well not optimize any code at all. Specifically not optimizing this code by the use of pragmas or attributes is *wrong*. Just revise the preprocessor output, the compiler output, and also try introducing a NULL at run time, to check that everything's fine. > > I also use 'volatile' for reads/writes to addresses that I don't want to be optimized out. Are you sure you don't want to optimize? Are those addresses hardware I/O or interrupts? Maybe you just want membarrier(2). > >> >> What I recommend you to do from time to time, to make sure you don't miss any warnings, is compile the whole project first with '-O3' and then with '-O0'. If you are a bit paranoic, sporadically you can try all of them : '-Og', '-O0', '-O1', '-Os', '-O2', '-O3', '-Ofast' but I don't think that is necessary. Of course, always use '-fanalyzer' (GCC 10 and above). > > Yes, I am looking forward to David Malcom's -fanalyzer when Ubuntu LTS next upgrades, I'm on gcc 9.3 today. But -fanalyzer is only for C anyway.. much of of code base I work with is compiled as C++ so I can't use -fanalyzer yet. You can install gcc-10 for Bionic: apt-get install gcc-10 I recommend it. It finds bugs very deep in the code. Cheers, Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-08 11:06 ` strlen Alejandro Colomar (man-pages) @ 2021-07-08 12:13 ` Xi Ruoyao 2021-07-08 23:49 ` strlen Segher Boessenkool ` (2 subsequent siblings) 3 siblings, 0 replies; 32+ messages in thread From: Xi Ruoyao @ 2021-07-08 12:13 UTC (permalink / raw) To: Alejandro Colomar (man-pages), Jonny Grant Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk On Thu, 2021-07-08 at 13:06 +0200, Alejandro Colomar (man-pages) via Gcc-help wrote: > On 7/8/21 12:07 PM, Jonny Grant wrote: > > Thank you for your reply. > > > > We can't guarantee safestrlen() won't be called with NULL. So because > > strlen() itself doesn't check for NULL in C standard we'd need to call > > the wrapper so that NULL can be checked for. > > > > I'd like to avoid the compiler removing certain execution paths. > > I'd rather keep all code paths, even if they are not taken, just in > > case a NULL pointer creeps in due to an external device that is > > connected to an embedded system. If you are taking a pointer from external device "correctly", gcc won't delete your NULL checking path. For example: // defined by linker script extern volatile char *an_io_port_providing_a_pointer; int f() { char *ptr = an_io_port_providing_a_pointer; // C standard disallows to remove it if (ptr == NULL) { gracefully_report_bug("some message"); return -EINVAL; } return g(ptr); } Or // in assembly extern char *read_pointer_from_io_port(int io_port_id); int f() { char *ptr = read_pointer_from_io_port(IO_PORT_A); // C standard disallows to remove it if (ptr == NULL) { gracefully_report_bug("some message"); return -EINVAL; } return g(ptr); } OTOH, if you are taking the pointer from external input incorrectly (i. e. violating C standard and invoking some UB), even if you used some way to enforce the compiler to keep the NULL checking, it would be still unsafe. Even if you want to be "careful" (I'd rather call this "paranoid"), you can use -fno-delete-null-pointer-checks, instead of turning off all optimizations. And, GCC "optimize" attribute/pragma is somewhat buggy and only intended for debugging GCC. If you need to turn off some optmization for a function, it's better to put the function into a seperate TU and use command line option to disable the optimization. By the way, if C can't provide the safety feature you need (for example programming something launching a nuclear missile :), maybe it's better to use Ada or something. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-08 11:06 ` strlen Alejandro Colomar (man-pages) 2021-07-08 12:13 ` strlen Xi Ruoyao @ 2021-07-08 23:49 ` Segher Boessenkool 2021-07-09 13:54 ` strlen Jonny Grant 2021-07-09 10:50 ` strlen Jonny Grant [not found] ` <1627912755.3783669.1625745946723@mail.yahoo.com> 3 siblings, 1 reply; 32+ messages in thread From: Segher Boessenkool @ 2021-07-08 23:49 UTC (permalink / raw) To: Alejandro Colomar (man-pages) Cc: Jonny Grant, gcc-help, linux-man, Florian Weimer, Michael Kerrisk On Thu, Jul 08, 2021 at 01:06:17PM +0200, Alejandro Colomar (man-pages) via Gcc-help wrote: > On 7/8/21 12:07 PM, Jonny Grant wrote: > >We can't guarantee safestrlen() won't be called with NULL. So because > >strlen() itself doesn't check for NULL in C standard we'd need to call the > >wrapper so that NULL can be checked for. > >size_t __attribute__((optimize("O0"))) safestrlen(const char * s) > >{ > > if (NULL == s) return 0; > > else return strlen(s); > >} > That also allows differentiating a length of 0 (i.e., "") from an > invalid string (i.e., NULL), by returning -1 for NULL. It is incorrect to return any particular value for strlen(0); not 0, not -1, not anything. Since there *is* no string, it doesn't have a length either. So instead of making some function for this, I recommend just writing something like bla = s ? strlen(s) : 0; wherever you need it. If a function name isn't self-explanatory, and even *cannot* be, your factoring is most likely not ideal. Code is primarily there for humans to read, it should be optimised for that. Segher ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-08 23:49 ` strlen Segher Boessenkool @ 2021-07-09 13:54 ` Jonny Grant 2021-07-09 14:17 ` strlen Alejandro Colomar (man-pages) 0 siblings, 1 reply; 32+ messages in thread From: Jonny Grant @ 2021-07-09 13:54 UTC (permalink / raw) To: Segher Boessenkool, Alejandro Colomar (man-pages) Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk On 09/07/2021 00:49, Segher Boessenkool wrote: > On Thu, Jul 08, 2021 at 01:06:17PM +0200, Alejandro Colomar (man-pages) via Gcc-help wrote: >> On 7/8/21 12:07 PM, Jonny Grant wrote: >>> We can't guarantee safestrlen() won't be called with NULL. So because >>> strlen() itself doesn't check for NULL in C standard we'd need to call the >>> wrapper so that NULL can be checked for. > >>> size_t __attribute__((optimize("O0"))) safestrlen(const char * s) >>> { >>> if (NULL == s) return 0; >>> else return strlen(s); >>> } > >> That also allows differentiating a length of 0 (i.e., "") from an >> invalid string (i.e., NULL), by returning -1 for NULL. > > It is incorrect to return any particular value for strlen(0); not 0, not > -1, not anything. Since there *is* no string, it doesn't have a length > either. > > So instead of making some function for this, I recommend just writing > something like > > bla = s ? strlen(s) : 0; Hi Segher Yes, this could work. But it does rely on programmer typing it like that every time... Maybe an inline function better. inline size_t safestrlen(const char * s) {return s?strlen(s) : 0} Perhaps there are too many email addresses on this cc list now. I'd prefer a Annex K of C11 style function ISO/IEC TR 24731-1 for strlen() - but there isn't one such as strnlen_s. > > wherever you need it. If a function name isn't self-explanatory, and > even *cannot* be, your factoring is most likely not ideal. Code is > primarily there for humans to read, it should be optimised for that. > > > Segher > . Good point Jonny ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-09 13:54 ` strlen Jonny Grant @ 2021-07-09 14:17 ` Alejandro Colomar (man-pages) 2021-07-09 16:11 ` strlen Xi Ruoyao 2021-07-10 1:00 ` strlen Segher Boessenkool 0 siblings, 2 replies; 32+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-09 14:17 UTC (permalink / raw) To: Jonny Grant, Segher Boessenkool Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk Hi Jonny, Segher, On 7/9/21 3:54 PM, Jonny Grant wrote: > Yes, this could work. But it does rely on programmer typing it like that every time... Maybe an inline function better. I agree on that. > > inline size_t safestrlen(const char * s) {return s?strlen(s) : 0} > > Perhaps there are too many email addresses on this cc list now. > > I'd prefer a Annex K of C11 style function ISO/IEC TR 24731-1 for strlen() - but there isn't one such as strnlen_s. Please, consider not calling some function safesomething() or similar, as it isn't 100% safe. It's like calling some thing "the new X". How will you call the next version? "the nova X"? And the next? "the supernew X"? As I said before, unsigned types are unsafe, you may want to accept it or not, but they are. Now, the day you realize that and develop an even safer function that doesn't use unsigned size_t, what will you call it? supersafestrlen()? Use names that define your functions as closely as possible. > > > On 09/07/2021 00:49, Segher Boessenkool wrote: >> wherever you need it. If a function name isn't self-explanatory, and Agree on this >> even *cannot* be, your factoring is most likely not ideal. Code is But not on this. You could call it strlennull(), that is, a strlen() that special-cases NULL. I find it a good enough name, as long as you document your function. It saves the problem of repeating yourself every time. >> primarily there for humans to read, it should be optimised for that. >> Agree on this again, but I think the following is readable: len = strlennull(maybenull); Regards, Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-09 14:17 ` strlen Alejandro Colomar (man-pages) @ 2021-07-09 16:11 ` Xi Ruoyao 2021-07-10 1:00 ` strlen Segher Boessenkool 1 sibling, 0 replies; 32+ messages in thread From: Xi Ruoyao @ 2021-07-09 16:11 UTC (permalink / raw) To: Alejandro Colomar (man-pages), Jonny Grant, Segher Boessenkool Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk On Fri, 2021-07-09 at 16:17 +0200, Alejandro Colomar (man-pages) via Gcc-help wrote: > Hi Jonny, Segher, > > On 7/9/21 3:54 PM, Jonny Grant wrote: > > Yes, this could work. But it does rely on programmer typing it like > > that every time... Maybe an inline function better. > > I agree on that. > > > > > inline size_t safestrlen(const char * s) {return s?strlen(s) : 0} > > > > Perhaps there are too many email addresses on this cc list now. I think the discussion at least has nothing to do with linux-man or gcc- help: man pages just describe the existing API (C or POSIX or Linux specific), and GCC just compiles code and doesn't care what the API is. Neither is a place to discuss "how to design an API". And I think Jonny should discuss the API design with the users of his API (maybe his collegue or downstream developers), instead of some random guys in mail list. The users are the ones who will call his function anyway so it's better to choose an API they like. Yes, Jonny can "force" the users to do something for safety, but this decision should also be discussed with them and documented. Or they won't understand the decision, and may "invent" or "improvise" some "new wheels", breaking Jonny's design. For example, I don't like a function silently treats NULL as an empty string. I prefer a function to abort() or print a log "strlen_checked() is called with NULL, there is a bug in your code" when I (mis)use NULL. But it's just my 2 cents: if the potential users of the API agree the function to act as that, then it's good to go. -- Xi Ruoyao <xry111@mengyan1223.wang> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-09 14:17 ` strlen Alejandro Colomar (man-pages) 2021-07-09 16:11 ` strlen Xi Ruoyao @ 2021-07-10 1:00 ` Segher Boessenkool 1 sibling, 0 replies; 32+ messages in thread From: Segher Boessenkool @ 2021-07-10 1:00 UTC (permalink / raw) To: Alejandro Colomar (man-pages) Cc: Jonny Grant, gcc-help, linux-man, Florian Weimer, Michael Kerrisk Fine, I'll bite :-) On Fri, Jul 09, 2021 at 04:17:08PM +0200, Alejandro Colomar (man-pages) wrote: > On 7/9/21 3:54 PM, Jonny Grant wrote: > >Yes, this could work. But it does rely on programmer typing it like that > >every time... Maybe an inline function better. > > I agree on that. A function (or any other abstraction) can be fine for this, *iff* you can make people use it correctly. Since it is pretty much impossible to give a good succinct name to this function, I posit that is not the case. Please feel free to prove me wrong (by coming up with a decent name for it). > >I'd prefer a Annex K of C11 style function ISO/IEC TR 24731-1 for strlen() > >- but there isn't one such as strnlen_s. > > Please, consider not calling some function safesomething() or similar, > as it isn't 100% safe. It's like calling some thing "the new X". How > will you call the next version? "the nova X"? And the next? "the > supernew X"? > > As I said before, unsigned types are unsafe, you may want to accept it > or not, but they are. I thought Annex K was great entertainment, but calling unsigned types "unsafe" takes the cake. Unsigned types are Z/nZ with n some power of two. Signed types are not even Z (overflow is undefined). Unsigned types are useful for describing many machine things. They are useful for sizes, not only because sizes cannot be negative, but also because sizes can be bigger than the maximum positive number that can fit in the same size signed number. They are useful for bitty things, registers maybe, stuff in memory, or device I/O registers. And they are much more useful than C signed numbers for holding memory addresses, where you need that (you can do sane aritmetic on it). Using unsigned types without range checking is often wrong ("unsafe" in your words). Using signed types without range checking is just as wrong in the same cases, if not more (overflow is undefined). At least in the "unsigned" case it is *possible* its behaviour is what the programmer intended! > Agree on this again, but I think the following is readable: > > len = strlennull(maybenull); If you use it a million times, of course you can give it a short and non-sensical name, and expect the users to learn what it means. If not, it is better to be slightly more verbose, and reduce the mental load. Segher ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-08 11:06 ` strlen Alejandro Colomar (man-pages) 2021-07-08 12:13 ` strlen Xi Ruoyao 2021-07-08 23:49 ` strlen Segher Boessenkool @ 2021-07-09 10:50 ` Jonny Grant 2021-07-09 11:27 ` strlen Alejandro Colomar (man-pages) [not found] ` <1627912755.3783669.1625745946723@mail.yahoo.com> 3 siblings, 1 reply; 32+ messages in thread From: Jonny Grant @ 2021-07-09 10:50 UTC (permalink / raw) To: Alejandro Colomar (man-pages) Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help On 08/07/2021 12:06, Alejandro Colomar (man-pages) wrote: > On 7/8/21 12:07 PM, Jonny Grant wrote: >> Thank you for your reply. >> >> We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for. >> >> I'd like to avoid the compiler removing certain execution paths. >> I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system. >> >> >> Probably this would work: >> >> size_t __attribute__((optimize("O0"))) safestrlen(const char * s) >> { >> if (NULL == s) return 0; >> else return strlen(s); >> } > > I don't think you don't need that. Unless there's a bug in GCC, it shouldn't optimize that path unless it is 100% sure that it will never be called. That is good, so the code will always be kept! As compiler will never find all calls to strlen() and be sure those calls are never NULL. > Moreover, I recommend you to optimize as much as possible. > Even though NULL is possible in your code, I guess it's unlikely. > > Also, calling a function safe is too generic. > I'd call it with the suffix null, as it act different on null. > > Also, I recommend avoiding 'size_t' (and any other unsigned types, BTW). > See <https://google.github.io/styleguide/cppguide.html#Integer_Types>. > Use the POSIX type 'ssize_t'. > That also allows differentiating a length of 0 (i.e., "") from an invalid string (i.e., NULL), by returning -1 for NULL. > https://man7.org/linux/man-pages/man3/strlen.3.html size_t strlen(const char *s); I'd rather not change the return type from POSIX size_t in any wrapper of strlen. Unless it is part of C11 Annex K style standards improvement. Cheers, Jonny ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-09 10:50 ` strlen Jonny Grant @ 2021-07-09 11:27 ` Alejandro Colomar (man-pages) 2021-07-09 11:43 ` strlen Alejandro Colomar (man-pages) 0 siblings, 1 reply; 32+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-09 11:27 UTC (permalink / raw) To: Jonny Grant Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help, Segher Boessenkool, Xi Ruoyao Hi Jonny, On 7/9/21 12:50 PM, Jonny Grant wrote: > > > On 08/07/2021 12:06, Alejandro Colomar (man-pages) wrote: >> On 7/8/21 12:07 PM, Jonny Grant wrote: >>> Thank you for your reply. >>> >>> We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for. >>> >>> I'd like to avoid the compiler removing certain execution paths. >>> I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system. >>> >>> >>> Probably this would work: >>> >>> size_t __attribute__((optimize("O0"))) safestrlen(const char * s) >>> { >>> if (NULL == s) return 0; >>> else return strlen(s); >>> } >> >> I don't think you don't need that. Unless there's a bug in GCC, it shouldn't optimize that path unless it is 100% sure that it will never be called. > > That is good, so the code will always be kept! As compiler will never find all calls to strlen() and be sure those calls are never NULL. Not always. If you inline that function, that path may be removed in some calls, if the compiler knows better than you that it can. My point is that you shouldn't care; your code is completely legal, and whatever the compiler decides to do will also be legal (no undefined behavior, and no crashes). If it optimizes, it will be a good thing that you shouldn't prevent. If the compiler does otherwise, that's a bug in the compiler, and something you can't solve by writing different code or preventing optimizations, or at least there's no guarantee about it, since it's a bug. But I don't believe you'll find a bug in this case. So please, trust the compiler, at least when using perfectly defined behavior. And if you don't trust the compiler, which is perfectly reasonable, test it, but don't try to workaround bugs that don't exist. > >> Moreover, I recommend you to optimize as much as possible. >> Even though NULL is possible in your code, I guess it's unlikely. >> >> Also, calling a function safe is too generic. >> I'd call it with the suffix null, as it act different on null. >> >> Also, I recommend avoiding 'size_t' (and any other unsigned types, BTW). >> See <https://google.github.io/styleguide/cppguide.html#Integer_Types>. >> Use the POSIX type 'ssize_t'. >> That also allows differentiating a length of 0 (i.e., "") from an invalid string (i.e., NULL), by returning -1 for NULL. >> > > https://man7.org/linux/man-pages/man3/strlen.3.html > size_t strlen(const char *s); > > I'd rather not change the return type from POSIX size_t in any wrapper of strlen. Unless it is part of C11 Annex K style standards improvement. That's a historical accident. A long time ago (much before I was born, and much before the first standard, I mean in the early times of K&R C and Unix), unsigned types were used more than they should, and the first C standards (I mean ANSI/ISO standards (i.e., C89, C99, ...), not POSIX), with a lot of already existing code, didn't attempt to change the language, but to annotate common usage. In POSIX.1 there's a mix, because POSIX has the type 'ssize_t', which is not defined by the C standards. POSIX in general tends to use the signed type 'ssize_t' for its POSIX-only functions (i.e., not in the C standards). Annex K has been an attempt of Microsoft to provide safer functions, but while there are some functions there that have good intentions, most of them are just badly designed. That annex K is DOA, and will probably be marked as deprecated in C22 (currently C2x). I think that a standard should not try to design new functions, and instead just annotate common usage, as they did in the first ones. Problems like the ones Annex K suffers could have been detected early if they had been implemented as an extension to some compiler(s) decade(s) before being standardized. Therefore, if the implementation passes the test of time, you standardize it, else not, IMO. Otherwise, we have a standard that is declared deprecated in the next version of the standard, similar to what is happening with the C++ standards (which, guess what, BTW I recently read that they are undeprecating a lot of C stuff they deprecated in the first standards). Using unsigned for anything else than bitfileds and similar stuff is just *wrong*, as you can read from the Google C++ style guide I linked before. Another source you can read is this paper from Bjarne Stroustrup: <http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1428r0.pdf> This is one of the few cases where I agree with something coming from him. I hope some day we get a ssizeof operator :-) Cheers, Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-09 11:27 ` strlen Alejandro Colomar (man-pages) @ 2021-07-09 11:43 ` Alejandro Colomar (man-pages) 0 siblings, 0 replies; 32+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-09 11:43 UTC (permalink / raw) To: Jonny Grant Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help, Segher Boessenkool, Xi Ruoyao On 7/9/21 1:27 PM, Alejandro Colomar (man-pages) wrote: > Annex K has been an attempt of Microsoft to provide safer functions, but > while there are some functions there that have good intentions, most of > them are just badly designed. That annex K is DOA, and will probably be > marked as deprecated in C22 (currently C2x). > > I think that a standard should not try to design new functions, and > instead just annotate common usage, as they did in the first ones. > Problems like the ones Annex K suffers could have been detected early if > they had been implemented as an extension to some compiler(s) decade(s) > before being standardized. Therefore, if the implementation passes the > test of time, you standardize it, else not, IMO. Otherwise, we have a > standard that is declared deprecated in the next version of the > standard, similar to what is happening with the C++ standards (which, > guess what, BTW I recently read that they are undeprecating a lot of C > stuff they deprecated in the first standards). But let's analyze Annex K (C11): It has been designed by Microsoft. MS's compiler (MS Visual Studio) doesn't even fully support C99 yet (and by that trend, I doubt it never will). At most it supports C89. Visual Studio has a long history of not supporting C except for those parts required to implement their C++ compiler. Would you buy a car designed by a bike manufacturer? -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <1627912755.3783669.1625745946723@mail.yahoo.com>]
[parent not found: <59a70222-a46f-1e65-c9db-6c9e577c8adc@126.com>]
* Re: strlen [not found] ` <59a70222-a46f-1e65-c9db-6c9e577c8adc@126.com> @ 2021-07-09 17:26 ` Martin Sebor 2021-07-09 20:19 ` strlen Alejandro Colomar (man-pages) 0 siblings, 1 reply; 32+ messages in thread From: Martin Sebor @ 2021-07-09 17:26 UTC (permalink / raw) To: LIU Hao, johnsfine, gcc-help, jg; +Cc: linux-man, fw, mtk.manpages On 7/8/21 8:08 PM, LIU Hao via Gcc-help wrote: > 在 7/8/21 8:05 PM, johnsfine--- via Gcc-help 写道: >> This is not the forum for such a discussion. But I want to make >> people reading this aware that many expert C and C++ programmers >> (likely a majority) consider that advice to avoid unsigned types to be >> horrible advice. >> I advise people to avoid signed types and I do so myself. If an >> integer value won't be negative, it shouldn't be signed. That makes >> your intent clearer to anyone reading your code, and (especially in >> x86-64) lets the compiler generate smaller and faster code. > > That makes no sense. Would you prefer unsigned integers to signed ones, > for something that can only be one of {1,2,3,4}? Just because something > can't be negative, does not mean it should be unsigned. There are benefits to making that explicit by choosing an unsigned type: the result of converting a narrower unsigned type to a wider unsigned type is unchanged. The result of converting it to a wider signed type may change. This has an impact on value range propagation which in turn can influence other optimizations as well as warnings. Choosing an unsigned type has liabilities as well. Because unsigned arithmetic wraps around zero whereas signed arithmetic does not (it overflows which is undefined), its results are less constrained than those of signed arithmetic. Martin > > Conversion between `uint64_t` and `double` is much slower than `int64_t` > on some platforms, e.g. x86, so 'smaller and faster code' is also > groundless [1]. > > > A signed integer is a value that denotes the number or amount of > something, such as bytes, characters, files, seconds, kilometers, or > even dollars. An unsigned integer is merely a fixed-sized collection of > bits. That's the only fundamental difference. > > > [1] https://gcc.godbolt.org/z/6YY61rhGe > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-09 17:26 ` strlen Martin Sebor @ 2021-07-09 20:19 ` Alejandro Colomar (man-pages) 0 siblings, 0 replies; 32+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-09 20:19 UTC (permalink / raw) To: Martin Sebor, LIU Hao, johnsfine, gcc-help, jg Cc: linux-man, fw, mtk.manpages On 7/9/21 7:26 PM, Martin Sebor wrote: > On 7/8/21 8:08 PM, LIU Hao via Gcc-help wrote: >> 在 7/8/21 8:05 PM, johnsfine--- via Gcc-help 写道: >>> This is not the forum for such a discussion. But I want to make >>> people reading this aware that many expert C and C++ programmers >>> (likely a majority) consider that advice to avoid unsigned types to >>> be horrible advice. I'm sorry if it's not the right place, I could remove the lists from the CC if it's too noisy, but I think an important point is here, and a couple of emails won't damage too much the mailing lists. On the other hand, I consider bad etiquette removing CCs from a discussion. >>> I advise people to avoid signed types and I do so myself. If an >>> integer value won't be negative, it shouldn't be signed. That makes >>> your intent clearer to anyone reading your code, and (especially in >>> x86-64) lets the compiler generate smaller and faster code. As others have showed with facts, and the Google style guide also hints, using unsigned types just removes the opportunity of the compiler to optimize on overflow, because it has to account for wrapping around. >> >> That makes no sense. Would you prefer unsigned integers to signed >> ones, for something that can only be one of {1,2,3,4}? Just because >> something can't be negative, does not mean it should be unsigned. That. The same way that because a number is never going to be greater than 100 is it any better to use [u]int256_t. Just use int, unless you have a reason not to. Please John, read the paper from Bjarne about unsigned types, it really covers a lot. If you still disagree after reading that, feel free to argument it. <http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1428r0.pdf> > > There are benefits to making that explicit by choosing an unsigned > type: the result of converting a narrower unsigned type to a wider > unsigned type is unchanged. The result of converting it to a wider > signed type may change. This has an impact on value range propagation > which in turn can influence other optimizations as well as warnings. That problem with implicit conversions to larger types is not a problem of signed integers, not even a problem of unsigned integers. It's a problem of mixing both. If you use signed integers for everything, you won't have that problem. Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-07 16:57 ` strlen Alejandro Colomar (man-pages) 2021-07-07 17:23 ` strlen Alejandro Colomar (man-pages) 2021-07-08 10:07 ` strlen Jonny Grant @ 2021-07-09 20:44 ` Jonny Grant 2021-07-10 18:37 ` strlen Alejandro Colomar (man-pages) 2 siblings, 1 reply; 32+ messages in thread From: Jonny Grant @ 2021-07-09 20:44 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man, Michael Kerrisk, Florian Weimer On 07/07/2021 17:57, Alejandro Colomar (man-pages) wrote: > On 7/7/21 3:31 PM, Jonny Grant wrote: [snip] >> >> >> >> Maybe the man page could just state: >> >> >> NOTES >> >> The calling strlen with a NULL pointer is undefined behavior. > > Okay. I agree that should probably be documented. > I'm surprised it's not documented already. Not even in the glibc manual (or I couldn't find it). > > There are a lot of functions that should get this addition, though. I'd like to patch them all at once. I'll try to find a list of functions documented in the man pages and that have nonnull in the oimplementation. If I don't come back soon with a list, please ping me. > > Thanks, > > Alex > Perhaps the NOTES section on strlen(3) could also give a hint that strnlen would be better to use than strlen if the max buffer size is known. Likewise suggestion the same for wcslen(3) could suggest wcsnlen(3) where the buffer size is know. With kind regards Jonny ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-09 20:44 ` strlen Jonny Grant @ 2021-07-10 18:37 ` Alejandro Colomar (man-pages) 2021-07-10 20:49 ` strlen Jonny Grant 0 siblings, 1 reply; 32+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-10 18:37 UTC (permalink / raw) To: Jonny Grant Cc: linux-man, Michael Kerrisk, Florian Weimer, Alejandro Colomar (man-pages) Hi Jonny, On 7/9/21 10:44 PM, Jonny Grant wrote: > > > On 07/07/2021 17:57, Alejandro Colomar (man-pages) wrote: >> On 7/7/21 3:31 PM, Jonny Grant wrote: > [snip] >>> >>> >>> >>> Maybe the man page could just state: >>> >>> >>> NOTES >>> >>> The calling strlen with a NULL pointer is undefined behavior. >> >> Okay. I agree that should probably be documented. >> I'm surprised it's not documented already. Not even in the glibc manual (or I couldn't find it). >> >> There are a lot of functions that should get this addition, though. I'd like to patch them all at once. I'll try to find a list of functions documented in the man pages and that have nonnull in the oimplementation. If I don't come back soon with a list, please ping me. >> >> Thanks, >> >> Alex >> > > Perhaps the NOTES section on strlen(3) could also give a hint that strnlen would be better to use than strlen if the max buffer size is known. Likewise suggestion the same for wcslen(3) could suggest wcsnlen(3) where the buffer size is know. Agreed. I applied the following patch. Kind regards, Alex --- From a9ab4fdd530486450b84137dce1d869f6cbfcbe0 Mon Sep 17 00:00:00 2001 From: Alejandro Colomar <alx.manpages@gmail.com> Date: Sat, 10 Jul 2021 20:34:59 +0200 Subject: strlen.3, wcslen.3: Add recommendations for safer variants Reported-by: Jonny Grant <jg@jguk.org> Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com> --- man3/strlen.3 | 6 ++++++ man3/wcslen.3 | 9 ++++++++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/man3/strlen.3 b/man3/strlen.3 index dea4c1050..fb734db1b 100644 --- a/man3/strlen.3 +++ b/man3/strlen.3 @@ -66,6 +66,12 @@ T} Thread safety MT-Safe .sp 1 .SH CONFORMING TO POSIX.1-2001, POSIX.1-2008, C89, C99, C11, SVr4, 4.3BSD. +.SH NOTES +.SS strnlen(3) +If there is a known buffer size, +it is probably better to use +.BR strnlen (3), +which can prevent some cases of buffer overrun/overflow. .SH SEE ALSO .BR string (3), .BR strnlen (3), diff --git a/man3/wcslen.3 b/man3/wcslen.3 index af3fcb9ca..868f748a8 100644 --- a/man3/wcslen.3 +++ b/man3/wcslen.3 @@ -58,5 +58,12 @@ T} Thread safety MT-Safe .sp 1 .SH CONFORMING TO POSIX.1-2001, POSIX.1-2008, C99. +.SH NOTES +.SS wcsnlen(3) +If there is a known buffer size, +it is probably better to use +.BR wcsnlen (3), +which can prevent some cases of buffer overrun/overflow. .SH SEE ALSO -.BR strlen (3) +.BR strlen (3), +.BR wcsnlen (3) -- 2.32.0 -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-10 18:37 ` strlen Alejandro Colomar (man-pages) @ 2021-07-10 20:49 ` Jonny Grant 2021-07-10 21:36 ` strlen Alejandro Colomar (man-pages) 0 siblings, 1 reply; 32+ messages in thread From: Jonny Grant @ 2021-07-10 20:49 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man, Michael Kerrisk, Florian Weimer On 10/07/2021 19:37, Alejandro Colomar (man-pages) wrote: > Hi Jonny, > > On 7/9/21 10:44 PM, Jonny Grant wrote: >> >> >> On 07/07/2021 17:57, Alejandro Colomar (man-pages) wrote: >>> On 7/7/21 3:31 PM, Jonny Grant wrote: >> [snip] >>>> >>>> >>>> >>>> Maybe the man page could just state: >>>> >>>> >>>> NOTES >>>> >>>> The calling strlen with a NULL pointer is undefined behavior. >>> >>> Okay. I agree that should probably be documented. >>> I'm surprised it's not documented already. Not even in the glibc manual (or I couldn't find it). >>> >>> There are a lot of functions that should get this addition, though. I'd like to patch them all at once. I'll try to find a list of functions documented in the man pages and that have nonnull in the oimplementation. If I don't come back soon with a list, please ping me. >>> >>> Thanks, >>> >>> Alex >>> >> >> Perhaps the NOTES section on strlen(3) could also give a hint that strnlen would be better to use than strlen if the max buffer size is known. Likewise suggestion the same for wcslen(3) could suggest wcsnlen(3) where the buffer size is know. > > Agreed. > > I applied the following patch. > > Kind regards, > > Alex > > --- >>From a9ab4fdd530486450b84137dce1d869f6cbfcbe0 Mon Sep 17 00:00:00 2001 > From: Alejandro Colomar <alx.manpages@gmail.com> > Date: Sat, 10 Jul 2021 20:34:59 +0200 > Subject: strlen.3, wcslen.3: Add recommendations for safer variants > > Reported-by: Jonny Grant <jg@jguk.org> > Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com> > --- > man3/strlen.3 | 6 ++++++ > man3/wcslen.3 | 9 ++++++++- > 2 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/man3/strlen.3 b/man3/strlen.3 > index dea4c1050..fb734db1b 100644 > --- a/man3/strlen.3 > +++ b/man3/strlen.3 > @@ -66,6 +66,12 @@ T} Thread safety MT-Safe > .sp 1 > .SH CONFORMING TO > POSIX.1-2001, POSIX.1-2008, C89, C99, C11, SVr4, 4.3BSD. > +.SH NOTES > +.SS strnlen(3) > +If there is a known buffer size, > +it is probably better to use > +.BR strnlen (3), > +which can prevent some cases of buffer overrun/overflow. > .SH SEE ALSO > .BR string (3), > .BR strnlen (3), > diff --git a/man3/wcslen.3 b/man3/wcslen.3 > index af3fcb9ca..868f748a8 100644 > --- a/man3/wcslen.3 > +++ b/man3/wcslen.3 > @@ -58,5 +58,12 @@ T} Thread safety MT-Safe > .sp 1 > .SH CONFORMING TO > POSIX.1-2001, POSIX.1-2008, C99. > +.SH NOTES > +.SS wcsnlen(3) > +If there is a known buffer size, > +it is probably better to use > +.BR wcsnlen (3), > +which can prevent some cases of buffer overrun/overflow. > .SH SEE ALSO > -.BR strlen (3) > +.BR strlen (3), > +.BR wcsnlen (3) > Hi Alex Thank you for making the updates! As "buffer overrun" refers to writing to a buffer, my 2 cents would be to express as: "which will prevent reading beyond the end of the character buffer" Any thoughts about adding the following? NOTES Calling strlen with a NULL pointer is undefined behavior. With kind regards Jonny ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-10 20:49 ` strlen Jonny Grant @ 2021-07-10 21:36 ` Alejandro Colomar (man-pages) 2021-07-12 21:16 ` strlen Jonny Grant 0 siblings, 1 reply; 32+ messages in thread From: Alejandro Colomar (man-pages) @ 2021-07-10 21:36 UTC (permalink / raw) To: Jonny Grant; +Cc: linux-man, Michael Kerrisk, Florian Weimer On 7/10/21 10:49 PM, Jonny Grant wrote: > On 10/07/2021 19:37, Alejandro Colomar (man-pages) wrote: >> diff --git a/man3/wcslen.3 b/man3/wcslen.3 >> index af3fcb9ca..868f748a8 100644 >> --- a/man3/wcslen.3 >> +++ b/man3/wcslen.3 >> @@ -58,5 +58,12 @@ T} Thread safety MT-Safe >> .sp 1 >> .SH CONFORMING TO >> POSIX.1-2001, POSIX.1-2008, C99. >> +.SH NOTES >> +.SS wcsnlen(3) >> +If there is a known buffer size, >> +it is probably better to use >> +.BR wcsnlen (3), >> +which can prevent some cases of buffer overrun/overflow. >> .SH SEE ALSO >> -.BR strlen (3) >> +.BR strlen (3), >> +.BR wcsnlen (3) >> > > Hi Alex > > Thank you for making the updates! > > As "buffer overrun" refers to writing to a buffer, my 2 cents would be to express as: > > "which will prevent reading beyond the end of the character buffer" I wrote both overrun and overflow on purpose. I was thinking of something like: const char *src; char dest[5]; src = "This is a very ... long valid string"; len = strnlen(src, 5 - 1); /* strlen(s) wouldn't overrun, but the next call would overflow */ memcpy(dest, src, len); dest[len] = '\0'; But after thinking about it a second time, I think I'll remove it, and keep only overrun (and I like your text, I'll copy part of it), as the overflow is a problem of considering that the size of the buffers is going to be the same, and the solution is not to use strnlen(3), but to differentiate both sizes, which allows to detect truncation. If the input string is known but the output buffer is small, I'd call strlen(str) and then MIN(len, bufsiz) (this is kind of equivalent to what strlcpy(3bsd) does). And if the input string is untrusted, I'd call strnlen(str, strsiz) and MIN(len, bufsiz) (this is kind of what strncpy_s(3) does). > > Any thoughts about adding the following? > > NOTES > Calling strlen with a NULL pointer is undefined behavior. I'm waiting for Michael and/or Florian to comment on my proposal. I don't want to fix a page and have 296 broken... I want to fix them all at once, but am not sure which solution to apply. But yes, adding a line like that one is likely. Thanks! Alex -- Alejandro Colomar Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/ http://www.alejandro-colomar.es/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen 2021-07-10 21:36 ` strlen Alejandro Colomar (man-pages) @ 2021-07-12 21:16 ` Jonny Grant 0 siblings, 0 replies; 32+ messages in thread From: Jonny Grant @ 2021-07-12 21:16 UTC (permalink / raw) To: Alejandro Colomar (man-pages); +Cc: linux-man On 10/07/2021 22:36, Alejandro Colomar (man-pages) wrote: > On 7/10/21 10:49 PM, Jonny Grant wrote: >> On 10/07/2021 19:37, Alejandro Colomar (man-pages) wrote: >>> diff --git a/man3/wcslen.3 b/man3/wcslen.3 >>> index af3fcb9ca..868f748a8 100644 >>> --- a/man3/wcslen.3 >>> +++ b/man3/wcslen.3 >>> @@ -58,5 +58,12 @@ T} Thread safety MT-Safe >>> .sp 1 >>> .SH CONFORMING TO >>> POSIX.1-2001, POSIX.1-2008, C99. >>> +.SH NOTES >>> +.SS wcsnlen(3) >>> +If there is a known buffer size, >>> +it is probably better to use >>> +.BR wcsnlen (3), >>> +which can prevent some cases of buffer overrun/overflow. >>> .SH SEE ALSO >>> -.BR strlen (3) >>> +.BR strlen (3), >>> +.BR wcsnlen (3) >>> >> >> Hi Alex >> >> Thank you for making the updates! >> >> As "buffer overrun" refers to writing to a buffer, my 2 cents would be to express as: >> >> "which will prevent reading beyond the end of the character buffer" > > I wrote both overrun and overflow on purpose. > I was thinking of something like: > > const char *src; > char dest[5]; > > src = "This is a very ... long valid string"; > len = strnlen(src, 5 - 1); > /* strlen(s) wouldn't overrun, but the next call would overflow */ > memcpy(dest, src, len); > dest[len] = '\0'; Sounds like any overflow would be due to the application code, rather than strnlen. > > But after thinking about it a second time, I think I'll remove it, and keep only overrun (and I like your text, I'll copy part of it), as the overflow is a problem of considering that the size of the buffers is going to be the same, and the solution is not to use strnlen(3), but to differentiate both sizes, which allows to detect truncation. > > If the input string is known but the output buffer is small, I'd call strlen(str) and then MIN(len, bufsiz) (this is kind of equivalent to what strlcpy(3bsd) does). > And if the input string is untrusted, I'd call strnlen(str, strsiz) and MIN(len, bufsiz) (this is kind of what strncpy_s(3) does). > > strlcpy is considered harmful, best avoid it. It breaks ISO TR24731 "Do not unexpectedly truncate strings" by silently overwriting memory before checking there was enough space to copy all the bytes. > >> >> Any thoughts about adding the following? >> >> NOTES >> Calling strlen with a NULL pointer is undefined behavior. > > > I'm waiting for Michael and/or Florian to comment on my proposal. > > I don't want to fix a page and have 296 broken... I want to fix them all at once, but am not sure which solution to apply. > > But yes, adding a line like that one is likely. > > Thanks! > > Alex Ok, no immediate rush. Jonny ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2021-07-12 21:16 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-04 18:01 strlen Jonny Grant 2020-09-04 19:21 ` strlen Florian Weimer 2020-09-04 23:14 ` strlen Jonny Grant 2020-09-05 7:12 ` strlen Florian Weimer 2021-07-06 20:30 ` strlen Jonny Grant 2021-07-06 22:11 ` strlen Florian Weimer 2021-07-07 11:36 ` strlen Jonny Grant 2021-07-07 12:22 ` strlen Alejandro Colomar (man-pages) 2021-07-07 12:31 ` strlen Alejandro Colomar (man-pages) 2021-07-07 13:31 ` strlen Jonny Grant 2021-07-07 16:57 ` strlen Alejandro Colomar (man-pages) 2021-07-07 17:23 ` strlen Alejandro Colomar (man-pages) 2021-07-07 17:33 ` strlen Alejandro Colomar (man-pages) 2021-07-09 13:48 ` strlen Jonny Grant 2021-07-08 10:07 ` strlen Jonny Grant 2021-07-08 11:06 ` strlen Alejandro Colomar (man-pages) 2021-07-08 12:13 ` strlen Xi Ruoyao 2021-07-08 23:49 ` strlen Segher Boessenkool 2021-07-09 13:54 ` strlen Jonny Grant 2021-07-09 14:17 ` strlen Alejandro Colomar (man-pages) 2021-07-09 16:11 ` strlen Xi Ruoyao 2021-07-10 1:00 ` strlen Segher Boessenkool 2021-07-09 10:50 ` strlen Jonny Grant 2021-07-09 11:27 ` strlen Alejandro Colomar (man-pages) 2021-07-09 11:43 ` strlen Alejandro Colomar (man-pages) [not found] ` <1627912755.3783669.1625745946723@mail.yahoo.com> [not found] ` <59a70222-a46f-1e65-c9db-6c9e577c8adc@126.com> 2021-07-09 17:26 ` strlen Martin Sebor 2021-07-09 20:19 ` strlen Alejandro Colomar (man-pages) 2021-07-09 20:44 ` strlen Jonny Grant 2021-07-10 18:37 ` strlen Alejandro Colomar (man-pages) 2021-07-10 20:49 ` strlen Jonny Grant 2021-07-10 21:36 ` strlen Alejandro Colomar (man-pages) 2021-07-12 21:16 ` strlen Jonny Grant
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.