* strlen
@ 2020-09-04 18:01 Jonny Grant
2020-09-04 19:21 ` strlen Florian Weimer
0 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2020-09-04 18:01 UTC (permalink / raw)
To: linux-man; +Cc: Michael Kerrisk
Hello
https://man7.org/linux/man-pages/man3/strlen.3.html
Is it possible to clarify :-
* glibc will SIGSEGV if 's' is NULL
* there are no ERROR returns
Jonny
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2020-09-04 18:01 strlen Jonny Grant
@ 2020-09-04 19:21 ` Florian Weimer
2020-09-04 23:14 ` strlen Jonny Grant
0 siblings, 1 reply; 32+ messages in thread
From: Florian Weimer @ 2020-09-04 19:21 UTC (permalink / raw)
To: Jonny Grant; +Cc: linux-man, Michael Kerrisk
* Jonny Grant:
> Hello
>
> https://man7.org/linux/man-pages/man3/strlen.3.html
>
> Is it possible to clarify :-
>
> * glibc will SIGSEGV if 's' is NULL
> * there are no ERROR returns
That would be misleading. Whether strlen (NULL) is undefined also
depends on the compiler. They know that the argument cannot be zero
and optimize accordingly.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2020-09-04 19:21 ` strlen Florian Weimer
@ 2020-09-04 23:14 ` Jonny Grant
2020-09-05 7:12 ` strlen Florian Weimer
0 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2020-09-04 23:14 UTC (permalink / raw)
To: Florian Weimer; +Cc: linux-man, Michael Kerrisk
On 04/09/2020 20:21, Florian Weimer wrote:
> * Jonny Grant:
>
>> Hello
>>
>> https://man7.org/linux/man-pages/man3/strlen.3.html
>>
>> Is it possible to clarify :-
>>
>> * glibc will SIGSEGV if 's' is NULL
>> * there are no ERROR returns
>
> That would be misleading. Whether strlen (NULL) is undefined also
> depends on the compiler. They know that the argument cannot be zero
> and optimize accordingly.
>
Hi,
Do you know a compiler that has a different behaviour? I only tested gcc and clang. How would a compiler optimise?
The header has non-null attribute for the compiler (ie gcc) to use to give a nice warning
str.c:6:8: warning: null argument where non-null required (argument 1) [-Wnonnull]
6 | return strlen(NULL);
However, if the implementation gets NULL it still crashes.
I was using -fno-builtin-strlen so it was glibc, but the __builtin_strlen ha SEGV just like glibc
clang is the same as gcc.
So it feels like it's pretty clear, there is no NULL check, can this be documented?
Jonny
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2020-09-04 23:14 ` strlen Jonny Grant
@ 2020-09-05 7:12 ` Florian Weimer
2021-07-06 20:30 ` strlen Jonny Grant
0 siblings, 1 reply; 32+ messages in thread
From: Florian Weimer @ 2020-09-05 7:12 UTC (permalink / raw)
To: Jonny Grant; +Cc: linux-man, Michael Kerrisk
* Jonny Grant:
> On 04/09/2020 20:21, Florian Weimer wrote:
>> * Jonny Grant:
>>
>>> Hello
>>>
>>> https://man7.org/linux/man-pages/man3/strlen.3.html
>>>
>>> Is it possible to clarify :-
>>>
>>> * glibc will SIGSEGV if 's' is NULL
>>> * there are no ERROR returns
>>
>> That would be misleading. Whether strlen (NULL) is undefined also
>> depends on the compiler. They know that the argument cannot be zero
>> and optimize accordingly.
>>
>
> Hi,
>
> Do you know a compiler that has a different behaviour? I only tested
> gcc and clang. How would a compiler optimise?
Here's an example:
#include <stddef.h>
#include <stdio.h>
#include <string.h>
void
f (const char *str)
{
strlen (str);
if (str == NULL)
puts ("str is NULL");
}
int
main (void)
{
f (NULL);
return 0;
}
When built with -O2, GCC 8 prints nothing, and there is no crash.
My point it's not just the C *library* that makes strlen (NULL)
undefined. It's undefined according to the language, and even if we
changed the glibc implementation, things would still go wrong.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2020-09-05 7:12 ` strlen Florian Weimer
@ 2021-07-06 20:30 ` Jonny Grant
2021-07-06 22:11 ` strlen Florian Weimer
0 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-06 20:30 UTC (permalink / raw)
To: Florian Weimer; +Cc: linux-man, Michael Kerrisk
On 05/09/2020 08:12, Florian Weimer wrote:
> * Jonny Grant:
>
>> On 04/09/2020 20:21, Florian Weimer wrote:
>>> * Jonny Grant:
>>>
>>>> Hello
>>>>
>>>> https://man7.org/linux/man-pages/man3/strlen.3.html
>>>>
>>>> Is it possible to clarify :-
>>>>
>>>> * glibc will SIGSEGV if 's' is NULL
>>>> * there are no ERROR returns
>>>
>>> That would be misleading. Whether strlen (NULL) is undefined also
>>> depends on the compiler. They know that the argument cannot be zero
>>> and optimize accordingly.
>>>
>>
>> Hi,
>>
>> Do you know a compiler that has a different behaviour? I only tested
>> gcc and clang. How would a compiler optimise?
>
> Here's an example:
>
> #include <stddef.h>
> #include <stdio.h>
> #include <string.h>
>
> void
> f (const char *str)
> {
> strlen (str);
> if (str == NULL)
> puts ("str is NULL");
> }
>
> int
> main (void)
> {
> f (NULL);
> return 0;
> }
>
> When built with -O2, GCC 8 prints nothing, and there is no crash.
>
> My point it's not just the C *library* that makes strlen (NULL)
> undefined. It's undefined according to the language, and even if we
> changed the glibc implementation, things would still go wrong.
>
Hi Florian, Apologies for tardy response time.
The reason it does not crash appears to be because of this warning which removes the call to strlen due to the return not being checked?
strlen.c:11:3: warning: statement with no effect [-Wunused-value]
11 | strlen (str);
| ^~~~~~~~~~~~
https://godbolt.org/z/caoes5nGa
.LC0:
.string "str is NULL"
f(char const*):
test rdi, rdi
je .L4
ret
.L4:
mov edi, OFFSET FLAT:.LC0
jmp puts
main:
sub rsp, 8
mov edi, OFFSET FLAT:.LC0
call puts
xor eax, eax
add rsp, 8
ret
Changing to this, it does crash
#include <stddef.h>
#include <stdio.h>
#include <string.h>
void
f (const char *str)
{
size_t len = strlen (str);
printf("len %zu\n", len);
if (str == NULL)
puts ("str is NULL");
}
int
main (void)
{
f (NULL);
return 0;
}
I could not see why in your original example it doesn't output as expected as the assembler does show it 'jmp puts' although seems to have optimised and added that code to 'main' as well?
I had expected to see:
$ ./strlen
str is NULL
Cheers, Jonny
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-06 20:30 ` strlen Jonny Grant
@ 2021-07-06 22:11 ` Florian Weimer
2021-07-07 11:36 ` strlen Jonny Grant
0 siblings, 1 reply; 32+ messages in thread
From: Florian Weimer @ 2021-07-06 22:11 UTC (permalink / raw)
To: Jonny Grant; +Cc: linux-man, Michael Kerrisk
* Jonny Grant:
> The reason it does not crash appears to be because of this warning
> which removes the call to strlen due to the return not being
> checked?
GCC uses the information that the argument to strlen cannot be null on
that particular path.
> strlen.c:11:3: warning: statement with no effect [-Wunused-value]
> 11 | strlen (str);
> | ^~~~~~~~~~~~
>
> https://godbolt.org/z/caoes5nGa
That site probably uses different library headers.
As posted, with Debian's GCC 8.3, I get this for the main function:
main:
xorl %eax, %eax
ret
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-06 22:11 ` strlen Florian Weimer
@ 2021-07-07 11:36 ` Jonny Grant
2021-07-07 12:22 ` strlen Alejandro Colomar (man-pages)
0 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-07 11:36 UTC (permalink / raw)
To: Florian Weimer; +Cc: linux-man, Michael Kerrisk, Jonny Grant
On 06/07/2021 23:11, Florian Weimer wrote:
> * Jonny Grant:
>
>> The reason it does not crash appears to be because of this warning
>> which removes the call to strlen due to the return not being
>> checked?
>
> GCC uses the information that the argument to strlen cannot be null on
> that particular path.
It's a shame GCC doesn't give a warning
It may be GCC is using '__builtin_strlen'
<string.h> marks the param as nonnull. However, I am surprised this does not trigger the GCC warning -Werror=nonnull
/* Return the length of S. */
extern size_t strlen (const char *__s)
__THROW __attribute_pure__ __nonnull ((1));
Perhaps that is just a macro that is not actually used......
If I add another function -Werror=nonnull does give a warning
void test(const char * const p) __attribute__((nonnull));
https://godbolt.org/z/x37sbfWaG
<source>:15:9: error: argument 1 null where non-null expected [-Werror=nonnull]
15 | test(NULL);
>
>> strlen.c:11:3: warning: statement with no effect [-Wunused-value]
>> 11 | strlen (str);
>> | ^~~~~~~~~~~~
>>
>> https://godbolt.org/z/caoes5nGa
>
> That site probably uses different library headers.
>
> As posted, with Debian's GCC 8.3, I get this for the main function:
>
> main:
> xorl %eax, %eax
> ret
>
In that case maybe https://man7.org/linux/man-pages/man3/strlen.3.html should have a "NOTES" section stating something similar to your wording ?
NOTES
The behavior of strlen(NULL) is depends on different things. It may cause a SEGV exception, or the compiler may optimize and remove the code path handling NULL.
Cheers, Jonny
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-07 11:36 ` strlen Jonny Grant
@ 2021-07-07 12:22 ` Alejandro Colomar (man-pages)
2021-07-07 12:31 ` strlen Alejandro Colomar (man-pages)
0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-07 12:22 UTC (permalink / raw)
To: Jonny Grant, Florian Weimer; +Cc: linux-man, Michael Kerrisk
Hello Jonny,
On 7/7/21 1:36 PM, Jonny Grant wrote:
>
>
> On 06/07/2021 23:11, Florian Weimer wrote:
>> * Jonny Grant:
>>
>>> The reason it does not crash appears to be because of this warning
>>> which removes the call to strlen due to the return not being
>>> checked?
>>
>> GCC uses the information that the argument to strlen cannot be null on
>> that particular path.
>
> It's a shame GCC doesn't give a warning
>
> It may be GCC is using '__builtin_strlen'
>
> <string.h> marks the param as nonnull. However, I am surprised this does not trigger the GCC warning -Werror=nonnull
[[gnu::nunnull]], aka __attribute__((nonnull)), is only triggered by
compile time NULL. That is: passing NULL (either hardcoded or through a
macro)
If after optimizations the compiler realizes that an argument is NULL at
compile time, it may (but also may not) trigger the warning. However,
if you optimize too much, the expression may vanish, and the warning be
silenced again. The -fanalyzer option of GCC may (but also may not)
realize of more NULL cases.
I managed to trigger the following, but only with '-O0 -fanalyzer':
[[
<source>:17:20: error: use of NULL 's' where non-null expected [CWE-476]
[-Werror=analyzer-null-argument]
17 | int x = strlen (s);
| ~~~~~~~^~~
'void f(const char*)': events 1-2
|
| 16 | const char *s = NULL;
| | ^
| | |
| | (1) 's' is NULL
| 17 | int x = strlen(s);
| | ~~~~~~~~~~
| | |
| | (2) argument 1 ('s') NULL where non-null
expected
|
In file included from <source>:4:
/usr/include/string.h:385:15: note: argument 1 of 'size_t strlen(const
char*)' must be non-null
385 | extern size_t strlen (const char *__s)
| ^~~~~~
]]
It is undefined behavior to pass NULL to a nonnull parameter, and it is
a problem that the caller should check before the call. The compiler
may warn you of the most obvious ones, but in the end, the
responsibility is on the programmer.
See
<https://stackoverflow.com/questions/42035769/how-to-generate-warning-of-non-null-argument-for-my-custom-function>.
>
> /* Return the length of S. */
> extern size_t strlen (const char *__s)
> __THROW __attribute_pure__ __nonnull ((1));
>
> Perhaps that is just a macro that is not actually used......
It _is_ being used:
/usr/include$ grep -rn 'define __nonnull' -B3 -A1
x86_64-linux-gnu/sys/cdefs.h-290-/* The nonull function attribute allows
to mark pointer parameters which
x86_64-linux-gnu/sys/cdefs.h-291- must not be NULL. */
x86_64-linux-gnu/sys/cdefs.h-292-#if __GNUC_PREREQ (3,3)
x86_64-linux-gnu/sys/cdefs.h:293:# define __nonnull(params)
__attribute__ ((__nonnull__ params))
x86_64-linux-gnu/sys/cdefs.h-294-#else
x86_64-linux-gnu/sys/cdefs.h:295:# define __nonnull(params)
x86_64-linux-gnu/sys/cdefs.h-296-#endif
>
> If I add another function -Werror=nonnull does give a warning
> void test(const char * const p) __attribute__((nonnull));
>
> https://godbolt.org/z/x37sbfWaG
>
> <source>:15:9: error: argument 1 null where non-null expected [-Werror=nonnull]
> 15 | test(NULL);
>
>>
>>> strlen.c:11:3: warning: statement with no effect [-Wunused-value]
>>> 11 | strlen (str);
>>> | ^~~~~~~~~~~~
>>>
>>> https://godbolt.org/z/caoes5nGa
>>
>> That site probably uses different library headers.
>>
>> As posted, with Debian's GCC 8.3, I get this for the main function:
>>
>> main:
>> xorl %eax, %eax
>> ret
>>
>
>
> In that case maybe https://man7.org/linux/man-pages/man3/strlen.3.html should have a "NOTES" section stating something similar to your wording ?
>
> NOTES
>
> The behavior of strlen(NULL) is depends on different things. It may cause a SEGV exception, or the compiler may optimize and remove the code path handling NULL.
>
I disagree with this. It is likely that the behavior is that, given the
current implementation of Linux/GCC/glibc. But it is undefined
behavior, and anything can happen. You should just try harder to avoid
it, and not rely on any possible outcome of it. GCC people may decide
tomorrow to change the behavior to do some more agresive optimizations,
and the documentation shouldn't preclude such a thing, as long as it's
legal according to the relevant standards, and sane.
Cheers,
Alex
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-07 12:22 ` strlen Alejandro Colomar (man-pages)
@ 2021-07-07 12:31 ` Alejandro Colomar (man-pages)
2021-07-07 13:31 ` strlen Jonny Grant
0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-07 12:31 UTC (permalink / raw)
To: Jonny Grant, Florian Weimer; +Cc: linux-man, Michael Kerrisk
On 7/7/21 2:22 PM, Alejandro Colomar (man-pages) wrote:
> I disagree with this. It is likely that the behavior is that, given the
> current implementation of Linux/GCC/glibc. But it is undefined
> behavior, and anything can happen. You should just try harder to avoid
> it, and not rely on any possible outcome of it. GCC people may decide
> tomorrow to change the behavior to do some more agresive optimizations,
> and the documentation shouldn't preclude such a thing, as long as it's
> legal according to the relevant standards, and sane.
The standard (and implementations) define a set of thing you can do in
C. Those are an equilibrium between usability and room for
optimizations. Some things must remain undefined for the language to be
more efficient and simple.
If the language, or an implementation of it, attempted to provide a
defined behavior for absolutely everything, some optimizations could not
be done, and also, it would be much harder to actually implement it (and
also document it). So for good reasons, UB (undefined behavior) remains
undefined.
Cheers,
Alex
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-07 12:31 ` strlen Alejandro Colomar (man-pages)
@ 2021-07-07 13:31 ` Jonny Grant
2021-07-07 16:57 ` strlen Alejandro Colomar (man-pages)
0 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-07 13:31 UTC (permalink / raw)
To: Alejandro Colomar (man-pages), Florian Weimer; +Cc: linux-man, Michael Kerrisk
On 07/07/2021 13:31, Alejandro Colomar (man-pages) wrote:
> On 7/7/21 2:22 PM, Alejandro Colomar (man-pages) wrote:
>> I disagree with this. It is likely that the behavior is that, given the current implementation of Linux/GCC/glibc. But it is undefined behavior, and anything can happen. You should just try harder to avoid it, and not rely on any possible outcome of it. GCC people may decide tomorrow to change the behavior to do some more agresive optimizations, and the documentation shouldn't preclude such a thing, as long as it's legal according to the relevant standards, and sane.
>
> The standard (and implementations) define a set of thing you can do in C. Those are an equilibrium between usability and room for optimizations. Some things must remain undefined for the language to be more efficient and simple.
>
> If the language, or an implementation of it, attempted to provide a defined behavior for absolutely everything, some optimizations could not be done, and also, it would be much harder to actually implement it (and also document it). So for good reasons, UB (undefined behavior) remains undefined.
>
>
> Cheers,
>
> Alex
>
>
Hi Alex, Florian
Do you think this would get optimized out by GCC too?
size_t safestrlen(const char * s)
{
if (NULL == s) return 0;
else return strlen(s);
}
Maybe the man page could just state:
NOTES
The calling strlen with a NULL pointer is undefined behavior.
Cheers, Jonny
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-07 13:31 ` strlen Jonny Grant
@ 2021-07-07 16:57 ` Alejandro Colomar (man-pages)
2021-07-07 17:23 ` strlen Alejandro Colomar (man-pages)
` (2 more replies)
0 siblings, 3 replies; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-07 16:57 UTC (permalink / raw)
To: Jonny Grant; +Cc: linux-man, Michael Kerrisk, Florian Weimer
On 7/7/21 3:31 PM, Jonny Grant wrote:
>
>
> On 07/07/2021 13:31, Alejandro Colomar (man-pages) wrote:
>> On 7/7/21 2:22 PM, Alejandro Colomar (man-pages) wrote:
>>> I disagree with this. It is likely that the behavior is that, given the current implementation of Linux/GCC/glibc. But it is undefined behavior, and anything can happen. You should just try harder to avoid it, and not rely on any possible outcome of it. GCC people may decide tomorrow to change the behavior to do some more agresive optimizations, and the documentation shouldn't preclude such a thing, as long as it's legal according to the relevant standards, and sane.
>>
>> The standard (and implementations) define a set of thing you can do in C. Those are an equilibrium between usability and room for optimizations. Some things must remain undefined for the language to be more efficient and simple.
>>
>> If the language, or an implementation of it, attempted to provide a defined behavior for absolutely everything, some optimizations could not be done, and also, it would be much harder to actually implement it (and also document it). So for good reasons, UB (undefined behavior) remains undefined.
>>
>>
>> Cheers,
>>
>> Alex
>>
>>
>
> Hi Alex, Florian
>
> Do you think this would get optimized out by GCC too?
>
> size_t safestrlen(const char * s)
> {
> if (NULL == s) return 0;
> else return strlen(s);
> }
This would be optimized if the compiler can determine that s == NULL or
s != NULL, which tipically would be that you ask the compiler to
optimize, AND the compiler can deduce at compile time its relationship
with NULL; OR you ask the compiler to optimize at link time (-flto) AND
the relationship of s and NULL can be deduced at link time.
However, I don't see why that would be a problem. Either you can
guarantee that s is not NULL, and you don't need to call this
safestrlen() wrapper, or you cannot guarantee it and then you are forced
to call this wrapper. The optimization, if it happens, will be good.
What I recommend you to do from time to time, to make sure you don't
miss any warnings, is compile the whole project first with '-O3' and
then with '-O0'. If you are a bit paranoic, sporadically you can try
all of them : '-Og', '-O0', '-O1', '-Os', '-O2', '-O3', '-Ofast' but I
don't think that is necessary. Of course, always use '-fanalyzer' (GCC
10 and above).
>
>
>
> Maybe the man page could just state:
>
>
> NOTES
>
> The calling strlen with a NULL pointer is undefined behavior.
Okay. I agree that should probably be documented.
I'm surprised it's not documented already. Not even in the glibc manual
(or I couldn't find it).
There are a lot of functions that should get this addition, though. I'd
like to patch them all at once. I'll try to find a list of functions
documented in the man pages and that have nonnull in the
oimplementation. If I don't come back soon with a list, please ping me.
Thanks,
Alex
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-07 16:57 ` strlen Alejandro Colomar (man-pages)
@ 2021-07-07 17:23 ` Alejandro Colomar (man-pages)
2021-07-07 17:33 ` strlen Alejandro Colomar (man-pages)
2021-07-08 10:07 ` strlen Jonny Grant
2021-07-09 20:44 ` strlen Jonny Grant
2 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-07 17:23 UTC (permalink / raw)
To: Jonny Grant, Michael Kerrisk, Florian Weimer; +Cc: linux-man
Hi Jonny, Florian, Michael,
On 7/7/21 6:57 PM, Alejandro Colomar (man-pages) wrote:
>> Maybe the man page could just state:
>>
>>
>> NOTES
>>
>> The calling strlen with a NULL pointer is undefined behavior.
>
> Okay. I agree that should probably be documented.
> I'm surprised it's not documented already. Not even in the glibc manual
> (or I couldn't find it).
>
> There are a lot of functions that should get this addition, though. I'd
> like to patch them all at once. I'll try to find a list of functions
> documented in the man pages and that have nonnull in the
> oimplementation. If I don't come back soon with a list, please ping me.
Humm, the list is huge, much bigger than I expected...
Just try yourselves:
~/src/gnu/glibc$ man_lsfunc ../../linux/man-pages/man3 > lsfunc;
~/src/gnu/glibc$ cat lsfunc \
| while read f; do \
grep_glibc_prototype $f \
| grep nonnull >/dev/null \
&& grep_glibc_prototype $f; \
done \
> nonnull;
~/src/gnu/glibc$ wc -l nonnull;
296 nonnull
grep_glibc_prototype is defined in the man-pages, in <scripts/bash_aliases>.
How do you think this should be handled?
Adding a line in NOTES for every such function? Adding [[gnu::nonnull]]
to every such prototype in SYNOPSIS (this might be too noisy)? Else?
Thanks,
Alex
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-07 17:23 ` strlen Alejandro Colomar (man-pages)
@ 2021-07-07 17:33 ` Alejandro Colomar (man-pages)
2021-07-09 13:48 ` strlen Jonny Grant
0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-07 17:33 UTC (permalink / raw)
To: Jonny Grant, Michael Kerrisk, Florian Weimer; +Cc: linux-man, Libc-alpha
On 7/7/21 7:23 PM, Alejandro Colomar (man-pages) wrote:
> How do you think this should be handled?
> Adding a line in NOTES for every such function? Adding [[gnu::nonnull]]
> to every such prototype in SYNOPSIS (this might be too noisy)? Else?
As an example of how man pages could look like with the addition of
[[gnu::nonnull]], you can have a look at this manual page of mine:
[[
...
SYNOPSIS
#include <alx/base/stdlib.h>
[[gnu::nonnull]] [[gnu::warn_unused_result]]
int alx_callocs(type **ptr, ptrdiff_t nmemb);
[[gnu::malloc]] [[gnu::warn_unused_result]]
void *alx_mallocarray(ptrdiff_t nmemb, ssize_t size);
[[gnu::nonnull]] [[gnu::warn_unused_result]]
int alx_mallocarrays(type **ptr, ptrdiff_t nmemb);
[[gnu::nonnull]] [[gnu::warn_unused_result]]
int alx_mallocs(type **ptr, ssize_t nmemb);
[[gnu::warn_unused_result]]
void *alx_reallocarrayf(void *ptr, ptrdiff_t nmemb, ssize_t nmemb);
[[gnu::nonnull]] [[gnu::warn_unused_result]]
int alx_reallocarrayfs(type **ptr, ptrdiff_t nmemb);
[[gnu::nonnull]] [[gnu::warn_unused_result]]
int alx_reallocfs(type **ptr, ssize_t nmemb);
[[gnu::nonnull]] [[gnu::warn_unused_result]]
int alx_reallocs(type **ptr, ssize_t nmemb);
[[gnu::nonnull]]
int alx_frees(type **ptr);
...
]]
Source:
<https://github.com/alejandro-colomar/libalx/tree/main/share/man/base/man3>.
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-07 16:57 ` strlen Alejandro Colomar (man-pages)
2021-07-07 17:23 ` strlen Alejandro Colomar (man-pages)
@ 2021-07-08 10:07 ` Jonny Grant
2021-07-08 11:06 ` strlen Alejandro Colomar (man-pages)
2021-07-09 20:44 ` strlen Jonny Grant
2 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-08 10:07 UTC (permalink / raw)
To: Alejandro Colomar (man-pages)
Cc: linux-man, Michael Kerrisk, Florian Weimer, Jonny Grant
Hi Alex
On 07/07/2021 17:57, Alejandro Colomar (man-pages) wrote:
> On 7/7/21 3:31 PM, Jonny Grant wrote:
>>
>>
>> On 07/07/2021 13:31, Alejandro Colomar (man-pages) wrote:
>>> On 7/7/21 2:22 PM, Alejandro Colomar (man-pages) wrote:
>>>> I disagree with this. It is likely that the behavior is that, given the current implementation of Linux/GCC/glibc. But it is undefined behavior, and anything can happen. You should just try harder to avoid it, and not rely on any possible outcome of it. GCC people may decide tomorrow to change the behavior to do some more agresive optimizations, and the documentation shouldn't preclude such a thing, as long as it's legal according to the relevant standards, and sane.
>>>
>>> The standard (and implementations) define a set of thing you can do in C. Those are an equilibrium between usability and room for optimizations. Some things must remain undefined for the language to be more efficient and simple.
>>>
>>> If the language, or an implementation of it, attempted to provide a defined behavior for absolutely everything, some optimizations could not be done, and also, it would be much harder to actually implement it (and also document it). So for good reasons, UB (undefined behavior) remains undefined.
>>>
>>>
>>> Cheers,
>>>
>>> Alex
>>>
>>>
>>
>> Hi Alex, Florian
>>
>> Do you think this would get optimized out by GCC too?
>>
>> size_t safestrlen(const char * s)
>> {
>> if (NULL == s) return 0;
>> else return strlen(s);
>> }
>
> This would be optimized if the compiler can determine that s == NULL or s != NULL, which tipically would be that you ask the compiler to optimize, AND the compiler can deduce at compile time its relationship with NULL; OR you ask the compiler to optimize at link time (-flto) AND the relationship of s and NULL can be deduced at link time.
>
> However, I don't see why that would be a problem. Either you can guarantee that s is not NULL, and you don't need to call this safestrlen() wrapper, or you cannot guarantee it and then you are forced to call this wrapper. The optimization, if it happens, will be good.
Thank you for your reply.
We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for.
I'd like to avoid the compiler removing certain execution paths.
I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system.
Probably this would work:
size_t __attribute__((optimize("O0"))) safestrlen(const char * s)
{
if (NULL == s) return 0;
else return strlen(s);
}
I also use 'volatile' for reads/writes to addresses that I don't want to be optimized out.
>
> What I recommend you to do from time to time, to make sure you don't miss any warnings, is compile the whole project first with '-O3' and then with '-O0'. If you are a bit paranoic, sporadically you can try all of them : '-Og', '-O0', '-O1', '-Os', '-O2', '-O3', '-Ofast' but I don't think that is necessary. Of course, always use '-fanalyzer' (GCC 10 and above).
Yes, I am looking forward to David Malcom's -fanalyzer when Ubuntu LTS next upgrades, I'm on gcc 9.3 today. But -fanalyzer is only for C anyway.. much of of code base I work with is compiled as C++ so I can't use -fanalyzer yet.
https://developers.redhat.com/blog/2020/03/26/static-analysis-in-gcc-10#trying_it_out
I do have these other instrumentation options.
-fsanitize=null,returns-nonnull-attribute,signed-integer-overflow,leak,undefined,address
>>
>>
>>
>> Maybe the man page could just state:
>>
>>
>> NOTES
>>
>> The calling strlen with a NULL pointer is undefined behavior.
>
> Okay. I agree that should probably be documented.
> I'm surprised it's not documented already. Not even in the glibc manual (or I couldn't find it).
>
> There are a lot of functions that should get this addition, though. I'd like to patch them all at once. I'll try to find a list of functions documented in the man pages and that have nonnull in the oimplementation. If I don't come back soon with a list, please ping me.
>
> Thanks,
>
> Alex
>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-08 10:07 ` strlen Jonny Grant
@ 2021-07-08 11:06 ` Alejandro Colomar (man-pages)
2021-07-08 12:13 ` strlen Xi Ruoyao
` (3 more replies)
0 siblings, 4 replies; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-08 11:06 UTC (permalink / raw)
To: Jonny Grant; +Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help
On 7/8/21 12:07 PM, Jonny Grant wrote:
> Thank you for your reply.
>
> We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for.
>
> I'd like to avoid the compiler removing certain execution paths.
> I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system.
>
>
> Probably this would work:
>
> size_t __attribute__((optimize("O0"))) safestrlen(const char * s)
> {
> if (NULL == s) return 0;
> else return strlen(s);
> }
I don't think you don't need that. Unless there's a bug in GCC, it
shouldn't optimize that path unless it is 100% sure that it will never
be called.
Moreover, I recommend you to optimize as much as possible.
Even though NULL is possible in your code, I guess it's unlikely.
Also, calling a function safe is too generic.
I'd call it with the suffix null, as it act different on null.
Also, I recommend avoiding 'size_t' (and any other unsigned types, BTW).
See <https://google.github.io/styleguide/cppguide.html#Integer_Types>.
Use the POSIX type 'ssize_t'.
That also allows differentiating a length of 0 (i.e., "") from an
invalid string (i.e., NULL), by returning -1 for NULL.
Recommended implementation (requires C99 or later due to the usage of
inline):
[
// compiler.h
#define likely(x) __builtin_expect((x), 1)
#define unlikely(x) __builtin_expect((x), 0)
]
[
// strlennull.h
#include <string.h>
#include <sys/types.h>
#include "compiler.h"
[[gnu::pure]]
inline ssize_t strlennull(const char *s);
inline ssize_t strlennull(const char *s)
{
if (unlikely(!s))
return -1;
return strlen(s);
}
]
[
// strlennull.c
#include "strlennull.h"
#include <sys/types.h>
extern inline ssize_t strlennull(const char *s);
]
BTW, if the input is so untrusted that NULL may be possible, I guess it
might as well contain invalid strings (non-NUL-terminated), for which
strlen(3) would behave as bad or worse. So I recommend having a look at
strnlen(3) and maybe implement strnlennull() instead of strlennull().
Unless you _know_ there's a compiler bug that doesn't allow you to
optimize, please optimize. Otherwise, if it's just a bit of paranoia,
you could as well not optimize any code at all. Specifically not
optimizing this code by the use of pragmas or attributes is *wrong*.
Just revise the preprocessor output, the compiler output, and also try
introducing a NULL at run time, to check that everything's fine.
>
> I also use 'volatile' for reads/writes to addresses that I don't want to be optimized out.
Are you sure you don't want to optimize? Are those addresses hardware
I/O or interrupts?
Maybe you just want membarrier(2).
>
>>
>> What I recommend you to do from time to time, to make sure you don't miss any warnings, is compile the whole project first with '-O3' and then with '-O0'. If you are a bit paranoic, sporadically you can try all of them : '-Og', '-O0', '-O1', '-Os', '-O2', '-O3', '-Ofast' but I don't think that is necessary. Of course, always use '-fanalyzer' (GCC 10 and above).
>
> Yes, I am looking forward to David Malcom's -fanalyzer when Ubuntu LTS next upgrades, I'm on gcc 9.3 today. But -fanalyzer is only for C anyway.. much of of code base I work with is compiled as C++ so I can't use -fanalyzer yet.
You can install gcc-10 for Bionic: apt-get install gcc-10
I recommend it. It finds bugs very deep in the code.
Cheers,
Alex
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-08 11:06 ` strlen Alejandro Colomar (man-pages)
@ 2021-07-08 12:13 ` Xi Ruoyao
2021-07-08 23:49 ` strlen Segher Boessenkool
` (2 subsequent siblings)
3 siblings, 0 replies; 32+ messages in thread
From: Xi Ruoyao @ 2021-07-08 12:13 UTC (permalink / raw)
To: Alejandro Colomar (man-pages), Jonny Grant
Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk
On Thu, 2021-07-08 at 13:06 +0200, Alejandro Colomar (man-pages) via
Gcc-help wrote:
> On 7/8/21 12:07 PM, Jonny Grant wrote:
> > Thank you for your reply.
> >
> > We can't guarantee safestrlen() won't be called with NULL. So because
> > strlen() itself doesn't check for NULL in C standard we'd need to call
> > the wrapper so that NULL can be checked for.
> >
> > I'd like to avoid the compiler removing certain execution paths.
> > I'd rather keep all code paths, even if they are not taken, just in
> > case a NULL pointer creeps in due to an external device that is
> > connected to an embedded system.
If you are taking a pointer from external device "correctly", gcc won't
delete your NULL checking path. For example:
// defined by linker script
extern volatile char *an_io_port_providing_a_pointer;
int f()
{
char *ptr = an_io_port_providing_a_pointer;
// C standard disallows to remove it
if (ptr == NULL) {
gracefully_report_bug("some message");
return -EINVAL;
}
return g(ptr);
}
Or
// in assembly
extern char *read_pointer_from_io_port(int io_port_id);
int f()
{
char *ptr = read_pointer_from_io_port(IO_PORT_A);
// C standard disallows to remove it
if (ptr == NULL) {
gracefully_report_bug("some message");
return -EINVAL;
}
return g(ptr);
}
OTOH, if you are taking the pointer from external input incorrectly (i.
e. violating C standard and invoking some UB), even if you used some way
to enforce the compiler to keep the NULL checking, it would be still
unsafe.
Even if you want to be "careful" (I'd rather call this "paranoid"), you
can use -fno-delete-null-pointer-checks, instead of turning off all
optimizations.
And, GCC "optimize" attribute/pragma is somewhat buggy and only intended
for debugging GCC. If you need to turn off some optmization for a
function, it's better to put the function into a seperate TU and use
command line option to disable the optimization.
By the way, if C can't provide the safety feature you need (for example
programming something launching a nuclear missile :), maybe it's better
to use Ada or something.
--
Xi Ruoyao <xry111@mengyan1223.wang>
School of Aerospace Science and Technology, Xidian University
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-08 11:06 ` strlen Alejandro Colomar (man-pages)
2021-07-08 12:13 ` strlen Xi Ruoyao
@ 2021-07-08 23:49 ` Segher Boessenkool
2021-07-09 13:54 ` strlen Jonny Grant
2021-07-09 10:50 ` strlen Jonny Grant
[not found] ` <1627912755.3783669.1625745946723@mail.yahoo.com>
3 siblings, 1 reply; 32+ messages in thread
From: Segher Boessenkool @ 2021-07-08 23:49 UTC (permalink / raw)
To: Alejandro Colomar (man-pages)
Cc: Jonny Grant, gcc-help, linux-man, Florian Weimer, Michael Kerrisk
On Thu, Jul 08, 2021 at 01:06:17PM +0200, Alejandro Colomar (man-pages) via Gcc-help wrote:
> On 7/8/21 12:07 PM, Jonny Grant wrote:
> >We can't guarantee safestrlen() won't be called with NULL. So because
> >strlen() itself doesn't check for NULL in C standard we'd need to call the
> >wrapper so that NULL can be checked for.
> >size_t __attribute__((optimize("O0"))) safestrlen(const char * s)
> >{
> > if (NULL == s) return 0;
> > else return strlen(s);
> >}
> That also allows differentiating a length of 0 (i.e., "") from an
> invalid string (i.e., NULL), by returning -1 for NULL.
It is incorrect to return any particular value for strlen(0); not 0, not
-1, not anything. Since there *is* no string, it doesn't have a length
either.
So instead of making some function for this, I recommend just writing
something like
bla = s ? strlen(s) : 0;
wherever you need it. If a function name isn't self-explanatory, and
even *cannot* be, your factoring is most likely not ideal. Code is
primarily there for humans to read, it should be optimised for that.
Segher
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-08 11:06 ` strlen Alejandro Colomar (man-pages)
2021-07-08 12:13 ` strlen Xi Ruoyao
2021-07-08 23:49 ` strlen Segher Boessenkool
@ 2021-07-09 10:50 ` Jonny Grant
2021-07-09 11:27 ` strlen Alejandro Colomar (man-pages)
[not found] ` <1627912755.3783669.1625745946723@mail.yahoo.com>
3 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-09 10:50 UTC (permalink / raw)
To: Alejandro Colomar (man-pages)
Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help
On 08/07/2021 12:06, Alejandro Colomar (man-pages) wrote:
> On 7/8/21 12:07 PM, Jonny Grant wrote:
>> Thank you for your reply.
>>
>> We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for.
>>
>> I'd like to avoid the compiler removing certain execution paths.
>> I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system.
>>
>>
>> Probably this would work:
>>
>> size_t __attribute__((optimize("O0"))) safestrlen(const char * s)
>> {
>> if (NULL == s) return 0;
>> else return strlen(s);
>> }
>
> I don't think you don't need that. Unless there's a bug in GCC, it shouldn't optimize that path unless it is 100% sure that it will never be called.
That is good, so the code will always be kept! As compiler will never find all calls to strlen() and be sure those calls are never NULL.
> Moreover, I recommend you to optimize as much as possible.
> Even though NULL is possible in your code, I guess it's unlikely.
>
> Also, calling a function safe is too generic.
> I'd call it with the suffix null, as it act different on null.
>
> Also, I recommend avoiding 'size_t' (and any other unsigned types, BTW).
> See <https://google.github.io/styleguide/cppguide.html#Integer_Types>.
> Use the POSIX type 'ssize_t'.
> That also allows differentiating a length of 0 (i.e., "") from an invalid string (i.e., NULL), by returning -1 for NULL.
>
https://man7.org/linux/man-pages/man3/strlen.3.html
size_t strlen(const char *s);
I'd rather not change the return type from POSIX size_t in any wrapper of strlen. Unless it is part of C11 Annex K style standards improvement.
Cheers, Jonny
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-09 10:50 ` strlen Jonny Grant
@ 2021-07-09 11:27 ` Alejandro Colomar (man-pages)
2021-07-09 11:43 ` strlen Alejandro Colomar (man-pages)
0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-09 11:27 UTC (permalink / raw)
To: Jonny Grant
Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help,
Segher Boessenkool, Xi Ruoyao
Hi Jonny,
On 7/9/21 12:50 PM, Jonny Grant wrote:
>
>
> On 08/07/2021 12:06, Alejandro Colomar (man-pages) wrote:
>> On 7/8/21 12:07 PM, Jonny Grant wrote:
>>> Thank you for your reply.
>>>
>>> We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for.
>>>
>>> I'd like to avoid the compiler removing certain execution paths.
>>> I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system.
>>>
>>>
>>> Probably this would work:
>>>
>>> size_t __attribute__((optimize("O0"))) safestrlen(const char * s)
>>> {
>>> if (NULL == s) return 0;
>>> else return strlen(s);
>>> }
>>
>> I don't think you don't need that. Unless there's a bug in GCC, it shouldn't optimize that path unless it is 100% sure that it will never be called.
>
> That is good, so the code will always be kept! As compiler will never find all calls to strlen() and be sure those calls are never NULL.
Not always. If you inline that function, that path may be removed in
some calls, if the compiler knows better than you that it can. My point
is that you shouldn't care; your code is completely legal, and whatever
the compiler decides to do will also be legal (no undefined behavior,
and no crashes). If it optimizes, it will be a good thing that you
shouldn't prevent.
If the compiler does otherwise, that's a bug in the compiler, and
something you can't solve by writing different code or preventing
optimizations, or at least there's no guarantee about it, since it's a
bug. But I don't believe you'll find a bug in this case. So please,
trust the compiler, at least when using perfectly defined behavior. And
if you don't trust the compiler, which is perfectly reasonable, test it,
but don't try to workaround bugs that don't exist.
>
>> Moreover, I recommend you to optimize as much as possible.
>> Even though NULL is possible in your code, I guess it's unlikely.
>>
>> Also, calling a function safe is too generic.
>> I'd call it with the suffix null, as it act different on null.
>>
>> Also, I recommend avoiding 'size_t' (and any other unsigned types, BTW).
>> See <https://google.github.io/styleguide/cppguide.html#Integer_Types>.
>> Use the POSIX type 'ssize_t'.
>> That also allows differentiating a length of 0 (i.e., "") from an invalid string (i.e., NULL), by returning -1 for NULL.
>>
>
> https://man7.org/linux/man-pages/man3/strlen.3.html
> size_t strlen(const char *s);
>
> I'd rather not change the return type from POSIX size_t in any wrapper of strlen. Unless it is part of C11 Annex K style standards improvement.
That's a historical accident.
A long time ago (much before I was born, and much before the first
standard, I mean in the early times of K&R C and Unix), unsigned types
were used more than they should, and the first C standards (I mean
ANSI/ISO standards (i.e., C89, C99, ...), not POSIX), with a lot of
already existing code, didn't attempt to change the language, but to
annotate common usage.
In POSIX.1 there's a mix, because POSIX has the type 'ssize_t', which is
not defined by the C standards. POSIX in general tends to use the
signed type 'ssize_t' for its POSIX-only functions (i.e., not in the C
standards).
Annex K has been an attempt of Microsoft to provide safer functions, but
while there are some functions there that have good intentions, most of
them are just badly designed. That annex K is DOA, and will probably be
marked as deprecated in C22 (currently C2x).
I think that a standard should not try to design new functions, and
instead just annotate common usage, as they did in the first ones.
Problems like the ones Annex K suffers could have been detected early if
they had been implemented as an extension to some compiler(s) decade(s)
before being standardized. Therefore, if the implementation passes the
test of time, you standardize it, else not, IMO. Otherwise, we have a
standard that is declared deprecated in the next version of the
standard, similar to what is happening with the C++ standards (which,
guess what, BTW I recently read that they are undeprecating a lot of C
stuff they deprecated in the first standards).
Using unsigned for anything else than bitfileds and similar stuff is
just *wrong*, as you can read from the Google C++ style guide I linked
before.
Another source you can read is this paper from Bjarne Stroustrup:
<http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1428r0.pdf>
This is one of the few cases where I agree with something coming from
him. I hope some day we get a ssizeof operator :-)
Cheers,
Alex
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-09 11:27 ` strlen Alejandro Colomar (man-pages)
@ 2021-07-09 11:43 ` Alejandro Colomar (man-pages)
0 siblings, 0 replies; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-09 11:43 UTC (permalink / raw)
To: Jonny Grant
Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help,
Segher Boessenkool, Xi Ruoyao
On 7/9/21 1:27 PM, Alejandro Colomar (man-pages) wrote:
> Annex K has been an attempt of Microsoft to provide safer functions, but
> while there are some functions there that have good intentions, most of
> them are just badly designed. That annex K is DOA, and will probably be
> marked as deprecated in C22 (currently C2x).
>
> I think that a standard should not try to design new functions, and
> instead just annotate common usage, as they did in the first ones.
> Problems like the ones Annex K suffers could have been detected early if
> they had been implemented as an extension to some compiler(s) decade(s)
> before being standardized. Therefore, if the implementation passes the
> test of time, you standardize it, else not, IMO. Otherwise, we have a
> standard that is declared deprecated in the next version of the
> standard, similar to what is happening with the C++ standards (which,
> guess what, BTW I recently read that they are undeprecating a lot of C
> stuff they deprecated in the first standards).
But let's analyze Annex K (C11):
It has been designed by Microsoft.
MS's compiler (MS Visual Studio) doesn't even fully support C99 yet (and
by that trend, I doubt it never will). At most it supports C89. Visual
Studio has a long history of not supporting C except for those parts
required to implement their C++ compiler. Would you buy a car designed
by a bike manufacturer?
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-07 17:33 ` strlen Alejandro Colomar (man-pages)
@ 2021-07-09 13:48 ` Jonny Grant
0 siblings, 0 replies; 32+ messages in thread
From: Jonny Grant @ 2021-07-09 13:48 UTC (permalink / raw)
To: Alejandro Colomar (man-pages), Michael Kerrisk, Florian Weimer
Cc: linux-man, Libc-alpha
On 07/07/2021 18:33, Alejandro Colomar (man-pages) wrote:
> On 7/7/21 7:23 PM, Alejandro Colomar (man-pages) wrote:
>> How do you think this should be handled?
>> Adding a line in NOTES for every such function? Adding [[gnu::nonnull]] to every such prototype in SYNOPSIS (this might be too noisy)? Else?
>
> As an example of how man pages could look like with the addition of [[gnu::nonnull]], you can have a look at this manual page of mine:
>
> [[
> ...
> SYNOPSIS
> #include <alx/base/stdlib.h>
>
> [[gnu::nonnull]] [[gnu::warn_unused_result]]
> int alx_callocs(type **ptr, ptrdiff_t nmemb);
>
> [[gnu::malloc]] [[gnu::warn_unused_result]]
> void *alx_mallocarray(ptrdiff_t nmemb, ssize_t size);
>
> [[gnu::nonnull]] [[gnu::warn_unused_result]]
> int alx_mallocarrays(type **ptr, ptrdiff_t nmemb);
>
> [[gnu::nonnull]] [[gnu::warn_unused_result]]
> int alx_mallocs(type **ptr, ssize_t nmemb);
>
> [[gnu::warn_unused_result]]
> void *alx_reallocarrayf(void *ptr, ptrdiff_t nmemb, ssize_t nmemb);
>
> [[gnu::nonnull]] [[gnu::warn_unused_result]]
> int alx_reallocarrayfs(type **ptr, ptrdiff_t nmemb);
>
> [[gnu::nonnull]] [[gnu::warn_unused_result]]
> int alx_reallocfs(type **ptr, ssize_t nmemb);
>
> [[gnu::nonnull]] [[gnu::warn_unused_result]]
> int alx_reallocs(type **ptr, ssize_t nmemb);
>
> [[gnu::nonnull]]
> int alx_frees(type **ptr);
> ...
> ]]
>
> Source: <https://github.com/alejandro-colomar/libalx/tree/main/share/man/base/man3>.
May I ask, could a note be added to the NOTES section as well?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-08 23:49 ` strlen Segher Boessenkool
@ 2021-07-09 13:54 ` Jonny Grant
2021-07-09 14:17 ` strlen Alejandro Colomar (man-pages)
0 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-09 13:54 UTC (permalink / raw)
To: Segher Boessenkool, Alejandro Colomar (man-pages)
Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk
On 09/07/2021 00:49, Segher Boessenkool wrote:
> On Thu, Jul 08, 2021 at 01:06:17PM +0200, Alejandro Colomar (man-pages) via Gcc-help wrote:
>> On 7/8/21 12:07 PM, Jonny Grant wrote:
>>> We can't guarantee safestrlen() won't be called with NULL. So because
>>> strlen() itself doesn't check for NULL in C standard we'd need to call the
>>> wrapper so that NULL can be checked for.
>
>>> size_t __attribute__((optimize("O0"))) safestrlen(const char * s)
>>> {
>>> if (NULL == s) return 0;
>>> else return strlen(s);
>>> }
>
>> That also allows differentiating a length of 0 (i.e., "") from an
>> invalid string (i.e., NULL), by returning -1 for NULL.
>
> It is incorrect to return any particular value for strlen(0); not 0, not
> -1, not anything. Since there *is* no string, it doesn't have a length
> either.
>
> So instead of making some function for this, I recommend just writing
> something like
>
> bla = s ? strlen(s) : 0;
Hi Segher
Yes, this could work. But it does rely on programmer typing it like that every time... Maybe an inline function better.
inline size_t safestrlen(const char * s) {return s?strlen(s) : 0}
Perhaps there are too many email addresses on this cc list now.
I'd prefer a Annex K of C11 style function ISO/IEC TR 24731-1 for strlen() - but there isn't one such as strnlen_s.
>
> wherever you need it. If a function name isn't self-explanatory, and
> even *cannot* be, your factoring is most likely not ideal. Code is
> primarily there for humans to read, it should be optimised for that.
>
>
> Segher
> .
Good point
Jonny
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-09 13:54 ` strlen Jonny Grant
@ 2021-07-09 14:17 ` Alejandro Colomar (man-pages)
2021-07-09 16:11 ` strlen Xi Ruoyao
2021-07-10 1:00 ` strlen Segher Boessenkool
0 siblings, 2 replies; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-09 14:17 UTC (permalink / raw)
To: Jonny Grant, Segher Boessenkool
Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk
Hi Jonny, Segher,
On 7/9/21 3:54 PM, Jonny Grant wrote:
> Yes, this could work. But it does rely on programmer typing it like that every time... Maybe an inline function better.
I agree on that.
>
> inline size_t safestrlen(const char * s) {return s?strlen(s) : 0}
>
> Perhaps there are too many email addresses on this cc list now.
>
> I'd prefer a Annex K of C11 style function ISO/IEC TR 24731-1 for strlen() - but there isn't one such as strnlen_s.
Please, consider not calling some function safesomething() or similar,
as it isn't 100% safe. It's like calling some thing "the new X". How
will you call the next version? "the nova X"? And the next? "the
supernew X"?
As I said before, unsigned types are unsafe, you may want to accept it
or not, but they are.
Now, the day you realize that and develop an even safer function that
doesn't use unsigned size_t, what will you call it? supersafestrlen()?
Use names that define your functions as closely as possible.
>
>
> On 09/07/2021 00:49, Segher Boessenkool wrote:
>> wherever you need it. If a function name isn't self-explanatory, and
Agree on this
>> even *cannot* be, your factoring is most likely not ideal. Code is
But not on this.
You could call it strlennull(), that is, a strlen() that special-cases
NULL. I find it a good enough name, as long as you document your function.
It saves the problem of repeating yourself every time.
>> primarily there for humans to read, it should be optimised for that.
>>
Agree on this again, but I think the following is readable:
len = strlennull(maybenull);
Regards,
Alex
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-09 14:17 ` strlen Alejandro Colomar (man-pages)
@ 2021-07-09 16:11 ` Xi Ruoyao
2021-07-10 1:00 ` strlen Segher Boessenkool
1 sibling, 0 replies; 32+ messages in thread
From: Xi Ruoyao @ 2021-07-09 16:11 UTC (permalink / raw)
To: Alejandro Colomar (man-pages), Jonny Grant, Segher Boessenkool
Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk
On Fri, 2021-07-09 at 16:17 +0200, Alejandro Colomar (man-pages) via
Gcc-help wrote:
> Hi Jonny, Segher,
>
> On 7/9/21 3:54 PM, Jonny Grant wrote:
> > Yes, this could work. But it does rely on programmer typing it like
> > that every time... Maybe an inline function better.
>
> I agree on that.
>
> >
> > inline size_t safestrlen(const char * s) {return s?strlen(s) : 0}
> >
> > Perhaps there are too many email addresses on this cc list now.
I think the discussion at least has nothing to do with linux-man or gcc-
help: man pages just describe the existing API (C or POSIX or Linux
specific), and GCC just compiles code and doesn't care what the API is.
Neither is a place to discuss "how to design an API".
And I think Jonny should discuss the API design with the users of his
API (maybe his collegue or downstream developers), instead of some
random guys in mail list. The users are the ones who will call his
function anyway so it's better to choose an API they like. Yes, Jonny
can "force" the users to do something for safety, but this decision
should also be discussed with them and documented. Or they won't
understand the decision, and may "invent" or "improvise" some "new
wheels", breaking Jonny's design.
For example, I don't like a function silently treats NULL as an empty
string. I prefer a function to abort() or print a log "strlen_checked()
is called with NULL, there is a bug in your code" when I (mis)use NULL.
But it's just my 2 cents: if the potential users of the API agree the
function to act as that, then it's good to go.
--
Xi Ruoyao <xry111@mengyan1223.wang>
School of Aerospace Science and Technology, Xidian University
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
[not found] ` <59a70222-a46f-1e65-c9db-6c9e577c8adc@126.com>
@ 2021-07-09 17:26 ` Martin Sebor
2021-07-09 20:19 ` strlen Alejandro Colomar (man-pages)
0 siblings, 1 reply; 32+ messages in thread
From: Martin Sebor @ 2021-07-09 17:26 UTC (permalink / raw)
To: LIU Hao, johnsfine, gcc-help, jg; +Cc: linux-man, fw, mtk.manpages
On 7/8/21 8:08 PM, LIU Hao via Gcc-help wrote:
> 在 7/8/21 8:05 PM, johnsfine--- via Gcc-help 写道:
>> This is not the forum for such a discussion. But I want to make
>> people reading this aware that many expert C and C++ programmers
>> (likely a majority) consider that advice to avoid unsigned types to be
>> horrible advice.
>> I advise people to avoid signed types and I do so myself. If an
>> integer value won't be negative, it shouldn't be signed. That makes
>> your intent clearer to anyone reading your code, and (especially in
>> x86-64) lets the compiler generate smaller and faster code.
>
> That makes no sense. Would you prefer unsigned integers to signed ones,
> for something that can only be one of {1,2,3,4}? Just because something
> can't be negative, does not mean it should be unsigned.
There are benefits to making that explicit by choosing an unsigned
type: the result of converting a narrower unsigned type to a wider
unsigned type is unchanged. The result of converting it to a wider
signed type may change. This has an impact on value range propagation
which in turn can influence other optimizations as well as warnings.
Choosing an unsigned type has liabilities as well. Because unsigned
arithmetic wraps around zero whereas signed arithmetic does not (it
overflows which is undefined), its results are less constrained than
those of signed arithmetic.
Martin
>
> Conversion between `uint64_t` and `double` is much slower than `int64_t`
> on some platforms, e.g. x86, so 'smaller and faster code' is also
> groundless [1].
>
>
> A signed integer is a value that denotes the number or amount of
> something, such as bytes, characters, files, seconds, kilometers, or
> even dollars. An unsigned integer is merely a fixed-sized collection of
> bits. That's the only fundamental difference.
>
>
> [1] https://gcc.godbolt.org/z/6YY61rhGe
>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-09 17:26 ` strlen Martin Sebor
@ 2021-07-09 20:19 ` Alejandro Colomar (man-pages)
0 siblings, 0 replies; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-09 20:19 UTC (permalink / raw)
To: Martin Sebor, LIU Hao, johnsfine, gcc-help, jg
Cc: linux-man, fw, mtk.manpages
On 7/9/21 7:26 PM, Martin Sebor wrote:
> On 7/8/21 8:08 PM, LIU Hao via Gcc-help wrote:
>> 在 7/8/21 8:05 PM, johnsfine--- via Gcc-help 写道:
>>> This is not the forum for such a discussion. But I want to make
>>> people reading this aware that many expert C and C++ programmers
>>> (likely a majority) consider that advice to avoid unsigned types to
>>> be horrible advice.
I'm sorry if it's not the right place, I could remove the lists
from the CC if it's too noisy, but I think an important point is
here, and a couple of emails won't damage too much the mailing lists.
On the other hand, I consider bad etiquette removing CCs from
a discussion.
>>> I advise people to avoid signed types and I do so myself. If an
>>> integer value won't be negative, it shouldn't be signed. That makes
>>> your intent clearer to anyone reading your code, and (especially in
>>> x86-64) lets the compiler generate smaller and faster code.
As others have showed with facts, and the Google style guide also hints,
using unsigned types just removes the opportunity of the compiler
to optimize on overflow, because it has to account for wrapping around.
>>
>> That makes no sense. Would you prefer unsigned integers to signed
>> ones, for something that can only be one of {1,2,3,4}? Just because
>> something can't be negative, does not mean it should be unsigned.
That.
The same way that because a number is never going to be
greater than 100 is it any better to use [u]int256_t.
Just use int, unless you have a reason not to.
Please John, read the paper from Bjarne about unsigned types,
it really covers a lot. If you still disagree after reading that,
feel free to argument it.
<http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1428r0.pdf>
>
> There are benefits to making that explicit by choosing an unsigned
> type: the result of converting a narrower unsigned type to a wider
> unsigned type is unchanged. The result of converting it to a wider
> signed type may change. This has an impact on value range propagation
> which in turn can influence other optimizations as well as warnings.
That problem with implicit conversions to larger types is not a
problem of signed integers, not even a problem of unsigned integers.
It's a problem of mixing both.
If you use signed integers for everything, you won't have that problem.
Alex
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-07 16:57 ` strlen Alejandro Colomar (man-pages)
2021-07-07 17:23 ` strlen Alejandro Colomar (man-pages)
2021-07-08 10:07 ` strlen Jonny Grant
@ 2021-07-09 20:44 ` Jonny Grant
2021-07-10 18:37 ` strlen Alejandro Colomar (man-pages)
2 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-09 20:44 UTC (permalink / raw)
To: Alejandro Colomar (man-pages); +Cc: linux-man, Michael Kerrisk, Florian Weimer
On 07/07/2021 17:57, Alejandro Colomar (man-pages) wrote:
> On 7/7/21 3:31 PM, Jonny Grant wrote:
[snip]
>>
>>
>>
>> Maybe the man page could just state:
>>
>>
>> NOTES
>>
>> The calling strlen with a NULL pointer is undefined behavior.
>
> Okay. I agree that should probably be documented.
> I'm surprised it's not documented already. Not even in the glibc manual (or I couldn't find it).
>
> There are a lot of functions that should get this addition, though. I'd like to patch them all at once. I'll try to find a list of functions documented in the man pages and that have nonnull in the oimplementation. If I don't come back soon with a list, please ping me.
>
> Thanks,
>
> Alex
>
Perhaps the NOTES section on strlen(3) could also give a hint that strnlen would be better to use than strlen if the max buffer size is known. Likewise suggestion the same for wcslen(3) could suggest wcsnlen(3) where the buffer size is know.
With kind regards
Jonny
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-09 14:17 ` strlen Alejandro Colomar (man-pages)
2021-07-09 16:11 ` strlen Xi Ruoyao
@ 2021-07-10 1:00 ` Segher Boessenkool
1 sibling, 0 replies; 32+ messages in thread
From: Segher Boessenkool @ 2021-07-10 1:00 UTC (permalink / raw)
To: Alejandro Colomar (man-pages)
Cc: Jonny Grant, gcc-help, linux-man, Florian Weimer, Michael Kerrisk
Fine, I'll bite :-)
On Fri, Jul 09, 2021 at 04:17:08PM +0200, Alejandro Colomar (man-pages) wrote:
> On 7/9/21 3:54 PM, Jonny Grant wrote:
> >Yes, this could work. But it does rely on programmer typing it like that
> >every time... Maybe an inline function better.
>
> I agree on that.
A function (or any other abstraction) can be fine for this, *iff* you
can make people use it correctly. Since it is pretty much impossible to
give a good succinct name to this function, I posit that is not the
case. Please feel free to prove me wrong (by coming up with a decent
name for it).
> >I'd prefer a Annex K of C11 style function ISO/IEC TR 24731-1 for strlen()
> >- but there isn't one such as strnlen_s.
>
> Please, consider not calling some function safesomething() or similar,
> as it isn't 100% safe. It's like calling some thing "the new X". How
> will you call the next version? "the nova X"? And the next? "the
> supernew X"?
>
> As I said before, unsigned types are unsafe, you may want to accept it
> or not, but they are.
I thought Annex K was great entertainment, but calling unsigned types
"unsafe" takes the cake.
Unsigned types are Z/nZ with n some power of two. Signed types are not
even Z (overflow is undefined). Unsigned types are useful for
describing many machine things. They are useful for sizes, not only
because sizes cannot be negative, but also because sizes can be bigger
than the maximum positive number that can fit in the same size signed
number. They are useful for bitty things, registers maybe, stuff in
memory, or device I/O registers. And they are much more useful than C
signed numbers for holding memory addresses, where you need that (you
can do sane aritmetic on it).
Using unsigned types without range checking is often wrong ("unsafe" in
your words). Using signed types without range checking is just as wrong
in the same cases, if not more (overflow is undefined). At least in the
"unsigned" case it is *possible* its behaviour is what the programmer
intended!
> Agree on this again, but I think the following is readable:
>
> len = strlennull(maybenull);
If you use it a million times, of course you can give it a short and
non-sensical name, and expect the users to learn what it means. If not,
it is better to be slightly more verbose, and reduce the mental load.
Segher
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-09 20:44 ` strlen Jonny Grant
@ 2021-07-10 18:37 ` Alejandro Colomar (man-pages)
2021-07-10 20:49 ` strlen Jonny Grant
0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-10 18:37 UTC (permalink / raw)
To: Jonny Grant
Cc: linux-man, Michael Kerrisk, Florian Weimer,
Alejandro Colomar (man-pages)
Hi Jonny,
On 7/9/21 10:44 PM, Jonny Grant wrote:
>
>
> On 07/07/2021 17:57, Alejandro Colomar (man-pages) wrote:
>> On 7/7/21 3:31 PM, Jonny Grant wrote:
> [snip]
>>>
>>>
>>>
>>> Maybe the man page could just state:
>>>
>>>
>>> NOTES
>>>
>>> The calling strlen with a NULL pointer is undefined behavior.
>>
>> Okay. I agree that should probably be documented.
>> I'm surprised it's not documented already. Not even in the glibc manual (or I couldn't find it).
>>
>> There are a lot of functions that should get this addition, though. I'd like to patch them all at once. I'll try to find a list of functions documented in the man pages and that have nonnull in the oimplementation. If I don't come back soon with a list, please ping me.
>>
>> Thanks,
>>
>> Alex
>>
>
> Perhaps the NOTES section on strlen(3) could also give a hint that strnlen would be better to use than strlen if the max buffer size is known. Likewise suggestion the same for wcslen(3) could suggest wcsnlen(3) where the buffer size is know.
Agreed.
I applied the following patch.
Kind regards,
Alex
---
From a9ab4fdd530486450b84137dce1d869f6cbfcbe0 Mon Sep 17 00:00:00 2001
From: Alejandro Colomar <alx.manpages@gmail.com>
Date: Sat, 10 Jul 2021 20:34:59 +0200
Subject: strlen.3, wcslen.3: Add recommendations for safer variants
Reported-by: Jonny Grant <jg@jguk.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
---
man3/strlen.3 | 6 ++++++
man3/wcslen.3 | 9 ++++++++-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/man3/strlen.3 b/man3/strlen.3
index dea4c1050..fb734db1b 100644
--- a/man3/strlen.3
+++ b/man3/strlen.3
@@ -66,6 +66,12 @@ T} Thread safety MT-Safe
.sp 1
.SH CONFORMING TO
POSIX.1-2001, POSIX.1-2008, C89, C99, C11, SVr4, 4.3BSD.
+.SH NOTES
+.SS strnlen(3)
+If there is a known buffer size,
+it is probably better to use
+.BR strnlen (3),
+which can prevent some cases of buffer overrun/overflow.
.SH SEE ALSO
.BR string (3),
.BR strnlen (3),
diff --git a/man3/wcslen.3 b/man3/wcslen.3
index af3fcb9ca..868f748a8 100644
--- a/man3/wcslen.3
+++ b/man3/wcslen.3
@@ -58,5 +58,12 @@ T} Thread safety MT-Safe
.sp 1
.SH CONFORMING TO
POSIX.1-2001, POSIX.1-2008, C99.
+.SH NOTES
+.SS wcsnlen(3)
+If there is a known buffer size,
+it is probably better to use
+.BR wcsnlen (3),
+which can prevent some cases of buffer overrun/overflow.
.SH SEE ALSO
-.BR strlen (3)
+.BR strlen (3),
+.BR wcsnlen (3)
--
2.32.0
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-10 18:37 ` strlen Alejandro Colomar (man-pages)
@ 2021-07-10 20:49 ` Jonny Grant
2021-07-10 21:36 ` strlen Alejandro Colomar (man-pages)
0 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-10 20:49 UTC (permalink / raw)
To: Alejandro Colomar (man-pages); +Cc: linux-man, Michael Kerrisk, Florian Weimer
On 10/07/2021 19:37, Alejandro Colomar (man-pages) wrote:
> Hi Jonny,
>
> On 7/9/21 10:44 PM, Jonny Grant wrote:
>>
>>
>> On 07/07/2021 17:57, Alejandro Colomar (man-pages) wrote:
>>> On 7/7/21 3:31 PM, Jonny Grant wrote:
>> [snip]
>>>>
>>>>
>>>>
>>>> Maybe the man page could just state:
>>>>
>>>>
>>>> NOTES
>>>>
>>>> The calling strlen with a NULL pointer is undefined behavior.
>>>
>>> Okay. I agree that should probably be documented.
>>> I'm surprised it's not documented already. Not even in the glibc manual (or I couldn't find it).
>>>
>>> There are a lot of functions that should get this addition, though. I'd like to patch them all at once. I'll try to find a list of functions documented in the man pages and that have nonnull in the oimplementation. If I don't come back soon with a list, please ping me.
>>>
>>> Thanks,
>>>
>>> Alex
>>>
>>
>> Perhaps the NOTES section on strlen(3) could also give a hint that strnlen would be better to use than strlen if the max buffer size is known. Likewise suggestion the same for wcslen(3) could suggest wcsnlen(3) where the buffer size is know.
>
> Agreed.
>
> I applied the following patch.
>
> Kind regards,
>
> Alex
>
> ---
>>From a9ab4fdd530486450b84137dce1d869f6cbfcbe0 Mon Sep 17 00:00:00 2001
> From: Alejandro Colomar <alx.manpages@gmail.com>
> Date: Sat, 10 Jul 2021 20:34:59 +0200
> Subject: strlen.3, wcslen.3: Add recommendations for safer variants
>
> Reported-by: Jonny Grant <jg@jguk.org>
> Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
> ---
> man3/strlen.3 | 6 ++++++
> man3/wcslen.3 | 9 ++++++++-
> 2 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/man3/strlen.3 b/man3/strlen.3
> index dea4c1050..fb734db1b 100644
> --- a/man3/strlen.3
> +++ b/man3/strlen.3
> @@ -66,6 +66,12 @@ T} Thread safety MT-Safe
> .sp 1
> .SH CONFORMING TO
> POSIX.1-2001, POSIX.1-2008, C89, C99, C11, SVr4, 4.3BSD.
> +.SH NOTES
> +.SS strnlen(3)
> +If there is a known buffer size,
> +it is probably better to use
> +.BR strnlen (3),
> +which can prevent some cases of buffer overrun/overflow.
> .SH SEE ALSO
> .BR string (3),
> .BR strnlen (3),
> diff --git a/man3/wcslen.3 b/man3/wcslen.3
> index af3fcb9ca..868f748a8 100644
> --- a/man3/wcslen.3
> +++ b/man3/wcslen.3
> @@ -58,5 +58,12 @@ T} Thread safety MT-Safe
> .sp 1
> .SH CONFORMING TO
> POSIX.1-2001, POSIX.1-2008, C99.
> +.SH NOTES
> +.SS wcsnlen(3)
> +If there is a known buffer size,
> +it is probably better to use
> +.BR wcsnlen (3),
> +which can prevent some cases of buffer overrun/overflow.
> .SH SEE ALSO
> -.BR strlen (3)
> +.BR strlen (3),
> +.BR wcsnlen (3)
>
Hi Alex
Thank you for making the updates!
As "buffer overrun" refers to writing to a buffer, my 2 cents would be to express as:
"which will prevent reading beyond the end of the character buffer"
Any thoughts about adding the following?
NOTES
Calling strlen with a NULL pointer is undefined behavior.
With kind regards
Jonny
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-10 20:49 ` strlen Jonny Grant
@ 2021-07-10 21:36 ` Alejandro Colomar (man-pages)
2021-07-12 21:16 ` strlen Jonny Grant
0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-10 21:36 UTC (permalink / raw)
To: Jonny Grant; +Cc: linux-man, Michael Kerrisk, Florian Weimer
On 7/10/21 10:49 PM, Jonny Grant wrote:
> On 10/07/2021 19:37, Alejandro Colomar (man-pages) wrote:
>> diff --git a/man3/wcslen.3 b/man3/wcslen.3
>> index af3fcb9ca..868f748a8 100644
>> --- a/man3/wcslen.3
>> +++ b/man3/wcslen.3
>> @@ -58,5 +58,12 @@ T} Thread safety MT-Safe
>> .sp 1
>> .SH CONFORMING TO
>> POSIX.1-2001, POSIX.1-2008, C99.
>> +.SH NOTES
>> +.SS wcsnlen(3)
>> +If there is a known buffer size,
>> +it is probably better to use
>> +.BR wcsnlen (3),
>> +which can prevent some cases of buffer overrun/overflow.
>> .SH SEE ALSO
>> -.BR strlen (3)
>> +.BR strlen (3),
>> +.BR wcsnlen (3)
>>
>
> Hi Alex
>
> Thank you for making the updates!
>
> As "buffer overrun" refers to writing to a buffer, my 2 cents would be to express as:
>
> "which will prevent reading beyond the end of the character buffer"
I wrote both overrun and overflow on purpose.
I was thinking of something like:
const char *src;
char dest[5];
src = "This is a very ... long valid string";
len = strnlen(src, 5 - 1);
/* strlen(s) wouldn't overrun, but the next call would overflow */
memcpy(dest, src, len);
dest[len] = '\0';
But after thinking about it a second time, I think I'll remove it, and
keep only overrun (and I like your text, I'll copy part of it), as the
overflow is a problem of considering that the size of the buffers is
going to be the same, and the solution is not to use strnlen(3), but to
differentiate both sizes, which allows to detect truncation.
If the input string is known but the output buffer is small, I'd call
strlen(str) and then MIN(len, bufsiz) (this is kind of equivalent to
what strlcpy(3bsd) does).
And if the input string is untrusted, I'd call strnlen(str, strsiz) and
MIN(len, bufsiz) (this is kind of what strncpy_s(3) does).
>
> Any thoughts about adding the following?
>
> NOTES
> Calling strlen with a NULL pointer is undefined behavior.
I'm waiting for Michael and/or Florian to comment on my proposal.
I don't want to fix a page and have 296 broken... I want to fix them
all at once, but am not sure which solution to apply.
But yes, adding a line like that one is likely.
Thanks!
Alex
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: strlen
2021-07-10 21:36 ` strlen Alejandro Colomar (man-pages)
@ 2021-07-12 21:16 ` Jonny Grant
0 siblings, 0 replies; 32+ messages in thread
From: Jonny Grant @ 2021-07-12 21:16 UTC (permalink / raw)
To: Alejandro Colomar (man-pages); +Cc: linux-man
On 10/07/2021 22:36, Alejandro Colomar (man-pages) wrote:
> On 7/10/21 10:49 PM, Jonny Grant wrote:
>> On 10/07/2021 19:37, Alejandro Colomar (man-pages) wrote:
>>> diff --git a/man3/wcslen.3 b/man3/wcslen.3
>>> index af3fcb9ca..868f748a8 100644
>>> --- a/man3/wcslen.3
>>> +++ b/man3/wcslen.3
>>> @@ -58,5 +58,12 @@ T} Thread safety MT-Safe
>>> .sp 1
>>> .SH CONFORMING TO
>>> POSIX.1-2001, POSIX.1-2008, C99.
>>> +.SH NOTES
>>> +.SS wcsnlen(3)
>>> +If there is a known buffer size,
>>> +it is probably better to use
>>> +.BR wcsnlen (3),
>>> +which can prevent some cases of buffer overrun/overflow.
>>> .SH SEE ALSO
>>> -.BR strlen (3)
>>> +.BR strlen (3),
>>> +.BR wcsnlen (3)
>>>
>>
>> Hi Alex
>>
>> Thank you for making the updates!
>>
>> As "buffer overrun" refers to writing to a buffer, my 2 cents would be to express as:
>>
>> "which will prevent reading beyond the end of the character buffer"
>
> I wrote both overrun and overflow on purpose.
> I was thinking of something like:
>
> const char *src;
> char dest[5];
>
> src = "This is a very ... long valid string";
> len = strnlen(src, 5 - 1);
> /* strlen(s) wouldn't overrun, but the next call would overflow */
> memcpy(dest, src, len);
> dest[len] = '\0';
Sounds like any overflow would be due to the application code, rather than strnlen.
>
> But after thinking about it a second time, I think I'll remove it, and keep only overrun (and I like your text, I'll copy part of it), as the overflow is a problem of considering that the size of the buffers is going to be the same, and the solution is not to use strnlen(3), but to differentiate both sizes, which allows to detect truncation.
>
> If the input string is known but the output buffer is small, I'd call strlen(str) and then MIN(len, bufsiz) (this is kind of equivalent to what strlcpy(3bsd) does).
> And if the input string is untrusted, I'd call strnlen(str, strsiz) and MIN(len, bufsiz) (this is kind of what strncpy_s(3) does).
>
>
strlcpy is considered harmful, best avoid it. It breaks ISO TR24731 "Do not unexpectedly truncate strings" by silently overwriting memory before checking there was enough space to copy all the bytes.
>
>>
>> Any thoughts about adding the following?
>>
>> NOTES
>> Calling strlen with a NULL pointer is undefined behavior.
>
>
> I'm waiting for Michael and/or Florian to comment on my proposal.
>
> I don't want to fix a page and have 296 broken... I want to fix them all at once, but am not sure which solution to apply.
>
> But yes, adding a line like that one is likely.
>
> Thanks!
>
> Alex
Ok, no immediate rush.
Jonny
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2021-07-12 21:16 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-04 18:01 strlen Jonny Grant
2020-09-04 19:21 ` strlen Florian Weimer
2020-09-04 23:14 ` strlen Jonny Grant
2020-09-05 7:12 ` strlen Florian Weimer
2021-07-06 20:30 ` strlen Jonny Grant
2021-07-06 22:11 ` strlen Florian Weimer
2021-07-07 11:36 ` strlen Jonny Grant
2021-07-07 12:22 ` strlen Alejandro Colomar (man-pages)
2021-07-07 12:31 ` strlen Alejandro Colomar (man-pages)
2021-07-07 13:31 ` strlen Jonny Grant
2021-07-07 16:57 ` strlen Alejandro Colomar (man-pages)
2021-07-07 17:23 ` strlen Alejandro Colomar (man-pages)
2021-07-07 17:33 ` strlen Alejandro Colomar (man-pages)
2021-07-09 13:48 ` strlen Jonny Grant
2021-07-08 10:07 ` strlen Jonny Grant
2021-07-08 11:06 ` strlen Alejandro Colomar (man-pages)
2021-07-08 12:13 ` strlen Xi Ruoyao
2021-07-08 23:49 ` strlen Segher Boessenkool
2021-07-09 13:54 ` strlen Jonny Grant
2021-07-09 14:17 ` strlen Alejandro Colomar (man-pages)
2021-07-09 16:11 ` strlen Xi Ruoyao
2021-07-10 1:00 ` strlen Segher Boessenkool
2021-07-09 10:50 ` strlen Jonny Grant
2021-07-09 11:27 ` strlen Alejandro Colomar (man-pages)
2021-07-09 11:43 ` strlen Alejandro Colomar (man-pages)
[not found] ` <1627912755.3783669.1625745946723@mail.yahoo.com>
[not found] ` <59a70222-a46f-1e65-c9db-6c9e577c8adc@126.com>
2021-07-09 17:26 ` strlen Martin Sebor
2021-07-09 20:19 ` strlen Alejandro Colomar (man-pages)
2021-07-09 20:44 ` strlen Jonny Grant
2021-07-10 18:37 ` strlen Alejandro Colomar (man-pages)
2021-07-10 20:49 ` strlen Jonny Grant
2021-07-10 21:36 ` strlen Alejandro Colomar (man-pages)
2021-07-12 21:16 ` strlen Jonny Grant
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).