All of lore.kernel.org
 help / color / mirror / Atom feed
* strlen
@ 2020-09-04 18:01 Jonny Grant
  2020-09-04 19:21 ` strlen Florian Weimer
  0 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2020-09-04 18:01 UTC (permalink / raw)
  To: linux-man; +Cc: Michael Kerrisk

Hello

https://man7.org/linux/man-pages/man3/strlen.3.html

Is it possible to clarify :-

* glibc will SIGSEGV if 's' is NULL
* there are no ERROR returns

Jonny

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2020-09-04 18:01 strlen Jonny Grant
@ 2020-09-04 19:21 ` Florian Weimer
  2020-09-04 23:14   ` strlen Jonny Grant
  0 siblings, 1 reply; 32+ messages in thread
From: Florian Weimer @ 2020-09-04 19:21 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man, Michael Kerrisk

* Jonny Grant:

> Hello
>
> https://man7.org/linux/man-pages/man3/strlen.3.html
>
> Is it possible to clarify :-
>
> * glibc will SIGSEGV if 's' is NULL
> * there are no ERROR returns

That would be misleading.  Whether strlen (NULL) is undefined also
depends on the compiler.  They know that the argument cannot be zero
and optimize accordingly.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2020-09-04 19:21 ` strlen Florian Weimer
@ 2020-09-04 23:14   ` Jonny Grant
  2020-09-05  7:12     ` strlen Florian Weimer
  0 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2020-09-04 23:14 UTC (permalink / raw)
  To: Florian Weimer; +Cc: linux-man, Michael Kerrisk



On 04/09/2020 20:21, Florian Weimer wrote:
> * Jonny Grant:
> 
>> Hello
>>
>> https://man7.org/linux/man-pages/man3/strlen.3.html
>>
>> Is it possible to clarify :-
>>
>> * glibc will SIGSEGV if 's' is NULL
>> * there are no ERROR returns
> 
> That would be misleading.  Whether strlen (NULL) is undefined also
> depends on the compiler.  They know that the argument cannot be zero
> and optimize accordingly.
> 

Hi,

Do you know a compiler that has a different behaviour? I only tested gcc and clang. How would a compiler optimise?

The header has non-null attribute for the compiler (ie gcc) to use to give a nice warning

str.c:6:8: warning: null argument where non-null required (argument 1) [-Wnonnull]
    6 | return strlen(NULL);

However, if the implementation gets NULL it still crashes.

I was using -fno-builtin-strlen so it was glibc, but the __builtin_strlen ha SEGV just like glibc

clang is the same as gcc.

So it feels like it's pretty clear, there is no NULL check, can this be documented? 

Jonny

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2020-09-04 23:14   ` strlen Jonny Grant
@ 2020-09-05  7:12     ` Florian Weimer
  2021-07-06 20:30       ` strlen Jonny Grant
  0 siblings, 1 reply; 32+ messages in thread
From: Florian Weimer @ 2020-09-05  7:12 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man, Michael Kerrisk

* Jonny Grant:

> On 04/09/2020 20:21, Florian Weimer wrote:
>> * Jonny Grant:
>> 
>>> Hello
>>>
>>> https://man7.org/linux/man-pages/man3/strlen.3.html
>>>
>>> Is it possible to clarify :-
>>>
>>> * glibc will SIGSEGV if 's' is NULL
>>> * there are no ERROR returns
>> 
>> That would be misleading.  Whether strlen (NULL) is undefined also
>> depends on the compiler.  They know that the argument cannot be zero
>> and optimize accordingly.
>> 
>
> Hi,
>
> Do you know a compiler that has a different behaviour? I only tested
> gcc and clang. How would a compiler optimise?

Here's an example:

#include <stddef.h>
#include <stdio.h>
#include <string.h>

void
f (const char *str)
{
  strlen (str);
  if (str == NULL)
    puts ("str is NULL");
}

int
main (void)
{
  f (NULL);
  return 0;
}

When built with -O2, GCC 8 prints nothing, and there is no crash.

My point it's not just the C *library* that makes strlen (NULL)
undefined.  It's undefined according to the language, and even if we
changed the glibc implementation, things would still go wrong.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2020-09-05  7:12     ` strlen Florian Weimer
@ 2021-07-06 20:30       ` Jonny Grant
  2021-07-06 22:11         ` strlen Florian Weimer
  0 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-06 20:30 UTC (permalink / raw)
  To: Florian Weimer; +Cc: linux-man, Michael Kerrisk



On 05/09/2020 08:12, Florian Weimer wrote:
> * Jonny Grant:
> 
>> On 04/09/2020 20:21, Florian Weimer wrote:
>>> * Jonny Grant:
>>>
>>>> Hello
>>>>
>>>> https://man7.org/linux/man-pages/man3/strlen.3.html
>>>>
>>>> Is it possible to clarify :-
>>>>
>>>> * glibc will SIGSEGV if 's' is NULL
>>>> * there are no ERROR returns
>>>
>>> That would be misleading.  Whether strlen (NULL) is undefined also
>>> depends on the compiler.  They know that the argument cannot be zero
>>> and optimize accordingly.
>>>
>>
>> Hi,
>>
>> Do you know a compiler that has a different behaviour? I only tested
>> gcc and clang. How would a compiler optimise?
> 
> Here's an example:
> 
> #include <stddef.h>
> #include <stdio.h>
> #include <string.h>
> 
> void
> f (const char *str)
> {
>   strlen (str);
>   if (str == NULL)
>     puts ("str is NULL");
> }
> 
> int
> main (void)
> {
>   f (NULL);
>   return 0;
> }
> 
> When built with -O2, GCC 8 prints nothing, and there is no crash.
> 
> My point it's not just the C *library* that makes strlen (NULL)
> undefined.  It's undefined according to the language, and even if we
> changed the glibc implementation, things would still go wrong.
> 

Hi Florian, Apologies for tardy response time.

The reason it does not crash appears to be because of this warning which removes the call to strlen due to the return not being checked?

strlen.c:11:3: warning: statement with no effect [-Wunused-value]
   11 |   strlen (str);
      |   ^~~~~~~~~~~~

https://godbolt.org/z/caoes5nGa


.LC0:
        .string "str is NULL"
f(char const*):
        test    rdi, rdi
        je      .L4
        ret
.L4:
        mov     edi, OFFSET FLAT:.LC0
        jmp     puts
main:
        sub     rsp, 8
        mov     edi, OFFSET FLAT:.LC0
        call    puts
        xor     eax, eax
        add     rsp, 8
        ret



Changing to this, it does crash


#include <stddef.h>
#include <stdio.h>
#include <string.h>

void
f (const char *str)
{
  size_t len = strlen (str);
  printf("len %zu\n", len);
  if (str == NULL)
    puts ("str is NULL");
}

int
main (void)
{
  f (NULL);
  return 0;
}



I could not see why in your original example it doesn't output as expected as the assembler does show it 'jmp puts' although seems to have optimised and added that code to 'main' as well?

I had expected to see:
$ ./strlen
str is NULL


Cheers, Jonny

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-06 20:30       ` strlen Jonny Grant
@ 2021-07-06 22:11         ` Florian Weimer
  2021-07-07 11:36           ` strlen Jonny Grant
  0 siblings, 1 reply; 32+ messages in thread
From: Florian Weimer @ 2021-07-06 22:11 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man, Michael Kerrisk

* Jonny Grant:

> The reason it does not crash appears to be because of this warning
> which removes the call to strlen due to the return not being
> checked?

GCC uses the information that the argument to strlen cannot be null on
that particular path.

> strlen.c:11:3: warning: statement with no effect [-Wunused-value]
>    11 |   strlen (str);
>       |   ^~~~~~~~~~~~
>
> https://godbolt.org/z/caoes5nGa

That site probably uses different library headers.

As posted, with Debian's GCC 8.3, I get this for the main function:

main:
	xorl	%eax, %eax
	ret

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-06 22:11         ` strlen Florian Weimer
@ 2021-07-07 11:36           ` Jonny Grant
  2021-07-07 12:22             ` strlen Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-07 11:36 UTC (permalink / raw)
  To: Florian Weimer; +Cc: linux-man, Michael Kerrisk, Jonny Grant



On 06/07/2021 23:11, Florian Weimer wrote:
> * Jonny Grant:
> 
>> The reason it does not crash appears to be because of this warning
>> which removes the call to strlen due to the return not being
>> checked?
> 
> GCC uses the information that the argument to strlen cannot be null on
> that particular path.

It's a shame GCC doesn't give a warning

It may be GCC is using '__builtin_strlen'

<string.h> marks the param as nonnull. However, I am surprised this does not trigger the GCC warning -Werror=nonnull

/* Return the length of S.  */
extern size_t strlen (const char *__s)
     __THROW __attribute_pure__ __nonnull ((1));

Perhaps that is just a macro that is not actually used......

If I add another function -Werror=nonnull does give a warning
void test(const char * const p) __attribute__((nonnull));

https://godbolt.org/z/x37sbfWaG

<source>:15:9: error: argument 1 null where non-null expected [-Werror=nonnull]
   15 |     test(NULL);

> 
>> strlen.c:11:3: warning: statement with no effect [-Wunused-value]
>>    11 |   strlen (str);
>>       |   ^~~~~~~~~~~~
>>
>> https://godbolt.org/z/caoes5nGa
> 
> That site probably uses different library headers.
> 
> As posted, with Debian's GCC 8.3, I get this for the main function:
> 
> main:
> 	xorl	%eax, %eax
> 	ret
> 


In that case maybe https://man7.org/linux/man-pages/man3/strlen.3.html should have a "NOTES" section stating something similar to your wording ?

NOTES

The behavior of strlen(NULL) is depends on different things. It may cause a SEGV exception, or the compiler may optimize and remove the code path handling NULL.


Cheers, Jonny

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-07 11:36           ` strlen Jonny Grant
@ 2021-07-07 12:22             ` Alejandro Colomar (man-pages)
  2021-07-07 12:31               ` strlen Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-07 12:22 UTC (permalink / raw)
  To: Jonny Grant, Florian Weimer; +Cc: linux-man, Michael Kerrisk

Hello Jonny,

On 7/7/21 1:36 PM, Jonny Grant wrote:
> 
> 
> On 06/07/2021 23:11, Florian Weimer wrote:
>> * Jonny Grant:
>>
>>> The reason it does not crash appears to be because of this warning
>>> which removes the call to strlen due to the return not being
>>> checked?
>>
>> GCC uses the information that the argument to strlen cannot be null on
>> that particular path.
> 
> It's a shame GCC doesn't give a warning
> 
> It may be GCC is using '__builtin_strlen'
> 
> <string.h> marks the param as nonnull. However, I am surprised this does not trigger the GCC warning -Werror=nonnull

[[gnu::nunnull]], aka __attribute__((nonnull)), is only triggered by 
compile time NULL.  That is: passing NULL (either hardcoded or through a 
macro)

If after optimizations the compiler realizes that an argument is NULL at 
compile time, it may (but also may not) trigger the warning.  However, 
if you optimize too much, the expression may vanish, and the warning be 
silenced again.  The -fanalyzer option of GCC may (but also may not) 
realize of more NULL cases.

I managed to trigger the following, but only with '-O0 -fanalyzer':

[[
<source>:17:20: error: use of NULL 's' where non-null expected [CWE-476] 
[-Werror=analyzer-null-argument]
    17 |     int x = strlen (s);
       |             ~~~~~~~^~~
   'void f(const char*)': events 1-2
     |
     |   16 |     const char *s = NULL;
     |      |                 ^
     |      |                 |
     |      |                 (1) 's' is NULL
     |   17 |     int x = strlen(s);
     |      |             ~~~~~~~~~~
     |      |                   |
     |      |                   (2) argument 1 ('s') NULL where non-null 
expected
     |
In file included from <source>:4:
/usr/include/string.h:385:15: note: argument 1 of 'size_t strlen(const 
char*)' must be non-null
   385 | extern size_t strlen (const char *__s)
       |               ^~~~~~
]]


It is undefined behavior to pass NULL to a nonnull parameter, and it is 
a problem that the caller should check before the call.  The compiler 
may warn you of the most obvious ones, but in the end, the 
responsibility is on the programmer.

See 
<https://stackoverflow.com/questions/42035769/how-to-generate-warning-of-non-null-argument-for-my-custom-function>.

> 
> /* Return the length of S.  */
> extern size_t strlen (const char *__s)
>       __THROW __attribute_pure__ __nonnull ((1));
> 
> Perhaps that is just a macro that is not actually used......

It _is_ being used:

/usr/include$ grep -rn 'define __nonnull' -B3 -A1
x86_64-linux-gnu/sys/cdefs.h-290-/* The nonull function attribute allows 
to mark pointer parameters which
x86_64-linux-gnu/sys/cdefs.h-291-   must not be NULL.  */
x86_64-linux-gnu/sys/cdefs.h-292-#if __GNUC_PREREQ (3,3)
x86_64-linux-gnu/sys/cdefs.h:293:# define __nonnull(params) 
__attribute__ ((__nonnull__ params))
x86_64-linux-gnu/sys/cdefs.h-294-#else
x86_64-linux-gnu/sys/cdefs.h:295:# define __nonnull(params)
x86_64-linux-gnu/sys/cdefs.h-296-#endif

> 
> If I add another function -Werror=nonnull does give a warning
> void test(const char * const p) __attribute__((nonnull));
> 
> https://godbolt.org/z/x37sbfWaG
> 
> <source>:15:9: error: argument 1 null where non-null expected [-Werror=nonnull]
>     15 |     test(NULL);
> 
>>
>>> strlen.c:11:3: warning: statement with no effect [-Wunused-value]
>>>     11 |   strlen (str);
>>>        |   ^~~~~~~~~~~~
>>>
>>> https://godbolt.org/z/caoes5nGa
>>
>> That site probably uses different library headers.
>>
>> As posted, with Debian's GCC 8.3, I get this for the main function:
>>
>> main:
>> 	xorl	%eax, %eax
>> 	ret
>>
> 
> 
> In that case maybe https://man7.org/linux/man-pages/man3/strlen.3.html should have a "NOTES" section stating something similar to your wording ?
> 
> NOTES
> 
> The behavior of strlen(NULL) is depends on different things. It may cause a SEGV exception, or the compiler may optimize and remove the code path handling NULL.
> 

I disagree with this.  It is likely that the behavior is that, given the 
current implementation of Linux/GCC/glibc.  But it is undefined 
behavior, and anything can happen.  You should just try harder to avoid 
it, and not rely on any possible outcome of it.  GCC people may decide 
tomorrow to change the behavior to do some more agresive optimizations, 
and the documentation shouldn't preclude such a thing, as long as it's 
legal according to the relevant standards, and sane.


Cheers,

Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-07 12:22             ` strlen Alejandro Colomar (man-pages)
@ 2021-07-07 12:31               ` Alejandro Colomar (man-pages)
  2021-07-07 13:31                 ` strlen Jonny Grant
  0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-07 12:31 UTC (permalink / raw)
  To: Jonny Grant, Florian Weimer; +Cc: linux-man, Michael Kerrisk

On 7/7/21 2:22 PM, Alejandro Colomar (man-pages) wrote:
> I disagree with this.  It is likely that the behavior is that, given the 
> current implementation of Linux/GCC/glibc.  But it is undefined 
> behavior, and anything can happen.  You should just try harder to avoid 
> it, and not rely on any possible outcome of it.  GCC people may decide 
> tomorrow to change the behavior to do some more agresive optimizations, 
> and the documentation shouldn't preclude such a thing, as long as it's 
> legal according to the relevant standards, and sane.

The standard (and implementations) define a set of thing you can do in 
C.  Those are an equilibrium between usability and room for 
optimizations.  Some things must remain undefined for the language to be 
more efficient and simple.

If the language, or an implementation of it, attempted to provide a 
defined behavior for absolutely everything, some optimizations could not 
be done, and also, it would be much harder to actually implement it (and 
also document it).  So for good reasons, UB (undefined behavior) remains 
undefined.


Cheers,

Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-07 12:31               ` strlen Alejandro Colomar (man-pages)
@ 2021-07-07 13:31                 ` Jonny Grant
  2021-07-07 16:57                   ` strlen Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-07 13:31 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages), Florian Weimer; +Cc: linux-man, Michael Kerrisk



On 07/07/2021 13:31, Alejandro Colomar (man-pages) wrote:
> On 7/7/21 2:22 PM, Alejandro Colomar (man-pages) wrote:
>> I disagree with this.  It is likely that the behavior is that, given the current implementation of Linux/GCC/glibc.  But it is undefined behavior, and anything can happen.  You should just try harder to avoid it, and not rely on any possible outcome of it.  GCC people may decide tomorrow to change the behavior to do some more agresive optimizations, and the documentation shouldn't preclude such a thing, as long as it's legal according to the relevant standards, and sane.
> 
> The standard (and implementations) define a set of thing you can do in C.  Those are an equilibrium between usability and room for optimizations.  Some things must remain undefined for the language to be more efficient and simple.
> 
> If the language, or an implementation of it, attempted to provide a defined behavior for absolutely everything, some optimizations could not be done, and also, it would be much harder to actually implement it (and also document it).  So for good reasons, UB (undefined behavior) remains undefined.
> 
> 
> Cheers,
> 
> Alex
> 
> 

Hi Alex, Florian

Do you think this would get optimized out by GCC too?

size_t safestrlen(const char * s)
{
    if (NULL == s) return 0;
    else return strlen(s);
}



Maybe the man page could just state:


NOTES

The calling strlen with a NULL pointer is undefined behavior.

Cheers, Jonny

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-07 13:31                 ` strlen Jonny Grant
@ 2021-07-07 16:57                   ` Alejandro Colomar (man-pages)
  2021-07-07 17:23                     ` strlen Alejandro Colomar (man-pages)
                                       ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-07 16:57 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man, Michael Kerrisk, Florian Weimer

On 7/7/21 3:31 PM, Jonny Grant wrote:
> 
> 
> On 07/07/2021 13:31, Alejandro Colomar (man-pages) wrote:
>> On 7/7/21 2:22 PM, Alejandro Colomar (man-pages) wrote:
>>> I disagree with this.  It is likely that the behavior is that, given the current implementation of Linux/GCC/glibc.  But it is undefined behavior, and anything can happen.  You should just try harder to avoid it, and not rely on any possible outcome of it.  GCC people may decide tomorrow to change the behavior to do some more agresive optimizations, and the documentation shouldn't preclude such a thing, as long as it's legal according to the relevant standards, and sane.
>>
>> The standard (and implementations) define a set of thing you can do in C.  Those are an equilibrium between usability and room for optimizations.  Some things must remain undefined for the language to be more efficient and simple.
>>
>> If the language, or an implementation of it, attempted to provide a defined behavior for absolutely everything, some optimizations could not be done, and also, it would be much harder to actually implement it (and also document it).  So for good reasons, UB (undefined behavior) remains undefined.
>>
>>
>> Cheers,
>>
>> Alex
>>
>>
> 
> Hi Alex, Florian
> 
> Do you think this would get optimized out by GCC too?
> 
> size_t safestrlen(const char * s)
> {
>      if (NULL == s) return 0;
>      else return strlen(s);
> }

This would be optimized if the compiler can determine that s == NULL or 
s != NULL, which tipically would be that you ask the compiler to 
optimize, AND the compiler can deduce at compile time its relationship 
with NULL; OR you ask the compiler to optimize at link time (-flto) AND 
the relationship of s and NULL can be deduced at link time.

However, I don't see why that would be a problem.  Either you can 
guarantee that s is not NULL, and you don't need to call this 
safestrlen() wrapper, or you cannot guarantee it and then you are forced 
to call this wrapper.  The optimization, if it happens, will be good.

What I recommend you to do from time to time, to make sure you don't 
miss any warnings, is compile the whole project first with '-O3' and 
then with '-O0'.  If you are a bit paranoic, sporadically you can try 
all of them : '-Og', '-O0', '-O1', '-Os', '-O2', '-O3', '-Ofast' but I 
don't think that is necessary.  Of course, always use '-fanalyzer' (GCC 
10 and above).

> 
> 
> 
> Maybe the man page could just state:
> 
> 
> NOTES
> 
> The calling strlen with a NULL pointer is undefined behavior.

Okay.  I agree that should probably be documented.
I'm surprised it's not documented already.  Not even in the glibc manual 
(or I couldn't find it).

There are a lot of functions that should get this addition, though.  I'd 
like to patch them all at once.  I'll try to find a list of functions 
documented in the man pages and that have nonnull in the 
oimplementation.  If I don't come back soon with a list, please ping me.

Thanks,

Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-07 16:57                   ` strlen Alejandro Colomar (man-pages)
@ 2021-07-07 17:23                     ` Alejandro Colomar (man-pages)
  2021-07-07 17:33                       ` strlen Alejandro Colomar (man-pages)
  2021-07-08 10:07                     ` strlen Jonny Grant
  2021-07-09 20:44                     ` strlen Jonny Grant
  2 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-07 17:23 UTC (permalink / raw)
  To: Jonny Grant, Michael Kerrisk, Florian Weimer; +Cc: linux-man

Hi Jonny, Florian, Michael,

On 7/7/21 6:57 PM, Alejandro Colomar (man-pages) wrote:
>> Maybe the man page could just state:
>>
>>
>> NOTES
>>
>> The calling strlen with a NULL pointer is undefined behavior.
> 
> Okay.  I agree that should probably be documented.
> I'm surprised it's not documented already.  Not even in the glibc manual 
> (or I couldn't find it).
> 
> There are a lot of functions that should get this addition, though.  I'd 
> like to patch them all at once.  I'll try to find a list of functions 
> documented in the man pages and that have nonnull in the 
> oimplementation.  If I don't come back soon with a list, please ping me.

Humm, the list is huge, much bigger than I expected...

Just try yourselves:

~/src/gnu/glibc$ man_lsfunc ../../linux/man-pages/man3 > lsfunc;
~/src/gnu/glibc$ cat lsfunc \
                  | while read f; do \
                          grep_glibc_prototype $f \
                          | grep nonnull >/dev/null \
                          && grep_glibc_prototype $f; \
                  done \
                  > nonnull;
~/src/gnu/glibc$ wc -l  nonnull;
296 nonnull


grep_glibc_prototype is defined in the man-pages, in <scripts/bash_aliases>.

How do you think this should be handled?
Adding a line in NOTES for every such function?  Adding [[gnu::nonnull]] 
to every such prototype in SYNOPSIS (this might be too noisy)?  Else?


Thanks,

Alex

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-07 17:23                     ` strlen Alejandro Colomar (man-pages)
@ 2021-07-07 17:33                       ` Alejandro Colomar (man-pages)
  2021-07-09 13:48                         ` strlen Jonny Grant
  0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-07 17:33 UTC (permalink / raw)
  To: Jonny Grant, Michael Kerrisk, Florian Weimer; +Cc: linux-man, Libc-alpha

On 7/7/21 7:23 PM, Alejandro Colomar (man-pages) wrote:
> How do you think this should be handled?
> Adding a line in NOTES for every such function?  Adding [[gnu::nonnull]] 
> to every such prototype in SYNOPSIS (this might be too noisy)?  Else?

As an example of how man pages could look like with the addition of 
[[gnu::nonnull]], you can have a look at this manual page of mine:

[[
...
SYNOPSIS
        #include <alx/base/stdlib.h>

        [[gnu::nonnull]] [[gnu::warn_unused_result]]
        int alx_callocs(type **ptr, ptrdiff_t nmemb);

        [[gnu::malloc]] [[gnu::warn_unused_result]]
        void *alx_mallocarray(ptrdiff_t nmemb, ssize_t size);

        [[gnu::nonnull]] [[gnu::warn_unused_result]]
        int alx_mallocarrays(type **ptr, ptrdiff_t nmemb);

        [[gnu::nonnull]] [[gnu::warn_unused_result]]
        int alx_mallocs(type **ptr, ssize_t nmemb);

        [[gnu::warn_unused_result]]
        void *alx_reallocarrayf(void *ptr, ptrdiff_t nmemb, ssize_t nmemb);

        [[gnu::nonnull]] [[gnu::warn_unused_result]]
        int alx_reallocarrayfs(type **ptr, ptrdiff_t nmemb);

        [[gnu::nonnull]] [[gnu::warn_unused_result]]
        int alx_reallocfs(type **ptr, ssize_t nmemb);

        [[gnu::nonnull]] [[gnu::warn_unused_result]]
        int alx_reallocs(type **ptr, ssize_t nmemb);

        [[gnu::nonnull]]
        int alx_frees(type **ptr);
...
]]

Source: 
<https://github.com/alejandro-colomar/libalx/tree/main/share/man/base/man3>.

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-07 16:57                   ` strlen Alejandro Colomar (man-pages)
  2021-07-07 17:23                     ` strlen Alejandro Colomar (man-pages)
@ 2021-07-08 10:07                     ` Jonny Grant
  2021-07-08 11:06                       ` strlen Alejandro Colomar (man-pages)
  2021-07-09 20:44                     ` strlen Jonny Grant
  2 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-08 10:07 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages)
  Cc: linux-man, Michael Kerrisk, Florian Weimer, Jonny Grant

Hi Alex

On 07/07/2021 17:57, Alejandro Colomar (man-pages) wrote:
> On 7/7/21 3:31 PM, Jonny Grant wrote:
>>
>>
>> On 07/07/2021 13:31, Alejandro Colomar (man-pages) wrote:
>>> On 7/7/21 2:22 PM, Alejandro Colomar (man-pages) wrote:
>>>> I disagree with this.  It is likely that the behavior is that, given the current implementation of Linux/GCC/glibc.  But it is undefined behavior, and anything can happen.  You should just try harder to avoid it, and not rely on any possible outcome of it.  GCC people may decide tomorrow to change the behavior to do some more agresive optimizations, and the documentation shouldn't preclude such a thing, as long as it's legal according to the relevant standards, and sane.
>>>
>>> The standard (and implementations) define a set of thing you can do in C.  Those are an equilibrium between usability and room for optimizations.  Some things must remain undefined for the language to be more efficient and simple.
>>>
>>> If the language, or an implementation of it, attempted to provide a defined behavior for absolutely everything, some optimizations could not be done, and also, it would be much harder to actually implement it (and also document it).  So for good reasons, UB (undefined behavior) remains undefined.
>>>
>>>
>>> Cheers,
>>>
>>> Alex
>>>
>>>
>>
>> Hi Alex, Florian
>>
>> Do you think this would get optimized out by GCC too?
>>
>> size_t safestrlen(const char * s)
>> {
>>      if (NULL == s) return 0;
>>      else return strlen(s);
>> }
> 
> This would be optimized if the compiler can determine that s == NULL or s != NULL, which tipically would be that you ask the compiler to optimize, AND the compiler can deduce at compile time its relationship with NULL; OR you ask the compiler to optimize at link time (-flto) AND the relationship of s and NULL can be deduced at link time.
> 
> However, I don't see why that would be a problem.  Either you can guarantee that s is not NULL, and you don't need to call this safestrlen() wrapper, or you cannot guarantee it and then you are forced to call this wrapper.  The optimization, if it happens, will be good.

Thank you for your reply.

We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for.

I'd like to avoid the compiler removing certain execution paths.
I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system.


Probably this would work:

size_t __attribute__((optimize("O0"))) safestrlen(const char * s)
{
    if (NULL == s) return 0;
    else return strlen(s);
}

I also use 'volatile' for reads/writes to addresses that I don't want to be optimized out.

> 
> What I recommend you to do from time to time, to make sure you don't miss any warnings, is compile the whole project first with '-O3' and then with '-O0'.  If you are a bit paranoic, sporadically you can try all of them : '-Og', '-O0', '-O1', '-Os', '-O2', '-O3', '-Ofast' but I don't think that is necessary.  Of course, always use '-fanalyzer' (GCC 10 and above).

Yes, I am looking forward to David Malcom's -fanalyzer when Ubuntu LTS next upgrades, I'm on gcc 9.3 today. But -fanalyzer is only for C anyway.. much of of code base I work with is compiled as C++ so I can't use -fanalyzer yet.

https://developers.redhat.com/blog/2020/03/26/static-analysis-in-gcc-10#trying_it_out

I do have these other instrumentation options.
-fsanitize=null,returns-nonnull-attribute,signed-integer-overflow,leak,undefined,address


>>
>>
>>
>> Maybe the man page could just state:
>>
>>
>> NOTES
>>
>> The calling strlen with a NULL pointer is undefined behavior.
> 
> Okay.  I agree that should probably be documented.
> I'm surprised it's not documented already.  Not even in the glibc manual (or I couldn't find it).
> 
> There are a lot of functions that should get this addition, though.  I'd like to patch them all at once.  I'll try to find a list of functions documented in the man pages and that have nonnull in the oimplementation.  If I don't come back soon with a list, please ping me.
> 
> Thanks,
> 
> Alex
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-08 10:07                     ` strlen Jonny Grant
@ 2021-07-08 11:06                       ` Alejandro Colomar (man-pages)
  2021-07-08 12:13                         ` strlen Xi Ruoyao
                                           ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-08 11:06 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help

On 7/8/21 12:07 PM, Jonny Grant wrote:
> Thank you for your reply.
> 
> We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for.
> 
> I'd like to avoid the compiler removing certain execution paths.
> I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system.
> 
> 
> Probably this would work:
> 
> size_t __attribute__((optimize("O0"))) safestrlen(const char * s)
> {
>      if (NULL == s) return 0;
>      else return strlen(s);
> }

I don't think you don't need that.  Unless there's a bug in GCC, it 
shouldn't optimize that path unless it is 100% sure that it will never 
be called.

Moreover, I recommend you to optimize as much as possible.
Even though NULL is possible in your code, I guess it's unlikely.

Also, calling a function safe is too generic.
I'd call it with the suffix null, as it act different on null.

Also, I recommend avoiding 'size_t' (and any other unsigned types, BTW).
See <https://google.github.io/styleguide/cppguide.html#Integer_Types>.
Use the POSIX type 'ssize_t'.
That also allows differentiating a length of 0 (i.e., "") from an 
invalid string (i.e., NULL), by returning -1 for NULL.

Recommended implementation (requires C99 or later due to the usage of 
inline):

[
// compiler.h

#define likely(x)    __builtin_expect((x), 1)
#define unlikely(x)  __builtin_expect((x), 0)
]


[
// strlennull.h

#include <string.h>
#include <sys/types.h>

#include "compiler.h"

[[gnu::pure]]
inline ssize_t strlennull(const char *s);

inline ssize_t strlennull(const char *s)
{
	if (unlikely(!s))
		return -1;
	return strlen(s);
}
]


[
// strlennull.c

#include "strlennull.h"

#include <sys/types.h>

extern inline ssize_t strlennull(const char *s);
]


BTW, if the input is so untrusted that NULL may be possible, I guess it 
might as well contain invalid strings (non-NUL-terminated), for which 
strlen(3) would behave as bad or worse.  So I recommend having a look at 
strnlen(3) and maybe implement strnlennull() instead of strlennull().


Unless you _know_ there's a compiler bug that doesn't allow you to 
optimize, please optimize.  Otherwise, if it's just a bit of paranoia, 
you could as well not optimize any code at all.  Specifically not 
optimizing this code by the use of pragmas or attributes is *wrong*. 
Just revise the preprocessor output, the compiler output, and also try 
introducing a NULL at run time, to check that everything's fine.

> 
> I also use 'volatile' for reads/writes to addresses that I don't want to be optimized out.

Are you sure you don't want to optimize?  Are those addresses hardware 
I/O or interrupts?

Maybe you just want membarrier(2).

> 
>>
>> What I recommend you to do from time to time, to make sure you don't miss any warnings, is compile the whole project first with '-O3' and then with '-O0'.  If you are a bit paranoic, sporadically you can try all of them : '-Og', '-O0', '-O1', '-Os', '-O2', '-O3', '-Ofast' but I don't think that is necessary.  Of course, always use '-fanalyzer' (GCC 10 and above).
> 
> Yes, I am looking forward to David Malcom's -fanalyzer when Ubuntu LTS next upgrades, I'm on gcc 9.3 today. But -fanalyzer is only for C anyway.. much of of code base I work with is compiled as C++ so I can't use -fanalyzer yet.

You can install gcc-10 for Bionic: apt-get install gcc-10
I recommend it.  It finds bugs very deep in the code.


Cheers,

Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-08 11:06                       ` strlen Alejandro Colomar (man-pages)
@ 2021-07-08 12:13                         ` Xi Ruoyao
  2021-07-08 23:49                         ` strlen Segher Boessenkool
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 32+ messages in thread
From: Xi Ruoyao @ 2021-07-08 12:13 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages), Jonny Grant
  Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk

On Thu, 2021-07-08 at 13:06 +0200, Alejandro Colomar (man-pages) via
Gcc-help wrote:
> On 7/8/21 12:07 PM, Jonny Grant wrote:
> > Thank you for your reply.
> > 
> > We can't guarantee safestrlen() won't be called with NULL. So because
> > strlen() itself doesn't check for NULL in C standard we'd need to call
> > the wrapper so that NULL can be checked for.
> > 
> > I'd like to avoid the compiler removing certain execution paths.
> > I'd rather keep all code paths, even if they are not taken, just in
> > case a NULL pointer creeps in due to an external device that is
> > connected to an embedded system.

If you are taking a pointer from external device "correctly", gcc won't
delete your NULL checking path.  For example:

// defined by linker script
extern volatile char *an_io_port_providing_a_pointer;

int f()
{
    char *ptr = an_io_port_providing_a_pointer;

    // C standard disallows to remove it
    if (ptr == NULL) {
        gracefully_report_bug("some message");
        return -EINVAL;
    }

    return g(ptr);
}

Or

// in assembly
extern char *read_pointer_from_io_port(int io_port_id);

int f()
{
    char *ptr = read_pointer_from_io_port(IO_PORT_A);

    // C standard disallows to remove it
    if (ptr == NULL) {
        gracefully_report_bug("some message");
        return -EINVAL;
    }

    return g(ptr);
}

OTOH, if you are taking the pointer from external input incorrectly (i.
e. violating C standard and invoking some UB), even if you used some way
to enforce the compiler to keep the NULL checking, it would be still
unsafe.

Even if you want to be "careful" (I'd rather call this "paranoid"), you
can use -fno-delete-null-pointer-checks, instead of turning off all
optimizations.

And, GCC "optimize" attribute/pragma is somewhat buggy and only intended
for debugging GCC.  If you need to turn off some optmization for a
function, it's better to put the function into a seperate TU and use
command line option to disable the optimization.

By the way, if C can't provide the safety feature you need (for example
programming something launching a nuclear missile :), maybe it's better
to use Ada or something.

-- 
Xi Ruoyao <xry111@mengyan1223.wang>
School of Aerospace Science and Technology, Xidian University


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-08 11:06                       ` strlen Alejandro Colomar (man-pages)
  2021-07-08 12:13                         ` strlen Xi Ruoyao
@ 2021-07-08 23:49                         ` Segher Boessenkool
  2021-07-09 13:54                           ` strlen Jonny Grant
  2021-07-09 10:50                         ` strlen Jonny Grant
       [not found]                         ` <1627912755.3783669.1625745946723@mail.yahoo.com>
  3 siblings, 1 reply; 32+ messages in thread
From: Segher Boessenkool @ 2021-07-08 23:49 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages)
  Cc: Jonny Grant, gcc-help, linux-man, Florian Weimer, Michael Kerrisk

On Thu, Jul 08, 2021 at 01:06:17PM +0200, Alejandro Colomar (man-pages) via Gcc-help wrote:
> On 7/8/21 12:07 PM, Jonny Grant wrote:
> >We can't guarantee safestrlen() won't be called with NULL. So because 
> >strlen() itself doesn't check for NULL in C standard we'd need to call the 
> >wrapper so that NULL can be checked for.

> >size_t __attribute__((optimize("O0"))) safestrlen(const char * s)
> >{
> >     if (NULL == s) return 0;
> >     else return strlen(s);
> >}

> That also allows differentiating a length of 0 (i.e., "") from an 
> invalid string (i.e., NULL), by returning -1 for NULL.

It is incorrect to return any particular value for strlen(0); not 0, not
-1, not anything.  Since there *is* no string, it doesn't have a length
either.

So instead of making some function for this, I recommend just writing
something like

  bla = s ? strlen(s) : 0;

wherever you need it.  If a function name isn't self-explanatory, and
even *cannot* be, your factoring is most likely not ideal.  Code is
primarily there for humans to read, it should be optimised for that.


Segher

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-08 11:06                       ` strlen Alejandro Colomar (man-pages)
  2021-07-08 12:13                         ` strlen Xi Ruoyao
  2021-07-08 23:49                         ` strlen Segher Boessenkool
@ 2021-07-09 10:50                         ` Jonny Grant
  2021-07-09 11:27                           ` strlen Alejandro Colomar (man-pages)
       [not found]                         ` <1627912755.3783669.1625745946723@mail.yahoo.com>
  3 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-09 10:50 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages)
  Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help



On 08/07/2021 12:06, Alejandro Colomar (man-pages) wrote:
> On 7/8/21 12:07 PM, Jonny Grant wrote:
>> Thank you for your reply.
>>
>> We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for.
>>
>> I'd like to avoid the compiler removing certain execution paths.
>> I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system.
>>
>>
>> Probably this would work:
>>
>> size_t __attribute__((optimize("O0"))) safestrlen(const char * s)
>> {
>>      if (NULL == s) return 0;
>>      else return strlen(s);
>> }
> 
> I don't think you don't need that.  Unless there's a bug in GCC, it shouldn't optimize that path unless it is 100% sure that it will never be called.

That is good, so the code will always be kept! As compiler will never find all calls to strlen() and be sure those calls are never NULL.

> Moreover, I recommend you to optimize as much as possible.
> Even though NULL is possible in your code, I guess it's unlikely.
> 
> Also, calling a function safe is too generic.
> I'd call it with the suffix null, as it act different on null.
> 
> Also, I recommend avoiding 'size_t' (and any other unsigned types, BTW).
> See <https://google.github.io/styleguide/cppguide.html#Integer_Types>.
> Use the POSIX type 'ssize_t'.
> That also allows differentiating a length of 0 (i.e., "") from an invalid string (i.e., NULL), by returning -1 for NULL.
> 

https://man7.org/linux/man-pages/man3/strlen.3.html
size_t strlen(const char *s);

I'd rather not change the return type from POSIX size_t in any wrapper of strlen. Unless it is part of C11 Annex K style standards improvement.

Cheers, Jonny

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-09 10:50                         ` strlen Jonny Grant
@ 2021-07-09 11:27                           ` Alejandro Colomar (man-pages)
  2021-07-09 11:43                             ` strlen Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-09 11:27 UTC (permalink / raw)
  To: Jonny Grant
  Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help,
	Segher Boessenkool, Xi Ruoyao

Hi Jonny,

On 7/9/21 12:50 PM, Jonny Grant wrote:
> 
> 
> On 08/07/2021 12:06, Alejandro Colomar (man-pages) wrote:
>> On 7/8/21 12:07 PM, Jonny Grant wrote:
>>> Thank you for your reply.
>>>
>>> We can't guarantee safestrlen() won't be called with NULL. So because strlen() itself doesn't check for NULL in C standard we'd need to call the wrapper so that NULL can be checked for.
>>>
>>> I'd like to avoid the compiler removing certain execution paths.
>>> I'd rather keep all code paths, even if they are not taken, just in case a NULL pointer creeps in due to an external device that is connected to an embedded system.
>>>
>>>
>>> Probably this would work:
>>>
>>> size_t __attribute__((optimize("O0"))) safestrlen(const char * s)
>>> {
>>>      if (NULL == s) return 0;
>>>      else return strlen(s);
>>> }
>>
>> I don't think you don't need that.  Unless there's a bug in GCC, it shouldn't optimize that path unless it is 100% sure that it will never be called.
> 
> That is good, so the code will always be kept! As compiler will never find all calls to strlen() and be sure those calls are never NULL.

Not always.  If you inline that function, that path may be removed in
some calls, if the compiler knows better than you that it can.  My point
is that you shouldn't care; your code is completely legal, and whatever
the compiler decides to do will also be legal (no undefined behavior,
and no crashes).  If it optimizes, it will be a good thing that you
shouldn't prevent.

If the compiler does otherwise, that's a bug in the compiler, and
something you can't solve by writing different code or preventing
optimizations, or at least there's no guarantee about it, since it's a
bug.  But I don't believe you'll find a bug in this case.  So please,
trust the compiler, at least when using perfectly defined behavior.  And
if you don't trust the compiler, which is perfectly reasonable, test it,
but don't try to workaround bugs that don't exist.

> 
>> Moreover, I recommend you to optimize as much as possible.
>> Even though NULL is possible in your code, I guess it's unlikely.
>>
>> Also, calling a function safe is too generic.
>> I'd call it with the suffix null, as it act different on null.
>>
>> Also, I recommend avoiding 'size_t' (and any other unsigned types, BTW).
>> See <https://google.github.io/styleguide/cppguide.html#Integer_Types>.
>> Use the POSIX type 'ssize_t'.
>> That also allows differentiating a length of 0 (i.e., "") from an invalid string (i.e., NULL), by returning -1 for NULL.
>>
> 
> https://man7.org/linux/man-pages/man3/strlen.3.html
> size_t strlen(const char *s);
> 
> I'd rather not change the return type from POSIX size_t in any wrapper of strlen. Unless it is part of C11 Annex K style standards improvement.

That's a historical accident.

A long time ago (much before I was born, and much before the first
standard, I mean in the early times of K&R C and Unix), unsigned types
were used more than they should, and the first C standards (I mean
ANSI/ISO standards (i.e., C89, C99, ...), not POSIX), with a lot of
already existing code, didn't attempt to change the language, but to
annotate common usage.

In POSIX.1 there's a mix, because POSIX has the type 'ssize_t', which is
not defined by the C standards.  POSIX in general tends to use the
signed type 'ssize_t' for its POSIX-only functions (i.e., not in the C
standards).

Annex K has been an attempt of Microsoft to provide safer functions, but
while there are some functions there that have good intentions, most of
them are just badly designed.  That annex K is DOA, and will probably be
marked as deprecated in C22 (currently C2x).

I think that a standard should not try to design new functions, and
instead just annotate common usage, as they did in the first ones.
Problems like the ones Annex K suffers could have been detected early if
they had been implemented as an extension to some compiler(s) decade(s)
before being standardized.  Therefore, if the implementation passes the
test of time, you standardize it, else not, IMO.  Otherwise, we have a
standard that is declared deprecated in the next version of the
standard, similar to what is happening with the C++ standards (which,
guess what, BTW I recently read that they are undeprecating a lot of C
stuff they deprecated in the first standards).

Using unsigned for anything else than bitfileds and similar stuff is
just *wrong*, as you can read from the Google C++ style guide I linked
before.
Another source you can read is this paper from Bjarne Stroustrup:
<http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1428r0.pdf>
This is one of the few cases where I agree with something coming from
him.  I hope some day we get a ssizeof operator :-)


Cheers,

Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-09 11:27                           ` strlen Alejandro Colomar (man-pages)
@ 2021-07-09 11:43                             ` Alejandro Colomar (man-pages)
  0 siblings, 0 replies; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-09 11:43 UTC (permalink / raw)
  To: Jonny Grant
  Cc: linux-man, Michael Kerrisk, Florian Weimer, gcc-help,
	Segher Boessenkool, Xi Ruoyao



On 7/9/21 1:27 PM, Alejandro Colomar (man-pages) wrote:
> Annex K has been an attempt of Microsoft to provide safer functions, but
> while there are some functions there that have good intentions, most of
> them are just badly designed.  That annex K is DOA, and will probably be
> marked as deprecated in C22 (currently C2x).
> 
> I think that a standard should not try to design new functions, and
> instead just annotate common usage, as they did in the first ones.
> Problems like the ones Annex K suffers could have been detected early if
> they had been implemented as an extension to some compiler(s) decade(s)
> before being standardized.  Therefore, if the implementation passes the
> test of time, you standardize it, else not, IMO.  Otherwise, we have a
> standard that is declared deprecated in the next version of the
> standard, similar to what is happening with the C++ standards (which,
> guess what, BTW I recently read that they are undeprecating a lot of C
> stuff they deprecated in the first standards).

But let's analyze Annex K (C11):

It has been designed by Microsoft.

MS's compiler (MS Visual Studio) doesn't even fully support C99 yet (and
by that trend, I doubt it never will).  At most it supports C89.  Visual
Studio has a long history of not supporting C except for those parts
required to implement their C++ compiler.  Would you buy a car designed
by a bike manufacturer?


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-07 17:33                       ` strlen Alejandro Colomar (man-pages)
@ 2021-07-09 13:48                         ` Jonny Grant
  0 siblings, 0 replies; 32+ messages in thread
From: Jonny Grant @ 2021-07-09 13:48 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages), Michael Kerrisk, Florian Weimer
  Cc: linux-man, Libc-alpha



On 07/07/2021 18:33, Alejandro Colomar (man-pages) wrote:
> On 7/7/21 7:23 PM, Alejandro Colomar (man-pages) wrote:
>> How do you think this should be handled?
>> Adding a line in NOTES for every such function?  Adding [[gnu::nonnull]] to every such prototype in SYNOPSIS (this might be too noisy)?  Else?
> 
> As an example of how man pages could look like with the addition of [[gnu::nonnull]], you can have a look at this manual page of mine:
> 
> [[
> ...
> SYNOPSIS
>        #include <alx/base/stdlib.h>
> 
>        [[gnu::nonnull]] [[gnu::warn_unused_result]]
>        int alx_callocs(type **ptr, ptrdiff_t nmemb);
> 
>        [[gnu::malloc]] [[gnu::warn_unused_result]]
>        void *alx_mallocarray(ptrdiff_t nmemb, ssize_t size);
> 
>        [[gnu::nonnull]] [[gnu::warn_unused_result]]
>        int alx_mallocarrays(type **ptr, ptrdiff_t nmemb);
> 
>        [[gnu::nonnull]] [[gnu::warn_unused_result]]
>        int alx_mallocs(type **ptr, ssize_t nmemb);
> 
>        [[gnu::warn_unused_result]]
>        void *alx_reallocarrayf(void *ptr, ptrdiff_t nmemb, ssize_t nmemb);
> 
>        [[gnu::nonnull]] [[gnu::warn_unused_result]]
>        int alx_reallocarrayfs(type **ptr, ptrdiff_t nmemb);
> 
>        [[gnu::nonnull]] [[gnu::warn_unused_result]]
>        int alx_reallocfs(type **ptr, ssize_t nmemb);
> 
>        [[gnu::nonnull]] [[gnu::warn_unused_result]]
>        int alx_reallocs(type **ptr, ssize_t nmemb);
> 
>        [[gnu::nonnull]]
>        int alx_frees(type **ptr);
> ...
> ]]
> 
> Source: <https://github.com/alejandro-colomar/libalx/tree/main/share/man/base/man3>.


May I ask, could a note be added to the NOTES section as well?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-08 23:49                         ` strlen Segher Boessenkool
@ 2021-07-09 13:54                           ` Jonny Grant
  2021-07-09 14:17                             ` strlen Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-09 13:54 UTC (permalink / raw)
  To: Segher Boessenkool, Alejandro Colomar (man-pages)
  Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk



On 09/07/2021 00:49, Segher Boessenkool wrote:
> On Thu, Jul 08, 2021 at 01:06:17PM +0200, Alejandro Colomar (man-pages) via Gcc-help wrote:
>> On 7/8/21 12:07 PM, Jonny Grant wrote:
>>> We can't guarantee safestrlen() won't be called with NULL. So because 
>>> strlen() itself doesn't check for NULL in C standard we'd need to call the 
>>> wrapper so that NULL can be checked for.
> 
>>> size_t __attribute__((optimize("O0"))) safestrlen(const char * s)
>>> {
>>>     if (NULL == s) return 0;
>>>     else return strlen(s);
>>> }
> 
>> That also allows differentiating a length of 0 (i.e., "") from an 
>> invalid string (i.e., NULL), by returning -1 for NULL.
> 
> It is incorrect to return any particular value for strlen(0); not 0, not
> -1, not anything.  Since there *is* no string, it doesn't have a length
> either.
> 
> So instead of making some function for this, I recommend just writing
> something like
> 
>   bla = s ? strlen(s) : 0;


Hi Segher

Yes, this could work. But it does rely on programmer typing it like that every time... Maybe an inline function better.

inline size_t safestrlen(const char * s) {return s?strlen(s) : 0}

Perhaps there are too many email addresses on this cc list now.

I'd prefer a Annex K of C11 style function ISO/IEC TR 24731-1 for strlen() - but there isn't one such as strnlen_s.


> 
> wherever you need it.  If a function name isn't self-explanatory, and
> even *cannot* be, your factoring is most likely not ideal.  Code is
> primarily there for humans to read, it should be optimised for that.
> 
> 
> Segher
> .

Good point
Jonny

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-09 13:54                           ` strlen Jonny Grant
@ 2021-07-09 14:17                             ` Alejandro Colomar (man-pages)
  2021-07-09 16:11                               ` strlen Xi Ruoyao
  2021-07-10  1:00                               ` strlen Segher Boessenkool
  0 siblings, 2 replies; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-09 14:17 UTC (permalink / raw)
  To: Jonny Grant, Segher Boessenkool
  Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk

Hi Jonny, Segher,

On 7/9/21 3:54 PM, Jonny Grant wrote:
> Yes, this could work. But it does rely on programmer typing it like that every time... Maybe an inline function better.

I agree on that.

> 
> inline size_t safestrlen(const char * s) {return s?strlen(s) : 0}
> 
> Perhaps there are too many email addresses on this cc list now.
> 
> I'd prefer a Annex K of C11 style function ISO/IEC TR 24731-1 for strlen() - but there isn't one such as strnlen_s.

Please, consider not calling some function safesomething() or similar, 
as it isn't 100% safe.  It's like calling some thing "the new X".  How 
will you call the next version?  "the nova X"? And the next? "the 
supernew X"?

As I said before, unsigned types are unsafe, you may want to accept it 
or not, but they are.

Now, the day you realize that and develop an even safer function that 
doesn't use unsigned size_t, what will you call it?  supersafestrlen()?

Use names that define your functions as closely as possible.

> 
> 
> On 09/07/2021 00:49, Segher Boessenkool wrote:
>> wherever you need it.  If a function name isn't self-explanatory, and

Agree on this

>> even *cannot* be, your factoring is most likely not ideal.  Code is
But not on this.

You could call it strlennull(), that is, a strlen() that special-cases 
NULL.  I find it a good enough name, as long as you document your function.

It saves the problem of repeating yourself every time.

>> primarily there for humans to read, it should be optimised for that.
>>

Agree on this again, but I think the following is readable:

len = strlennull(maybenull);


Regards,

Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-09 14:17                             ` strlen Alejandro Colomar (man-pages)
@ 2021-07-09 16:11                               ` Xi Ruoyao
  2021-07-10  1:00                               ` strlen Segher Boessenkool
  1 sibling, 0 replies; 32+ messages in thread
From: Xi Ruoyao @ 2021-07-09 16:11 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages), Jonny Grant, Segher Boessenkool
  Cc: gcc-help, linux-man, Florian Weimer, Michael Kerrisk

On Fri, 2021-07-09 at 16:17 +0200, Alejandro Colomar (man-pages) via
Gcc-help wrote:
> Hi Jonny, Segher,
> 
> On 7/9/21 3:54 PM, Jonny Grant wrote:
> > Yes, this could work. But it does rely on programmer typing it like
> > that every time... Maybe an inline function better.
> 
> I agree on that.
> 
> > 
> > inline size_t safestrlen(const char * s) {return s?strlen(s) : 0}
> > 
> > Perhaps there are too many email addresses on this cc list now.

I think the discussion at least has nothing to do with linux-man or gcc-
help: man pages just describe the existing API (C or POSIX or Linux
specific), and GCC just compiles code and doesn't care what the API is.
Neither is a place to discuss "how to design an API".

And I think Jonny should discuss the API design with the users of his
API (maybe his collegue or downstream developers), instead of some
random guys in mail list.  The users are the ones who will call his
function anyway so it's better to choose an API they like.  Yes, Jonny
can "force" the users to do something for safety, but this decision
should also be discussed with them and documented.  Or they won't
understand the decision, and may "invent" or "improvise" some "new
wheels", breaking Jonny's design.

For example, I don't like a function silently treats NULL as an empty
string.  I prefer a function to abort() or print a log "strlen_checked()
is called with NULL, there is a bug in your code" when I (mis)use NULL.
But it's just my 2 cents: if the potential users of the API agree the
function to act as that, then it's good to go.

-- 
Xi Ruoyao <xry111@mengyan1223.wang>
School of Aerospace Science and Technology, Xidian University


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
       [not found]                           ` <59a70222-a46f-1e65-c9db-6c9e577c8adc@126.com>
@ 2021-07-09 17:26                             ` Martin Sebor
  2021-07-09 20:19                               ` strlen Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 32+ messages in thread
From: Martin Sebor @ 2021-07-09 17:26 UTC (permalink / raw)
  To: LIU Hao, johnsfine, gcc-help, jg; +Cc: linux-man, fw, mtk.manpages

On 7/8/21 8:08 PM, LIU Hao via Gcc-help wrote:
> 在 7/8/21 8:05 PM, johnsfine--- via Gcc-help 写道:
>>   This is not the forum for such a discussion.  But I want to make 
>> people reading this aware that many expert C and C++ programmers 
>> (likely a majority) consider that advice to avoid unsigned types to be 
>> horrible advice.
>> I advise people to avoid signed types and I do so myself.  If an 
>> integer value won't be negative, it shouldn't be signed.  That makes 
>> your intent clearer to anyone reading your code, and (especially in 
>> x86-64) lets the compiler generate smaller and faster code.
> 
> That makes no sense. Would you prefer unsigned integers to signed ones, 
> for something that can only be one of {1,2,3,4}? Just because something 
> can't be negative, does not mean it should be unsigned.

There are benefits to making that explicit by choosing an unsigned
type: the result of converting a narrower unsigned type to a wider
unsigned type is unchanged.  The result of converting it to a wider
signed type may change.  This has an impact on value range propagation
which in turn can influence other optimizations as well as warnings.

Choosing an unsigned type has liabilities as well.  Because unsigned
arithmetic wraps around zero whereas signed arithmetic does not (it
overflows which is undefined), its results are less constrained than
those of signed arithmetic.

Martin

> 
> Conversion between `uint64_t` and `double` is much slower than `int64_t` 
> on some platforms, e.g. x86, so 'smaller and faster code' is also 
> groundless [1].
> 
> 
> A signed integer is a value that denotes the number or amount of 
> something, such as bytes, characters, files, seconds, kilometers, or 
> even dollars. An unsigned integer is merely a fixed-sized collection of 
> bits. That's the only fundamental difference.
> 
> 
> [1] https://gcc.godbolt.org/z/6YY61rhGe
> 
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-09 17:26                             ` strlen Martin Sebor
@ 2021-07-09 20:19                               ` Alejandro Colomar (man-pages)
  0 siblings, 0 replies; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-09 20:19 UTC (permalink / raw)
  To: Martin Sebor, LIU Hao, johnsfine, gcc-help, jg
  Cc: linux-man, fw, mtk.manpages


On 7/9/21 7:26 PM, Martin Sebor wrote:
> On 7/8/21 8:08 PM, LIU Hao via Gcc-help wrote:
>> 在 7/8/21 8:05 PM, johnsfine--- via Gcc-help 写道:
>>>   This is not the forum for such a discussion.  But I want to make 
>>> people reading this aware that many expert C and C++ programmers 
>>> (likely a majority) consider that advice to avoid unsigned types to 
>>> be horrible advice.

I'm sorry if it's not the right place, I could remove the lists
from the CC if it's too noisy, but I think an important point is
here, and a couple of emails won't damage too much the mailing lists.

On the other hand, I consider bad etiquette removing CCs from
a discussion.

>>> I advise people to avoid signed types and I do so myself.  If an 
>>> integer value won't be negative, it shouldn't be signed.  That makes 
>>> your intent clearer to anyone reading your code, and (especially in 
>>> x86-64) lets the compiler generate smaller and faster code.

As others have showed with facts, and the Google style guide also hints,
using unsigned types just removes the opportunity of the compiler
to optimize on overflow, because it has to account for wrapping around.

>>
>> That makes no sense. Would you prefer unsigned integers to signed 
>> ones, for something that can only be one of {1,2,3,4}? Just because 
>> something can't be negative, does not mean it should be unsigned.
That.
The same way that because a number is never going to be
greater than 100 is it any better to use [u]int256_t.
Just use int, unless you have a reason not to.

Please John, read the paper from Bjarne about unsigned types,
it really covers a lot.  If you still disagree after reading that,
feel free to argument it.

<http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1428r0.pdf>

> 
> There are benefits to making that explicit by choosing an unsigned
> type: the result of converting a narrower unsigned type to a wider
> unsigned type is unchanged.  The result of converting it to a wider
> signed type may change.  This has an impact on value range propagation
> which in turn can influence other optimizations as well as warnings.

That problem with implicit conversions to larger types is not a
problem of signed integers, not even a problem of unsigned integers.
It's a problem of mixing both.
If you use signed integers for everything, you won't have that problem.


Alex


-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-07 16:57                   ` strlen Alejandro Colomar (man-pages)
  2021-07-07 17:23                     ` strlen Alejandro Colomar (man-pages)
  2021-07-08 10:07                     ` strlen Jonny Grant
@ 2021-07-09 20:44                     ` Jonny Grant
  2021-07-10 18:37                       ` strlen Alejandro Colomar (man-pages)
  2 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-09 20:44 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages); +Cc: linux-man, Michael Kerrisk, Florian Weimer



On 07/07/2021 17:57, Alejandro Colomar (man-pages) wrote:
> On 7/7/21 3:31 PM, Jonny Grant wrote:
[snip]
>>
>>
>>
>> Maybe the man page could just state:
>>
>>
>> NOTES
>>
>> The calling strlen with a NULL pointer is undefined behavior.
> 
> Okay.  I agree that should probably be documented.
> I'm surprised it's not documented already.  Not even in the glibc manual (or I couldn't find it).
> 
> There are a lot of functions that should get this addition, though.  I'd like to patch them all at once.  I'll try to find a list of functions documented in the man pages and that have nonnull in the oimplementation.  If I don't come back soon with a list, please ping me.
> 
> Thanks,
> 
> Alex
> 

Perhaps the NOTES section on strlen(3) could also give a hint that strnlen would be better to use than strlen if the max buffer size is known.  Likewise suggestion the same for wcslen(3) could suggest wcsnlen(3) where the buffer size is know.

With kind regards
Jonny

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-09 14:17                             ` strlen Alejandro Colomar (man-pages)
  2021-07-09 16:11                               ` strlen Xi Ruoyao
@ 2021-07-10  1:00                               ` Segher Boessenkool
  1 sibling, 0 replies; 32+ messages in thread
From: Segher Boessenkool @ 2021-07-10  1:00 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages)
  Cc: Jonny Grant, gcc-help, linux-man, Florian Weimer, Michael Kerrisk

Fine, I'll bite :-)

On Fri, Jul 09, 2021 at 04:17:08PM +0200, Alejandro Colomar (man-pages) wrote:
> On 7/9/21 3:54 PM, Jonny Grant wrote:
> >Yes, this could work. But it does rely on programmer typing it like that 
> >every time... Maybe an inline function better.
> 
> I agree on that.

A function (or any other abstraction) can be fine for this, *iff* you
can make people use it correctly.  Since it is pretty much impossible to
give a good succinct name to this function, I posit that is not the
case.  Please feel free to prove me wrong (by coming up with a decent
name for it).

> >I'd prefer a Annex K of C11 style function ISO/IEC TR 24731-1 for strlen() 
> >- but there isn't one such as strnlen_s.
> 
> Please, consider not calling some function safesomething() or similar, 
> as it isn't 100% safe.  It's like calling some thing "the new X".  How 
> will you call the next version?  "the nova X"? And the next? "the 
> supernew X"?
> 
> As I said before, unsigned types are unsafe, you may want to accept it 
> or not, but they are.

I thought Annex K was great entertainment, but calling unsigned types
"unsafe" takes the cake.

Unsigned types are Z/nZ with n some power of two.  Signed types are not
even Z (overflow is undefined).  Unsigned types are useful for
describing many machine things.  They are useful for sizes, not only
because sizes cannot be negative, but also because sizes can be bigger
than the maximum positive number that can fit in the same size signed
number.  They are useful for bitty things, registers maybe, stuff in
memory, or device I/O registers.  And they are much more useful than C
signed numbers for holding memory addresses, where you need that (you
can do sane aritmetic on it).

Using unsigned types without range checking is often wrong ("unsafe" in
your words).  Using signed types without range checking is just as wrong
in the same cases, if not more (overflow is undefined).  At least in the
"unsigned" case it is *possible* its behaviour is what the programmer
intended!

> Agree on this again, but I think the following is readable:
> 
> len = strlennull(maybenull);

If you use it a million times, of course you can give it a short and
non-sensical name, and expect the users to learn what it means.  If not,
it is better to be slightly more verbose, and reduce the mental load.


Segher

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-09 20:44                     ` strlen Jonny Grant
@ 2021-07-10 18:37                       ` Alejandro Colomar (man-pages)
  2021-07-10 20:49                         ` strlen Jonny Grant
  0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-10 18:37 UTC (permalink / raw)
  To: Jonny Grant
  Cc: linux-man, Michael Kerrisk, Florian Weimer,
	Alejandro Colomar (man-pages)

Hi Jonny,

On 7/9/21 10:44 PM, Jonny Grant wrote:
> 
> 
> On 07/07/2021 17:57, Alejandro Colomar (man-pages) wrote:
>> On 7/7/21 3:31 PM, Jonny Grant wrote:
> [snip]
>>>
>>>
>>>
>>> Maybe the man page could just state:
>>>
>>>
>>> NOTES
>>>
>>> The calling strlen with a NULL pointer is undefined behavior.
>>
>> Okay.  I agree that should probably be documented.
>> I'm surprised it's not documented already.  Not even in the glibc manual (or I couldn't find it).
>>
>> There are a lot of functions that should get this addition, though.  I'd like to patch them all at once.  I'll try to find a list of functions documented in the man pages and that have nonnull in the oimplementation.  If I don't come back soon with a list, please ping me.
>>
>> Thanks,
>>
>> Alex
>>
> 
> Perhaps the NOTES section on strlen(3) could also give a hint that strnlen would be better to use than strlen if the max buffer size is known.  Likewise suggestion the same for wcslen(3) could suggest wcsnlen(3) where the buffer size is know.

Agreed.

I applied the following patch.

Kind regards,

Alex

---
From a9ab4fdd530486450b84137dce1d869f6cbfcbe0 Mon Sep 17 00:00:00 2001
From: Alejandro Colomar <alx.manpages@gmail.com>
Date: Sat, 10 Jul 2021 20:34:59 +0200
Subject: strlen.3, wcslen.3: Add recommendations for safer variants

Reported-by: Jonny Grant <jg@jguk.org>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
---
 man3/strlen.3 | 6 ++++++
 man3/wcslen.3 | 9 ++++++++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/man3/strlen.3 b/man3/strlen.3
index dea4c1050..fb734db1b 100644
--- a/man3/strlen.3
+++ b/man3/strlen.3
@@ -66,6 +66,12 @@ T}	Thread safety	MT-Safe
 .sp 1
 .SH CONFORMING TO
 POSIX.1-2001, POSIX.1-2008, C89, C99, C11, SVr4, 4.3BSD.
+.SH NOTES
+.SS strnlen(3)
+If there is a known buffer size,
+it is probably better to use
+.BR strnlen (3),
+which can prevent some cases of buffer overrun/overflow.
 .SH SEE ALSO
 .BR string (3),
 .BR strnlen (3),
diff --git a/man3/wcslen.3 b/man3/wcslen.3
index af3fcb9ca..868f748a8 100644
--- a/man3/wcslen.3
+++ b/man3/wcslen.3
@@ -58,5 +58,12 @@ T}	Thread safety	MT-Safe
 .sp 1
 .SH CONFORMING TO
 POSIX.1-2001, POSIX.1-2008, C99.
+.SH NOTES
+.SS wcsnlen(3)
+If there is a known buffer size,
+it is probably better to use
+.BR wcsnlen (3),
+which can prevent some cases of buffer overrun/overflow.
 .SH SEE ALSO
-.BR strlen (3)
+.BR strlen (3),
+.BR wcsnlen (3)
-- 
2.32.0



-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-10 18:37                       ` strlen Alejandro Colomar (man-pages)
@ 2021-07-10 20:49                         ` Jonny Grant
  2021-07-10 21:36                           ` strlen Alejandro Colomar (man-pages)
  0 siblings, 1 reply; 32+ messages in thread
From: Jonny Grant @ 2021-07-10 20:49 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages); +Cc: linux-man, Michael Kerrisk, Florian Weimer



On 10/07/2021 19:37, Alejandro Colomar (man-pages) wrote:
> Hi Jonny,
> 
> On 7/9/21 10:44 PM, Jonny Grant wrote:
>>
>>
>> On 07/07/2021 17:57, Alejandro Colomar (man-pages) wrote:
>>> On 7/7/21 3:31 PM, Jonny Grant wrote:
>> [snip]
>>>>
>>>>
>>>>
>>>> Maybe the man page could just state:
>>>>
>>>>
>>>> NOTES
>>>>
>>>> The calling strlen with a NULL pointer is undefined behavior.
>>>
>>> Okay.  I agree that should probably be documented.
>>> I'm surprised it's not documented already.  Not even in the glibc manual (or I couldn't find it).
>>>
>>> There are a lot of functions that should get this addition, though.  I'd like to patch them all at once.  I'll try to find a list of functions documented in the man pages and that have nonnull in the oimplementation.  If I don't come back soon with a list, please ping me.
>>>
>>> Thanks,
>>>
>>> Alex
>>>
>>
>> Perhaps the NOTES section on strlen(3) could also give a hint that strnlen would be better to use than strlen if the max buffer size is known.  Likewise suggestion the same for wcslen(3) could suggest wcsnlen(3) where the buffer size is know.
> 
> Agreed.
> 
> I applied the following patch.
> 
> Kind regards,
> 
> Alex
> 
> ---
>>From a9ab4fdd530486450b84137dce1d869f6cbfcbe0 Mon Sep 17 00:00:00 2001
> From: Alejandro Colomar <alx.manpages@gmail.com>
> Date: Sat, 10 Jul 2021 20:34:59 +0200
> Subject: strlen.3, wcslen.3: Add recommendations for safer variants
> 
> Reported-by: Jonny Grant <jg@jguk.org>
> Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
> ---
>  man3/strlen.3 | 6 ++++++
>  man3/wcslen.3 | 9 ++++++++-
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/man3/strlen.3 b/man3/strlen.3
> index dea4c1050..fb734db1b 100644
> --- a/man3/strlen.3
> +++ b/man3/strlen.3
> @@ -66,6 +66,12 @@ T}	Thread safety	MT-Safe
>  .sp 1
>  .SH CONFORMING TO
>  POSIX.1-2001, POSIX.1-2008, C89, C99, C11, SVr4, 4.3BSD.
> +.SH NOTES
> +.SS strnlen(3)
> +If there is a known buffer size,
> +it is probably better to use
> +.BR strnlen (3),
> +which can prevent some cases of buffer overrun/overflow.
>  .SH SEE ALSO
>  .BR string (3),
>  .BR strnlen (3),
> diff --git a/man3/wcslen.3 b/man3/wcslen.3
> index af3fcb9ca..868f748a8 100644
> --- a/man3/wcslen.3
> +++ b/man3/wcslen.3
> @@ -58,5 +58,12 @@ T}	Thread safety	MT-Safe
>  .sp 1
>  .SH CONFORMING TO
>  POSIX.1-2001, POSIX.1-2008, C99.
> +.SH NOTES
> +.SS wcsnlen(3)
> +If there is a known buffer size,
> +it is probably better to use
> +.BR wcsnlen (3),
> +which can prevent some cases of buffer overrun/overflow.
>  .SH SEE ALSO
> -.BR strlen (3)
> +.BR strlen (3),
> +.BR wcsnlen (3)
> 

Hi Alex

Thank you for making the updates!

As "buffer overrun" refers to writing to a buffer, my 2 cents would be to express as:

"which will prevent reading beyond the end of the character buffer"

Any thoughts about adding the following?

NOTES
    Calling strlen with a NULL pointer is undefined behavior.

With kind regards
Jonny

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-10 20:49                         ` strlen Jonny Grant
@ 2021-07-10 21:36                           ` Alejandro Colomar (man-pages)
  2021-07-12 21:16                             ` strlen Jonny Grant
  0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar (man-pages) @ 2021-07-10 21:36 UTC (permalink / raw)
  To: Jonny Grant; +Cc: linux-man, Michael Kerrisk, Florian Weimer

On 7/10/21 10:49 PM, Jonny Grant wrote:
> On 10/07/2021 19:37, Alejandro Colomar (man-pages) wrote:
>> diff --git a/man3/wcslen.3 b/man3/wcslen.3
>> index af3fcb9ca..868f748a8 100644
>> --- a/man3/wcslen.3
>> +++ b/man3/wcslen.3
>> @@ -58,5 +58,12 @@ T}	Thread safety	MT-Safe
>>   .sp 1
>>   .SH CONFORMING TO
>>   POSIX.1-2001, POSIX.1-2008, C99.
>> +.SH NOTES
>> +.SS wcsnlen(3)
>> +If there is a known buffer size,
>> +it is probably better to use
>> +.BR wcsnlen (3),
>> +which can prevent some cases of buffer overrun/overflow.
>>   .SH SEE ALSO
>> -.BR strlen (3)
>> +.BR strlen (3),
>> +.BR wcsnlen (3)
>>
> 
> Hi Alex
> 
> Thank you for making the updates!
> 
> As "buffer overrun" refers to writing to a buffer, my 2 cents would be to express as:
> 
> "which will prevent reading beyond the end of the character buffer"

I wrote both overrun and overflow on purpose.
I was thinking of something like:

const char *src;
char dest[5];

src = "This is a very ... long valid string";
len = strnlen(src, 5 - 1);
/* strlen(s) wouldn't overrun, but the next call would overflow */
memcpy(dest, src, len);
dest[len] = '\0';


But after thinking about it a second time, I think I'll remove it, and 
keep only overrun (and I like your text, I'll copy part of it), as the 
overflow is a problem of considering that the size of the buffers is 
going to be the same, and the solution is not to use strnlen(3), but to 
differentiate both sizes, which allows to detect truncation.

If the input string is known but the output buffer is small, I'd call 
strlen(str) and then MIN(len, bufsiz)  (this is kind of equivalent to 
what strlcpy(3bsd) does).
And if the input string is untrusted, I'd call strnlen(str, strsiz) and 
MIN(len, bufsiz) (this is kind of what strncpy_s(3) does).



> 
> Any thoughts about adding the following?
> 
> NOTES
>      Calling strlen with a NULL pointer is undefined behavior.


I'm waiting for Michael and/or Florian to comment on my proposal.

I don't want to fix a page and have 296 broken...  I want to fix them 
all at once, but am not sure which solution to apply.

But yes, adding a line like that one is likely.

Thanks!

Alex

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: strlen
  2021-07-10 21:36                           ` strlen Alejandro Colomar (man-pages)
@ 2021-07-12 21:16                             ` Jonny Grant
  0 siblings, 0 replies; 32+ messages in thread
From: Jonny Grant @ 2021-07-12 21:16 UTC (permalink / raw)
  To: Alejandro Colomar (man-pages); +Cc: linux-man



On 10/07/2021 22:36, Alejandro Colomar (man-pages) wrote:
> On 7/10/21 10:49 PM, Jonny Grant wrote:
>> On 10/07/2021 19:37, Alejandro Colomar (man-pages) wrote:
>>> diff --git a/man3/wcslen.3 b/man3/wcslen.3
>>> index af3fcb9ca..868f748a8 100644
>>> --- a/man3/wcslen.3
>>> +++ b/man3/wcslen.3
>>> @@ -58,5 +58,12 @@ T}    Thread safety    MT-Safe
>>>   .sp 1
>>>   .SH CONFORMING TO
>>>   POSIX.1-2001, POSIX.1-2008, C99.
>>> +.SH NOTES
>>> +.SS wcsnlen(3)
>>> +If there is a known buffer size,
>>> +it is probably better to use
>>> +.BR wcsnlen (3),
>>> +which can prevent some cases of buffer overrun/overflow.
>>>   .SH SEE ALSO
>>> -.BR strlen (3)
>>> +.BR strlen (3),
>>> +.BR wcsnlen (3)
>>>
>>
>> Hi Alex
>>
>> Thank you for making the updates!
>>
>> As "buffer overrun" refers to writing to a buffer, my 2 cents would be to express as:
>>
>> "which will prevent reading beyond the end of the character buffer"
> 
> I wrote both overrun and overflow on purpose.
> I was thinking of something like:
> 
> const char *src;
> char dest[5];
> 
> src = "This is a very ... long valid string";
> len = strnlen(src, 5 - 1);
> /* strlen(s) wouldn't overrun, but the next call would overflow */
> memcpy(dest, src, len);
> dest[len] = '\0';

Sounds like any overflow would be due to the application code, rather than strnlen.

> 
> But after thinking about it a second time, I think I'll remove it, and keep only overrun (and I like your text, I'll copy part of it), as the overflow is a problem of considering that the size of the buffers is going to be the same, and the solution is not to use strnlen(3), but to differentiate both sizes, which allows to detect truncation.
> 
> If the input string is known but the output buffer is small, I'd call strlen(str) and then MIN(len, bufsiz)  (this is kind of equivalent to what strlcpy(3bsd) does).
> And if the input string is untrusted, I'd call strnlen(str, strsiz) and MIN(len, bufsiz) (this is kind of what strncpy_s(3) does).
> 
> 

strlcpy is considered harmful, best avoid it. It breaks ISO TR24731 "Do not unexpectedly truncate strings" by silently overwriting memory before checking there was enough space to copy all the bytes.

> 
>>
>> Any thoughts about adding the following?
>>
>> NOTES
>>      Calling strlen with a NULL pointer is undefined behavior.
> 
> 
> I'm waiting for Michael and/or Florian to comment on my proposal.
> 
> I don't want to fix a page and have 296 broken...  I want to fix them all at once, but am not sure which solution to apply.
> 
> But yes, adding a line like that one is likely.
> 
> Thanks!
> 
> Alex

Ok, no immediate rush.

Jonny


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-07-12 21:16 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-04 18:01 strlen Jonny Grant
2020-09-04 19:21 ` strlen Florian Weimer
2020-09-04 23:14   ` strlen Jonny Grant
2020-09-05  7:12     ` strlen Florian Weimer
2021-07-06 20:30       ` strlen Jonny Grant
2021-07-06 22:11         ` strlen Florian Weimer
2021-07-07 11:36           ` strlen Jonny Grant
2021-07-07 12:22             ` strlen Alejandro Colomar (man-pages)
2021-07-07 12:31               ` strlen Alejandro Colomar (man-pages)
2021-07-07 13:31                 ` strlen Jonny Grant
2021-07-07 16:57                   ` strlen Alejandro Colomar (man-pages)
2021-07-07 17:23                     ` strlen Alejandro Colomar (man-pages)
2021-07-07 17:33                       ` strlen Alejandro Colomar (man-pages)
2021-07-09 13:48                         ` strlen Jonny Grant
2021-07-08 10:07                     ` strlen Jonny Grant
2021-07-08 11:06                       ` strlen Alejandro Colomar (man-pages)
2021-07-08 12:13                         ` strlen Xi Ruoyao
2021-07-08 23:49                         ` strlen Segher Boessenkool
2021-07-09 13:54                           ` strlen Jonny Grant
2021-07-09 14:17                             ` strlen Alejandro Colomar (man-pages)
2021-07-09 16:11                               ` strlen Xi Ruoyao
2021-07-10  1:00                               ` strlen Segher Boessenkool
2021-07-09 10:50                         ` strlen Jonny Grant
2021-07-09 11:27                           ` strlen Alejandro Colomar (man-pages)
2021-07-09 11:43                             ` strlen Alejandro Colomar (man-pages)
     [not found]                         ` <1627912755.3783669.1625745946723@mail.yahoo.com>
     [not found]                           ` <59a70222-a46f-1e65-c9db-6c9e577c8adc@126.com>
2021-07-09 17:26                             ` strlen Martin Sebor
2021-07-09 20:19                               ` strlen Alejandro Colomar (man-pages)
2021-07-09 20:44                     ` strlen Jonny Grant
2021-07-10 18:37                       ` strlen Alejandro Colomar (man-pages)
2021-07-10 20:49                         ` strlen Jonny Grant
2021-07-10 21:36                           ` strlen Alejandro Colomar (man-pages)
2021-07-12 21:16                             ` strlen Jonny Grant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.