All of lore.kernel.org
 help / color / mirror / Atom feed
* ctype.h undefined behaviour on signed char platforms, needs cast to (unsigned char)
       [not found] <814B09B7-CAAB-487B-9F42-8C0A2169A015@bluewin.ch>
@ 2020-11-21  9:52 ` Phil
  2020-11-21 20:29   ` Denis Kenzior
  0 siblings, 1 reply; 6+ messages in thread
From: Phil @ 2020-11-21  9:52 UTC (permalink / raw)
  To: ell

[-- Attachment #1: Type: text/plain, Size: 444 bytes --]

Just a little reminder that a cast to (unsigned char) is needed when passing a possibly negative char to ctype functions like isalnum, isspace. Passing anything other than positive or -1 is undefined behaviour, on platforms where char is signed.

One way to find them all is to build from time to time with ctype.h include commented out. Static review tools may also help.

+#include <ctype.h>
+		while (base64_len && isspace(*base64)) {

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ctype.h undefined behaviour on signed char platforms, needs cast to (unsigned char)
  2020-11-21  9:52 ` ctype.h undefined behaviour on signed char platforms, needs cast to (unsigned char) Phil
@ 2020-11-21 20:29   ` Denis Kenzior
  2020-11-21 20:57     ` Andrew Zaborowski
  0 siblings, 1 reply; 6+ messages in thread
From: Denis Kenzior @ 2020-11-21 20:29 UTC (permalink / raw)
  To: ell

[-- Attachment #1: Type: text/plain, Size: 837 bytes --]

Hi Phil,

On 11/21/20 3:52 AM, Phil wrote:
> Just a little reminder that a cast to (unsigned char) is needed when passing a possibly negative char to ctype functions like isalnum, isspace. Passing anything other than positive or -1 is undefined behaviour, on platforms where char is signed.
> 
> One way to find them all is to build from time to time with ctype.h include commented out. Static review tools may also help.
> 
> +#include <ctype.h>
> +		while (base64_len && isspace(*base64)) {

Thanks.  But I'm confused..? ell shouldn't even be using these functions in the 
first place.  We have our own l_ascii versions which cast already.  And I don't 
see any uses of them in git HEAD...

Or are you referring to Andrew's patch set? In which case, you might want to 
reply to the relevant thread...

Regards,
-Denis

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ctype.h undefined behaviour on signed char platforms, needs cast to (unsigned char)
  2020-11-21 20:29   ` Denis Kenzior
@ 2020-11-21 20:57     ` Andrew Zaborowski
  2020-11-21 22:49       ` Phil
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Zaborowski @ 2020-11-21 20:57 UTC (permalink / raw)
  To: ell

[-- Attachment #1: Type: text/plain, Size: 1095 bytes --]

On Sat, 21 Nov 2020 at 21:31, Denis Kenzior <denkenz@gmail.com> wrote:
> On 11/21/20 3:52 AM, Phil wrote:
> > Just a little reminder that a cast to (unsigned char) is needed when passing a possibly negative char to ctype functions like isalnum, isspace. Passing anything other than positive or -1 is undefined behaviour, on platforms where char is signed.

Ok, good point.  In this case the chars haven't been validated to be <= 127.

> >
> > One way to find them all is to build from time to time with ctype.h include commented out. Static review tools may also help.
> >
> > +#include <ctype.h>
> > +             while (base64_len && isspace(*base64)) {
>
> Thanks.  But I'm confused..? ell shouldn't even be using these functions in the
> first place.  We have our own l_ascii versions which cast already.  And I don't
> see any uses of them in git HEAD...
>
> Or are you referring to Andrew's patch set? In which case, you might want to
> reply to the relevant thread...

Thanks for both comments.  No need to reply separately, I noted it and
will fix it.

Best regards

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ctype.h undefined behaviour on signed char platforms, needs cast to (unsigned char)
  2020-11-21 20:57     ` Andrew Zaborowski
@ 2020-11-21 22:49       ` Phil
  2020-11-21 23:18         ` Andrew Zaborowski
  2020-11-22  3:23         ` Denis Kenzior
  0 siblings, 2 replies; 6+ messages in thread
From: Phil @ 2020-11-21 22:49 UTC (permalink / raw)
  To: ell

[-- Attachment #1: Type: text/plain, Size: 2689 bytes --]

The defect is already in HEAD and not just in that patch: ell/util.c calls isprint and toupper without a cast.

[My quick search for ctype did not find the macro definitions in "utf8.h". I didn't bother to look for the reserved identifiers to[a-z]+ and is[a-z]+ because it is quite fiddly to refine the regexp to not also match a lot of other identifiers. So I had no idea that the l_ascii_is* macros in utf8.h even existed, thanks for the tip. But utf8.h is only a partial replacement for <ctype.h> because there are no l_ascii_to* variants, and the names are off-putting, it looks like they are just for ascii not utf8. What value do they add to the ctype originals anyway, apart from providing the cast?]

That validation won't help on affected platforms where char is signed 8 bit because every char <= 127.

This defect is common, I have seen it in several projects. It will quite likely happen again and again until it becomes standard practice to have@least one static analysis tool in the build chain that can detect both existing and new STR37-C defects. The undefined behaviour may well be benign on most platforms, which may explain why it is low priority, but the standard does not guarantee that, so we are stuck with having to add an ugly non-obvious cast or roll our own with the cast built-in.


> On 21 Nov 2020, at 21:57, Andrew Zaborowski <andrew.zaborowski@intel.com> wrote:
> 
> On Sat, 21 Nov 2020 at 21:31, Denis Kenzior <denkenz@gmail.com> wrote:
>> On 11/21/20 3:52 AM, Phil wrote:
>>> Just a little reminder that a cast to (unsigned char) is needed when passing a possibly negative char to ctype functions like isalnum, isspace. Passing anything other than positive or -1 is undefined behaviour, on platforms where char is signed.
> 
> Ok, good point.  In this case the chars haven't been validated to be <= 127.
> 
>>> 
>>> One way to find them all is to build from time to time with ctype.h include commented out. Static review tools may also help.
>>> 
>>> +#include <ctype.h>
>>> +             while (base64_len && isspace(*base64)) {
>> 
>> Thanks.  But I'm confused..? ell shouldn't even be using these functions in the
>> first place.  We have our own l_ascii versions which cast already.  And I don't
>> see any uses of them in git HEAD...
>> 
>> Or are you referring to Andrew's patch set? In which case, you might want to
>> reply to the relevant thread...
> 
> Thanks for both comments.  No need to reply separately, I noted it and
> will fix it.
> 
> Best regards
> _______________________________________________
> ell mailing list -- ell(a)lists.01.org
> To unsubscribe send an email to ell-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ctype.h undefined behaviour on signed char platforms, needs cast to (unsigned char)
  2020-11-21 22:49       ` Phil
@ 2020-11-21 23:18         ` Andrew Zaborowski
  2020-11-22  3:23         ` Denis Kenzior
  1 sibling, 0 replies; 6+ messages in thread
From: Andrew Zaborowski @ 2020-11-21 23:18 UTC (permalink / raw)
  To: ell

[-- Attachment #1: Type: text/plain, Size: 1215 bytes --]

On Sat, 21 Nov 2020 at 23:50, Phil <philip.hodges@bluewin.ch> wrote:
> The defect is already in HEAD and not just in that patch: ell/util.c calls isprint and toupper without a cast.

The isprint() parameter is an unsigned char, but the first toupper
call could in theory be passed unchecked char values...

>
> [My quick search for ctype did not find the macro definitions in "utf8.h". I didn't bother to look for the reserved identifiers to[a-z]+ and is[a-z]+ because it is quite fiddly to refine the regexp to not also match a lot of other identifiers. So I had no idea that the l_ascii_is* macros in utf8.h even existed, thanks for the tip. But utf8.h is only a partial replacement for <ctype.h> because there are no l_ascii_to* variants, and the names are off-putting, it looks like they are just for ascii not utf8. What value do they add to the ctype originals anyway, apart from providing the cast?]
>
> That validation won't help on affected platforms where char is signed 8 bit because every char <= 127.

Right, I didn't mean comparing char values against 127.  Just that a
valid PEM file doesn't contain values > 127 but at that point in the
code we've not enforced this yet.

Best regards

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ctype.h undefined behaviour on signed char platforms, needs cast to (unsigned char)
  2020-11-21 22:49       ` Phil
  2020-11-21 23:18         ` Andrew Zaborowski
@ 2020-11-22  3:23         ` Denis Kenzior
  1 sibling, 0 replies; 6+ messages in thread
From: Denis Kenzior @ 2020-11-22  3:23 UTC (permalink / raw)
  To: ell

[-- Attachment #1: Type: text/plain, Size: 1450 bytes --]

Hi Phil,

On 11/21/20 4:49 PM, Phil wrote:
> The defect is already in HEAD and not just in that patch: ell/util.c calls isprint and toupper without a cast.

Ok, now I gotcha.  I thought I eradicated all uses of ctype.h, but guess we 
still had some hiding.  Thanks for pointing this out.

I removed these in commit ef25e0072d283217fc12e422f628f1af0920242a.

> 
> [My quick search for ctype did not find the macro definitions in "utf8.h". I didn't bother to look for the reserved identifiers to[a-z]+ and is[a-z]+ because it is quite fiddly to refine the regexp to not also match a lot of other identifiers. So I had no idea that the l_ascii_is* macros in utf8.h even existed, thanks for the tip. But utf8.h is only a partial replacement for <ctype.h> because there are no l_ascii_to* variants, and the names are off-putting, it looks like they are just for ascii not utf8. What value do they add to the ctype originals anyway, apart from providing the cast?]

So to answer your question about why the l_ascii stuff is in utf8...  ascii is a 
subset of utf8, so it seemed logical and didn't seem worth it to add another header.

We avoid ctype.h 'originals' like the plague because they're locale based, which 
we do not ever need or want.  Then there's the casting issue that you already 
pointed out.  It is also too easy to forget that the behavior changes depending 
on locale, which can lead to subtle bugs.

Regards,
-Denis

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-11-22  3:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <814B09B7-CAAB-487B-9F42-8C0A2169A015@bluewin.ch>
2020-11-21  9:52 ` ctype.h undefined behaviour on signed char platforms, needs cast to (unsigned char) Phil
2020-11-21 20:29   ` Denis Kenzior
2020-11-21 20:57     ` Andrew Zaborowski
2020-11-21 22:49       ` Phil
2020-11-21 23:18         ` Andrew Zaborowski
2020-11-22  3:23         ` Denis Kenzior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.