* Re: somebody dropped a (warning) bomb [not found] ` <7Oc8t-NS-1@gated-at.bofh.it> @ 2007-02-15 20:08 ` Bodo Eggert 2007-02-16 11:21 ` Sergei Organov 2007-02-16 12:46 ` Sergei Organov 0 siblings, 2 replies; 67+ messages in thread From: Bodo Eggert @ 2007-02-15 20:08 UTC (permalink / raw) To: Sergei Organov, Linus Torvalds, J.A. MagallÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Sergei Organov <osv@javad.com> wrote: > Linus Torvalds <torvalds@linux-foundation.org> writes: >> Exactly because "char" *by*definition* is "indeterminate sign" as far as >> something like "strlen()" is concerned. > > Thanks, I now understand that you either don't see the difference > between "indeterminate" and "implementation-defined" in this context or > consider it non-essential, so I think I've got your point. If you don't code for a specific compiler with specific settings, there is no implementation defining the signedness of char, and each part of the code using char* will be wrong unless it handles both cases correctly. Therefore it's either always wrong to call your char* function with char*, unsigned char* _and_ signed char unless you can guarantee not to overflow any of them, or it's always correct to call char* functions with any kind of these. Off cause if you happen to code for specific compiler settings, one signedness of char will become real and one warning will be legit. And if pigs fly, they should wear googles to protect their eyes ... >> THE FACT IS, THAT "strlen()" IS DEFINED UNIVERSALLY AS TAKING "char *". > > So just don't pass it "unsigned char*". End of story. Using signed chars for strings is wrong in most countries on earth. It was wrong when the first IBM PC came out in 1981, and creating a compiler in 1987 defaulting to signed char is a sure sign of originating from an isolated country and knowing nothing about this planet. Using signed chars in comparisons is especially wrong, and casting each char to unsigned before comparing them is likely to be forgotten. Unsigned character strings are useless because there is no such thing as char(-23), and if these strings weren't casted to signed inside all IO functions, they wouldn't work correctly. Only because many programmers don't compare chars, most programs will work outside the USA. I repeat: Thanks to using signed chars, the programs only work /by/ /accident/! Promoting the use of signed char strings is promoting bugs and endangering the stability of all our systems. You should stop this bullshit now, instead of increasing the pile. >> That BY DEFINITION means that "strlen()" cannot care about the sign, >> because the sign IS NOT DEFINED UNIVERSALLY! >> >> And if you cannot accept that fact, it's your problem. Not mine. > > I never had problems either with strlen() or with this warning, so was > curious why does the warning is such a problem for the kernel. The warning is a problem because either it clutters your build log hiding real bugs, you typecast hiding errors whenever a char* function is called, or you disable it globally hiding real signedness bugs. Either way you take it, it's wrong. -- Funny quotes: 12. He who laughs last thinks slowest. Friß, Spammer: uv@k.7eggert.dyndns.org mt4siyETc@65egwl.7eggert.dyndns.org ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-15 20:08 ` somebody dropped a (warning) bomb Bodo Eggert @ 2007-02-16 11:21 ` Sergei Organov 2007-02-16 14:51 ` Bodo Eggert 2007-02-16 12:46 ` Sergei Organov 1 sibling, 1 reply; 67+ messages in thread From: Sergei Organov @ 2007-02-16 11:21 UTC (permalink / raw) To: 7eggert Cc: Linus Torvalds, J.A. MagallÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Bodo Eggert <7eggert@gmx.de> writes: > Sergei Organov <osv@javad.com> wrote: >> Linus Torvalds <torvalds@linux-foundation.org> writes: [...] > Using signed chars for strings is wrong in most countries on earth. It was > wrong when the first IBM PC came out in 1981, and creating a compiler in > 1987 defaulting to signed char is a sure sign of originating from an > isolated country and knowing nothing about this planet. Maybe I'm wrong, but wasn't it the case that C had no "signed char" type that time, so "char" had to be abused for the "tiny signed integer"? > Using signed chars in comparisons is especially wrong, and casting > each char to unsigned before comparing them is likely to be > forgotten. If we start talking about the C language, my opinion is that it's C problem that it allows numeric comparison of "char" variables at all. If one actually needs to compare alphabetic characters numerically, he should first cast to required integer type. > Unsigned character strings are useless because there is no such thing > as char(-23), and if these strings weren't casted to signed inside all > IO functions, they wouldn't work correctly. Didn't you swap "signed" and "unsigned" by mistake in this phrase? Or are you indeed think that using "unsigned char*" for strings is useless? > Only because many programmers don't compare chars, most programs will > work outside the USA. Comparison of characters being numeric is not a very good property of the C language. > I repeat: Thanks to using signed chars, the programs only > work /by/ /accident/! Promoting the use of signed char strings is promoting > bugs and endangering the stability of all our systems. You should stop this > bullshit now, instead of increasing the pile. Where did you see I promoted using of "singed char strings"?! If you don't like the fact that in C language characters are "char", strings are "char*", and the sign of char is implementation-defined, please argue with the C committee, not with me. Or use -funsigned-char to get dialect of C that fits your requirements better. -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-16 11:21 ` Sergei Organov @ 2007-02-16 14:51 ` Bodo Eggert 2007-02-19 11:56 ` Sergei Organov 0 siblings, 1 reply; 67+ messages in thread From: Bodo Eggert @ 2007-02-16 14:51 UTC (permalink / raw) To: Sergei Organov Cc: 7eggert, Linus Torvalds, J.A. MagallÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Fri, 16 Feb 2007, Sergei Organov wrote: > Bodo Eggert <7eggert@gmx.de> writes: > > Sergei Organov <osv@javad.com> wrote: > >> Linus Torvalds <torvalds@linux-foundation.org> writes: > [...] > > Using signed chars for strings is wrong in most countries on earth. It was > > wrong when the first IBM PC came out in 1981, and creating a compiler in > > 1987 defaulting to signed char is a sure sign of originating from an > > isolated country and knowing nothing about this planet. > > Maybe I'm wrong, but wasn't it the case that C had no "signed char" type > that time, so "char" had to be abused for the "tiny signed integer"? Mentioning tha availability of unsigned char was a change in K&R's book "The C programming Language" from 1987 to 1988: "For some time, type unsigned char has been available. The standard introduces the signed keyword to make signedness explicit for char and other integral objects." So if there was no "signed char" before, the blame passes on to those who didn't introduce it alongside "unsigned char". > > Using signed chars in comparisons is especially wrong, and casting > > each char to unsigned before comparing them is likely to be > > forgotten. > > If we start talking about the C language, my opinion is that it's C > problem that it allows numeric comparison of "char" variables at > all. If one actually needs to compare alphabetic characters numerically, > he should first cast to required integer type. Char values are indexes in a character set. Taking the difference of indexes is legal. Other languages have ord('x'), but it's only an expensive typecast (byte('x') does exactly the same thing). > > Unsigned character strings are useless because there is no such thing > > as char(-23), and if these strings weren't casted to signed inside all > > IO functions, they wouldn't work correctly. > > Didn't you swap "signed" and "unsigned" by mistake in this phrase? Or > are you indeed think that using "unsigned char*" for strings is useless? ACK > > Only because many programmers don't compare chars, most programs will > > work outside the USA. > > Comparison of characters being numeric is not a very good property of > the C language. NACK, as above > > I repeat: Thanks to using signed chars, the programs only > > work /by/ /accident/! Promoting the use of signed char strings is promoting > > bugs and endangering the stability of all our systems. You should stop this > > bullshit now, instead of increasing the pile. > > Where did you see I promoted using of "singed char strings"?! char strings are often signed char strings, only using unsigned char prevents this in a portable way. If you care about ordinal values of your strings, use unsigned. > If you > don't like the fact that in C language characters are "char", strings > are "char*", and the sign of char is implementation-defined, please > argue with the C committee, not with me. > > Or use -funsigned-char to get dialect of C that fits your requirements > better. If you restrict yourself to a specific C compiler, you are doomed, and that's not a good start. -- What about those red balls they have on car aerials so you can spot your car in a park. I think all cars should have them! -- Homer Simpson ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-16 14:51 ` Bodo Eggert @ 2007-02-19 11:56 ` Sergei Organov 0 siblings, 0 replies; 67+ messages in thread From: Sergei Organov @ 2007-02-19 11:56 UTC (permalink / raw) To: Bodo Eggert Cc: Linus Torvalds, J.A. MagallÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Bodo Eggert <7eggert@gmx.de> writes: > On Fri, 16 Feb 2007, Sergei Organov wrote: >> Bodo Eggert <7eggert@gmx.de> writes: >> > Sergei Organov <osv@javad.com> wrote: >> >> Linus Torvalds <torvalds@linux-foundation.org> writes: > [...] >> If we start talking about the C language, my opinion is that it's C >> problem that it allows numeric comparison of "char" variables at >> all. If one actually needs to compare alphabetic characters numerically, >> he should first cast to required integer type. > > Char values are indexes in a character set. Taking the difference of > indexes is legal. Other languages have ord('x'), but it's only an > expensive typecast (byte('x') does exactly the same thing). Why this typecast should be expensive? Is char c1, c2; ... if((unsigned char)c1 < (unsigned char)c2) any more expensive than if(c1 < c2)? [...] >> Comparison of characters being numeric is not a very good property of >> the C language. > > NACK, as above The point was not to entirely disable it or to make it any more expensive, but to somehow force programmer to be explicit that he indeed needs numeric comparison of characters' indexes. Well, I do realize that the above is not in the C "spirit" that is to allow implicit conversions almost everywhere. >> > I repeat: Thanks to using signed chars, the programs only >> > work /by/ /accident/! Promoting the use of signed char strings is promoting >> > bugs and endangering the stability of all our systems. You should stop this >> > bullshit now, instead of increasing the pile. >> >> Where did you see I promoted using of "singed char strings"?! > > char strings are often signed char strings, only using unsigned char > prevents this in a portable way. If you care about ordinal values of > your strings, use unsigned. It means that (as I don't want to change the type of my strings/characters in case I suddenly start to care about ordinal values of the characters) I must always use "unsigned char" for storing characters in practice. Right? There are at least three problems then: 1. The type of "abc" remains "char*", so effectively I'll have two different representations of strings. 2. I can't mix two different representations of strings in C without compromising type safety. In C, the type for characters is "char", that by definition is different type than "unsigned char". 3. As I now represent characters by unsigned tiny integers, I loose distinct type for unsigned tiny integers. Effectively this brings me back in time to the old days when C didn't have distinct type for signed tiny integers. Therefore, your "solution" to one problem creates a bunch of others. Worse yet, you in fact don't solve initial problem of comparing strings. Take, for example, KOI8-R encoding. Comparing characters from this encoding using "unsigned char" representation of character indexes makes no more sense than comparing "signed char" representations, as both comparisons will bring meaningless results, due to the fact that the order of (some of) the characters in the table doesn't match their alphabetical order. >> If you don't like the fact that in C language characters are "char", >> strings are "char*", and the sign of char is implementation-defined, >> please argue with the C committee, not with me. >> >> Or use -funsigned-char to get dialect of C that fits your requirements >> better. > > If you restrict yourself to a specific C compiler, you are doomed, and > that's not a good start. Isn't it you who insists that "char" should always be unsigned? It's your problem then to use compiler that understands those imaginary language that meets the requirement. -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-15 20:08 ` somebody dropped a (warning) bomb Bodo Eggert 2007-02-16 11:21 ` Sergei Organov @ 2007-02-16 12:46 ` Sergei Organov 2007-02-16 17:40 ` Bodo Eggert 1 sibling, 1 reply; 67+ messages in thread From: Sergei Organov @ 2007-02-16 12:46 UTC (permalink / raw) To: 7eggert Cc: Linus Torvalds, J.A. MagallÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Bodo Eggert <7eggert@gmx.de> writes: > Sergei Organov <osv@javad.com> wrote: >> Linus Torvalds <torvalds@linux-foundation.org> writes: > >>> Exactly because "char" *by*definition* is "indeterminate sign" as far as >>> something like "strlen()" is concerned. >> >> Thanks, I now understand that you either don't see the difference >> between "indeterminate" and "implementation-defined" in this context or >> consider it non-essential, so I think I've got your point. > > If you don't code for a specific compiler with specific settings, there is > no implementation defining the signedness of char, and each part of the code > using char* will be wrong unless it handles both cases correctly. The problem here is that due to historical reasons, there could be code out there that abuses "char" for "signed char" (not sure about "unsigned char"). Old code and old habits are rather persistent. > Therefore it's either always wrong to call your char* function with char*, > unsigned char* _and_ signed char unless you can guarantee not to overflow any > of them, or it's always correct to call char* functions with any kind > of these. How are you sure those who wrote foo(char*) agrees with your opinion or even understands all the involved issues? > Off cause if you happen to code for specific compiler settings, one signedness > of char will become real and one warning will be legit. And if pigs fly, they > should wear googles to protect their eyes ... And if people were writing perfect portable code all the time, there would be no need for warnings in the first place... -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-16 12:46 ` Sergei Organov @ 2007-02-16 17:40 ` Bodo Eggert 2007-02-19 12:17 ` Sergei Organov 0 siblings, 1 reply; 67+ messages in thread From: Bodo Eggert @ 2007-02-16 17:40 UTC (permalink / raw) To: Sergei Organov Cc: 7eggert, Linus Torvalds, J.A. MagallÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Fri, 16 Feb 2007, Sergei Organov wrote: > Bodo Eggert <7eggert@gmx.de> writes: > > Sergei Organov <osv@javad.com> wrote: > >> Linus Torvalds <torvalds@linux-foundation.org> writes: > > If you don't code for a specific compiler with specific settings, there is > > no implementation defining the signedness of char, and each part of the code > > using char* will be wrong unless it handles both cases correctly. > > The problem here is that due to historical reasons, there could be code > out there that abuses "char" for "signed char" (not sure about "unsigned > char"). Old code and old habits are rather persistent. There could be code using trigraphs ... and gcc has an option for that. If this code uses signed chars, using it on unsigned-char-archs is broken and should be warned about, but the compiler will not warn about this because this code will not use "signed char" and therefore it's bug-to-bug syntax compatible, waiting for a semantic breakdown. I'll say it again: Either the code using unspecified chars is correct, or it isn't. If it's correct, neither using with signed nor with unsigned chars is a bug and you should not warn at all, and if it's not correct, you should always warn. Instead, gcc warns on "code compiles for $arch". > > Therefore it's either always wrong to call your char* function with char*, > > unsigned char* _and_ signed char unless you can guarantee not to overflow any > > of them, or it's always correct to call char* functions with any kind > > of these. > > How are you sure those who wrote foo(char*) agrees with your opinion or > even understands all the involved issues? Let's asume we have this piece of buggy code. We compile it on an unsigned char architecture. No warning. *BOOM* Let's asume there is correct code, and we use it as designed: Warning: Wrong arch Warning: Wrong arch Warning: Wrong arch Warning: real issue Warning: Wrong arch Warning: Wrong arch Warning: Wrong arch Warning: Wrong arch <scroll off screen/> Warning: Wrong arch Warning: Wrong arch Warning: Wrong arch Warning: Wrong arch Warning: Wrong arch You don't see "real issue". *BOOM* What can you do about this warning? Let's asume we cast everywhere: struct foo * p; printf(strlen(char*)p); *BOOM* Let's asume we disable this warning: int f(unsigned short x) { if (!x) return 0; return (int) x + f(x-1); } f(-1); *BOOM* Therefore unless you program for one arch with one set of compiler flags, this warning is useless, and I did not see much code explicitely designed to be non-portable. Warning on wrong signedness is good, but if you can't enable it on portable code, it's useless. -- Funny quotes: 39. Ever wonder about those people who spend $2.00 apiece on those little bottles of Evian water? Try spelling Evian backwards: NAIVE ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-16 17:40 ` Bodo Eggert @ 2007-02-19 12:17 ` Sergei Organov 0 siblings, 0 replies; 67+ messages in thread From: Sergei Organov @ 2007-02-19 12:17 UTC (permalink / raw) To: Bodo Eggert Cc: Linus Torvalds, J.A. MagallÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Bodo Eggert <7eggert@gmx.de> writes: > On Fri, 16 Feb 2007, Sergei Organov wrote: [...] > I'll say it again: Either the code using unspecified chars is correct, or > it isn't. If it's correct, neither using with signed nor with unsigned > chars is a bug and you should not warn at all, and if it's not correct, > you should always warn. Instead, gcc warns on "code compiles for > $arch". Here is where we disagree. In my opinion, no matter what the sign of char for given architecture is, it's incorrect to pass either "signed char*" or "unsigned char*" argument to a function expecting "char*". >> > Therefore it's either always wrong to call your char* function with char*, >> > unsigned char* _and_ signed char unless you can guarantee not to overflow any >> > of them, or it's always correct to call char* functions with any kind >> > of these. >> >> How are you sure those who wrote foo(char*) agrees with your opinion or >> even understands all the involved issues? > > Let's asume we have this piece of buggy code. We compile it on an unsigned > char architecture. No warning. *BOOM* There should be warning, -- that's my point. "char" is different. "char" is distinct from either "signed char" or "unsigned char". Always. At least it's how C is defined. > Let's asume there is correct code, and we use it as designed: > Warning: Wrong arch > Warning: Wrong arch > Warning: Wrong arch > Warning: real issue > Warning: Wrong arch > Warning: Wrong arch > Warning: Wrong arch > Warning: Wrong arch > <scroll off screen/> > Warning: Wrong arch > Warning: Wrong arch > Warning: Wrong arch > Warning: Wrong arch > Warning: Wrong arch > > You don't see "real issue". *BOOM* > > > What can you do about this warning? Let's asume we cast everywhere: I already gave an answer in response to Linus: static inline size_t ustrlen(const unsigned char *s) { return strlen((const char *)s); } > > struct foo * p; > printf(strlen(char*)p); *BOOM* printf(ustrlen(p)); *NO BOOM* unsigned char* u; printf(ustrlen(u)); *NO WARNING* -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* somebody dropped a (warning) bomb @ 2007-02-08 15:00 Jeff Garzik 2007-02-08 16:33 ` Linus Torvalds 2007-02-08 16:35 ` Kumar Gala 0 siblings, 2 replies; 67+ messages in thread From: Jeff Garzik @ 2007-02-08 15:00 UTC (permalink / raw) To: Linux Kernel Mailing List; +Cc: Andrew Morton, Linus Torvalds Just did a pull for the first time since 2.6.20, and a /megaton/ of new warnings have appeared, on Fedora Core 6/x86-64 "make allyesconfig". All of the new warnings spew appear to be "pointers differ in signedness" warning. Granted, we should care about these, but even a "silent" build (make -s) now spews far more output than any human could possibly care about, and digging important nuggets out of all this noise is beginning to require automated tools rather than just human eyes. Jeff ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 15:00 Jeff Garzik @ 2007-02-08 16:33 ` Linus Torvalds 2007-02-08 18:42 ` Jan Engelhardt 2007-02-08 16:35 ` Kumar Gala 1 sibling, 1 reply; 67+ messages in thread From: Linus Torvalds @ 2007-02-08 16:33 UTC (permalink / raw) To: Jeff Garzik; +Cc: Linux Kernel Mailing List, Andrew Morton On Thu, 8 Feb 2007, Jeff Garzik wrote: > > Just did a pull for the first time since 2.6.20, and a /megaton/ of new > warnings have appeared, on Fedora Core 6/x86-64 "make allyesconfig". That was due to the Makefile rule breakage. Hopefully it now _really_ is fixed. The CFLAGS += $(call cc-option,-Wno-pointer-sign,) part effectively got dropped. > All of the new warnings spew appear to be "pointers differ in signedness" > warning. Granted, we should care about these The thing is, I'm _really_ sad we have to use that -Wno-pointer-sign thing. It's a real and valid warning. It finds really ugly code issues (unlike a lot of the other things). So it got dropped by mistake, but that is one warning that I'd actually like to resurrect (unlike a lot of other gcc warnings that I just think are crap to begin with). However, we've got a metric shitload of code that passes in pointers to the wrong sign, so a lot of it would involve mindless changes that are likely to break real things. And the thing is, sometimes -Wpointer-sign-compare is just horribly broken. For example, you cannot write a "strlen()" that doesn't complain about "unsigned char *" vs "char *". And that is just a BUG. I'm continually amazed at how totally clueless some gcc warnings are. You'd think that the people writing them are competent. And then sometimes they show just how totally incompetent they are, by making it impossible to use "unsigned char" arrays together with standard functions. So a _lot_ of the warnings are real. But with no way to shut up the ones that aren't, it sadly doesn't matter one whit ;( (And adding casts is just much worse than shutting up the warning entirely) There should be a rule about warnings: if you can't shut up stylistic warnings on a site-by-site level without adding horribly syntactic stuff that is worse than what the warning is all about, the warning shouldn't exist. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 16:33 ` Linus Torvalds @ 2007-02-08 18:42 ` Jan Engelhardt 2007-02-08 19:53 ` Linus Torvalds 2007-02-09 15:10 ` Sergei Organov 0 siblings, 2 replies; 67+ messages in thread From: Jan Engelhardt @ 2007-02-08 18:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Feb 8 2007 08:33, Linus Torvalds wrote: > >And the thing is, sometimes -Wpointer-sign-compare is just horribly >broken. For example, you cannot write a "strlen()" that doesn't >complain about "unsigned char *" vs "char *". And that is just a BUG. > >I'm continually amazed at how totally clueless some gcc warnings are. >You'd think that the people writing them are competent. And then >sometimes they show just how totally incompetent they are, by making it >impossible to use "unsigned char" arrays together with standard >functions. I generally have to agree with you about the unsigned char* vs char*. It is a problem of the C language that char can be signed and unsigned, and that people, as a result, have used it for storing "shorter_than_short_t"s. What C needs is a distinction between char and int8_t, rendering "char" an unsigned at all times basically and making "unsigned char" and "signed char" illegal types in turn. Jan -- ft: http://freshmeat.net/p/chaostables/ ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 18:42 ` Jan Engelhardt @ 2007-02-08 19:53 ` Linus Torvalds 2007-02-08 21:10 ` Jan Engelhardt 2007-02-08 21:13 ` J.A. Magallón 2007-02-09 15:10 ` Sergei Organov 1 sibling, 2 replies; 67+ messages in thread From: Linus Torvalds @ 2007-02-08 19:53 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 8 Feb 2007, Jan Engelhardt wrote: > > I generally have to agree with you about the unsigned char* vs char*. It > is a problem of the C language that char can be signed and unsigned, and > that people, as a result, have used it for storing > "shorter_than_short_t"s. > > What C needs is a distinction between char and int8_t, rendering "char" > an unsigned at all times basically and making "unsigned char" and > "signed char" illegal types in turn. No, it's really more fundamental than that. Exactly because "char *" doesn't have a defined sign, only a TOTALLY INCOMPETENT compiler will warn about its signedness. The user has clearly stated "I don't care about the sign". If a compiler complains about us passing "unsigned char *" (or, if "char" is naturally unsigned on that platform, "signed char *") to strcmp then that compiler IS BROKEN. Because "strcmp()" takes "char *", which simply DOES NOT HAVE a uniquely defined sign. That's why we can't have -Wpointer-sign on by default. The gcc warning is simply *crap*. If you were to have extern int strcmp(signed char *); and would pass *that* an "unsigned char", it would be a real and valid sign error. But just plain "char *" isn't signed or unsigned. It's "implementation defined", and as such, if the programmer used it, the programmer clearly doesn't care. If he cared, he would just state the signedness explicitly. It really is that simple. gcc is broken. The C language isn't, it's purely a broken compiler issue. The other things in the kernel I'd be willing to fix up. But I simply refuse to work around a known-buggy warning. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 19:53 ` Linus Torvalds @ 2007-02-08 21:10 ` Jan Engelhardt 2007-02-08 21:37 ` Linus Torvalds 2007-02-08 21:13 ` J.A. Magallón 1 sibling, 1 reply; 67+ messages in thread From: Jan Engelhardt @ 2007-02-08 21:10 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Feb 8 2007 11:53, Linus Torvalds wrote: >On Thu, 8 Feb 2007, Jan Engelhardt wrote: >> >Exactly because "char *" doesn't have a defined sign, >The user has clearly stated "I don't care about the sign". If a compiler >complains about us passing "unsigned char *" (or, if "char" is naturally >unsigned on that platform, "signed char *") to strcmp then that compiler >IS BROKEN. Because "strcmp()" takes "char *", which simply DOES NOT HAVE a >uniquely defined sign. Thank you for this insight, I don't usually read standards, only RFCs :) Uh, does that also apply to the longer types, int, long etc.? I hope not. >only a TOTALLY INCOMPETENT compiler will warn about its signedness. >That's why we can't have -Wpointer-sign on by default. The gcc warning is >simply *crap*. >[...] >It really is that simple. gcc is broken. The C language isn't, it's purely >a broken compiler issue. Maybe you could send in a patch to gcc that fixes the issue? Jan -- ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 21:10 ` Jan Engelhardt @ 2007-02-08 21:37 ` Linus Torvalds 2007-02-08 23:12 ` David Rientjes 0 siblings, 1 reply; 67+ messages in thread From: Linus Torvalds @ 2007-02-08 21:37 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 8 Feb 2007, Jan Engelhardt wrote: > > Thank you for this insight, I don't usually read standards, only RFCs :) > Uh, does that also apply to the longer types, int, long etc.? I hope not. No. An "int" or "long" is always signed. But bitfields, enums or "char" can - and do - have signedness that depends on the architecture and/or compiler writers whims. So if you do enum my_enum { orange = 1, blue = 2, green = 3 }; you don't know if "enum my_enum" is signed or not. The compiler could decide to use "int". Or it could decide to use "unsigned char" for it. You simply don't know. Same goes for struct dummy { int flag:1; } a_variable; which could make "a_variable.d" be either signed or unsigned. In the absense of an explicit "signed" or "unsigned" by the programmer, you really won't even _know_ whether "flag" can contain the values {0,1} or {-1,0}. Normally you wouldn't ever even care. A one-bit bitfield would only ever really be tested against 0, but it really is possible that when you assign "1" to that value, and read it back, it will read back -1. Try it. Those are the three only types I can think of, but the point is, they really don't have any standard-defined sign. It's up to the compiler. Now, passing a "pointer to bitfield" is impossible, and if you pass a pointer to an enum you'd better *always* use the same enum anyway (because not only is the signedness implementation-defined, the _size_ of the enum is also implementation-defined), so in those two cases, -Wpointer-sign doesn't hurt. But in the case of "char", it really is insane. It's doubly insane exactly because the C standard defines a lot of standard functions that take "char *", and that make tons of sense with unsigned chars too, and you may have 100% good reasons to use "unsigned char" for many of them. > >It really is that simple. gcc is broken. The C language isn't, it's purely > >a broken compiler issue. > > Maybe you could send in a patch to gcc that fixes the issue? I've butted heads with too many broken gcc decisions. I've occasionally used to try to discuss these kinds of things on the gcc mailing lists, but I got too fed up with language lawyers that said "the standard _allows_ us to be total *ssholes, so stop complaining". It seemed to be an almost universal defense. (And yes, the standard allows any warnings at all. So technically, gcc can warn about whatever hell it wants. It doesn't make it a good idea). Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 21:37 ` Linus Torvalds @ 2007-02-08 23:12 ` David Rientjes 2007-02-08 23:37 ` Linus Torvalds 0 siblings, 1 reply; 67+ messages in thread From: David Rientjes @ 2007-02-08 23:12 UTC (permalink / raw) To: Linus Torvalds Cc: Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 8 Feb 2007, Linus Torvalds wrote: > Same goes for > > struct dummy { > int flag:1; > } a_variable; > > which could make "a_variable.d" be either signed or unsigned. In the > absense of an explicit "signed" or "unsigned" by the programmer, you > really won't even _know_ whether "flag" can contain the values {0,1} or > {-1,0}. > > Normally you wouldn't ever even care. A one-bit bitfield would only ever > really be tested against 0, but it really is possible that when you assign > "1" to that value, and read it back, it will read back -1. Try it. > > Those are the three only types I can think of, but the point is, they > really don't have any standard-defined sign. It's up to the compiler. > And a compiler that makes a_variable.flag unsigned would be brain-dead because "int" is always signed. The signed specifier is only useful for forcing chars to carry a sign and it's redundant with any other types. Using bitfields requires the knowledge of the sign bit in twos-complement just like when you use a signed qualifier for any other definition. Having inconsistent behavior on whether it is unsigned or signed based on how large the bitfield is (i.e. one bit compared to multiple bits) is inappropriate. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 23:12 ` David Rientjes @ 2007-02-08 23:37 ` Linus Torvalds 2007-02-09 0:24 ` David Rientjes 0 siblings, 1 reply; 67+ messages in thread From: Linus Torvalds @ 2007-02-08 23:37 UTC (permalink / raw) To: David Rientjes Cc: Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 8 Feb 2007, David Rientjes wrote: > > And a compiler that makes a_variable.flag unsigned would be brain-dead > because "int" is always signed. No, making bitfields unsigned is actually usually a good idea. It allows you to often generate better code, and it actually tends to be what programmers _expect_. A lot of people seem to be surprised to hear that a one-bit bitfield actually often encodes -1/0, and not 0/1. So unsigned bitfields are not only traditional K&R, they are also usually _faster_ (which is probably why they are traditional K&R - along with allowing "char" to be unsigned by default). Don't knock them. It's much better to just remember that bitfields simply don't _have_ any standard sign unless you specify it explicitly, than saying "it should be signed because 'int' is signed". I will actually argue that having signed bit-fields is almost always a bug, and that as a result you should _never_ use "int" at all. Especially as you might as well just write it as signed a:1; if you really want a signed bitfield. So I would reall yrecommend that you never use "int a:<bits>" AT ALL, because there really is never any good reason to do so. Do it as unsigned a:3; signed b:2; but never int c:4; because the latter really isn't sensible. "sparse" will actually complain about single-bit signed bitfields, and it found a number of cases where people used that "int x:1" kind of syntax. Just don't do it. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 23:37 ` Linus Torvalds @ 2007-02-09 0:24 ` David Rientjes 2007-02-09 0:42 ` Linus Torvalds 0 siblings, 1 reply; 67+ messages in thread From: David Rientjes @ 2007-02-09 0:24 UTC (permalink / raw) To: Linus Torvalds Cc: Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 8 Feb 2007, Linus Torvalds wrote: > No, making bitfields unsigned is actually usually a good idea. It allows > you to often generate better code, and it actually tends to be what > programmers _expect_. A lot of people seem to be surprised to hear that a > one-bit bitfield actually often encodes -1/0, and not 0/1. > Your struct: struct dummy { int flag:1; } a_variable; should expect a_varible.flag to be signed, that's what the int says. There is no special case here with regard to type. It's traditional K&R that writing signed flag:1 here is redundant (K&R pg. 211). Now whether that's what you wanted or not is irrelevant. As typical with C, it gives you exactly what you asked for. You're arguing for inconsistency in how bitfields are qualified based on their size. If you declare a bitfield as int byte:8, then you're going to expect this to behave exactly like a signed char, which it does. struct { int byte:8; } __attribute__((packed)) b_variable; b_variable.byte is identical to a signed char. Just because a_variable.flag happens to be one bit, you cannot say that programmers _expect_ it to be unsigned, or you would also expect b_variable.byte to act as an unsigned char. signed is the default behavior for all types besides char, so the behavior is appropriate based on what most programmers would expect. > So unsigned bitfields are not only traditional K&R, they are also usually > _faster_ (which is probably why they are traditional K&R - along with > allowing "char" to be unsigned by default). Don't knock them. It's much > better to just remember that bitfields simply don't _have_ any standard > sign unless you specify it explicitly, than saying "it should be signed > because 'int' is signed". > Of course they're faster, it doesn't require the sign extension. But you're adding additional semantics to the language by your interpretation of what is expected by the programmer in the bitfield case as opposed to a normal variable definition. > I will actually argue that having signed bit-fields is almost always a > bug, and that as a result you should _never_ use "int" at all. Especially > as you might as well just write it as > > signed a:1; > > if you really want a signed bitfield. > How can signed bitfields almost always be a bug? I'm the programmer and I want to store my data in a variable that I define, so I must follow the twos-complement rule and allot a sign bit if I declare it as a signed value. In C, you do this with "int"; otherwise, you use "unsigned". > So I would reall yrecommend that you never use "int a:<bits>" AT ALL, > because there really is never any good reason to do so. Do it as > > unsigned a:3; > signed b:2; > > but never > > int c:4; > > because the latter really isn't sensible. > That _is_ sensible because anything you declare "int" is going to be signed as far as gcc is concerned. "c" just happens to be 4-bits wide instead of 32. But for everything else, it's the same as an int. I know exactly what I'm getting by writing "int c:4", as would most programmers who use bitfields to begin with. David ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 0:24 ` David Rientjes @ 2007-02-09 0:42 ` Linus Torvalds 2007-02-09 0:59 ` Linus Torvalds ` (3 more replies) 0 siblings, 4 replies; 67+ messages in thread From: Linus Torvalds @ 2007-02-09 0:42 UTC (permalink / raw) To: David Rientjes Cc: Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 8 Feb 2007, David Rientjes wrote: > > Your struct: > > struct dummy { > int flag:1; > } a_variable; > > should expect a_varible.flag to be signed, that's what the int says. No it's not. You just don't understand the C language. And if you don't understand the C language, you can't say "that's what the int says". It says no such thing. The C language clearly says that bitfields have implementation-defined types. So when you see struct dummy { int flag:1; } a_variable; if you don't read that as "oh, the sign of 'flag' is implementation- defined", then you simply aren't reading it right. Is it "intuitive"? Nobody ever really called C an _intuitive_ language. C has a lot of rules that you simply have to know. The bitfield sign rule is just one such rule. > There is no special case here with regard to type. Sure there is. Read the spec. I don't understand why you are arguing. YOU ARE WRONG. Bitfields simply have implementation-defined signedness. As do enums. As does "char". It really is that simple. The *real* special case is actually "int" and "long". In many ways, the fact that those *do* have a well-specified signedness is actually the exception rather than the rule. Most C types don't, and some you can't even tell (do pointers generate "signed" or "unsigned" comparisons? I'll argue that a compiler that generates signed comparisons for them is broken, but it tends to be something you can only see with a standards- conforming proghram if you can allocate memory across the sign boundary, which may or may not be true..) Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 0:42 ` Linus Torvalds @ 2007-02-09 0:59 ` Linus Torvalds 2007-02-09 0:59 ` David Rientjes ` (2 subsequent siblings) 3 siblings, 0 replies; 67+ messages in thread From: Linus Torvalds @ 2007-02-09 0:59 UTC (permalink / raw) To: David Rientjes Cc: Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 8 Feb 2007, Linus Torvalds wrote: > > The C language clearly says that bitfields have implementation-defined > types. Btw, this isn't even some "theoretical discussion". LOTS of compilers do unsigned bitfields. It may even be the majority. There may (or may not) be some correlation with what the default for "char" is, I haven't looked into it. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 0:42 ` Linus Torvalds 2007-02-09 0:59 ` Linus Torvalds @ 2007-02-09 0:59 ` David Rientjes 2007-02-09 1:11 ` Linus Torvalds 2007-02-09 3:27 ` D. Hazelton 2007-02-09 12:34 ` Jan Engelhardt 3 siblings, 1 reply; 67+ messages in thread From: David Rientjes @ 2007-02-09 0:59 UTC (permalink / raw) To: Linus Torvalds Cc: Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 8 Feb 2007, Linus Torvalds wrote: > No it's not. > > You just don't understand the C language. > > And if you don't understand the C language, you can't say "that's what the > int says". It says no such thing. > > The C language clearly says that bitfields have implementation-defined > types. So when you see > > struct dummy { > int flag:1; > } a_variable; > > if you don't read that as "oh, the sign of 'flag' is implementation- > defined", then you simply aren't reading it right. > Maybe you should read my first post, we're talking about gcc's behavior here, not the C standard. My criticism was that any compiler that makes a_variable.flag unsigned is brain-dead and I was arguing in favor of gcc treating plain int bitfields as signed ints (6.7.2, 6.7.2.1). This has _nothing_ to do with the fact that the standard leaves it implementation defined. Naturally you should define it's signness explicitly in your code since it is implementation defined. That's not the point. Just because a compiler CAN consider a_variable.flag as unsigned doesn't mean it makes sense. It makes no sense, and thus is brain-dead. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 0:59 ` David Rientjes @ 2007-02-09 1:11 ` Linus Torvalds 2007-02-09 1:18 ` David Rientjes 0 siblings, 1 reply; 67+ messages in thread From: Linus Torvalds @ 2007-02-09 1:11 UTC (permalink / raw) To: David Rientjes Cc: Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 8 Feb 2007, David Rientjes wrote: > > Maybe you should read my first post, we're talking about gcc's behavior > here, not the C standard. Give it up, David. Even gcc DOES DIFFERENT THINGS! Have you even read the docs? By default it is treated as signed int but this may be changed by the -funsigned-bitfields option. So even for gcc, it's just a default. I think you can even change the default in the spec file. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 1:11 ` Linus Torvalds @ 2007-02-09 1:18 ` David Rientjes 2007-02-09 15:38 ` Linus Torvalds 0 siblings, 1 reply; 67+ messages in thread From: David Rientjes @ 2007-02-09 1:18 UTC (permalink / raw) To: Linus Torvalds Cc: Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 8 Feb 2007, Linus Torvalds wrote: > Even gcc DOES DIFFERENT THINGS! Have you even read the docs? > > By default it is treated as signed int but this may be changed by > the -funsigned-bitfields option. > Yes, I read the 4.1.1 docs: By default, such a bit-field is signed, because this is consistent: the basic integer types such as int are signed types. That is the whole basis for my argument, when you declare something "int," most programmers would consider that to be SIGNED regardless of whether it is 32 bits, 13 bits, or 1 bit. No doubt it is configurable because of the existance of brain-dead compilers that treat them as unsigned. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 1:18 ` David Rientjes @ 2007-02-09 15:38 ` Linus Torvalds 0 siblings, 0 replies; 67+ messages in thread From: Linus Torvalds @ 2007-02-09 15:38 UTC (permalink / raw) To: David Rientjes Cc: Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 8 Feb 2007, David Rientjes wrote: > > Yes, I read the 4.1.1 docs: > > By default, such a bit-field is signed, because this is > consistent: the basic integer types such as int are signed > types. Ok, so the gcc people have added some language. So what? The fact is, that language has absolutely nothing to do with the C language, and doesn't change anything at all. > That is the whole basis for my argument, when you declare something "int," > most programmers would consider that to be SIGNED regardless of whether it > is 32 bits, 13 bits, or 1 bit. And MY argument is that YOUR argument is CRAP. The fact is, signs of bitfields, chars and enums aren't well-defined. That's a FACT. Your argument makes no sense. It's like arguing against "gravity", or like arguing against the fact that the earth circles around the sun. It us STUPID to argue against facts. Your argument is exactly the same argument that people used to say that the earth is the center of the universe: "because it makes sense that we're the center". The fact that "most programmers" or "it makes sense" doesn't make anything at all true. Only verifiable FACTS make something true. There's a great saying: "Reality is that which, when you stop believing in it, doesn't go away." - Philip K Dick. So wake up and smell the coffee: reality is that bitfields don't have a specified sign. It's just a FACT. Whether you _like_ it or not is simply not relevant. The same is true of "char". You can argue that it should be signed by default, sicne "short", "int" and "long" are signed. But THAT IS NOT TRUE. Arguing against reality really isn't very smart. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 0:42 ` Linus Torvalds 2007-02-09 0:59 ` Linus Torvalds 2007-02-09 0:59 ` David Rientjes @ 2007-02-09 3:27 ` D. Hazelton 2007-02-09 19:54 ` Pete Zaitcev 2007-02-09 12:34 ` Jan Engelhardt 3 siblings, 1 reply; 67+ messages in thread From: D. Hazelton @ 2007-02-09 3:27 UTC (permalink / raw) To: Linus Torvalds Cc: David Rientjes, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thursday 08 February 2007 19:42, Linus Torvalds wrote: <snip> > Most C types don't, and some you can't even tell (do pointers generate > "signed" or "unsigned" comparisons? I'll argue that a compiler that > generates signed comparisons for them is broken, but it tends to be > something you can only see with a standards- conforming proghram if you > can allocate memory across the sign boundary, which may or may not be > true..) And this is, because as Dennis Ritchie says in "The Development of the C Language" (http://cm.bell-labs.com/cm/cs/who/dmr/chist.html) C evolved from a typeless language, B - which, in turn, had originated as a stripped down version of BCPL. B and BCPL used a "cell addressing" scheme, but because of the move off the PDP-7, and the difficulties run into by Ken Thompson in rewriting the code for Unix in B (the original Unix was written in assembler) types were added - 'char' and 'int'. These represented the byte and word addressing modes of the PDP-11, and were needed because Ritchie was in the process of having the compiler generate assembler instead of "threaded code". This "NB" language, after also having data structures added and the rules for defining pointers finalized, was renamed "C" by Ritchie. So almost all the rules around the signs of types are because of a single, historical machine. Hence the rules about "char" being unsigned by default and "int" being signed by default are because of the nature of the PDP-11. The implementation defined nature of bitfields is because Dennis Ritchie had freedom there - the PDP-11 didn't have anything like them in the hardware and they were being stuffed in to make Thompsons life easier. Now: There is no reason for the behavior that came from the nature of the PDP11 to have survived, but because it was in "The White Book" it made it through the ANSI standardization process. Now that this history lesson is over... DRH ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 3:27 ` D. Hazelton @ 2007-02-09 19:54 ` Pete Zaitcev 0 siblings, 0 replies; 67+ messages in thread From: Pete Zaitcev @ 2007-02-09 19:54 UTC (permalink / raw) To: D. Hazelton; +Cc: Kernel Mailing List On Thu, 8 Feb 2007 22:27:43 -0500, "D. Hazelton" <dhazelton@enter.net> wrote: > So almost all the rules around the signs of types are because of a single, > historical machine. Hence the rules about "char" being unsigned by default > and "int" being signed by default are because of the nature of the PDP-11. >[...] > Now: There is no reason for the behavior that came from the nature of the > PDP11 to have survived, but because it was in "The White Book" it made it > through the ANSI standardization process. > > Now that this history lesson is over... ... we can remember that char is signed on PDP-11. Mvu hah hah hah ha, only serious. Check how movb works. -- Pete ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 0:42 ` Linus Torvalds ` (2 preceding siblings ...) 2007-02-09 3:27 ` D. Hazelton @ 2007-02-09 12:34 ` Jan Engelhardt 2007-02-09 13:16 ` linux-os (Dick Johnson) 2007-02-13 15:14 ` Dick Streefland 3 siblings, 2 replies; 67+ messages in thread From: Jan Engelhardt @ 2007-02-09 12:34 UTC (permalink / raw) To: Linus Torvalds Cc: David Rientjes, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Feb 8 2007 16:42, Linus Torvalds wrote: > >Most C types don't, and some you can't even tell (do pointers generate >"signed" or "unsigned" comparisons? I'd say "neither", because both signed void *ptr; and unsigned void *xyz; are impossible (and dull at that). That's why you explicitly will have to cast one operand in a comparison to make it evident what sort of comparison is intended, i.e. if((unsigned long)ptr < PAGE_OFFSET) Further, giving again answer to the question whether they generate signed or unsigned comparisons: Have you ever seen a computer which addresses memory with negative numbers? Since the answer is most likely no, signed comparisons would not make sense for me. > I'll argue that a compiler that >generates signed comparisons for them is broken, but it tends to be >something you can only see with a standards- conforming proghram if you >can allocate memory across the sign boundary, which may or may not be >true..) Jan -- ft: http://freshmeat.net/p/chaostables/ ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 12:34 ` Jan Engelhardt @ 2007-02-09 13:16 ` linux-os (Dick Johnson) 2007-02-09 17:45 ` Jan Engelhardt 2007-02-13 15:14 ` Dick Streefland 1 sibling, 1 reply; 67+ messages in thread From: linux-os (Dick Johnson) @ 2007-02-09 13:16 UTC (permalink / raw) To: Jan Engelhardt Cc: Linus Torvalds, David Rientjes, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Fri, 9 Feb 2007, Jan Engelhardt wrote: > > On Feb 8 2007 16:42, Linus Torvalds wrote: >> >> Most C types don't, and some you can't even tell (do pointers generate >> "signed" or "unsigned" comparisons? > > I'd say "neither", because both > > signed void *ptr; and > unsigned void *xyz; > > are impossible (and dull at that). That's why you explicitly will > have to cast one operand in a comparison to make it evident > what sort of comparison is intended, i.e. > > if((unsigned long)ptr < PAGE_OFFSET) > > Further, giving again answer to the question whether they generate signed or > unsigned comparisons: Have you ever seen a computer which addresses memory with > negative numbers? Since the answer is most likely no, signed comparisons would Yes, most all do. Indexed memory operands are signed displacements. See page 2-16, Intel486 Microprocessor Programmer's Reference Manual. This gets hidden by higher-level languages such as 'C'. It is also non-obvious when using the AT&T (GNU) assembly language. However, the Intel documentation shows it clearly: MOV AL, [EBX+0xFFFFFFFF] ; -1 ...will put the byte existing at the location just BEFORE the offset in EBX, into register AL, while: MOV AL, [EBX+0x00000001] ; +1 ...will put the byte existing at the location just AFTER the offset in EBX, into register AL. Also, the documentation specifically states that sign extension is provided for all effective address calculations. > not make sense for me. > >> I'll argue that a compiler that >> generates signed comparisons for them is broken, but it tends to be >> something you can only see with a standards- conforming proghram if you >> can allocate memory across the sign boundary, which may or may not be >> true..) > > Jan > -- > ft: http://freshmeat.net/p/chaostables/ > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > Cheers, Dick Johnson Penguin : Linux version 2.6.16.24 on an i686 machine (5592.61 BogoMips). New book: http://www.AbominableFirebug.com/ _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 13:16 ` linux-os (Dick Johnson) @ 2007-02-09 17:45 ` Jan Engelhardt 2007-02-09 20:29 ` linux-os (Dick Johnson) 0 siblings, 1 reply; 67+ messages in thread From: Jan Engelhardt @ 2007-02-09 17:45 UTC (permalink / raw) To: linux-os (Dick Johnson) Cc: Linus Torvalds, David Rientjes, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Feb 9 2007 08:16, linux-os (Dick Johnson) wrote: >On Fri, 9 Feb 2007, Jan Engelhardt wrote: >> On Feb 8 2007 16:42, Linus Torvalds wrote: >>> >> Further, giving again answer to the question whether they generate >> signed or unsigned comparisons: Have you ever seen a computer which >> addresses memory with negative numbers? Since the answer is most likely >> no, signed comparisons would > >Yes, most all do. Indexed memory operands are signed displacements. See >page 2-16, Intel486 Microprocessor Programmer's Reference Manual. This I was referring to "absolute memory", not the offset magic that assembler allows. After all, (reg+relativeOffset) will yield an absolute address. What I was out at: for machines that have more than 2 GB of memory, you don't call the address that is given by 0x80000000U actually "byte -2147483648", but "byte 2147483648". >gets hidden by higher-level languages such as 'C'. It is also non-obvious >when using the AT&T (GNU) assembly language. However, the Intel documentation >shows it clearly: > MOV AL, [EBX+0xFFFFFFFF] ; -1 It can be debated what exactly this is... negative offset or "using overflow to get where we want". Jan -- ft: http://freshmeat.net/p/chaostables/ ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 17:45 ` Jan Engelhardt @ 2007-02-09 20:29 ` linux-os (Dick Johnson) 2007-02-09 22:05 ` Jan Engelhardt 0 siblings, 1 reply; 67+ messages in thread From: linux-os (Dick Johnson) @ 2007-02-09 20:29 UTC (permalink / raw) To: Jan Engelhardt Cc: Linus Torvalds, David Rientjes, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Fri, 9 Feb 2007, Jan Engelhardt wrote: > > On Feb 9 2007 08:16, linux-os (Dick Johnson) wrote: >> On Fri, 9 Feb 2007, Jan Engelhardt wrote: >>> On Feb 8 2007 16:42, Linus Torvalds wrote: >>>> >>> Further, giving again answer to the question whether they generate >>> signed or unsigned comparisons: Have you ever seen a computer which >>> addresses memory with negative numbers? Since the answer is most likely >>> no, signed comparisons would >> >> Yes, most all do. Indexed memory operands are signed displacements. See >> page 2-16, Intel486 Microprocessor Programmer's Reference Manual. This > > I was referring to "absolute memory", not the offset magic that assembler > allows. After all, (reg+relativeOffset) will yield an absolute address. > What I was out at: for machines that have more than 2 GB of memory, you > don't call the address that is given by 0x80000000U actually "byte > -2147483648", but "byte 2147483648". Don't make any large bets on that! char foo() { volatile char *p = (char *)0x80000000; return *p; } Optimized.... .file "zzz.c" .text .p2align 2,,3 .globl foo .type foo, @function foo: pushl %ebp movb -2147483648, %al movl %esp, %ebp movsbl %al,%eax leave ret .size foo, .-foo .section .note.GNU-stack,"",@progbits .ident "GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)" Not optimized... .file "zzz.c" .text .globl foo .type foo, @function foo: pushl %ebp movl %esp, %ebp subl $4, %esp movl $-2147483648, -4(%ebp) movl -4(%ebp), %eax movb (%eax), %al movsbl %al,%eax leave ret .size foo, .-foo .section .note.GNU-stack,"",@progbits .ident "GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)" ... Still negative numbers! > >> gets hidden by higher-level languages such as 'C'. It is also non-obvious >> when using the AT&T (GNU) assembly language. However, the Intel documentation >> shows it clearly: >> MOV AL, [EBX+0xFFFFFFFF] ; -1 > > It can be debated what exactly this is... negative offset or "using > overflow to get where we want". > No debate at all, and no overflow. The construction is even used in 'C' to optimize memory access while unrolling loops: /*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/ /* * This converts an array of integers to an array of double floats. */ void int2float(double *fp, int *ip, size_t len) { while(len >= 0x10) { fp += 0x10; ip += 0x10; len -= 0x10; fp[-0x10] = (double) ip[-0x10]; fp[-0x0f] = (double) ip[-0x0f]; fp[-0x0e] = (double) ip[-0x0e]; fp[-0x0d] = (double) ip[-0x0d]; fp[-0x0c] = (double) ip[-0x0c]; fp[-0x0b] = (double) ip[-0x0b]; fp[-0x0a] = (double) ip[-0x0a]; fp[-0x09] = (double) ip[-0x09]; fp[-0x08] = (double) ip[-0x08]; fp[-0x07] = (double) ip[-0x07]; fp[-0x06] = (double) ip[-0x06]; fp[-0x05] = (double) ip[-0x05]; fp[-0x04] = (double) ip[-0x04]; fp[-0x03] = (double) ip[-0x03]; fp[-0x02] = (double) ip[-0x02]; fp[-0x01] = (double) ip[-0x01]; } while(len--) *fp++ = (double) *ip++; } This makes certain that each memory access uses the same type and the address-calculation for the next memory access can be done in parallel with the current fetch. This is one of the methods used to save a few machine cycles. Doesn't mean much until you end up with large arrays such as used in image processing. Then, it may be the difference that makes or breaks the project! > > Jan > -- > ft: http://freshmeat.net/p/chaostables/ > Cheers, Dick Johnson Penguin : Linux version 2.6.16.24 on an i686 machine (5592.61 BogoMips). New book: http://www.AbominableFirebug.com/ _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 20:29 ` linux-os (Dick Johnson) @ 2007-02-09 22:05 ` Jan Engelhardt 2007-02-09 22:58 ` Martin Mares 2007-02-12 18:50 ` linux-os (Dick Johnson) 0 siblings, 2 replies; 67+ messages in thread From: Jan Engelhardt @ 2007-02-09 22:05 UTC (permalink / raw) To: linux-os (Dick Johnson) Cc: Linus Torvalds, David Rientjes, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Feb 9 2007 15:29, linux-os (Dick Johnson) wrote: >> >> I was referring to "absolute memory", not the offset magic that assembler >> allows. After all, (reg+relativeOffset) will yield an absolute address. >> What I was out at: for machines that have more than 2 GB of memory, you >> don't call the address that is given by 0x80000000U actually "byte >> -2147483648", but "byte 2147483648". > >Don't make any large bets on that! > >char foo() >{ > volatile char *p = (char *)0x80000000; > return *p; >} >Optimized.... > .file "zzz.c" > .text > .p2align 2,,3 >.globl foo > .type foo, @function >foo: > pushl %ebp > movb -2147483648, %al > movl %esp, %ebp > movsbl %al,%eax > leave > ret > .size foo, .-foo > .section .note.GNU-stack,"",@progbits > .ident "GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)" 00000000 <foo>: 0: 55 push %ebp 1: 0f b6 05 00 00 00 80 movzbl 0x80000000,%eax 8: 89 e5 mov %esp,%ebp a: 5d pop %ebp b: 0f be c0 movsbl %al,%eax e: c3 ret You do know that there is a bijection between the set of signed [32bit] integers and unsigned [32bit] integers, don't you? For the CPU, it's just bits. Being signed or unsigned is not important when just accessing memory. It will, when a comparison is involved, but that was not the point here. void* comparisons are unsigned. Period. Because a compiler doing signed comparisons will "map" the memory [from 2 GB to 4 GB] as part of the signed comparison before the memory [from 0 GB to 2 GB], which collides with - let's call it - "the world view". Jan -- ft: http://freshmeat.net/p/chaostables/ ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 22:05 ` Jan Engelhardt @ 2007-02-09 22:58 ` Martin Mares 2007-02-12 18:50 ` linux-os (Dick Johnson) 1 sibling, 0 replies; 67+ messages in thread From: Martin Mares @ 2007-02-09 22:58 UTC (permalink / raw) To: Jan Engelhardt Cc: linux-os (Dick Johnson), Linus Torvalds, David Rientjes, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Hello! > void* comparisons are unsigned. Period. As far as the C standard is concerned, there is no relationship between comparison on pointers and comparison of their values casted to uintptr_t. The address space needn't be linear and on some machines it isn't. So speaking about signedness of pointer comparisons doesn't make sense, except for concrete implementations. Have a nice fortnight -- Martin `MJ' Mares <mj@ucw.cz> http://mj.ucw.cz/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth Top ten reasons to procrastinate: 1. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 22:05 ` Jan Engelhardt 2007-02-09 22:58 ` Martin Mares @ 2007-02-12 18:50 ` linux-os (Dick Johnson) 1 sibling, 0 replies; 67+ messages in thread From: linux-os (Dick Johnson) @ 2007-02-12 18:50 UTC (permalink / raw) To: Jan Engelhardt Cc: Linus Torvalds, David Rientjes, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Fri, 9 Feb 2007, Jan Engelhardt wrote: > > On Feb 9 2007 15:29, linux-os (Dick Johnson) wrote: >>> >>> I was referring to "absolute memory", not the offset magic that assembler >>> allows. After all, (reg+relativeOffset) will yield an absolute address. >>> What I was out at: for machines that have more than 2 GB of memory, you >>> don't call the address that is given by 0x80000000U actually "byte >>> -2147483648", but "byte 2147483648". >> >> Don't make any large bets on that! >> >> char foo() >> { >> volatile char *p = (char *)0x80000000; >> return *p; >> } >> Optimized.... >> .file "zzz.c" >> .text >> .p2align 2,,3 >> .globl foo >> .type foo, @function >> foo: >> pushl %ebp >> movb -2147483648, %al >> movl %esp, %ebp >> movsbl %al,%eax >> leave >> ret >> .size foo, .-foo >> .section .note.GNU-stack,"",@progbits >> .ident "GCC: (GNU) 3.3.3 20040412 (Red Hat Linux 3.3.3-7)" > > 00000000 <foo>: > 0: 55 push %ebp > 1: 0f b6 05 00 00 00 80 movzbl 0x80000000,%eax > 8: 89 e5 mov %esp,%ebp > a: 5d pop %ebp > b: 0f be c0 movsbl %al,%eax > e: c3 ret > > You do know that there is a bijection between the set of signed [32bit] > integers and unsigned [32bit] integers, don't you? > For the CPU, it's just bits. Being signed or unsigned is not important > when just accessing memory. I would have normally just let that comment go, but it is symptomatic of a complete misunderstanding of how many (most) processors address memory. This may come about because modern compilers tend to isolate programmers from the underlying mechanisms. There isn't a 1:1 correspondence between a signed displacement and an unsigned one. Signed displacements are mandatory with processors that provide protection. Here is the reason: assume that you could execute code anywhere (a flat module) and you didn't have 'C' complaining about NULL pointers. If you were to execute code at memory location 0, that told the processor to jump to location -1 in absolute memory, you need to have the processor generate a bounds error trap, not to wrap to offset 0xffffffff. The same thing is true if code is executing at or near offset 0xffffffff. If it jumps to a location that would wrap past 0xffffffff, one needs to generate a trap as well. Also, except for the "FAR" jumps and FAR calls, where the segment register is part of the operand, all programed jumps or calls are relative, i.e., based upon the current program-counter's value. This means that the program- counter will have a value added to, or subtracted from, its current value to control program flow. For such addition to work either as an addition of signed numbers or as a relative displacement, the math must use signed (twos compliment) arithmetic. If you look at the opcode output from the compiler, when performing conditional jumps, you will note that the jump usually doesn't include a 32-bit address. Instead, these are mostly short jumps, using an 8-bit signed displacement (-128 to +127 bytes). Other conditional jumps use 32-bit signed displacements. The exact same addressing mechanism exists for memory read and write accesses. They are all signed displacements. > It will, when a comparison is involved, but > that was not the point here. void* comparisons are unsigned. Period. > Because a compiler doing signed comparisons will "map" the memory [from > 2 GB to 4 GB] as part of the signed comparison before the memory [from 0 > GB to 2 GB], which collides with - let's call it - "the world view". > > Jan > -- > ft: http://freshmeat.net/p/chaostables/ > Cheers, Dick Johnson Penguin : Linux version 2.6.16.24 on an i686 machine (5592.61 BogoMips). New book: http://www.AbominableFirebug.com/ _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 12:34 ` Jan Engelhardt 2007-02-09 13:16 ` linux-os (Dick Johnson) @ 2007-02-13 15:14 ` Dick Streefland 1 sibling, 0 replies; 67+ messages in thread From: Dick Streefland @ 2007-02-13 15:14 UTC (permalink / raw) To: linux-kernel Jan Engelhardt <jengelh@linux01.gwdg.de> wrote: | Further, giving again answer to the question whether they generate signed or | unsigned comparisons: Have you ever seen a computer which addresses memory with | negative numbers? Since the answer is most likely no, signed comparisons would | not make sense for me. The Transputer has (had?) a signed address space: http://maven.smith.edu/~thiebaut/transputer/chapter2/chap2-3.html -- Dick Streefland //// Altium BV dick.streefland@altium.nl (@ @) http://www.altium.com --------------------------------oOO--(_)--OOo--------------------------- ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 19:53 ` Linus Torvalds 2007-02-08 21:10 ` Jan Engelhardt @ 2007-02-08 21:13 ` J.A. Magallón 2007-02-08 21:42 ` Linus Torvalds 1 sibling, 1 reply; 67+ messages in thread From: J.A. Magallón @ 2007-02-08 21:13 UTC (permalink / raw) To: Linus Torvalds Cc: Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 8 Feb 2007 11:53:38 -0800 (PST), Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Thu, 8 Feb 2007, Jan Engelhardt wrote: > > > > I generally have to agree with you about the unsigned char* vs char*. It > > is a problem of the C language that char can be signed and unsigned, and > > that people, as a result, have used it for storing > > "shorter_than_short_t"s. > > > > What C needs is a distinction between char and int8_t, rendering "char" > > an unsigned at all times basically and making "unsigned char" and > > "signed char" illegal types in turn. > > No, it's really more fundamental than that. > > Exactly because "char *" doesn't have a defined sign, only a TOTALLY > INCOMPETENT compiler will warn about its signedness. > Perhaps the problem is that gcc warns you something like 'if you care anything about the sign behaviour of what you send to strlen() you're doomed'. Ie, you declared the var unsigned, so you care about it. But I can do anything without any regard to the sign. -- J.A. Magallon <jamagallon()ono!com> \ Software is like sex: \ It's better when it's free Mandriva Linux release 2007.1 (Cooker) for i586 Linux 2.6.19-jam06 (gcc 4.1.2 20070115 (prerelease) (4.1.2-0.20070115.1mdv2007.1)) #1 SMP PREEMPT ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 21:13 ` J.A. Magallón @ 2007-02-08 21:42 ` Linus Torvalds 2007-02-08 22:03 ` Linus Torvalds 0 siblings, 1 reply; 67+ messages in thread From: Linus Torvalds @ 2007-02-08 21:42 UTC (permalink / raw) To: J.A. Magallón Cc: Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton [-- Attachment #1: Type: TEXT/PLAIN, Size: 1282 bytes --] On Thu, 8 Feb 2007, J.A. Magallón wrote: > > Perhaps the problem is that gcc warns you something like 'if you care > anything about the sign behaviour of what you send to strlen() you're doomed'. But you *cannot* care. That's the point. The warning is insane. For any function that takes "char *", you very fundamentally *cannot* care about the signedness of the argument, because doing so would be wrong. If you do care, you're already buggy. > Ie, you declared the var unsigned, so you care about it. But I can do > anything without any regard to the sign. That makes no sense. First off, "strlen()" doesn't care about the sign of the buffer, so it's a bad example anyway. But take something that *can* care, namely "strcmp()", which can return different things depending on whether "char" is signed or not (is 0xff larger than 0x00? Depends on whether char is signed or not..). But THE CALLER CANNOT AND MUST NOT CARE! Because the sign of "char" is implementation-defined, so if you call "strcmp()", you are already basically saying: I don't care (and I _cannot_ care) what sign you are using. So having the compiler warn about it is insane. It's like warnign about the fact that it's Thursday today. It's not something the programmer cares about. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 21:42 ` Linus Torvalds @ 2007-02-08 22:03 ` Linus Torvalds 2007-02-08 22:19 ` Willy Tarreau ` (2 more replies) 0 siblings, 3 replies; 67+ messages in thread From: Linus Torvalds @ 2007-02-08 22:03 UTC (permalink / raw) To: J.A. Magallón Cc: Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 8 Feb 2007, Linus Torvalds wrote: > > But THE CALLER CANNOT AND MUST NOT CARE! Because the sign of "char" is > implementation-defined, so if you call "strcmp()", you are already > basically saying: I don't care (and I _cannot_ care) what sign you are > using. Let me explain it another way. Say you use signed char *myname, *yourname; if (strcmp(myname,yourname) < 0) printf("Ha, I win!\n") and you compile this on an architecture where "char" is signed even without the explicit "signed". What should happen? Should you get a warning? The types really *are* the same, so getting a warning sounds obviously insane. But on the other hand, if you really care about the sign that strcmp() uses internally, the code is wrong *anyway*, because with another compiler, or with the *same* compiler on another architecture or some other compiler flags, the very same code is buggy. In other words, either you should get a warning *regardless* of whether the sign actually matches or not, or you shouldn't get a warning at all for the above code. Either it's buggy code, or it isn't. Warning only when the sign doesn't _happen_ to match is crap. In that case, it's not a warning about bad code, it's a warning about a bad *compiler*. My suggestion is that if you *really* care about the sign so much that you want the sign warning, make it really obvious to the compiler. Don't ever call functions that have implicit signs. Make even "int" arguments (which is well-defined in its sign) use "signed int", and then you can make the compiler warn if anybody ever passes it an "unsigned int". Never mind even a pointer - if somebody actually took the time and effort to spell out "signed int" in a function prototype, and you pass that function an unsigned integer, maybe a warning is perfectly fine. Clearly the programmer really cared, and if he didn't care about the sign that much, he could have used just "int". Conversely, if somebody has a function with a "unsigned int" prototype, and you pass it a regular "int", a compiler shouldn't complain, because an "int" will just silently promote to unsigned. But perhaps the programmer passes it something that he had _explicitly_ marked with "signed int". Would it make sense to warn then? Makes sense to me. And no, none of this is about "strict C standards". All of it is about "what makes sense". It simply doesn't make sense to complain about the sign of "char", because it's not something that has a very hard definition. Similarly, you shouldn't complain about regular "int" conversions, because they are normal, and the standard defines them, but maybe you can take a hint when the programmer gives you a hint by doing something that is "obviously unnecessary", like explicitly saying that "signed int" thing. Just an idea. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 22:03 ` Linus Torvalds @ 2007-02-08 22:19 ` Willy Tarreau 2007-02-09 0:03 ` J.A. Magallón 2007-02-09 12:38 ` Sergei Organov 2 siblings, 0 replies; 67+ messages in thread From: Willy Tarreau @ 2007-02-08 22:19 UTC (permalink / raw) To: Linus Torvalds Cc: J.A. Magallón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, Feb 08, 2007 at 02:03:06PM -0800, Linus Torvalds wrote: > > > On Thu, 8 Feb 2007, Linus Torvalds wrote: > > > > But THE CALLER CANNOT AND MUST NOT CARE! Because the sign of "char" is > > implementation-defined, so if you call "strcmp()", you are already > > basically saying: I don't care (and I _cannot_ care) what sign you are > > using. > > Let me explain it another way. > > Say you use > > signed char *myname, *yourname; > > if (strcmp(myname,yourname) < 0) > printf("Ha, I win!\n") > > and you compile this on an architecture where "char" is signed even > without the explicit "signed". > > What should happen? > > Should you get a warning? The types really *are* the same, so getting a > warning sounds obviously insane. But on the other hand, if you really care > about the sign that strcmp() uses internally, the code is wrong *anyway*, > because with another compiler, or with the *same* compiler on another > architecture or some other compiler flags, the very same code is buggy. > > In other words, either you should get a warning *regardless* of whether > the sign actually matches or not, or you shouldn't get a warning at all > for the above code. Either it's buggy code, or it isn't. > > Warning only when the sign doesn't _happen_ to match is crap. In that > case, it's not a warning about bad code, it's a warning about a bad > *compiler*. > > My suggestion is that if you *really* care about the sign so much that you > want the sign warning, make it really obvious to the compiler. Don't ever > call functions that have implicit signs. Make even "int" arguments (which > is well-defined in its sign) use "signed int", and then you can make the > compiler warn if anybody ever passes it an "unsigned int". > > Never mind even a pointer - if somebody actually took the time and effort > to spell out "signed int" in a function prototype, and you pass that > function an unsigned integer, maybe a warning is perfectly fine. Clearly > the programmer really cared, and if he didn't care about the sign that > much, he could have used just "int". > > Conversely, if somebody has a function with a "unsigned int" prototype, > and you pass it a regular "int", a compiler shouldn't complain, because an > "int" will just silently promote to unsigned. But perhaps the programmer > passes it something that he had _explicitly_ marked with "signed int". > Would it make sense to warn then? Makes sense to me. > > And no, none of this is about "strict C standards". All of it is about > "what makes sense". It simply doesn't make sense to complain about the > sign of "char", because it's not something that has a very hard > definition. Similarly, you shouldn't complain about regular "int" > conversions, because they are normal, and the standard defines them, but > maybe you can take a hint when the programmer gives you a hint by doing > something that is "obviously unnecessary", like explicitly saying that > "signed int" thing. > > Just an idea. > > Linus I perfectly agree with your analysis here. I'm used to specify "signed" or "unsigned" when I really expect such a sign, whether it's an int or char, and because of this, I've spent a lot of time stuffing useless casts everywhere to shut stupid warning with common functions such as strcmp(), or even my functions when I don't care about the sign. IOW, the compiler's excessive warnings discourage you from being strict on your types. That could be called "funny" if it did not make us waste so much time. I would really like to get the "I don't care" possibility by default. Willy ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 22:03 ` Linus Torvalds 2007-02-08 22:19 ` Willy Tarreau @ 2007-02-09 0:03 ` J.A. Magallón 2007-02-09 0:22 ` Linus Torvalds 2007-02-09 12:38 ` Sergei Organov 2 siblings, 1 reply; 67+ messages in thread From: J.A. Magallón @ 2007-02-09 0:03 UTC (permalink / raw) To: Linus Torvalds Cc: Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 8 Feb 2007 14:03:06 -0800 (PST), Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Thu, 8 Feb 2007, Linus Torvalds wrote: > > > > But THE CALLER CANNOT AND MUST NOT CARE! Because the sign of "char" is > > implementation-defined, so if you call "strcmp()", you are already > > basically saying: I don't care (and I _cannot_ care) what sign you are > > using. > > Let me explain it another way. > > Say you use > > signed char *myname, *yourname; > > if (strcmp(myname,yourname) < 0) > printf("Ha, I win!\n") > > and you compile this on an architecture where "char" is signed even > without the explicit "signed". > > What should happen? > > Should you get a warning? The types really *are* the same, so getting a > warning sounds obviously insane. But on the other hand, if you really care > about the sign that strcmp() uses internally, the code is wrong *anyway*, Thats the point. Mmmm, I think I see it the other way around. I defined a variable as 'signed' or 'unsigned', because the sign info matters for me. And gcc warns about using a function on it that will _ignore_ or even misinterpret that info. Could it be a BUG ? Yes. > because with another compiler, or with the *same* compiler on another > architecture or some other compiler flags, the very same code is buggy. > > In other words, either you should get a warning *regardless* of whether > the sign actually matches or not, or you shouldn't get a warning at all > for the above code. Either it's buggy code, or it isn't. > In fact, I tried to look for an arch where this gives only _one_ error: #include <string.h> void f() { const unsigned char *a; const signed char *b; int la,lb; la = strlen(a); lb = strlen(b); } and guess, all give _both_ errors. Linux/x86, gcc 4.1.2-0.20070115: werewolf:~> gcc -Wpointer-sign -c t.c t.c: In function ‘f’: t.c:10: warning: pointer targets in passing argument 1 of ‘strlen’ differ in signedness t.c:11: warning: pointer targets in passing argument 1 of ‘strlen’ differ in signedness OSX ppc, gcc-4.0.1 belly:~> gcc -Wpointer-sign -c t.c t.c: In function ‘f’: t.c:10: warning: pointer targets in passing argument 1 of ‘strlen’ differ in signedness t.c:11: warning: pointer targets in passing argument 1 of ‘strlen’ differ in signedness Solaris5.7 Ultra, cc WorkShop Compilers 5.0 den:~> cc -Xa -c t.c ucbcc: Warning: "-Xa" redefines compatibility mode from "SunC transition" to "ANSI" "t.c", line 10: warning: argument #1 is incompatible with prototype: prototype: pointer to const char : "/usr/include/string.h", line 71 argument : pointer to const uchar "t.c", line 11: warning: argument #1 is incompatible with prototype: prototype: pointer to const char : "/usr/include/string.h", line 71 argument : pointer to const schar This makes sense. Someone can try on something psychedelic, like ARM ;) ? -- J.A. Magallon <jamagallon()ono!com> \ Software is like sex: \ It's better when it's free Mandriva Linux release 2007.1 (Cooker) for i586 Linux 2.6.19-jam06 (gcc 4.1.2 20070115 (prerelease) (4.1.2-0.20070115.1mdv2007.1)) #1 SMP PREEMPT ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 0:03 ` J.A. Magallón @ 2007-02-09 0:22 ` Linus Torvalds 0 siblings, 0 replies; 67+ messages in thread From: Linus Torvalds @ 2007-02-09 0:22 UTC (permalink / raw) To: J.A. Magallón Cc: Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton [-- Attachment #1: Type: TEXT/PLAIN, Size: 1251 bytes --] On Fri, 9 Feb 2007, J.A. Magallón wrote: > > Thats the point. Mmmm, I think I see it the other way around. I defined > a variable as 'signed' or 'unsigned', because the sign info matters for me. > And gcc warns about using a function on it that will _ignore_ or even > misinterpret that info. Could it be a BUG ? Yes. Sure. The other way of seeing it is that *anything* could be a bug. Could adding 1 to "a" be a bug? Yes. "a" might overflow. So maybe the compiler should warn about that too? So do you think a compiler should warn when you do int a = i + 1; and say "warning: Expression on line x might overflow"? Could it be a BUG? Hell yeah. Is warning for things that _could_ be bugs sane? Hell NO. > Linux/x86, gcc 4.1.2-0.20070115: > werewolf:~> gcc -Wpointer-sign -c t.c > t.c: In function ÿÿfÿÿ: > t.c:10: warning: pointer targets in passing argument 1 of ÿÿstrlenÿÿ differ in signedness > t.c:11: warning: pointer targets in passing argument 1 of ÿÿstrlenÿÿ differ in signedness Yeah, and that's what I think is crazy. Is it consistent? Yes. Does it help people? No. A warning that is consistent is not necessarily a good warning. It needs to MAKE SENSE too. And this one doesn't. I'm sorry if you can't see that. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 22:03 ` Linus Torvalds 2007-02-08 22:19 ` Willy Tarreau 2007-02-09 0:03 ` J.A. Magallón @ 2007-02-09 12:38 ` Sergei Organov 2007-02-09 15:58 ` Linus Torvalds 2 siblings, 1 reply; 67+ messages in thread From: Sergei Organov @ 2007-02-09 12:38 UTC (permalink / raw) To: Linus Torvalds Cc: J.A. Magallón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Linus Torvalds <torvalds@linux-foundation.org> writes: > On Thu, 8 Feb 2007, Linus Torvalds wrote: >> >> But THE CALLER CANNOT AND MUST NOT CARE! Because the sign of "char" is >> implementation-defined, so if you call "strcmp()", you are already >> basically saying: I don't care (and I _cannot_ care) what sign you are >> using. > > Let me explain it another way. > > Say you use > > signed char *myname, *yourname; > > if (strcmp(myname,yourname) < 0) > printf("Ha, I win!\n") > > and you compile this on an architecture where "char" is signed even > without the explicit "signed". As far as I can read the C99 standard, the "char", "signed char", and "unsigned char", are all different types: "The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.(35) .... (35) CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ from the other two and is not compatible with either. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ " If so, then the code above is broken no matter what representation of "char" is chosen for given arch, as strcmp() takes "char*" arguments, that are not compatible with either "signed char*" or "unsigned char*". -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 12:38 ` Sergei Organov @ 2007-02-09 15:58 ` Linus Torvalds 2007-02-12 11:12 ` Sergei Organov 0 siblings, 1 reply; 67+ messages in thread From: Linus Torvalds @ 2007-02-09 15:58 UTC (permalink / raw) To: Sergei Organov Cc: J.A. Magallón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Fri, 9 Feb 2007, Sergei Organov wrote: > > As far as I can read the C99 standard, the "char", "signed char", and > "unsigned char", are all different types: Indeed. Search for "pseudo-unsigned", and you'll see more. There are actually cases where "char" can act differently from _both_ "unsigned char" and "signed char". > If so, then the code above is broken no matter what representation of > "char" is chosen for given arch, as strcmp() takes "char*" arguments, > that are not compatible with either "signed char*" or "unsigned char*". ..and my argument is that a warning which doesn't allow you to call "strlen()" on a "unsigned char" array without triggering is a bogus warning, and must be removed. Which is all I've ever said from the beginning. I've never disputed that "char" and "unsigned char" are different types. They *clearly* are different types. But that doesn't make the warning valid. There is a real need for NOT warning about sign differences. But this argument has apparently again broken down and become a "language lawyer" argument instead of being about what a reasonable compiler actually can and should do. As long as gcc warns about idiotic things, the kernel will keep shutting that warning off. I'd _like_ to use the warning, but if the gcc implementation is on crack, we clearly cannot. There's simply an issue of "quality of implementation". In the case of this warning, gcc is just not very high quality. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-09 15:58 ` Linus Torvalds @ 2007-02-12 11:12 ` Sergei Organov 2007-02-12 16:26 ` Linus Torvalds 0 siblings, 1 reply; 67+ messages in thread From: Sergei Organov @ 2007-02-12 11:12 UTC (permalink / raw) To: Linus Torvalds Cc: J.A. Magallón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Linus Torvalds <torvalds@linux-foundation.org> writes: > On Fri, 9 Feb 2007, Sergei Organov wrote: >> >> As far as I can read the C99 standard, the "char", "signed char", and >> "unsigned char", are all different types: > > Indeed. Search for "pseudo-unsigned", and you'll see more. There are > actually cases where "char" can act differently from _both_ "unsigned > char" and "signed char". > >> If so, then the code above is broken no matter what representation of >> "char" is chosen for given arch, as strcmp() takes "char*" arguments, >> that are not compatible with either "signed char*" or "unsigned char*". > > ..and my argument is that a warning which doesn't allow you to call > "strlen()" on a "unsigned char" array without triggering is a bogus > warning, and must be removed. Why strlen() should be allowed to be called with an incompatible pointer type? My point is that gcc should issue *different warning*, -- the same warning it issues here: $ cat incompat.c void foo(int *p); void boo(long *p) { foo(p); } $ gcc -W -Wall -c incompat.c incompat.c:2: warning: passing argument 1 of 'foo' from incompatible pointer type Calling strlen(char*) with "unsigned char*" argument does pass argument of incompatible pointer type due to the fact that in C "char" and "unsigned char" are two incompatible types, and it has nothing to do with signedness. [...] -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-12 11:12 ` Sergei Organov @ 2007-02-12 16:26 ` Linus Torvalds 2007-02-13 18:06 ` Sergei Organov 0 siblings, 1 reply; 67+ messages in thread From: Linus Torvalds @ 2007-02-12 16:26 UTC (permalink / raw) To: Sergei Organov Cc: J.A. MagallÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Mon, 12 Feb 2007, Sergei Organov wrote: > > Why strlen() should be allowed to be called with an incompatible pointer > type? My point is that gcc should issue *different warning*, -- the same > warning it issues here: I agree that "strlen()" per se isn't different. The issue is not that the warning isn't "technically correct". IT IS. Nobody should ever argue that the warning isn't "correct". I hope people didn't think I argued that. I've argued that the warning is STUPID. That's a totally different thing. I can say totally idiotic things in perfectly reasonable English grammar and spelling. Does that make the things I say "good"? No. The same is true of this gcc warning. It's technically perfectly reasonable both in English grammar and spelling (well, as far as any compiler warning ever is) _and_ in "C grammar and spelling" too. But being grammatically correct does not make it "smart". IT IS STILL STUPID. Can people not see the difference between "grammatically correct" and "intelligent"? People on the internet seem to have often acquired the understanding that "bad grammar and spelling" => "stupid", and yes, there is definitely some kind of correlation there. But as any logician and matematician hopefully knows, "a => b" does NOT imply "!a => !b". Some people think that "warnings are always good". HELL NO! A warnign is only as good as (a) the thing it warns about (b) the thing you can do about it And THAT is the fundamental problem with that *idiotic* warning. Yes, it's technically correct. Yes, it's "proper C grammar". But if you can't get over the hump of realizing that there is a difference between "grammar" and "intelligent speech", you shouldn't be doing compilers. So the warning sucks because: - the thing it warns about (passing "unsigned char" to something that doesn't specify a sign at all!) is not something that sounds wrong in the first place. Yes, it's unsigned. But no, the thing it is passed to didn't specify that it wanted a "signed" thing in the first place. The "strlen()" function literally says "I want a char of indeterminate sign"! which implies that strlen really doesn't care about the sign. The same is true of *any* function that takes a "char *". Such a function doesn't care, and fundamentally CANNOT care about the sign, since it's not even defined! So the warning fails the (a) criterion. The warning isn't valid, because the thing it warns about isn't a valid problem! - it _also_ fails the (b) criterion, because quite often there is nothing you can do about it. Yes, you can add a cast, but adding a cast actually causes _worse_ code (but the warning is certainly gone). But that makes the _other_ argument for the warning totally point-less: if the reason for the warning was "bad code", then having the warning is actively BAD, because the end result is actually "worse code". See? The second point is why it's important to also realize that there is a lot of real and valid code that actually _does_ pass "strlen()" an unsigned string. There are tons of reasons for that to happen: the part of the program that _does_ care wants to use a "unsigned char" array, because it ends up doing things like "isspace(array[x])", and that is not well-defined if you use a "char *" array. So there are lots of reasons to use "unsigned char" arrays for strings. Look it up. Look up any half-way reasonable man-page for the "isspace()" kind of functions, and if they don't actually explicitly say that you should use unsigned characters for it, those man-pages are crap. Because those functions really *are* defined in "int", but it's the same kind of namespace that "getchar()" works in (ie "unsigned char" + EOF, where EOF _usually_ is -1, although other values are certainly technically legal too). So: - in practice, a lot of "good programming" uses "unsigned char" pointers for doing strings. There are LOTS of reasons for that, but "isspace()" and friends is the most obvious one. - if you can't call "strlen()" on your strings without the compiler warning, there's two choices: the compiler warning is CRAP, or your program is bad. But as I just showed you, "unsigned char *" is actually often the *right* thing to use for string work, so it clearly wasn't the program that was bad. So *please* understand: - yes, the warning is "correct" from a C grammatical standpoint - the warnign is STILL CRAP, because grammar isn't the only thing about a computer language. Sane usage is MUCH MORE important than any grammar. Thus ends the sacred teachings of Linus "always right" Torvalds. Go and ponder these words, and please send me all your money (certified checks only, please - sending small unmarked bills is against USPS rules) to show your support of the holy church of good taste. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-12 16:26 ` Linus Torvalds @ 2007-02-13 18:06 ` Sergei Organov 2007-02-13 18:26 ` Pekka Enberg ` (2 more replies) 0 siblings, 3 replies; 67+ messages in thread From: Sergei Organov @ 2007-02-13 18:06 UTC (permalink / raw) To: Linus Torvalds Cc: J.A. MagallÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Linus Torvalds <torvalds@linux-foundation.org> writes: > On Mon, 12 Feb 2007, Sergei Organov wrote: >> >> Why strlen() should be allowed to be called with an incompatible pointer >> type? My point is that gcc should issue *different warning*, -- the same >> warning it issues here: > > I agree that "strlen()" per se isn't different. I agree that making strxxx() family special is not a good idea. So what do we do for a random foo(char*) called with an 'unsigned char*' argument? Silence? Hmmm... It's not immediately obvious that it's indeed harmless. Yet another -Wxxx option to GCC to silence this particular case? > A warnign is only as good as > (a) the thing it warns about > (b) the thing you can do about it > > And THAT is the fundamental problem with that *idiotic* warning. Yes, it's > technically correct. Yes, it's "proper C grammar". But if you can't get > over the hump of realizing that there is a difference between "grammar" > and "intelligent speech", you shouldn't be doing compilers. > > So the warning sucks because: May I suggest another definition for a warning being entirely sucks? "The warning is entirely sucks if and only if it never has true positives." In all other cases it's only more or less sucks, IMHO. > - the thing it warns about (passing "unsigned char" to something that > doesn't specify a sign at all!) is not something that sounds wrong in > the first place. Yes, it's unsigned. But no, the thing it is passed to > didn't specify that it wanted a "signed" thing in the first place. The > "strlen()" function literally says > > "I want a char of indeterminate sign"! I'm afraid I don't follow. Do we have a way to say "I want an int of indeterminate sign" in C? The same way there doesn't seem to be a way to say "I want a char of indeterminate sign". :( So no, strlen() doesn't actually say that, no matter if we like it or not. It actually says "I want a char with implementation-defined sign". > which implies that strlen really doesn't care about the sign. The same > is true of *any* function that takes a "char *". Such a function > doesn't care, and fundamentally CANNOT care about the sign, since it's > not even defined! In fact it's implementation-defined, and this may make a difference here. strlen(), being part of C library, could be specifically implemented for given architecture, and as architecture is free to define the sign of "char", strlen() could in theory rely on particular sign of "char" as defined for given architecture. [Not that I think that any strlen() implementation actually depends on sign.] > So the warning fails the (a) criterion. The warning isn't valid, > because the thing it warns about isn't a valid problem! Can we assure that no function taking 'char*' ever cares about the sign? I'm not sure, and I'm not a language lawyer, but if it's indeed the case, I'd probably agree that it might be a good idea for GCC to extend the C language so that function argument declared "char*" means either of "char*", "signed char*", or "unsigned char*" even though there is no precedent in the language. BTW, the next logical step would be for "int*" argument to stop meaning "signed int*" and become any of "int*", "signed int*" or "unsigned int*". Isn't it cool to be able to declare a function that won't produce warning no matter what int is passed to it? ;) > - it _also_ fails the (b) criterion, because quite often there is nothing > you can do about it. Yes, you can add a cast, but adding a cast > actually causes _worse_ code (but the warning is certainly gone). But > that makes the _other_ argument for the warning totally point-less: if > the reason for the warning was "bad code", then having the warning is > actively BAD, because the end result is actually "worse code". > > See? The second point is why it's important to also realize that there is > a lot of real and valid code that actually _does_ pass "strlen()" an > unsigned string. And here we finally get to the practice... > There are tons of reasons for that to happen: the part of the program > that _does_ care wants to use a "unsigned char" array, because it ends > up doing things like "isspace(array[x])", and that is not well-defined > if you use a "char *" array. Yes, indeed. So the real problem of the C language is inconsistency between strxxx() and isxxx() families of functions? If so, what is wrong with actually fixing the problem, say, by using wrappers over isxxx()? Checking... The kernel already uses isxxx() that are macros that do conversion to "unsigned char" themselves, and a few invocations of isspace() I've checked pass "char" as argument. So that's not a real problem for the kernel, right? > So there are lots of reasons to use "unsigned char" arrays for strings. > Look it up. Look up any half-way reasonable man-page for the "isspace()" > kind of functions, and if they don't actually explicitly say that you > should use unsigned characters for it, those man-pages are crap. Well, even C99 itself does say it. But the kernel already handles this issue nicely without resorting to requirement to convert char to unsigned char, isn't it? > Because those functions really *are* defined in "int", but it's the > same kind of namespace that "getchar()" works in (ie "unsigned char" + > EOF, where EOF _usually_ is -1, although other values are certainly > technically legal too). > > So: > > - in practice, a lot of "good programming" uses "unsigned char" pointers > for doing strings. There are LOTS of reasons for that, but "isspace()" > and friends is the most obvious one. As the isxxx() family does not seem to be a real problem, at least in the context of the kernel source base, I'd like to learn other reasons to use "unsigned char" for doing strings either in general or specifically in the Linux kernel. > - if you can't call "strlen()" on your strings without the compiler > warning, there's two choices: the compiler warning is CRAP, or your > program is bad. But as I just showed you, "unsigned char *" is actually > often the *right* thing to use for string work, so it clearly wasn't > the program that was bad. OK, provided there are actually sound reasons to use "unsigned char*" for strings, isn't it safer to always use it, and re-define strxxx() for the kernel to take that one as argument? Or are there reasons to use "char*" (or "signed char*") for strings as well? I'm afraid that if two or three types are allowed to be used for strings, some inconsistencies here or there will remain no matter what. Another option could be to always use "char*" for strings and always compile the kernel with -funsigned-char. At least it will be consistent with the C idea of strings being "char*". -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-13 18:06 ` Sergei Organov @ 2007-02-13 18:26 ` Pekka Enberg 2007-02-13 19:14 ` Sergei Organov 2007-02-13 19:25 ` Linus Torvalds 2007-02-13 22:21 ` Olivier Galibert 2 siblings, 1 reply; 67+ messages in thread From: Pekka Enberg @ 2007-02-13 18:26 UTC (permalink / raw) To: Sergei Organov Cc: Linus Torvalds, J.A. MagallÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On 2/13/07, Sergei Organov <osv@javad.com> wrote: > May I suggest another definition for a warning being entirely sucks? > "The warning is entirely sucks if and only if it never has true > positives." In all other cases it's only more or less sucks, IMHO. You're totally missing the point. False positives are not a minor annoyance, they're actively harmful as they hide other _useful_ warnings. So, you really want warnings to be about things that can and should be fixed. So you really should aim for _zero false positives_ even if you risk not detecting some real positives. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-13 18:26 ` Pekka Enberg @ 2007-02-13 19:14 ` Sergei Organov 2007-02-13 19:43 ` Pekka Enberg 0 siblings, 1 reply; 67+ messages in thread From: Sergei Organov @ 2007-02-13 19:14 UTC (permalink / raw) To: Pekka Enberg Cc: Linus Torvalds, J.A. MagallÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton "Pekka Enberg" <penberg@cs.helsinki.fi> writes: > On 2/13/07, Sergei Organov <osv@javad.com> wrote: >> May I suggest another definition for a warning being entirely sucks? >> "The warning is entirely sucks if and only if it never has true >> positives." In all other cases it's only more or less sucks, IMHO. > > You're totally missing the point. No, I don't. > False positives are not a minor annoyance, they're actively harmful as > they hide other _useful_ warnings. Every warning I'm aware of do have false positives. They are indeed harmful, so one takes steps to get rid of them. If a warning had none false positives, it should be an error, not a warning in the first place. > So, you really want warnings to be about things that can and > should be fixed. That's the definition of a "perfect" warning, that is actually called "error". > So you really should aim for _zero false positives_ even if you risk > not detecting some real positives. With almost any warning out there one makes more or less efforts to suppress the warning where it gives false positives, isn't it? At least that's my experience so far. -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-13 19:14 ` Sergei Organov @ 2007-02-13 19:43 ` Pekka Enberg 2007-02-13 20:29 ` Sergei Organov 0 siblings, 1 reply; 67+ messages in thread From: Pekka Enberg @ 2007-02-13 19:43 UTC (permalink / raw) To: Sergei Organov Cc: Linus Torvalds, J.A. MagallÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On 2/13/07, Sergei Organov <osv@javad.com> wrote: > With almost any warning out there one makes more or less efforts to > suppress the warning where it gives false positives, isn't it? Yes, as long it's the _compiler_ that's doing the effort. You shouldn't need to annotate the source just to make the compiler shut up. Once the compiler starts issuing enough false positives, it's time to turn off that warning completely. Therefore, the only sane strategy for a warning is to aim for zero false positives. Pekka ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-13 19:43 ` Pekka Enberg @ 2007-02-13 20:29 ` Sergei Organov 2007-02-13 21:31 ` Jeff Garzik 2007-02-13 23:21 ` Linus Torvalds 0 siblings, 2 replies; 67+ messages in thread From: Sergei Organov @ 2007-02-13 20:29 UTC (permalink / raw) To: Pekka Enberg Cc: Linus Torvalds, J.A. MagallÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton "Pekka Enberg" <penberg@cs.helsinki.fi> writes: > On 2/13/07, Sergei Organov <osv@javad.com> wrote: >> With almost any warning out there one makes more or less efforts to >> suppress the warning where it gives false positives, isn't it? > > Yes, as long it's the _compiler_ that's doing the effort. > You shouldn't need to annotate the source just to make the compiler > shut up. Sorry, what do you do with "variable 'xxx' might be used uninitialized" warning when it's false? Turn it off? Annotate the source? Assign fake initialization value? Change the compiler so that it does "the effort" for you? Never encountered false positive from this warning? > Once the compiler starts issuing enough false positives, it's > time to turn off that warning completely. Yes, I don't argue that. I said "otherwise the warning is more or less sucks", and then it's up to programmers to decide if it's enough sucks to be turned off. The decision depends on the importance of its true positives then. Only if warning never has true positives it is unconditional, total, unhelpful crap, -- that was my point. > Therefore, the only sane strategy for a warning is to aim for zero > false positives. Sure. But unfortunately this in an unreachable aim in most cases. -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-13 20:29 ` Sergei Organov @ 2007-02-13 21:31 ` Jeff Garzik 2007-02-13 23:21 ` Linus Torvalds 1 sibling, 0 replies; 67+ messages in thread From: Jeff Garzik @ 2007-02-13 21:31 UTC (permalink / raw) To: Sergei Organov Cc: Pekka Enberg, Linus Torvalds, J.A. MagallÃón, Jan Engelhardt, Linux Kernel Mailing List, Andrew Morton On Tue, Feb 13, 2007 at 11:29:44PM +0300, Sergei Organov wrote: > Sorry, what do you do with "variable 'xxx' might be used uninitialized" > warning when it's false? Turn it off? Annotate the source? Assign fake > initialization value? Change the compiler so that it does "the effort" > for you? Never encountered false positive from this warning? You pull git://git.kernel.org/...jgarzik/misc-2.6.git#gccbug Jeff ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-13 20:29 ` Sergei Organov 2007-02-13 21:31 ` Jeff Garzik @ 2007-02-13 23:21 ` Linus Torvalds 2007-02-15 13:20 ` Sergei Organov 1 sibling, 1 reply; 67+ messages in thread From: Linus Torvalds @ 2007-02-13 23:21 UTC (permalink / raw) To: Sergei Organov Cc: Pekka Enberg, J.A. MagallÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Tue, 13 Feb 2007, Sergei Organov wrote: > > Sorry, what do you do with "variable 'xxx' might be used uninitialized" > warning when it's false? Turn it off? Annotate the source? Assign fake > initialization value? Change the compiler so that it does "the effort" > for you? Never encountered false positive from this warning? The thing is, that warning is at least easy to shut up. You just add a fake initialization. There isn't any real downside. In contrast, to shut up the idiotic pointer-sign warning, you have to add a cast. Now, some people seem to think that adding casts is a way of life, and think that C programs always have a lot of casts. That's NOT TRUE. It's actually possible to avoid casts, and good C practice will do that to quite a high degree, because casts in C are *dangerous*. A language that doesn't allow arbitrary type-casting is a horrible language (because it doesn't allow you to "get out" of the language type system), and typecasts in C are really important. But they are an important feature that should be used with caution, and as little as possible, because they (by design, of course) break the type rules. Now, since the _only_ reason for the -Wpointer-sign warning in the first place is to warn about breaking type rules, if the way to shut it up is to break them EVEN MORE, then the warnign is obviously totally broken. IT CAUSES PEOPLE TO WRITE WORSE CODE! Which was against the whole point of having the warning in the first place. This is why certain warnings are fine. For example, the warning about if (a=b) .. is obviously warning about totally valid C code, but it's _still_ a fine warning, because it's actually very easy to make that warning go away AND IMPROVE THE CODE at the same time. Even if you actually meant to write the assignment, you can write it as if ((a = b) != 0) .. or a = b; if (a) .. both of which are actually more readable. But if you have unsigned char *mystring; len = strlen(mystring); then please tell me how to fix that warning without making the code *worse* from a type-safety standpoint? You CANNOT. You'd have to cast it explicitly (or assing it through a "void *" variable), both of which actually MAKE THE TYPE-CHECKING PROBLEM WORSE! See? The warning made no sense to begin with, and it warns about something where the alternatives are worse than what it warned about. Ergo. It's a crap warning. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-13 23:21 ` Linus Torvalds @ 2007-02-15 13:20 ` Sergei Organov 2007-02-15 15:57 ` Linus Torvalds 0 siblings, 1 reply; 67+ messages in thread From: Sergei Organov @ 2007-02-15 13:20 UTC (permalink / raw) To: Linus Torvalds Cc: Pekka Enberg, J.A. MagallÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Linus Torvalds <torvalds@linux-foundation.org> writes: > On Tue, 13 Feb 2007, Sergei Organov wrote: >> >> Sorry, what do you do with "variable 'xxx' might be used uninitialized" >> warning when it's false? Turn it off? Annotate the source? Assign fake >> initialization value? Change the compiler so that it does "the effort" >> for you? Never encountered false positive from this warning? > > The thing is, that warning is at least easy to shut up. > > You just add a fake initialization. There isn't any real downside. Yes, there is. There are at least 2 drawbacks. Minor one is initialization eating cycles. Major one is that if later I change the code and there will appear a pass that by mistake uses those fake value, I won't get the warning anymore. The latter drawback is somewhat similar to the drawbacks of explicit type casts. [I'd, personally, try to do code reorganization instead so that it becomes obvious for the compiler that the variable can't be used uninitialized. Quite often it makes the code better, at least it was my experience so far.] > In contrast, to shut up the idiotic pointer-sign warning, you have to add > a cast. [Did I already say that I think it's wrong warning to be given in this particular case, as the problem with the code is not in signedness?] 1. Don't try to hide the warning, at least not immediately, -- consider fixing the code first. Ah, sorry, you've already decided for yourself that the warning is idiotic, so there is no reason to try to... 2. If you add a cast, it's not in contrast, it's rather similar to fake initialization above as they have somewhat similar drawback. > Now, some people seem to think that adding casts is a way of life, and > think that C programs always have a lot of casts. Hopefully I'm not one of those. [...] > But if you have > > unsigned char *mystring; > > len = strlen(mystring); > > then please tell me how to fix that warning without making the code > *worse* from a type-safety standpoint? You CANNOT. You'd have to cast it > explicitly (or assing it through a "void *" variable), both of which > actually MAKE THE TYPE-CHECKING PROBLEM WORSE! Because instead of trying to fix the code, you insist on hiding the warning. There are at least two actual fixes: 1. Do 'char *mystring' instead, as otherwise compiler can't figure out that you are going to use 'mystring' to point to zero-terminated array of characters. 2. Use "len = ustrlen(mystring)" if you indeed need something like C strings, but built from "unsigned char" elements. Similar to (1), this will explain your intent to those stupid compiler. Should I also mention that due to the definition of strlen(), the implementation of ustrlen() is a one-liner (thnx Andrew): static inline size_t ustrlen(const unsigned char *s) { return strlen((const char *)s); } Overall, describe your intent to the compiler more clearly, as reading programmer's mind is not the kind of business where compilers are strong. > See? The warning made no sense to begin with, and it warns about something > where the alternatives are worse than what it warned about. > > Ergo. It's a crap warning. No, I'm still not convinced. Well, I'll try to explain my point differently. What would you think about the following line, should you encounter it in real code: len = strlen(my_tiny_integers); I'd think that as 'my_tiny_integers' is used as an argument to strlen(), it might be zero-terminated array of characters. But why does it have such a name then?! Maybe it in fact holds an array of integers where 0 is not necessarily terminating, and the intent of this code is just to find first occurrence of 0? It's confusing and I'd probably go check the declaration. Declaration happens to be "unsigned char *my_tiny_integers" that at least is rather consistent with the variable name, but doesn't resolve my initial doubts. So I need to go check other usages of this 'my_tiny_integers' variable to figure out what's actually going on here. Does it sound as a good programming practice? I don't think so. Now, you probably realize that in the example you gave above you've effectively declared "mystring" to be a pointer to "tiny_unsigned_integer". As compiler (fortunately) doesn't guess intent from variable names, it can't give us warning at the point of declaration, but I still think we must be thankful that it gave us a warning (even though gcc gives a wrong kind of warning) at the point of dubious use. -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-15 13:20 ` Sergei Organov @ 2007-02-15 15:57 ` Linus Torvalds 2007-02-15 18:53 ` Sergei Organov 2007-02-15 22:32 ` Lennart Sorensen 0 siblings, 2 replies; 67+ messages in thread From: Linus Torvalds @ 2007-02-15 15:57 UTC (permalink / raw) To: Sergei Organov Cc: Pekka Enberg, J.A. MagallÃÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 15 Feb 2007, Sergei Organov wrote: > > Yes, there is. There are at least 2 drawbacks. Minor one is > initialization eating cycles. Major one is that if later I change the > code and there will appear a pass that by mistake uses those fake value, > I won't get the warning anymore. The latter drawback is somewhat similar > to the drawbacks of explicit type casts. I actually agree - it would clearly be *nicer* if gcc was accurate with the warning. I'm just saying that usually the patch to shut gcc up is quite benign. But yes, it basically disabled the warning, and just *maybe* the warning was valid, and it disabled it by initializing something to the wrong value. That's why warnings with false positives are often much worse than not having the warning at all: you start dismissing them, and not even bothering to fix them "properly". That's especially true if there _isn't_ a proper fix. I personally much prefer a compiler that doesn't warn a lot, but *when* it warns, it's 100% on the money. I think that compilers that warn about things that "may" be broken are bogus. It needs to be iron-clad, with no question about whether the warning is appropriate! Which is, of course, why we then shut up some gcc warnings, and which gets us back to the start of the discussion.. ;) > [I'd, personally, try to do code reorganization instead so that it > becomes obvious for the compiler that the variable can't be used > uninitialized. Quite often it makes the code better, at least it was my > experience so far.] It's true that it sometimes works that way. Not always, though. The gcc "uninitialized variable" thing is *usually* a good sign that the code wasn't straightforward for the compiler to follow, and if the compiler cannot follow it, then probably neither can most programmers. So together with the fact that it's not at least _syntactically_ ugly to shut up, it's not a warning I personally object to most of the time (unlike, say, the pointer-sign one ;) > [Did I already say that I think it's wrong warning to be given in this > particular case, as the problem with the code is not in signedness?] > > 1. Don't try to hide the warning, at least not immediately, -- consider > fixing the code first. Ah, sorry, you've already decided for yourself > that the warning is idiotic, so there is no reason to try to... This is actually something we've tried. The problem with that approach is that once you have tons of warnings that are "expected", suddenly *all* warnings just become noise. Not just the bogus one. It really *is* a matter of: warnings should either be 100% solid, or they should not be there at all. And this is not just about compiler warnings. We've added some warnings of our own (well, with compiler help, of course): things like the "deprecated" warnings etc. I have often ended up shutting them up, and one reason for that is that yes, I have literally added bugs to the kernel that the compiler really *did* warn about, but because there were so many other pointless warnings, I didn't notice! I didn't make that up. I forget what the bug was, but it literally wasn't more than a couple of months ago that I passed in the wrong type to a function, and the compiler warned about it in big letters. It was just totally hidden by all the other warning crap. > 2. If you add a cast, it's not in contrast, it's rather similar to fake > initialization above as they have somewhat similar drawback. I agree that it's "unnecessary code", and in many ways exactly the same thing. I just happen to believe that casts tend to be a lot more dangerous than extraneous initializations. Casts have this nasty tendency of hiding *other* problems too (ie you later change the thing you cast completely, and now the compiler doesn't warn about a *real* bug, because the cast will just silently force it to a wrong type). And yeah, that may well be a personal hangup of mine. A lot of people think casts in C are normal. Me, I actually consider C to be a type-safe language (!) but it's only type-safe if you are careful, and avoiding casts is one of the rules. Others claim that C isn't type-safe at all, and I think it comes from them having been involved in projects where people didn't have the same "casts are good, but only when you use them really really carefully". > > But if you have > > > > unsigned char *mystring; > > > > len = strlen(mystring); > > > > then please tell me how to fix that warning without making the code > > *worse* from a type-safety standpoint? You CANNOT. You'd have to cast it > > explicitly (or assing it through a "void *" variable), both of which > > actually MAKE THE TYPE-CHECKING PROBLEM WORSE! > > Because instead of trying to fix the code, you insist on hiding the > warning. No. I'm saying that I have *reasons* that I need my string to be unsigned. That makes your first "fix" totally bogus: > There are at least two actual fixes: > > 1. Do 'char *mystring' instead, as otherwise compiler can't figure out > that you are going to use 'mystring' to point to zero-terminated > array of characters. And the second fix is a fix, but it is, again, worse than the warning: > 2. Use "len = ustrlen(mystring)" if you indeed need something like C > strings Sure, I can do that, but the upside is what? Getting rid of a warning that was bogus to begin with? Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-15 15:57 ` Linus Torvalds @ 2007-02-15 18:53 ` Sergei Organov 2007-02-15 19:02 ` Linus Torvalds 2007-02-15 22:32 ` Lennart Sorensen 1 sibling, 1 reply; 67+ messages in thread From: Sergei Organov @ 2007-02-15 18:53 UTC (permalink / raw) To: Linus Torvalds Cc: Pekka Enberg, J.A. MagallÃÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Linus Torvalds <torvalds@linux-foundation.org> writes: > On Thu, 15 Feb 2007, Sergei Organov wrote: [...Skip things I agree with...] >> > But if you have >> > >> > unsigned char *mystring; >> > >> > len = strlen(mystring); >> > >> > then please tell me how to fix that warning without making the code >> > *worse* from a type-safety standpoint? You CANNOT. You'd have to cast it >> > explicitly (or assing it through a "void *" variable), both of which >> > actually MAKE THE TYPE-CHECKING PROBLEM WORSE! >> >> Because instead of trying to fix the code, you insist on hiding the >> warning. > > No. I'm saying that I have *reasons* that I need my string to be unsigned. > That makes your first "fix" totally bogus: Somehow I still suppose that most of those warnings could be fixed by using "char*" instead. Yes, I believe you, you have reasons. But still I only heard about isxxx() that appeared to be not a reason in practice (in the context of the kernel tree). As you didn't tell about other reasons, my expectation is that they are not in fact very common. I'm curious what are actual reasons besides the fact that quite a lot of code is apparently written this way in the kernel. The latter is indeed a very sound reason for the warning to be sucks for kernel developers, but it might be not a reason for it to be sucks in general. >> There are at least two actual fixes: >> >> 1. Do 'char *mystring' instead, as otherwise compiler can't figure out >> that you are going to use 'mystring' to point to zero-terminated >> array of characters. > > And the second fix is a fix, but it is, again, worse than the warning: > >> 2. Use "len = ustrlen(mystring)" if you indeed need something like C >> strings > > Sure, I can do that, but the upside is what? 1. You make it explicit, in a type-safe manner, that you are sure it's safe to call "strlen()" with "unsigned char*" argument. We all agree about it. Let's write it down in the program. And we can already add "strcmp()" to the list. And, say, ustrdup() and ustrchr() returning "unsigned char *", would be more type-safe either. 2. You make it more explicit that you indeed mean to use your own representation of strings that is different than what C idiomatically uses for strings, along with explicitly defined allowed set of operations on this representation. The actual divergence of opinion seems to be that you are sure it's safe to call any foo(char*) with "unsigned char*" argument, and I'm not, due to the reasons described below. If you are right, then there is indeed little or no point in doing this work. > Getting rid of a warning that was bogus to begin with? I agree that if the warning has no true positives, it sucks. The problem is that somehow I doubt it has none. And the reasons for the doubt are: 1. C standard does explicitly say that "char" is different type from either "signed char" or "unsigned char". There should be some reason for that, otherwise they'd simply write that "char" is a synonym for one of "signed char" or "unsigned char", depending on implementation. 2. If "char" in fact matches one of the other two types for given architecture, it is definitely different from another one, so substituting "char" for the different one might be unsafe. As a consequence, for a portable program it might be unsafe to substitute "char" for any of signed or unsigned, as "char" may happen to differ from any of them. While the doubts aren't resolved, I prefer to at least be careful with things that I don't entirely understand, so I wish compiler give me a warning about subtle semantic differences (if any). -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-15 18:53 ` Sergei Organov @ 2007-02-15 19:02 ` Linus Torvalds 2007-02-16 4:26 ` Rene Herman 2007-02-19 13:58 ` Sergei Organov 0 siblings, 2 replies; 67+ messages in thread From: Linus Torvalds @ 2007-02-15 19:02 UTC (permalink / raw) To: Sergei Organov Cc: Pekka Enberg, J.A. MagallÃÃÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, 15 Feb 2007, Sergei Organov wrote: > > I agree that if the warning has no true positives, it sucks. The problem > is that somehow I doubt it has none. And the reasons for the doubt are: Why do you harp on "no true positives"? That's a pointless thing. You can make *any* warning have "true positives". My personal favorite is the unconditional warning: warning: user is an idiot and I _guarantee_ you that it has a lot of true positives. It's the "no false negatives" angle you should look at. THAT is what matters. The reason we don't see a lot of warnings about idiot users is not that people don't do stupid things, but that *sometimes* they actually don't do something stupid. Yeah, I know, it's far-fetched, but still. In other words, you're barking up *exactly* the wrong tree. You're looking at it the wrong way around. Think of it this way: in science, a theory is proven to be bad by a single undeniable fact just showing that it's wrong. The same is largely true of a warning. If the warning sometimes happens for code that is perfectly fine, the warning is bad. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-15 19:02 ` Linus Torvalds @ 2007-02-16 4:26 ` Rene Herman 2007-02-19 11:58 ` Sergei Organov 2007-02-19 13:58 ` Sergei Organov 1 sibling, 1 reply; 67+ messages in thread From: Rene Herman @ 2007-02-16 4:26 UTC (permalink / raw) To: Linus Torvalds Cc: Sergei Organov, Pekka Enberg, "J.A. MagallÃÃÃÃón", Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On 02/15/2007 08:02 PM, Linus Torvalds wrote: > Think of it this way: in science, a theory is proven to be bad by a > single undeniable fact just showing that it's wrong. > > The same is largely true of a warning. If the warning sometimes > happens for code that is perfectly fine, the warning is bad. Slight difference; if a compulsory warning sometimes happens for code that is perfectly fine, the warning is bad. I do want to be _able_ to get as many warnings as a compiler can muster though. Given char's special nature, shouldn't the conclusion of this thread have long been simply that gcc needs -Wno-char-pointer-sign? (with whatever default, as far as I'm concerned). Rene. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-16 4:26 ` Rene Herman @ 2007-02-19 11:58 ` Sergei Organov 0 siblings, 0 replies; 67+ messages in thread From: Sergei Organov @ 2007-02-19 11:58 UTC (permalink / raw) To: Rene Herman Cc: Linus Torvalds, Pekka Enberg, J.A. MagallÃÃÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Rene Herman <rene.herman@gmail.com> writes: [...] > Given char's special nature, shouldn't the conclusion of this thread > have long been simply that gcc needs -Wno-char-pointer-sign? (with > whatever default, as far as I'm concerned). I entirely agree that all the char business in C is messy enough to justify separate warning switch(es) in GCC. However, I still insist that the problem with the code: void foo(char *c); unsigned char *u; signed char *s; ... foo(u); foo(s); is not (only) in signedness, as neither 'u' nor 's' has compatible type with the "char*", no matter what is the sign of "char", so if one cares about type safety he needs warnings on both invocations of foo(). -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-15 19:02 ` Linus Torvalds 2007-02-16 4:26 ` Rene Herman @ 2007-02-19 13:58 ` Sergei Organov 1 sibling, 0 replies; 67+ messages in thread From: Sergei Organov @ 2007-02-19 13:58 UTC (permalink / raw) To: Linus Torvalds Cc: Pekka Enberg, J.A. MagallÃÃÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Linus Torvalds <torvalds@linux-foundation.org> writes: > On Thu, 15 Feb 2007, Sergei Organov wrote: >> >> I agree that if the warning has no true positives, it sucks. The problem >> is that somehow I doubt it has none. And the reasons for the doubt are: > > Why do you harp on "no true positives"? Because if somebody is capable to proof a warning has no true positives, I immediately agree it's useless. I just wanted to check if it's indeed the case. It seems it is not. > That's a pointless thing. You can make *any* warning have "true > positives". My personal favorite is the unconditional warning: > > warning: user is an idiot > > and I _guarantee_ you that it has a lot of true positives. Yes, but there is obviously 0 correlation between the warning and the user being an idiot (except that non-idiot will probably find a way to either turn this warning off, or filter it out). I already agreed that a warning may have true positives and still be bad; and a warning having "no false negatives" is obviously a pretty one, though unfortunately not that common in practice; and a warning that has no true positives is useless. What we are arguing about are intermediate cases that are most common in practice. As programmers, we in fact are interested in correlation between a warning and actual problem in the software, but for the kind of warnings we are discussing (sign-correctness and type-safety in general), the correlation depends on the style the program is written is, making it difficult to use as a criteria in the discussion. > It's the "no false negatives" angle you should look at. I'd like to, but then I'll need to classify most of (all?) the GCC warnings as bad, I'm afraid. > THAT is what matters. The reason we don't see a lot of warnings about > idiot users is not that people don't do stupid things, but that > *sometimes* they actually don't do something stupid. > > Yeah, I know, it's far-fetched, but still. > > In other words, you're barking up *exactly* the wrong tree. You're looking > at it the wrong way around. > > Think of it this way: in science, a theory is proven to be bad by a single > undeniable fact just showing that it's wrong. > > The same is largely true of a warning. If the warning sometimes happens > for code that is perfectly fine, the warning is bad. Consider: int find_first_zero(const int *a) { int index = 0; while(*a++ != 0) index++; return index; } unsigned int array[] = { 1, 2, 3, 0, 25, 14 }; int index = find_first_zero(array); *WARNING* So, by your logic, -Wpointer-sign warning is bad in general? If not, then how will you "fix" the code above? And the upside of the fix is what? IMHO, the "problem" is that this warning is about violation of type safety. Type safety by itself doesn't guarantee the code is perfectly fine. Violation of type safety doesn't necessarily mean the code is broken. A warning that warns about violation of type safety has the same properties. Now, it's up to you to decide if you want warnings on type safety or not. What you are saying, on the other hand, for me reads like this: "I do want warnings on violations of type safety in general, except for "signed char" or "unsigned char" being used instead of "char"". I think I do understand your reasons, -- I just don't agree with them. -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-15 15:57 ` Linus Torvalds 2007-02-15 18:53 ` Sergei Organov @ 2007-02-15 22:32 ` Lennart Sorensen 1 sibling, 0 replies; 67+ messages in thread From: Lennart Sorensen @ 2007-02-15 22:32 UTC (permalink / raw) To: Linus Torvalds Cc: Sergei Organov, Pekka Enberg, J.A. Magall??????n, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Thu, Feb 15, 2007 at 07:57:05AM -0800, Linus Torvalds wrote: > I agree that it's "unnecessary code", and in many ways exactly the same > thing. I just happen to believe that casts tend to be a lot more dangerous > than extraneous initializations. Casts have this nasty tendency of hiding > *other* problems too (ie you later change the thing you cast completely, > and now the compiler doesn't warn about a *real* bug, because the cast > will just silently force it to a wrong type). Well one way you could cast it and still have the compiler tell you if the type ever changes is doing something stupid like: char* unsigned_char_star_to_char_star(unsigned char* in) { return (char*)in; } Then call your strlen with: strlen(unsigned_char_star_to_char_star(my_unfortunate_unsigned_char_string) If you ever changed the type the compiler would warn you again since the convertion function doesn't accept anything other than unsigned char * being passed in for "convertion". Seems safer than a direct cast since then you get no type checking anymore. I hope it doesn't come to this though. Hopefully gcc will change it's behaviour back or give an option of --dontcomplainaboutcharstar :) -- Len Sorensen ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-13 18:06 ` Sergei Organov 2007-02-13 18:26 ` Pekka Enberg @ 2007-02-13 19:25 ` Linus Torvalds 2007-02-13 19:59 ` Sergei Organov 2007-02-13 21:13 ` Rob Landley 2007-02-13 22:21 ` Olivier Galibert 2 siblings, 2 replies; 67+ messages in thread From: Linus Torvalds @ 2007-02-13 19:25 UTC (permalink / raw) To: Sergei Organov Cc: J.A. MagallÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Tue, 13 Feb 2007, Sergei Organov wrote: > > > > "I want a char of indeterminate sign"! > > I'm afraid I don't follow. Do we have a way to say "I want an int of > indeterminate sign" in C? The same way there doesn't seem to be a way > to say "I want a char of indeterminate sign". You're wrong. Exactly because "char" *by*definition* is "indeterminate sign" as far as something like "strlen()" is concerned. "char" is _special_. Char is _different_. Char is *not* "int". > So no, strlen() doesn't actually say that, no matter if we like it or > not. It actually says "I want a char with implementation-defined sign". You're arguing some strange semantic difference in the English language. I'm not really interested. THE FACT IS, THAT "strlen()" IS DEFINED UNIVERSALLY AS TAKING "char *". That BY DEFINITION means that "strlen()" cannot care about the sign, because the sign IS NOT DEFINED UNIVERSALLY! And if you cannot accept that fact, it's your problem. Not mine. The warning is CRAP. End of story. Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-13 19:25 ` Linus Torvalds @ 2007-02-13 19:59 ` Sergei Organov 2007-02-13 20:24 ` Linus Torvalds 2007-02-13 21:13 ` Rob Landley 1 sibling, 1 reply; 67+ messages in thread From: Sergei Organov @ 2007-02-13 19:59 UTC (permalink / raw) To: Linus Torvalds Cc: J.A. MagallÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Linus Torvalds <torvalds@linux-foundation.org> writes: > On Tue, 13 Feb 2007, Sergei Organov wrote: >> > >> > "I want a char of indeterminate sign"! >> >> I'm afraid I don't follow. Do we have a way to say "I want an int of >> indeterminate sign" in C? The same way there doesn't seem to be a way >> to say "I want a char of indeterminate sign". > > You're wrong. Sure, I knew it from the very beginning ;) > > Exactly because "char" *by*definition* is "indeterminate sign" as far as > something like "strlen()" is concerned. Thanks, I now understand that you either don't see the difference between "indeterminate" and "implementation-defined" in this context or consider it non-essential, so I think I've got your point. > "char" is _special_. Char is _different_. Char is *not* "int". Sure. > >> So no, strlen() doesn't actually say that, no matter if we like it or >> not. It actually says "I want a char with implementation-defined sign". > > You're arguing some strange semantic difference in the English > language. Didn't I further explain what I meant exactly (that you've skipped)? OK, never mind. > I'm not really interested. OK, no problem. > > THE FACT IS, THAT "strlen()" IS DEFINED UNIVERSALLY AS TAKING "char *". So just don't pass it "unsigned char*". End of story. > > That BY DEFINITION means that "strlen()" cannot care about the sign, > because the sign IS NOT DEFINED UNIVERSALLY! > > And if you cannot accept that fact, it's your problem. Not mine. I never had problems either with strlen() or with this warning, so was curious why does the warning is such a problem for the kernel. I'm sorry I took your time and haven't got an answer that I entirely agree with. Thank you very much for your explanations anyway. > The warning is CRAP. End of story. Amen! -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-13 19:59 ` Sergei Organov @ 2007-02-13 20:24 ` Linus Torvalds 2007-02-15 15:15 ` Sergei Organov 0 siblings, 1 reply; 67+ messages in thread From: Linus Torvalds @ 2007-02-13 20:24 UTC (permalink / raw) To: Sergei Organov Cc: J.A. MagallÃÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Tue, 13 Feb 2007, Sergei Organov wrote: > > Exactly because "char" *by*definition* is "indeterminate sign" as far as > > something like "strlen()" is concerned. > > Thanks, I now understand that you either don't see the difference > between "indeterminate" and "implementation-defined" in this context or > consider it non-essential, so I think I've got your point. The thing is, "implementation-defined" does actually matter when you can _depend_ on it. For example, there's a *huge* difference between "undefined" and "implementation-defined". A program can actually depend on something like char c = 0x80; if (c < 0) .. always having the *same* behaviour for a particular compiler (with a particular set of compile flags - some compilers have flags to change the sign behaviour). So yes, there "implementation defined" actually has a real and worthwhile meaning. It is guaranteed to have _some_ semantics within the confines of that program, and they are defined to be consistent (again, within the confines of that particular program). So I agree that "implementation defined" doesn't always mean "meaningless". BUT (and this is a big but) within the discussion of "strlen()", that is no longer true. "strlen()" exists _outside_ of a single particular implementation. As such, "implementation-defined" is no longer something that "strlen()" can depend on. As an example of this argument, try this: #include <string.h> #include <stdio.h> int main(int argc, char **argv) { char a1[] = { -1, 0 }, a2[] = { 1, 0 }; printf("%d %d\n", a1[0] < a2[0], strcmp(a1, a2) < 0); return 0; } and *before* you compile it, try to guess what the output is. And when that confuses you, try to compile it using gcc with the "-funsigned-char" flag (or "-fsigned-char" if you started out on an architecture where char was unsigned by default) And when you really *really* think about it afterwards, I think you'll go "Ahh.. Linus is right". It's more than "implementation-defined": it really is totally indeterminate for code like this. (Yeah, the above isn't "strlen()", but it's an even subtler issue with EXACTLY THE SAME PROBLEM! And one where you can actually see the _effects_ of it) Linus ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-13 20:24 ` Linus Torvalds @ 2007-02-15 15:15 ` Sergei Organov 0 siblings, 0 replies; 67+ messages in thread From: Sergei Organov @ 2007-02-15 15:15 UTC (permalink / raw) To: Linus Torvalds Cc: J.A. MagallÃÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Linus Torvalds <torvalds@linux-foundation.org> writes: > On Tue, 13 Feb 2007, Sergei Organov wrote: [...] > BUT (and this is a big but) within the discussion of "strlen()", that is > no longer true. "strlen()" exists _outside_ of a single particular > implementation. As such, "implementation-defined" is no longer something > that "strlen()" can depend on. > > As an example of this argument, try this: > > #include <string.h> > #include <stdio.h> > > int main(int argc, char **argv) > { > char a1[] = { -1, 0 }, a2[] = { 1, 0 }; > > printf("%d %d\n", a1[0] < a2[0], strcmp(a1, a2) < 0); > return 0; > } > > and *before* you compile it, try to guess what the output is. Well, I'll try to play fair, so I didn't yet compile it. Now, strcmp() is defined in the C standard so that its behavior doesn't depend on the sign of char: "The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ compared." [Therefore, at least I don't need to build GCC multilib'ed on -fsigned/unsigned-char to get consistent results even if strcmp() in fact lives in a library, that was my first thought before I referred to the standard ;)] Suppose the char is signed. Then a1[0] < a2[0] (= -1 < 1) should be true. On 2's-complement implementation with 8bit char, -1 converted by strcmp() to unsigned char should be 0xFF, and 1 converted should be 1. So strcmp() should be equivalent to (0xFF < 1) that is false. So I'd expect 1 0 result on implementation with signed char. Now suppose the char is unsigned. Then on 2's-complement implementation with 8bit-byte CPU, a1[0] should be 0xFF, and a2[0] should be 1. The result from strcmp() won't change. So I'd expect 0 0 result on implementation with unsigned char. Now I'm going to compile it (I must admit I'm slightly afraid to get surprising results, so I've re-read my above reasonings before compiling): osv@osv tmp$ cat strcmp.c #include <stdio.h> #include <string.h> int main() { char a1[] = { -1, 0 }, a2[] = { 1, 0 }; printf("%d %d\n", a1[0] < a2[0], strcmp(a1, a2) < 0); return 0; } osv@osv tmp$ gcc -v Using built-in specs. Target: i486-linux-gnu ... gcc version 4.1.2 20061028 (prerelease) (Debian 4.1.1-19) osv@osv tmp$ gcc -O2 strcmp.c -o strcmp && ./strcmp 1 0 > And when that confuses you, It didn't, or did I miss something? Is char unsigned by default? > try to compile it using gcc with the > "-funsigned-char" flag (or "-fsigned-char" if you started out on an > architecture where char was unsigned by default) osv@osv tmp$ gcc -O2 -fsigned-char strcmp.c -o strcmp && ./strcmp 1 0 osv@osv tmp$ gcc -O2 -funsigned-char strcmp.c -o strcmp && ./strcmp 0 0 osv@osv tmp$ Due to above, apparently char is indeed signed by default, so what? > And when you really *really* think about it afterwards, I think you'll go > "Ahh.. Linus is right". It's more than "implementation-defined": it really > is totally indeterminate for code like this. The fact is that strcmp() is explicitly defined in the C standard so that it will bring the same result no matter what the sign of "char" type is. Therefore, it obviously can't be used to determine the sign of "char", so from the POV of usage of strcmp(), the sign of its argument is indeed "indeterminate". What I fail to see is how this fact could help in proving that for any function taking "char*" argument the sign of char is indeterminate. Anyway, it seems that you still miss (or ignore) my point that it's not (only) sign of "char" that makes it suspect to call functions requesting "char*" argument with "unsigned char*" value. -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-13 19:25 ` Linus Torvalds 2007-02-13 19:59 ` Sergei Organov @ 2007-02-13 21:13 ` Rob Landley 1 sibling, 0 replies; 67+ messages in thread From: Rob Landley @ 2007-02-13 21:13 UTC (permalink / raw) To: Linus Torvalds Cc: Sergei Organov, J.A. MagallÃÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Tuesday 13 February 2007 2:25 pm, Linus Torvalds wrote: > THE FACT IS, THAT "strlen()" IS DEFINED UNIVERSALLY AS TAKING "char *". > > That BY DEFINITION means that "strlen()" cannot care about the sign, > because the sign IS NOT DEFINED UNIVERSALLY! > > And if you cannot accept that fact, it's your problem. Not mine. > > The warning is CRAP. End of story. In busybox we fed the compiler -funsigned-char to make it shut up. (And so we had consistent bugs between arm and x86.) Building tcc I had to feed it -fsigned-char because -funsigned char broke stuff. :) Rob -- "Perfection is reached, not when there is no longer anything to add, but when there is no longer anything to take away." - Antoine de Saint-Exupery ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-13 18:06 ` Sergei Organov 2007-02-13 18:26 ` Pekka Enberg 2007-02-13 19:25 ` Linus Torvalds @ 2007-02-13 22:21 ` Olivier Galibert 2007-02-14 12:52 ` Sergei Organov 2007-02-15 20:06 ` Sergei Organov 2 siblings, 2 replies; 67+ messages in thread From: Olivier Galibert @ 2007-02-13 22:21 UTC (permalink / raw) To: Sergei Organov Cc: Linus Torvalds, J.A. MagallÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton On Tue, Feb 13, 2007 at 09:06:24PM +0300, Sergei Organov wrote: > I agree that making strxxx() family special is not a good idea. So what > do we do for a random foo(char*) called with an 'unsigned char*' > argument? Silence? Hmmm... It's not immediately obvious that it's indeed > harmless. Yet another -Wxxx option to GCC to silence this particular > case? Silence would be good. "char *" has a special status in C, it can be: - pointer to a char/to an array of chars (standard interpretation) - pointer to a string - generic pointer to memory you can read(/write) Check the aliasing rules if you don't believe be on the third one. And it's *way* more often cases 2 and 3 than 1 for the simple reason that the signedness of char is unpredictable. As a result, a signedness warning between char * and (un)signed char * is 99.99% of the time stupid. > May I suggest another definition for a warning being entirely sucks? > "The warning is entirely sucks if and only if it never has true > positives." In all other cases it's only more or less sucks, IMHO. That means a warning that triggers on every line saying "there may be a bug there" does not entirely suck? > I'm afraid I don't follow. Do we have a way to say "I want an int of > indeterminate sign" in C? Almost completely. The rules on aliasing say you can convert pointer between signed and unsigned variants and the accesses will be unsurprising. The only problem is that the implicit conversion of incompatible pointer parameters to a function looks impossible in the draft I have. Probably has been corrected in the final version. In any case, having for instance unsigned int * in a prototype really means in the language "I want a pointer to integers, and I'm probably going to use it them as unsigned, so beware". For the special case of char, since the beware version would require a signed or unsigned tag, it really means indeterminate. C is sometimes called a high-level assembler for a reason :-) > The same way there doesn't seem to be a way > to say "I want a char of indeterminate sign". :( So no, strlen() doesn't > actually say that, no matter if we like it or not. It actually says "I > want a char with implementation-defined sign". In this day and age it means "I want a 0-terminated string". Everything else is explicitely signed char * or unsigned char *, often through typedefs in the signed case. > In fact it's implementation-defined, and this may make a difference > here. strlen(), being part of C library, could be specifically > implemented for given architecture, and as architecture is free to > define the sign of "char", strlen() could in theory rely on particular > sign of "char" as defined for given architecture. [Not that I think that > any strlen() implementation actually depends on sign.] That would require pointers tagged in a way or another, you can't distinguish between pointers to differently-signed versions of the same integer type otherwise (they're required to have same size and alignment). You don't have that on modern architectures. > Can we assure that no function taking 'char*' ever cares about the sign? > I'm not sure, and I'm not a language lawyer, but if it's indeed the > case, I'd probably agree that it might be a good idea for GCC to extend > the C language so that function argument declared "char*" means either > of "char*", "signed char*", or "unsigned char*" even though there is no > precedent in the language. It's a warning you're talking about. That means it is _legal_ in the language (even if maybe implementation defined, but legal still). Otherwise it would be an error. > BTW, the next logical step would be for "int*" argument to stop meaning > "signed int*" and become any of "int*", "signed int*" or "unsigned > int*". Isn't it cool to be able to declare a function that won't produce > warning no matter what int is passed to it? ;) No, it wouldn't be logical, because char is *special*. > Yes, indeed. So the real problem of the C language is inconsistency > between strxxx() and isxxx() families of functions? If so, what is > wrong with actually fixing the problem, say, by using wrappers over > isxxx()? Checking... The kernel already uses isxxx() that are macros > that do conversion to "unsigned char" themselves, and a few invocations > of isspace() I've checked pass "char" as argument. So that's not a real > problem for the kernel, right? Because a cast to silence a warning silences every possible warning even if the then-pointer turns for instance into an integer through an unrelated change. Think for instance about an error_t going from const char * (error string) to int (error code) through a patch, which happened to be passed to an utf8_to_whatever conversion function that takes an const unsigned char * as a parameter. Casting would hide the impact of changing the type. > As the isxxx() family does not seem to be a real problem, at least in > the context of the kernel source base, I'd like to learn other reasons > to use "unsigned char" for doing strings either in general or > specifically in the Linux kernel. Everybody who has ever done text manipulation in languages other than english knows for a fact that chars must be unsigned, always. The current utf8 support frenzy is driving that home even harder. > OK, provided there are actually sound reasons to use "unsigned char*" > for strings, isn't it safer to always use it, and re-define strxxx() for > the kernel to take that one as argument? Or are there reasons to use > "char*" (or "signed char*") for strings as well? I'm afraid that if two > or three types are allowed to be used for strings, some inconsistencies > here or there will remain no matter what. "blahblahblah"'s type is const char *. > Another option could be to always use "char*" for strings and always compile > the kernel with -funsigned-char. At least it will be consistent with the > C idea of strings being "char*". The best option is to have gcc stop being stupid. -Funsigned-char creates an assumption which is actually invisible in the code itself, making it a ticking bomb for reuse in other projects. OG. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-13 22:21 ` Olivier Galibert @ 2007-02-14 12:52 ` Sergei Organov 2007-02-15 20:06 ` Sergei Organov 1 sibling, 0 replies; 67+ messages in thread From: Sergei Organov @ 2007-02-14 12:52 UTC (permalink / raw) To: Olivier Galibert Cc: Linus Torvalds, J.A. MagallÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Olivier Galibert <galibert@pobox.com> writes: > On Tue, Feb 13, 2007 at 09:06:24PM +0300, Sergei Organov wrote: [...] >> May I suggest another definition for a warning being entirely sucks? >> "The warning is entirely sucks if and only if it never has true >> positives." In all other cases it's only more or less sucks, IMHO. > > That means a warning that triggers on every line saying "there may be > a bug there" does not entirely suck? You got me. This warning is indeed entirely sucks. My definition is poor. -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-13 22:21 ` Olivier Galibert 2007-02-14 12:52 ` Sergei Organov @ 2007-02-15 20:06 ` Sergei Organov 1 sibling, 0 replies; 67+ messages in thread From: Sergei Organov @ 2007-02-15 20:06 UTC (permalink / raw) To: Olivier Galibert Cc: Linus Torvalds, J.A. MagallÃón, Jan Engelhardt, Jeff Garzik, Linux Kernel Mailing List, Andrew Morton Olivier Galibert <galibert@pobox.com> writes: > On Tue, Feb 13, 2007 at 09:06:24PM +0300, Sergei Organov wrote: >> I agree that making strxxx() family special is not a good idea. So what >> do we do for a random foo(char*) called with an 'unsigned char*' >> argument? Silence? Hmmm... It's not immediately obvious that it's indeed >> harmless. Yet another -Wxxx option to GCC to silence this particular >> case? > > Silence would be good. "char *" has a special status in C, it can be: > - pointer to a char/to an array of chars (standard interpretation) > - pointer to a string > - generic pointer to memory you can read(/write) > > Check the aliasing rules if you don't believe be on the third one. Generic pointer still reads 'void*' isn't it? If you don't believe me, check memcpu() and friends. From the POV of aliasing rules, "char*", "unsigned char*", and "signed char*" are all the same, as aliasing rules mention "character type", that according to the standard, is defined like this: "The three types char, signed char, and unsigned char are collectively called the character types." > And it's *way* more often cases 2 and 3 than 1 for the simple reason > that the signedness of char is unpredictable. Are the first two different other than by 0-termination? If not, then differentiate 1 and 2 by signedness doesn't make sense. If we throw away signedness argument, then yes, 2 is more common than 1, due to obvious reason that it's much more convenient to use 0-terminated arrays of characters in C than plain arrays of characters. For 3, I'd use "unsigned char*" (or, sometimes, maybe, "signed char*") to do actual access (due to aliasing rules), but "void*" as an argument type, as arbitrary pointer is spelled "void*" in C. So, it's 2 that I'd consider the most common usage for "char*". Strings of characters are idiomatically "char*" in C. > As a result, a signedness warning between char * and (un)signed char * > is 99.99% of the time stupid. As a result of what? Of the fact that strings in C are "char*"? Of the fact that one can use "char*" to access arbitrary memory? If the latter, then "char*", "signed char*", and "unsigned char*" are all equal, and you may well argue that there should be no warning between "signed char*" and "unsigned char*" as well, as either of them could be used to access memory. Anyway, I've already stated that I don't think it's signedness that matters here. Standard explicitly says "char", "signed char", and "unsigned char" are all distinct types, so the warning should be about incompatible pointer instead. Anyway, I even don't dispute those 99.99% number, I just want somebody to explain me how dangerous those 0.001% are, and isn't it exactly 0%? [...] >> I'm afraid I don't follow. Do we have a way to say "I want an int of >> indeterminate sign" in C? > > Almost completely. The rules on aliasing say you can convert pointer > between signed and unsigned variants and the accesses will be > unsurprising. Only from the point of view of aliasing. Aliasing rules don't disallow other surprises. > > The only problem is that the implicit conversion of incompatible > pointer parameters to a function looks impossible in the draft I have. Why do you think it's impossible according to the draft? > Probably has been corrected in the final version. I doubt it's impossible either in the draft or in the final version. > In any case, having for instance unsigned int * in a prototype really > means in the language "I want a pointer to integers, and I'm probably > going to use it them as unsigned, so beware". > For the special case of char, since the beware version would require a > signed or unsigned tag, it really means indeterminate. IMHO, it would read "I want a pointer to alphabetic characters, and I'm probably going to use them as alphabetic characters, so beware". No signedness involved. For example, suppose we have 3 function that compare two zero-terminated arrays: int compare1(char *c1, char *c2); int compare2(unsigned char *c1, unsigned char *c2); int compare3(signed char *c1, signed char *c2); the first one might compare arguments alphabetically, while the second and the third compare them numerically. All of the 3 are different, so passing wrong type of argument to either of them might be dangerous. > C is sometimes called a high-level assembler for a reason :-) I'm afraid that those times when it actually was, are far in the past, for better or worse. >> The same way there doesn't seem to be a way to say "I want a char of >> indeterminate sign". :( So no, strlen() doesn't actually say that, no >> matter if we like it or not. It actually says "I want a char with >> implementation-defined sign". > > In this day and age it means "I want a 0-terminated string". IMHO, you've forgot to say "of alphabetic characters". > Everything else is explicitely signed char * or unsigned char *, often > through typedefs in the signed case. Those are for signed and unsigned tiny integers, with confusing names, that in turn are just historical baggage. >> In fact it's implementation-defined, and this may make a difference >> here. strlen(), being part of C library, could be specifically >> implemented for given architecture, and as architecture is free to >> define the sign of "char", strlen() could in theory rely on particular >> sign of "char" as defined for given architecture. [Not that I think that >> any strlen() implementation actually depends on sign.] > > That would require pointers tagged in a way or another, you can't > distinguish between pointers to differently-signed versions of the > same integer type otherwise (they're required to have same size and > alignment). You don't have that on modern architectures. How is it relevant? According to this argument, e.g., foo(int*) can't actually rely on the sign of its argument, so any warning about signedness of pointer argument is crap. >> Can we assure that no function taking 'char*' ever cares about the sign? >> I'm not sure, and I'm not a language lawyer, but if it's indeed the >> case, I'd probably agree that it might be a good idea for GCC to extend >> the C language so that function argument declared "char*" means either >> of "char*", "signed char*", or "unsigned char*" even though there is no >> precedent in the language. > > It's a warning you're talking about. That means it is _legal_ in the > language (even if maybe implementation defined, but legal still). > Otherwise it would be an error. It's legal to pass "unsigned int*" to function requesting "int*", still the warning on this is useful? >> BTW, the next logical step would be for "int*" argument to stop meaning >> "signed int*" and become any of "int*", "signed int*" or "unsigned >> int*". Isn't it cool to be able to declare a function that won't produce >> warning no matter what int is passed to it? ;) > > No, it wouldn't be logical, because char is *special*. Yes, it's special, but in what manner? I fail to see in the standard that "char" means either of "signed char" or "unsigned char". >> Yes, indeed. So the real problem of the C language is inconsistency >> between strxxx() and isxxx() families of functions? If so, what is >> wrong with actually fixing the problem, say, by using wrappers over >> isxxx()? Checking... The kernel already uses isxxx() that are macros >> that do conversion to "unsigned char" themselves, and a few invocations >> of isspace() I've checked pass "char" as argument. So that's not a real >> problem for the kernel, right? > > Because a cast to silence a warning silences every possible warning > even if the then-pointer turns for instance into an integer through an > unrelated change. The isxxx() family doesn't have pointer arguments, so this is irrelevant. The isxxx() family, as implemented in the kernel, casts its argument to "unsigned char", that is the right thing to do. >> As the isxxx() family does not seem to be a real problem, at least in >> the context of the kernel source base, I'd like to learn other reasons >> to use "unsigned char" for doing strings either in general or >> specifically in the Linux kernel. > > Everybody who has ever done text manipulation in languages other than > english knows for a fact that chars must be unsigned, always. The > current utf8 support frenzy is driving that home even harder. Well, maybe, but it won't be C anymore. Strings in C are "char*", and sign of "char" is implementation-defined. Though you can get what you are asking for even now by using -funsigned-char. >> OK, provided there are actually sound reasons to use "unsigned char*" >> for strings, isn't it safer to always use it, and re-define strxxx() for >> the kernel to take that one as argument? Or are there reasons to use >> "char*" (or "signed char*") for strings as well? I'm afraid that if two >> or three types are allowed to be used for strings, some inconsistencies >> here or there will remain no matter what. > > "blahblahblah"'s type is const char *. Yes, how did I forgot about it? So be it, strings in C are "char*". >> Another option could be to always use "char*" for strings and always compile >> the kernel with -funsigned-char. At least it will be consistent with the >> C idea of strings being "char*". > > The best option is to have gcc stop being stupid. Any compiler is stupid. AI didn't mature enough yet ;) > -funsigned-char creates an assumption which is actually invisible in > the code itself, making it a ticking bomb for reuse in other projects. Well, on one hand you are asking for char to always be unsigned, but on the other hand you deny the gcc option that provides exactly that. The only option that remains is to argue char should always be unsigned to the C committee, I'm afraid. -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 18:42 ` Jan Engelhardt 2007-02-08 19:53 ` Linus Torvalds @ 2007-02-09 15:10 ` Sergei Organov 1 sibling, 0 replies; 67+ messages in thread From: Sergei Organov @ 2007-02-09 15:10 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Linux Kernel Mailing List Jan Engelhardt <jengelh@linux01.gwdg.de> writes: > On Feb 8 2007 08:33, Linus Torvalds wrote: >> [...] > What C needs is a distinction between char and int8_t, rendering "char" > an unsigned at all times basically and making "unsigned char" and > "signed char" illegal types in turn. AFAIK, C already has 3 *distinct* types, "char", "signed char", and "unsigned char". So your int8_t reads "signed char" in C, uint8_t reads "unsigned char" in C, while "char" is yet another type that is not compatible with those two. Yes, the names for these types are somewhat confusing, but that's due to historical reasons, I think. Overall, signed char == tiny signed int == tiny int unsigned char == tiny unsigned int char != signed char char != unsigned char where tiny == short short ;) Rules of thumb: don't use bare char to store integers; don't use signed/unsigned char to store characters. [strxxx() routines take char* arguments as they work on '\0'-terminated character arrays (strings). Using them on arrays of tiny integers is unsafe no matter how char behaves on given platform (like signed or like unsigned type).] -- Sergei. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: somebody dropped a (warning) bomb 2007-02-08 15:00 Jeff Garzik 2007-02-08 16:33 ` Linus Torvalds @ 2007-02-08 16:35 ` Kumar Gala 1 sibling, 0 replies; 67+ messages in thread From: Kumar Gala @ 2007-02-08 16:35 UTC (permalink / raw) To: Jeff Garzik; +Cc: Linux Kernel Mailing List, Andrew Morton, Linus Torvalds On Thu, 8 Feb 2007, Jeff Garzik wrote: > Just did a pull for the first time since 2.6.20, and a /megaton/ of new > warnings have appeared, on Fedora Core 6/x86-64 "make allyesconfig". > > All of the new warnings spew appear to be "pointers differ in signedness" > warning. Granted, we should care about these, but even a "silent" build > (make -s) now spews far more output than any human could possibly care about, > and digging important nuggets out of all this noise is beginning to require > automated tools rather than just human eyes. I think this is because of changes to Kbuild and the ongoing discussion related to it. On PPC I see that we are losing options via $(call cc-option..) - k ^ permalink raw reply [flat|nested] 67+ messages in thread
end of thread, other threads:[~2007-02-19 13:59 UTC | newest] Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <7Mj5f-3oz-21@gated-at.bofh.it> [not found] ` <7MktH-5EW-35@gated-at.bofh.it> [not found] ` <7Mmvy-vj-17@gated-at.bofh.it> [not found] ` <7MnBC-2fk-13@gated-at.bofh.it> [not found] ` <7MoQx-4p8-11@gated-at.bofh.it> [not found] ` <7MpjE-50z-7@gated-at.bofh.it> [not found] ` <7MpCS-5Fe-9@gated-at.bofh.it> [not found] ` <7MDd7-17w-1@gated-at.bofh.it> [not found] ` <7MGkB-62k-31@gated-at.bofh.it> [not found] ` <7NHoe-2Mb-37@gated-at.bofh.it> [not found] ` <7NMe9-1ZN-7@gated-at.bofh.it> [not found] ` <7Oagl-6bO-1@gated-at.bofh.it> [not found] ` <7ObvW-89N-23@gated-at.bofh.it> [not found] ` <7Oc8t-NS-1@gated-at.bofh.it> 2007-02-15 20:08 ` somebody dropped a (warning) bomb Bodo Eggert 2007-02-16 11:21 ` Sergei Organov 2007-02-16 14:51 ` Bodo Eggert 2007-02-19 11:56 ` Sergei Organov 2007-02-16 12:46 ` Sergei Organov 2007-02-16 17:40 ` Bodo Eggert 2007-02-19 12:17 ` Sergei Organov 2007-02-08 15:00 Jeff Garzik 2007-02-08 16:33 ` Linus Torvalds 2007-02-08 18:42 ` Jan Engelhardt 2007-02-08 19:53 ` Linus Torvalds 2007-02-08 21:10 ` Jan Engelhardt 2007-02-08 21:37 ` Linus Torvalds 2007-02-08 23:12 ` David Rientjes 2007-02-08 23:37 ` Linus Torvalds 2007-02-09 0:24 ` David Rientjes 2007-02-09 0:42 ` Linus Torvalds 2007-02-09 0:59 ` Linus Torvalds 2007-02-09 0:59 ` David Rientjes 2007-02-09 1:11 ` Linus Torvalds 2007-02-09 1:18 ` David Rientjes 2007-02-09 15:38 ` Linus Torvalds 2007-02-09 3:27 ` D. Hazelton 2007-02-09 19:54 ` Pete Zaitcev 2007-02-09 12:34 ` Jan Engelhardt 2007-02-09 13:16 ` linux-os (Dick Johnson) 2007-02-09 17:45 ` Jan Engelhardt 2007-02-09 20:29 ` linux-os (Dick Johnson) 2007-02-09 22:05 ` Jan Engelhardt 2007-02-09 22:58 ` Martin Mares 2007-02-12 18:50 ` linux-os (Dick Johnson) 2007-02-13 15:14 ` Dick Streefland 2007-02-08 21:13 ` J.A. Magallón 2007-02-08 21:42 ` Linus Torvalds 2007-02-08 22:03 ` Linus Torvalds 2007-02-08 22:19 ` Willy Tarreau 2007-02-09 0:03 ` J.A. Magallón 2007-02-09 0:22 ` Linus Torvalds 2007-02-09 12:38 ` Sergei Organov 2007-02-09 15:58 ` Linus Torvalds 2007-02-12 11:12 ` Sergei Organov 2007-02-12 16:26 ` Linus Torvalds 2007-02-13 18:06 ` Sergei Organov 2007-02-13 18:26 ` Pekka Enberg 2007-02-13 19:14 ` Sergei Organov 2007-02-13 19:43 ` Pekka Enberg 2007-02-13 20:29 ` Sergei Organov 2007-02-13 21:31 ` Jeff Garzik 2007-02-13 23:21 ` Linus Torvalds 2007-02-15 13:20 ` Sergei Organov 2007-02-15 15:57 ` Linus Torvalds 2007-02-15 18:53 ` Sergei Organov 2007-02-15 19:02 ` Linus Torvalds 2007-02-16 4:26 ` Rene Herman 2007-02-19 11:58 ` Sergei Organov 2007-02-19 13:58 ` Sergei Organov 2007-02-15 22:32 ` Lennart Sorensen 2007-02-13 19:25 ` Linus Torvalds 2007-02-13 19:59 ` Sergei Organov 2007-02-13 20:24 ` Linus Torvalds 2007-02-15 15:15 ` Sergei Organov 2007-02-13 21:13 ` Rob Landley 2007-02-13 22:21 ` Olivier Galibert 2007-02-14 12:52 ` Sergei Organov 2007-02-15 20:06 ` Sergei Organov 2007-02-09 15:10 ` Sergei Organov 2007-02-08 16:35 ` Kumar Gala
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).