linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>,
	linux-kernel@vger.kernel.org, linux-kbuild@vger.kernel.org,
	linux-arch@vger.kernel.org, linux-toolchains@vger.kernel.org,
	Masahiro Yamada <masahiroy@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [PATCH] kbuild: treat char as always signed
Date: Wed, 19 Oct 2022 11:11:16 -0700	[thread overview]
Message-ID: <CAHk-=wiNNKLFfa0d+Hk=Wm5caiKjLY4V9wwu9DhcSSwPuMbxrg@mail.gmail.com> (raw)
In-Reply-To: <20221019174345.GM25951@gate.crashing.org>

On Wed, Oct 19, 2022 at 10:45 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> When I did this more than a decade ago there indeed was a LOT of noise,
> mostly caused by dubious code.

It really happens with explicitly *not* dubious code.

Using 'unsigned char[]' is very common in code that actually does
anything where you care about the actual byte values. Things like
utf-8 handling, things like compression, lots and lots of cases.

But a number of those cases are still dealing with *strings*. UTF-8 is
still a perfectly valid C string format, and using 'strlen()' on a
buffer that contains UTF-8 is neither unusual nor wrong. It is still
the proper way to get the byte length of the thing. It's how UTF-8 is
literally designed.

And -Wpointer-sign will complain about that, unless you start doing
explicit casting, which is just a worse fix than the disease.

Explicit casts are bad (unless, of course, you are explicitly trying
to violate the type system, when they are both required, and a great
way to say "look, I'm doing something dangerous").

So people who say "just cast it", don't understand that casts *should*
be seen as "this code is doing something special, tread carefully". If
you just randomly add casts to shut up a warning, the casts become
normalized and don't raise the kind of warning signs that they
*should* raise.

And it's really annoying, because the code ends up using 'unsigned
char' exactly _because_ it's trying to be careful and explicit about
signs, and then the warning makes that carefully written code worse.

> Then suggest something better?  Or suggest improvements to the existing
> warning?

As I mentioned in the next email, I tried to come up with something
better in sparse, which wasn't based on the pointer type comparison,
but on the actual 'char' itself.

My (admittedly only ever half-implemented) thing actually worked fine
for the simple cases (where simplification would end up just undoing
all the "expand char to int" because the end use was just assigned to
another char, or it was masked for other reasons).

But while sparse does a lot of basic optimizations, it still left
enough "look, you're doing sign-extensions on a 'char'" on the table
that it warned about perfectly valid stuff.

And maybe that's fundamentally hard.

The "-Wpointer-sign" thing could probably be fairly easily improved,
by just recognizing that things like 'strlen()' and friends do not
care about the sign of 'char', and neither does a 'strcmp()' that only
checks for equality (but if you check the *sign* of strcmp, it does
matter).

It's been some time since I last tried it, but at least from memory,
it really was mostly the standard C string functions that caused
almost all problems.  Your *own* functions you can just make sure the
signedness is right, but it's really really annoying when you try to
be careful about the byte signs, and the compiler starts complaining
just because you want to use the bog-standard 'strlen()' function.

And no, something like 'ustrlen()' with a hidden cast is just noise
for a warning that really shouldn't exist.

So some way to say 'this function really doesn't care about the sign
of this pointer' (and having the compiler know that for the string
functions it already knows about anyway) would probably make almost
all problems with -Wsign-warning go away.

Put another way: 'char *' is so fundamental and inherent in C, that
you can't just warn when people use it in contexts where sign really
doesn't matter.

                 Linus

  reply	other threads:[~2022-10-19 18:11 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-19 16:26 [PATCH] kbuild: treat char as always signed Jason A. Donenfeld
2022-10-19 16:54 ` Segher Boessenkool
2022-10-19 17:14   ` Linus Torvalds
2022-10-19 17:26     ` Linus Torvalds
2022-10-19 18:10       ` Nick Desaulniers
2022-10-19 18:35         ` Linus Torvalds
2022-10-19 19:23           ` Andy Shevchenko
2022-10-19 19:36             ` Linus Torvalds
2022-10-19 17:43     ` Segher Boessenkool
2022-10-19 18:11       ` Linus Torvalds [this message]
2022-10-19 18:20         ` Nick Desaulniers
2022-10-19 18:56           ` Linus Torvalds
2022-10-19 19:11             ` Kees Cook
2022-10-19 19:30               ` Linus Torvalds
2022-10-19 20:35                 ` Jason A. Donenfeld
2022-10-20  0:10                   ` Linus Torvalds
2022-10-20  3:11                     ` Jason A. Donenfeld
2022-10-19 20:15             ` Segher Boessenkool
2022-10-19 21:07         ` David Laight
2022-10-19 21:26           ` Segher Boessenkool
2022-10-20 10:41         ` Gabriel Paubert
2022-10-21 22:46           ` Linus Torvalds
2022-10-22  6:06             ` Gabriel Paubert
2022-10-22 18:16               ` Linus Torvalds
2022-10-23 20:23                 ` Gabriel Paubert
2022-10-25 23:00                   ` Kees Cook
2022-10-26  0:04                     ` Jason A. Donenfeld
2022-10-26 15:41                       ` Kees Cook
2022-10-19 19:54 ` Linus Torvalds
2022-10-19 20:23   ` Jason A. Donenfeld
2022-10-19 20:30     ` [PATCH v2] kbuild: treat char as always unsigned Jason A. Donenfeld
2022-10-19 23:56       ` Linus Torvalds
2022-10-20  0:02         ` Jason A. Donenfeld
2022-10-20  0:38           ` Linus Torvalds
2022-10-20  2:59             ` Jason A. Donenfeld
2022-10-20 18:41             ` Kees Cook
2022-10-21  1:01               ` Jason A. Donenfeld
2022-10-20 20:24         ` Segher Boessenkool
2022-10-24  9:24       ` Dan Carpenter
2022-10-24  9:30         ` Dan Carpenter
2022-10-24 16:33           ` Jason A. Donenfeld
2022-10-24 17:10             ` Linus Torvalds
2022-10-24 17:17               ` Jason A. Donenfeld
2022-10-25 19:22                 ` Kalle Valo
2022-10-25 10:16               ` David Laight
2022-10-24 15:17         ` Jason A. Donenfeld
2022-12-21 14:53       ` Guenter Roeck
2022-12-21 15:05         ` Geert Uytterhoeven
2022-12-21 15:23           ` Guenter Roeck
2022-12-21 15:29           ` Rasmus Villemoes
2022-12-21 15:56             ` Guenter Roeck
2022-12-21 17:06               ` Linus Torvalds
2022-12-21 17:19                 ` Guenter Roeck
2022-12-21 18:46                   ` Linus Torvalds
2022-12-21 19:08                     ` Linus Torvalds
2022-12-21 21:01                     ` Guenter Roeck
2022-12-22 13:05                     ` Geert Uytterhoeven
2022-12-22 10:41                 ` David Laight
     [not found]                   ` <f02e0ac7f2d805020a7ba66803aaff3e31b5eeff.camel@t-online.de>
2022-12-24  9:47                     ` Geert Uytterhoeven
2022-12-30 11:39                     ` David Laight
2022-12-30 13:13                       ` David Laight
2023-01-02  8:29                       ` Geert Uytterhoeven
2022-12-21 17:49               ` Andreas Schwab
2022-12-21 16:57             ` Geert Uytterhoeven
2022-10-19 20:58   ` [PATCH] kbuild: treat char as always signed David Laight
2022-10-26  0:10   ` make ctype ascii only? (was [PATCH] kbuild: treat char as always signed) Rasmus Villemoes
2022-10-26 18:10     ` Linus Torvalds
2022-10-27  7:59       ` Rasmus Villemoes
2022-10-27 18:28         ` Linus Torvalds
     [not found] ` <202210201618.8XhEGsLd-lkp@intel.com>
2022-10-20 16:33   ` [PATCH] kbuild: treat char as always signed Jason A. Donenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wiNNKLFfa0d+Hk=Wm5caiKjLY4V9wwu9DhcSSwPuMbxrg@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=Jason@zx2c4.com \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=keescook@chromium.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kbuild@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-toolchains@vger.kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=segher@kernel.crashing.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).