linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>,
	linux-kernel@vger.kernel.org, linux-kbuild@vger.kernel.org,
	linux-arch@vger.kernel.org, linux-toolchains@vger.kernel.org,
	Masahiro Yamada <masahiroy@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: make ctype ascii only? (was [PATCH] kbuild: treat char as always signed)
Date: Thu, 27 Oct 2022 11:28:02 -0700	[thread overview]
Message-ID: <CAHk-=wgadqCK7xAOzCdDHa7CQWC1z3df6a-pHQaF73ZjSZ58wg@mail.gmail.com> (raw)
In-Reply-To: <915a104b-0e70-dfb8-3c85-54fd1e5e63e5@rasmusvillemoes.dk>

On Thu, Oct 27, 2022 at 12:59 AM Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
>
> AFAICT, the differences are:
>
> - 0xaa (FEMININE ORDINAL INDICATOR), 0xb5 (MICRO SIGN), 0xba (FEMININE
> ORDINAL INDICATOR) should be lower (hence alpha and alnum), not punct.
>
> - depending a little on just exactly what one wants latin1 to mean, but
> if it does mean "first 256 codepoints of Unicode", 0x80-0x9f should be cntrl
>
> - for some reason at least glibc seems to classify 0xa0 as punctuation
> and not space (hence also as isgraph)
>
> - 0xdf and 0xff are correctly classified as lower, but since they don't
> have upper-case versions (at least not any that are representable in
> latin1), correct toupper() behaviour is to return them unchanged, but we
> just subtract 0x20, so 0xff becomes 0xdf which isn't isupper() and 0xdf
> becomes something that isn't even isalpha().

Heh.

Honestly, I don't think we should care at all.

For the byte range 128-255, anybody who uses ctype on them gets what
they get. In the kernel, the most likely use of it is for 'isprint()',
and if those care, they can (and some do) use 'isascii()' in addition.

I don't know if you realize, but the kernel already says "screw libc",
and makes all the isxyz() things just cast the argument to 'unsigned
char', and doesn't care about EOF.

And for the rest, let's just call it the "kernel locale", and just
admit that the kernel locale is entirely historical.

Boom - problem solved, and it's entirely standards conformant (apart
possibly from the EOF case, I think that is marked as a "lower case
character" right now ;)

Looking through

    https://pubs.opengroup.org/onlinepubs/9699919799/

I'm not actually seeing anything that says that we don't do *exactly*
what the standard requires.

You thinking that the kernel locale is US-ASCII is just wrong.

              Linus

  reply	other threads:[~2022-10-27 18:29 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-19 16:26 [PATCH] kbuild: treat char as always signed Jason A. Donenfeld
2022-10-19 16:54 ` Segher Boessenkool
2022-10-19 17:14   ` Linus Torvalds
2022-10-19 17:26     ` Linus Torvalds
2022-10-19 18:10       ` Nick Desaulniers
2022-10-19 18:35         ` Linus Torvalds
2022-10-19 19:23           ` Andy Shevchenko
2022-10-19 19:36             ` Linus Torvalds
2022-10-19 17:43     ` Segher Boessenkool
2022-10-19 18:11       ` Linus Torvalds
2022-10-19 18:20         ` Nick Desaulniers
2022-10-19 18:56           ` Linus Torvalds
2022-10-19 19:11             ` Kees Cook
2022-10-19 19:30               ` Linus Torvalds
2022-10-19 20:35                 ` Jason A. Donenfeld
2022-10-20  0:10                   ` Linus Torvalds
2022-10-20  3:11                     ` Jason A. Donenfeld
2022-10-19 20:15             ` Segher Boessenkool
2022-10-19 21:07         ` David Laight
2022-10-19 21:26           ` Segher Boessenkool
2022-10-20 10:41         ` Gabriel Paubert
2022-10-21 22:46           ` Linus Torvalds
2022-10-22  6:06             ` Gabriel Paubert
2022-10-22 18:16               ` Linus Torvalds
2022-10-23 20:23                 ` Gabriel Paubert
2022-10-25 23:00                   ` Kees Cook
2022-10-26  0:04                     ` Jason A. Donenfeld
2022-10-26 15:41                       ` Kees Cook
2022-10-19 19:54 ` Linus Torvalds
2022-10-19 20:23   ` Jason A. Donenfeld
2022-10-19 20:30     ` [PATCH v2] kbuild: treat char as always unsigned Jason A. Donenfeld
2022-10-19 23:56       ` Linus Torvalds
2022-10-20  0:02         ` Jason A. Donenfeld
2022-10-20  0:38           ` Linus Torvalds
2022-10-20  2:59             ` Jason A. Donenfeld
2022-10-20 18:41             ` Kees Cook
2022-10-21  1:01               ` Jason A. Donenfeld
2022-10-20 20:24         ` Segher Boessenkool
2022-10-24  9:24       ` Dan Carpenter
2022-10-24  9:30         ` Dan Carpenter
2022-10-24 16:33           ` Jason A. Donenfeld
2022-10-24 17:10             ` Linus Torvalds
2022-10-24 17:17               ` Jason A. Donenfeld
2022-10-25 19:22                 ` Kalle Valo
2022-10-25 10:16               ` David Laight
2022-10-24 15:17         ` Jason A. Donenfeld
2022-12-21 14:53       ` Guenter Roeck
2022-12-21 15:05         ` Geert Uytterhoeven
2022-12-21 15:23           ` Guenter Roeck
2022-12-21 15:29           ` Rasmus Villemoes
2022-12-21 15:56             ` Guenter Roeck
2022-12-21 17:06               ` Linus Torvalds
2022-12-21 17:19                 ` Guenter Roeck
2022-12-21 18:46                   ` Linus Torvalds
2022-12-21 19:08                     ` Linus Torvalds
2022-12-21 21:01                     ` Guenter Roeck
2022-12-22 13:05                     ` Geert Uytterhoeven
2022-12-22 10:41                 ` David Laight
     [not found]                   ` <f02e0ac7f2d805020a7ba66803aaff3e31b5eeff.camel@t-online.de>
2022-12-24  9:47                     ` Geert Uytterhoeven
2022-12-30 11:39                     ` David Laight
2022-12-30 13:13                       ` David Laight
2023-01-02  8:29                       ` Geert Uytterhoeven
2022-12-21 17:49               ` Andreas Schwab
2022-12-21 16:57             ` Geert Uytterhoeven
2022-10-19 20:58   ` [PATCH] kbuild: treat char as always signed David Laight
2022-10-26  0:10   ` make ctype ascii only? (was [PATCH] kbuild: treat char as always signed) Rasmus Villemoes
2022-10-26 18:10     ` Linus Torvalds
2022-10-27  7:59       ` Rasmus Villemoes
2022-10-27 18:28         ` Linus Torvalds [this message]
     [not found] ` <202210201618.8XhEGsLd-lkp@intel.com>
2022-10-20 16:33   ` [PATCH] kbuild: treat char as always signed Jason A. Donenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgadqCK7xAOzCdDHa7CQWC1z3df6a-pHQaF73ZjSZ58wg@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=Jason@zx2c4.com \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=keescook@chromium.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kbuild@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-toolchains@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=masahiroy@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).