From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0E461C433FE for ; Wed, 26 Oct 2022 18:11:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234040AbiJZSLZ (ORCPT ); Wed, 26 Oct 2022 14:11:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46024 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233802AbiJZSLT (ORCPT ); Wed, 26 Oct 2022 14:11:19 -0400 Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9094F83F1F for ; Wed, 26 Oct 2022 11:11:16 -0700 (PDT) Received: by mail-qk1-x734.google.com with SMTP id a5so11227015qkl.6 for ; Wed, 26 Oct 2022 11:11:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ITulc6XgqsqjNhbd41aCpior1CUyIgJHMI2V6YZSYqc=; b=euOzt4OtQ6r2FoqktgoiitkA8qilHqWaety7Buy4Wc/kLai3+4S/yKJ+b8cr/78vie 65EAmm5Nk2Ou8O4UkVxUCQrdIYy4Jt0zqWrXA2jzp/FrxwIK4YXMEVDcaQQgF11XbHVL dUvryBX9A+1fa9gOcaAoRkiAkEUCX4yINBJmM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ITulc6XgqsqjNhbd41aCpior1CUyIgJHMI2V6YZSYqc=; b=brOszZxDMS/h/KZ4aA0U5AyMiwPc+3qllFD2saVfdwBfgx9h331CVCskWAKGm2VKp4 kddsr9FpQYPDne28AOR+f1kS/7LwJd6rtZuefk+bVXYmBAIe0WkYbl3ygXJLS6qYsZiW hUIRjhv1+H7R4cSkDvdowEPeJLmU1ulek5drquuKpxdTiA/WsAvyxo+k5XN1xME++Qvb 1mrtDW1dO2ZPEIrdXODni71Jrro8o/3P+XUm10HPOE0i2/13N6wjBX5anlmUq7fNEiHn a5pG2NluyuX5RVtBFEvId23GuACoU5jwJOZCNo0xf3qOrL7dOm7M/0UB5PNRi4qO2Rt1 wXwQ== X-Gm-Message-State: ACrzQf1OtP07bQKc9s3hvwdyzd/fMzvgwDkQ7dtyy5TG07+1Gg4dgzX3 OY1ouc8jsfo1G+2w4ciZf6p2f2Qpo4E8dA== X-Google-Smtp-Source: AMsMyM446yhsyTK4HA0VbhcG4Exvb3KJt/s4xvgebFWuF40Qp6ntcRsdEXwTDEmFHpZb0U9EJnXjww== X-Received: by 2002:a05:620a:152:b0:6ea:d82e:f7e2 with SMTP id e18-20020a05620a015200b006ead82ef7e2mr31833650qkn.164.1666807874941; Wed, 26 Oct 2022 11:11:14 -0700 (PDT) Received: from mail-yb1-f174.google.com (mail-yb1-f174.google.com. [209.85.219.174]) by smtp.gmail.com with ESMTPSA id do8-20020a05620a2b0800b006b5cc25535fsm602644qkb.99.2022.10.26.11.11.14 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 26 Oct 2022 11:11:14 -0700 (PDT) Received: by mail-yb1-f174.google.com with SMTP id n130so19970470yba.10 for ; Wed, 26 Oct 2022 11:11:14 -0700 (PDT) X-Received: by 2002:a5b:984:0:b0:6ca:9345:b2ee with SMTP id c4-20020a5b0984000000b006ca9345b2eemr3573582ybq.362.1666807873800; Wed, 26 Oct 2022 11:11:13 -0700 (PDT) MIME-Version: 1.0 References: <20221019162648.3557490-1-Jason@zx2c4.com> <3a2fa7c1-2e31-0479-761f-9c189f8ed8c3@rasmusvillemoes.dk> In-Reply-To: <3a2fa7c1-2e31-0479-761f-9c189f8ed8c3@rasmusvillemoes.dk> From: Linus Torvalds Date: Wed, 26 Oct 2022 11:10:57 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: make ctype ascii only? (was [PATCH] kbuild: treat char as always signed) To: Rasmus Villemoes Cc: "Jason A. Donenfeld" , linux-kernel@vger.kernel.org, linux-kbuild@vger.kernel.org, linux-arch@vger.kernel.org, linux-toolchains@vger.kernel.org, Masahiro Yamada , Kees Cook , Andrew Morton , Andy Shevchenko , Greg Kroah-Hartman Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-toolchains@vger.kernel.org On Tue, Oct 25, 2022 at 5:10 PM Rasmus Villemoes wrote: > > Only very tangentially related (because it has to do with chars...): Can > we switch our ctype to be ASCII only, just as it was back in the good'ol > mid 90s Those US-ASCII days weren't really very "good" old days, but I forget why we did this (it's attributed to me, but that's from the pre-BK/pre-git days before we actually tracked things all that well, so..) Anyway, I think anybody using ctype.h on 8-bit chars gets what they deserve, and I think Latin1 (or something close to it) is better than US-ASCII, in that it's at least the same as Unicode in the low 8 chars. So no, I'm disinclined to go back in time to what I think is an even worse situation. Latin1 isn't great, but it sure beats US-ASCII. And if you really want just US-ASII, then don't use the high bit, and make your disgusting 7-bit code be *explicitly* 7-bit. Now, if there are errors in that table wrt Latin1 / "first 256 codepoints of Unicode" too, then we can fix those. Not that anybody has apparently cared since 2.0.1 was released back in July of 1996 (btw, it's sad how none of the old linux git archive creations seem to have tried to import the dates, so you have to look those up separately) And if nobody has cared since 1996, I don't really think it matters. But fundamentally, I think anybody calling US-ASCII "good" is either very very very confused, or is comparing it to EBCDIC. Linus