From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-toolchains-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 77FEAC433EF
	for <linux-toolchains@archiver.kernel.org>; Thu,  9 Dec 2021 16:42:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235119AbhLIQpm (ORCPT
        <rfc822;linux-toolchains@archiver.kernel.org>);
        Thu, 9 Dec 2021 11:45:42 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37644 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S236862AbhLIQpl (ORCPT
        <rfc822;linux-toolchains@vger.kernel.org>);
        Thu, 9 Dec 2021 11:45:41 -0500
Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7B459C061746
        for <linux-toolchains@vger.kernel.org>; Thu,  9 Dec 2021 08:42:07 -0800 (PST)
Received: by mail-oi1-x231.google.com with SMTP id bf8so9451942oib.6
        for <linux-toolchains@vger.kernel.org>; Thu, 09 Dec 2021 08:42:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=subject:to:cc:references:from:message-id:date:user-agent
         :mime-version:in-reply-to:content-language:content-transfer-encoding;
        bh=f09PG3amylLsT2P9FcP12/St8yDY2IFzuCsBPGTEMWg=;
        b=ZAQvbdDKabVWeAmlwA4hUlE12eYZka26oJjmp5dpoKVMsx4ezcaBdB8o58NKzEN3G+
         Ech6oZL1RzPLqXmaA089vL6KbR3l3LypIb90+pVBP92u2jrpWwI4X5lYLtpklY/lYlWX
         8E/Tyc9oC5jt63c1yjNCwoQLmZYfILYoXZbnUpKtX4uEyzP5p4cT9Z7e5Y/1248zbym4
         QbXLm+VDWpBeJYKQzEnCsUE54+T7AdE9PufW8EGsdTewUOS9NsMrujgbWIf23iCAfJv+
         unxR1r5O6GjJRz80EpgiOVOfIgwfXnZEoWoHBPvPzE+fFlNPPy6pQ9Zm+VIncwE2+JWV
         P2Ew==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:subject:to:cc:references:from:message-id:date
         :user-agent:mime-version:in-reply-to:content-language
         :content-transfer-encoding;
        bh=f09PG3amylLsT2P9FcP12/St8yDY2IFzuCsBPGTEMWg=;
        b=MxPG5kFrsieGdhrHJVMLi8/j01g3AA9GNHiXBTRhvywY5qhIs1Lks/7vSENs6i7fUt
         Wc/uaeZrGvioMQOYwERUWb9QWdJb0T9+gQbd0GTPa8CDMVfH+mMSwxaFfpyPgxuDBQ4U
         a4p8D87iIkNWhIeCwZo2NmD3Iy6MpNxCF0C0OMqZCLLrr+9UmYQf3AJGP6OEmbY5V8uU
         W+77bcEavSxQADk4FI5w1x/o2eegSktBqSBNq8DSooP1i8LPzms6T7u7+sfHqu9Z1kmo
         NU5QKyh2cBxNHpDK9FMoK1HxhfJ+JlA2kJJ9sXgxAKh3X2by4kw+VvjwE3UyvptK5trh
         bknA==
X-Gm-Message-State: AOAM531o+Ay7ZvEuUrqW1yCG8pPhUHGO94mamvEpYNC8AqHucGUb17RZ
        aVDQLbNcVhO3aiUOime8mBYrS9uQJ/8=
X-Google-Smtp-Source: ABdhPJzFpeeNVmGm3GgLCE9g/0mpm7EsWqSukVaBbWh007HvZcOSS07cfyJ2xXHJdXOT4h3K93Ak4Q==
X-Received: by 2002:a05:6808:d4e:: with SMTP id w14mr6815277oik.49.1639068126208;
        Thu, 09 Dec 2021 08:42:06 -0800 (PST)
Received: from [192.168.0.41] (184-96-227-137.hlrn.qwest.net. [184.96.227.137])
        by smtp.gmail.com with ESMTPSA id s13sm63492otv.34.2021.12.09.08.42.05
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Thu, 09 Dec 2021 08:42:06 -0800 (PST)
Subject: Re: [PATCH 0/6] RFC: adding support to GCC for detecting trust
 boundaries
To:     Segher Boessenkool <segher@kernel.crashing.org>
Cc:     David Malcolm <dmalcolm@redhat.com>, gcc-patches@gcc.gnu.org,
        linux-toolchains@vger.kernel.org
References: <20211113203732.2098220-1-dmalcolm@redhat.com>
 <c5961336-3d27-20a4-9094-e96eb6262cec@gmail.com>
 <20211206194025.GQ614@gate.crashing.org>
From:   Martin Sebor <msebor@gmail.com>
Message-ID: <b5453a10-193d-855d-8f7c-97651a1c8135@gmail.com>
Date:   Thu, 9 Dec 2021 09:42:04 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.2.2
MIME-Version: 1.0
In-Reply-To: <20211206194025.GQ614@gate.crashing.org>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Precedence: bulk
List-ID: <linux-toolchains.vger.kernel.org>
X-Mailing-List: linux-toolchains@vger.kernel.org

On 12/6/21 12:40 PM, Segher Boessenkool wrote:
> On Mon, Dec 06, 2021 at 11:12:00AM -0700, Martin Sebor wrote:
>> On 11/13/21 1:37 PM, David Malcolm via Gcc-patches wrote:
>>> Approach 1: Custom Address Spaces
>>> =================================
>>>
>>> GCC's C frontend supports target-specific address spaces; see:
>>>    https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
>>> Quoting the N1275 draft of ISO/IEC DTR 18037:
>>>    "Address space names are ordinary identifiers, sharing the same name
>>>    space as variables and typedef names.  Any such names follow the same
>>>    rules for scope as other ordinary identifiers (such as typedef names).
>>>    An implementation may provide an implementation-defined set of
>>>    intrinsic address spaces that are, in effect, predefined at the start
>>>    of every translation unit.  The names of intrinsic address spaces must
>>>    be reserved identifiers (beginning with an underscore and an uppercase
>>>    letter or with two underscores).  An implementation may also
>>>    optionally support a means for new address space names to be defined
>>>    within a translation unit."
>>>
>>> Patch 1a in the following patch kit for GCC implements such a means to
>>> define new address spaces names in a translation unit, via a pragma:
>>>    #prgama GCC custom_address_space(NAME_OF_ADDRESS_SPACE)
>>>
>>> For example, the Linux kernel could perhaps write:
>>>
>>>    #define __kernel
>>>    #pragma GCC custom_address_space(__user)
>>>    #pragma GCC custom_address_space(__iomem)
>>>    #pragma GCC custom_address_space(__percpu)
>>>    #pragma GCC custom_address_space(__rcu)
>>>
>>> and thus the C frontend can complain about code that mismatches __user
>>> and kernel pointers, e.g.:
>>>
>>> custom-address-space-1.c: In function ‘test_argpass_to_p’:
>>> custom-address-space-1.c:29:14: error: passing argument 1 of
>>> ‘accepts_p’
>> >from pointer to non-enclosed address space
>>>     29 |   accepts_p (p_user);
>>>        |              ^~~~~~
>>> custom-address-space-1.c:21:24: note: expected ‘void *’ but argument is
>>> of type ‘__user void *’
>>>     21 | extern void accepts_p (void *);
>>>        |                        ^~~~~~
>>> custom-address-space-1.c: In function ‘test_cast_k_to_u’:
>>> custom-address-space-1.c:135:12: warning: cast to ‘__user’ address
>>> space
>>> pointer from disjoint generic address space pointer
>>>    135 |   p_user = (void __user *)p_kernel;
>>>        |            ^
>>
>> This seems like an excellent use of named address spaces :)
> 
> It has some big problems though.
> 
> Named address spaces are completely target-specific.

My understanding of these kernel/user address spaces that David
is adding for the benefit of the analyzer is that the correspond
to what TR 18037 calls nested namespaces.  They're nested within
the generic namespace that's a union of the twp.  With that, I'd
expect them to be fully handled early on and be transparent
afterwards.  Is implementing this idea not feasible in the GCC
design?

Martin

> Defining them with
> a pragma like this does not allow you to set the pointer mode or
> anything related to a custom LEGITIMATE_ADDRESS_P.  It does not allow
> you to sayy zero pointers are invalid in some address spaces and not in
> others.  You cannot provide any of the DWARF address space stuff this
> way.  But most importantly, there are only four bits for the address
> space field internally, and they are used by however a backend wants to
> use them.
> 
> None of this cannot be solved, but all of it will have to be solved.
> 
> IMO it will be best to not mix this with address spaces in the user
> interface (it is of course fine to *implement* it like that, or with
> big overlap at least).
> 
>>> The patch doesn't yet maintain a good distinction between implicit
>>> target-specific address spaces and user-defined address spaces,
> 
> And that will have to be fixed in the user code syntax at least.
> 
>>> has at
>>> least one known major bug, and has only been lightly tested.  I can
>>> fix these issues, but was hoping for feedback that this approach is the
>>> right direction from both the GCC and Linux development communities.
> 
> Allowing the user to define new address spaces does not jibe well with
> how targets do (validly!) use them.
> 
>>> Approach 2: An "untrusted" attribute
>>> ====================================
>>>
>>> Alternatively, patch 1b in the kit implements:
>>>
>>>    __attribute__((untrusted))
>>>
>>> which can be applied to types as a qualifier (similarly to const,
>>> volatile, etc) to mark a trust boundary, hence the kernel could have:
>>>
>>>    #define __user __attribute__((untrusted))
>>>
>>> where my patched GCC treats
>>>    T *
>>> vs
>>>    T __attribute__((untrusted)) *
>>> as being different types and thus the C frontend can complain (even without
>>> -fanalyzer) about e.g.:
>>>
>>> extern void accepts_p(void *);
>>>
>>> void test_argpass_to_p(void __user *p_user)
>>> {
>>>    accepts_p(p_user);
>>> }
>>>
>>> untrusted-pointer-1.c: In function ‘test_argpass_to_p’:
>>> untrusted-pointer-1.c:22:13: error: passing argument 1 of ‘accepts_p’
>> >from pointer with different trust level
>>>     22 |   accepts_p(p_user);
>>>        |              ^~~~~~
>>> untrusted-pointer-1.c:14:23: note: expected ‘void *’ but argument is of
>>> type ‘__attribute__((untrusted)) void *’
>>>     14 | extern void accepts_p(void *);
>>>        |                        ^~~~~~
>>>
>>> So you'd get enforcement of __user vs non-__user pointers as part of
>>> GCC's regular type-checking.  (You need an explicit cast to convert
>>> between the untrusted vs trusted types).
>>
>> As with the named address space idea, this approach also looks
>> reasonable to me.  If you anticipate using the attribute only
>> in the analyzer I would suggest to consider introducing it in
>> the analyzer's namespace (e.g., analyzer::untrusted, or even
>> gnu::analyzer::untrusted).
> 
> I don't see any fundamental problems with this approach.  It also is
> very much in line with how Perl handles this (and some copycat languages
> do as well), the "tainted" flag on data.
> 
>>> This approach is much less expressive that the custom addres space
>>> approach; it would only cover the trust boundary aspect; it wouldn't
>>> cover any differences between generic pointers and __user, vs __iomem,
>>> __percpu, and __rcu which I admit I only dimly understand.
> 
> Yes, it does not have any of the big problems that come with those
> address spaces either!  :-)
> 
>>> Other attributes
>>> ================
>>>
>>> Patch 2 in the kit adds:
>>>    __attribute__((returns_zero_on_success))
>>> and
>>>    __attribute__((returns_nonzero_on_success))
>>> as hints to the analyzer that it's worth bifurcating the analysis of
>>> such functions (to explore failure vs success, and thus to better
>>> explore error-handling paths).  It's also a hint to the human reader of
>>> the source code.
>>
>> I thing being able to express something along these lines would
>> be useful even outside the analyzer, both for warnings and, when
>> done right, perhaps also for optimization.  So I'm in favor of
>> something like this.  I'll just reiterate here the comment on
>> this attribute I sent you privately some time ago.
> 
> What is "success" though?  You probably want it so some checker can make
> sure you do handle failure some way, but how do you see what is handling
> failure and what is handling the successful case?
> 
> 
> Segher
>