From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=pvy/=DJ=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-14.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL
	autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 81069C47423
	for <linux-mm@archiver.kernel.org>; Fri,  2 Oct 2020 05:45:51 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 1609220796
	for <linux-mm@archiver.kernel.org>; Fri,  2 Oct 2020 05:45:50 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nEcpsbTx"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1609220796
Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 332F96B005D; Fri,  2 Oct 2020 01:45:50 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 2E3FE6B0062; Fri,  2 Oct 2020 01:45:50 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 1AC076B0068; Fri,  2 Oct 2020 01:45:50 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0021.hostedemail.com [216.40.44.21])
	by kanga.kvack.org (Postfix) with ESMTP id E1EC16B005D
	for <linux-mm@kvack.org>; Fri,  2 Oct 2020 01:45:49 -0400 (EDT)
Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id 6B3D2181AE873
	for <linux-mm@kvack.org>; Fri,  2 Oct 2020 05:45:49 +0000 (UTC)
X-FDA: 77325898818.27.pump07_6016504271a1
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin27.hostedemail.com (Postfix) with ESMTP id 4FCAE3D668
	for <linux-mm@kvack.org>; Fri,  2 Oct 2020 05:45:49 +0000 (UTC)
X-HE-Tag: pump07_6016504271a1
X-Filterd-Recvd-Size: 8425
Received: from mail-ej1-f66.google.com (mail-ej1-f66.google.com [209.85.218.66])
	by imf34.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Fri,  2 Oct 2020 05:45:48 +0000 (UTC)
Received: by mail-ej1-f66.google.com with SMTP id u8so280761ejg.1
        for <linux-mm@kvack.org>; Thu, 01 Oct 2020 22:45:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=X/Mw4dhIBbKvuicToqPryCf8ZS6JkibVyJYbIfuE7Gs=;
        b=nEcpsbTxpqqR71sX3y9AeMu/5IDZjBWpyv3LTWExEnlMfdABMkG6DRWo06vinq90ka
         v0yKY98+cFdnzQaGPNqgP67Qi9jy40ePsimddpuPZK6/8+YptTg00PoI6JI+R6+xHlM7
         7IIh84MVJc1TIsQGsF0COrkSpLSXnQuttWl3oCsAFb/86eFiRu8LMpBI62BrvWHqYDCt
         igLUiBNCWVCc82jzMx1R4TEruzIymmgeSBaPaguSMN8pUkQYZNLXXkP4V51VdUdj5w19
         7NpXjxLmPtWTYGqVPqKTlrhupfuKQklo6bXJHXlJsBt1W9xk1SGNWZqoa8SM/vrMPR8I
         TOXQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=X/Mw4dhIBbKvuicToqPryCf8ZS6JkibVyJYbIfuE7Gs=;
        b=CKzWkO+eHOb7zOJOqZIk58ISQCMVKTLLfDwu5r1aXOo30fveLzZLWTzaIp2SZ/gR5m
         FQilMbZYE0aOZBC1btgGKjkaTJAp+PPqkRsokxG34tVIu8QlQ2U6i78bQHShf45utzgs
         NdKq8MJKuHe8yfAHcJn33DXYaWXYccaDNV66XA8yrwA2E6mT4mzjZ/IFY2qc+8OhttFn
         YfLhRnc8Tzr5rfgh7TaUan/Mmu8djgqSTLaN9FO8GB4ttvlCdarQyJMQlku44gjMm0Ax
         Xv+QHSt9TxeVRVsG5XRmhzDLBF+w6GSVvIjw21QbCRQtdEE7YcWQx1/nMivea/ZGOczH
         0OyQ==
X-Gm-Message-State: AOAM531VKKcpxkHQIJMM32Gy+qCtiNtMR1wy91j0zxnjtFvVtH4Jvkqk
	jZHiEXZf+/8+VoYUX83tzclqwOZ5N8c5532Si7gRRg==
X-Google-Smtp-Source: ABdhPJzXOQ4q/0XNF+hH7IjC+tsOfmsFCnx+oA/SC0g0BOQxITnnOLwyjE9pbF+CiN/9VUfkrD72NheY/yFuiHP5P8s=
X-Received: by 2002:a17:906:9156:: with SMTP id y22mr174829ejw.184.1601617547226;
 Thu, 01 Oct 2020 22:45:47 -0700 (PDT)
MIME-Version: 1.0
References: <20200929133814.2834621-1-elver@google.com> <20200929133814.2834621-3-elver@google.com>
In-Reply-To: <20200929133814.2834621-3-elver@google.com>
From: Jann Horn <jannh@google.com>
Date: Fri, 2 Oct 2020 07:45:20 +0200
Message-ID: <CAG48ez3OKj5Y8BURmqU9BAYWFJH8E8B5Dj9c0=UHutqf7r3hhg@mail.gmail.com>
Subject: Re: [PATCH v4 02/11] x86, kfence: enable KFENCE for x86
To: Marco Elver <elver@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, Alexander Potapenko <glider@google.com>, 
	"H . Peter Anvin" <hpa@zytor.com>, "Paul E . McKenney" <paulmck@kernel.org>, 
	Andrey Konovalov <andreyknvl@google.com>, Andrey Ryabinin <aryabinin@virtuozzo.com>, 
	Andy Lutomirski <luto@kernel.org>, Borislav Petkov <bp@alien8.de>, 
	Catalin Marinas <catalin.marinas@arm.com>, Christoph Lameter <cl@linux.com>, 
	Dave Hansen <dave.hansen@linux.intel.com>, David Rientjes <rientjes@google.com>, 
	Dmitry Vyukov <dvyukov@google.com>, Eric Dumazet <edumazet@google.com>, 
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Hillf Danton <hdanton@sina.com>, 
	Ingo Molnar <mingo@redhat.com>, Jonathan.Cameron@huawei.com, 
	Jonathan Corbet <corbet@lwn.net>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, 
	Kees Cook <keescook@chromium.org>, Mark Rutland <mark.rutland@arm.com>, 
	Pekka Enberg <penberg@kernel.org>, Peter Zijlstra <peterz@infradead.org>, sjpark@amazon.com, 
	Thomas Gleixner <tglx@linutronix.de>, Vlastimil Babka <vbabka@suse.cz>, Will Deacon <will@kernel.org>, 
	"the arch/x86 maintainers" <x86@kernel.org>, linux-doc@vger.kernel.org, 
	kernel list <linux-kernel@vger.kernel.org>, kasan-dev <kasan-dev@googlegroups.com>, 
	Linux ARM <linux-arm-kernel@lists.infradead.org>, Linux-MM <linux-mm@kvack.org>
Content-Type: text/plain; charset="UTF-8"
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Sep 29, 2020 at 3:38 PM Marco Elver <elver@google.com> wrote:
> Add architecture specific implementation details for KFENCE and enable
> KFENCE for the x86 architecture. In particular, this implements the
> required interface in <asm/kfence.h> for setting up the pool and
> providing helper functions for protecting and unprotecting pages.
>
> For x86, we need to ensure that the pool uses 4K pages, which is done
> using the set_memory_4k() helper function.
[...]
> diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h
[...]
> +/* Protect the given page and flush TLBs. */
> +static inline bool kfence_protect_page(unsigned long addr, bool protect)
> +{
> +       unsigned int level;
> +       pte_t *pte = lookup_address(addr, &level);
> +
> +       if (!pte || level != PG_LEVEL_4K)

Do we actually expect this to happen, or is this just a "robustness"
check? If we don't expect this to happen, there should be a WARN_ON()
around the condition.

> +               return false;
> +
> +       if (protect)
> +               set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
> +       else
> +               set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));

Hmm... do we have this helper (instead of using the existing helpers
for modifying memory permissions) to work around the allocation out of
the data section?

> +       flush_tlb_one_kernel(addr);
> +       return true;
> +}
> +
> +#endif /* _ASM_X86_KFENCE_H */
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
[...]
> @@ -701,6 +702,9 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>         }
>  #endif
>
> +       if (kfence_handle_page_fault(address))
> +               return;
> +
>         /*
>          * 32-bit:
>          *

The standard 5 lines of diff context don't really make it obvious
what's going on here. Here's a diff with more context:


        /*
         * Stack overflow?  During boot, we can fault near the initial
         * stack in the direct map, but that's not an overflow -- check
         * that we're in vmalloc space to avoid this.
         */
        if (is_vmalloc_addr((void *)address) &&
            (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
             address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
                unsigned long stack = __this_cpu_ist_top_va(DF) -
sizeof(void *);
                /*
                 * We're likely to be running with very little stack space
                 * left.  It's plausible that we'd hit this condition but
                 * double-fault even before we get this far, in which case
                 * we're fine: the double-fault handler will deal with it.
                 *
                 * We don't want to make it all the way into the oops code
                 * and then double-fault, though, because we're likely to
                 * break the console driver and lose most of the stack dump.
                 */
                asm volatile ("movq %[stack], %%rsp\n\t"
                              "call handle_stack_overflow\n\t"
                              "1: jmp 1b"
                              : ASM_CALL_CONSTRAINT
                              : "D" ("kernel stack overflow (page fault)"),
                                "S" (regs), "d" (address),
                                [stack] "rm" (stack));
                unreachable();
        }
 #endif

+       if (kfence_handle_page_fault(address))
+               return;
+
        /*
         * 32-bit:
         *
         *   Valid to do another page fault here, because if this fault
         *   had been triggered by is_prefetch fixup_exception would have
         *   handled it.
         *
         * 64-bit:
         *
         *   Hall of shame of CPU/BIOS bugs.
         */
        if (is_prefetch(regs, error_code, address))
                return;

        if (is_errata93(regs, address))
                return;

        /*
         * Buggy firmware could access regions which might page fault, try to
         * recover from such faults.
         */
        if (IS_ENABLED(CONFIG_EFI))
                efi_recover_from_page_fault(address);

 oops:
        /*
         * Oops. The kernel tried to access some bad page. We'll have to
         * terminate things with extreme prejudice:
         */
        flags = oops_begin();


Shouldn't kfence_handle_page_fault() happen after prefetch handling,
at least? Maybe directly above the "oops" label?