linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Marco Elver <elver@google.com>
To: Jann Horn <jannh@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Alexander Potapenko <glider@google.com>,
	 "H . Peter Anvin" <hpa@zytor.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	 Andrey Konovalov <andreyknvl@google.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	 Andy Lutomirski <luto@kernel.org>,
	Borislav Petkov <bp@alien8.de>,
	 Catalin Marinas <catalin.marinas@arm.com>,
	Christoph Lameter <cl@linux.com>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	David Rientjes <rientjes@google.com>,
	 Dmitry Vyukov <dvyukov@google.com>,
	Eric Dumazet <edumazet@google.com>,
	 Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Hillf Danton <hdanton@sina.com>,  Ingo Molnar <mingo@redhat.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	 Jonathan Corbet <corbet@lwn.net>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	 Kees Cook <keescook@chromium.org>,
	Mark Rutland <mark.rutland@arm.com>,
	 Pekka Enberg <penberg@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	 SeongJae Park <sjpark@amazon.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vlastimil Babka <vbabka@suse.cz>,  Will Deacon <will@kernel.org>,
	"the arch/x86 maintainers" <x86@kernel.org>,
	 "open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	kernel list <linux-kernel@vger.kernel.org>,
	 kasan-dev <kasan-dev@googlegroups.com>,
	 Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH v4 02/11] x86, kfence: enable KFENCE for x86
Date: Wed, 7 Oct 2020 15:08:55 +0200	[thread overview]
Message-ID: <CANpmjNP6mukCZ931_aW9dDqbkOyv=a2zbS7MuEMkE+unb7nYeg@mail.gmail.com> (raw)
In-Reply-To: <CAG48ez3OKj5Y8BURmqU9BAYWFJH8E8B5Dj9c0=UHutqf7r3hhg@mail.gmail.com>

On Fri, 2 Oct 2020 at 07:45, Jann Horn <jannh@google.com> wrote:
>
> On Tue, Sep 29, 2020 at 3:38 PM Marco Elver <elver@google.com> wrote:
> > Add architecture specific implementation details for KFENCE and enable
> > KFENCE for the x86 architecture. In particular, this implements the
> > required interface in <asm/kfence.h> for setting up the pool and
> > providing helper functions for protecting and unprotecting pages.
> >
> > For x86, we need to ensure that the pool uses 4K pages, which is done
> > using the set_memory_4k() helper function.
> [...]
> > diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h
> [...]
> > +/* Protect the given page and flush TLBs. */
> > +static inline bool kfence_protect_page(unsigned long addr, bool protect)
> > +{
> > +       unsigned int level;
> > +       pte_t *pte = lookup_address(addr, &level);
> > +
> > +       if (!pte || level != PG_LEVEL_4K)
>
> Do we actually expect this to happen, or is this just a "robustness"
> check? If we don't expect this to happen, there should be a WARN_ON()
> around the condition.

It's not obvious here, but we already have this covered with a WARN:
the core.c code has a KFENCE_WARN_ON, which disables KFENCE on a
warning.

> > +               return false;
> > +
> > +       if (protect)
> > +               set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
> > +       else
> > +               set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
>
> Hmm... do we have this helper (instead of using the existing helpers
> for modifying memory permissions) to work around the allocation out of
> the data section?

I just played around with using the set_memory.c functions, to remind
myself why this didn't work. I experimented with using
set_memory_{np,p}() functions; set_memory_p() isn't implemented, but
is easily added (which I did for below experiment). However, this
didn't quite work:

WARNING: CPU: 6 PID: 107 at kernel/smp.c:490
smp_call_function_many_cond+0x9c/0x2a0 kernel/smp.c:490
[...]
Call Trace:
 smp_call_function_many kernel/smp.c:577 [inline]
 smp_call_function kernel/smp.c:599 [inline]
 on_each_cpu+0x3e/0x90 kernel/smp.c:698
 __purge_vmap_area_lazy+0x58/0x670 mm/vmalloc.c:1352
 _vm_unmap_aliases.part.0+0x10b/0x140 mm/vmalloc.c:1770
 change_page_attr_set_clr+0xb4/0x1c0 arch/x86/mm/pat/set_memory.c:1732
 change_page_attr_set arch/x86/mm/pat/set_memory.c:1782 [inline]
 set_memory_p+0x21/0x30 arch/x86/mm/pat/set_memory.c:1950
 kfence_protect_page arch/x86/include/asm/kfence.h:55 [inline]
 kfence_protect_page arch/x86/include/asm/kfence.h:43 [inline]
 kfence_unprotect+0x42/0x70 mm/kfence/core.c:139
 no_context+0x115/0x300 arch/x86/mm/fault.c:705
 handle_page_fault arch/x86/mm/fault.c:1431 [inline]
 exc_page_fault+0xa7/0x170 arch/x86/mm/fault.c:1486
 asm_exc_page_fault+0x1e/0x30 arch/x86/include/asm/idtentry.h:538

For one, smp_call_function_many_cond() doesn't want to be called with
interrupts disabled, and we may very well get a KFENCE allocation or
page fault with interrupts disabled / within interrupts.

Therefore, to be safe, we should avoid IPIs. It follows that setting
the page attribute is best-effort, and we can tolerate some
inaccuracy. Lazy fault handling should take care of faults after we
set the page as PRESENT.

Which hopefully also answers your other comment:

> flush_tlb_one_kernel() -> flush_tlb_one_user() ->
> __flush_tlb_one_user() -> native_flush_tlb_one_user() only flushes on
> the local CPU core, not on others. If you want to leave it this way, I
> think this needs a comment explaining why we're not doing a global
> flush (locking context / performance overhead / ... ?).

We'll add a comment to clarify why it's done this way.

> > +       flush_tlb_one_kernel(addr);
> > +       return true;
> > +}
> > +
> > +#endif /* _ASM_X86_KFENCE_H */
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> [...]
> > @@ -701,6 +702,9 @@ no_context(struct pt_regs *regs, unsigned long error_code,
> >         }
> >  #endif
> >
> > +       if (kfence_handle_page_fault(address))
> > +               return;
> > +
> >         /*
> >          * 32-bit:
> >          *
>
> The standard 5 lines of diff context don't really make it obvious
> what's going on here. Here's a diff with more context:
>
>
>         /*
>          * Stack overflow?  During boot, we can fault near the initial
>          * stack in the direct map, but that's not an overflow -- check
>          * that we're in vmalloc space to avoid this.
>          */
>         if (is_vmalloc_addr((void *)address) &&
>             (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) ||
>              address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) {
>                 unsigned long stack = __this_cpu_ist_top_va(DF) -
> sizeof(void *);
>                 /*
>                  * We're likely to be running with very little stack space
>                  * left.  It's plausible that we'd hit this condition but
>                  * double-fault even before we get this far, in which case
>                  * we're fine: the double-fault handler will deal with it.
>                  *
>                  * We don't want to make it all the way into the oops code
>                  * and then double-fault, though, because we're likely to
>                  * break the console driver and lose most of the stack dump.
>                  */
>                 asm volatile ("movq %[stack], %%rsp\n\t"
>                               "call handle_stack_overflow\n\t"
>                               "1: jmp 1b"
>                               : ASM_CALL_CONSTRAINT
>                               : "D" ("kernel stack overflow (page fault)"),
>                                 "S" (regs), "d" (address),
>                                 [stack] "rm" (stack));
>                 unreachable();
>         }
>  #endif
>
> +       if (kfence_handle_page_fault(address))
> +               return;
> +
>         /*
>          * 32-bit:
>          *
>          *   Valid to do another page fault here, because if this fault
>          *   had been triggered by is_prefetch fixup_exception would have
>          *   handled it.
>          *
>          * 64-bit:
>          *
>          *   Hall of shame of CPU/BIOS bugs.
>          */
>         if (is_prefetch(regs, error_code, address))
>                 return;
>
>         if (is_errata93(regs, address))
>                 return;
>
>         /*
>          * Buggy firmware could access regions which might page fault, try to
>          * recover from such faults.
>          */
>         if (IS_ENABLED(CONFIG_EFI))
>                 efi_recover_from_page_fault(address);
>
>  oops:
>         /*
>          * Oops. The kernel tried to access some bad page. We'll have to
>          * terminate things with extreme prejudice:
>          */
>         flags = oops_begin();
>
>
>
> Shouldn't kfence_handle_page_fault() happen after prefetch handling,
> at least? Maybe directly above the "oops" label?

Good question. AFAIK it doesn't matter, as is_kfence_address() should
never apply for any of those that follow, right? In any case, it
shouldn't hurt to move it down.

Thanks,
-- Marco


  reply	other threads:[~2020-10-07 13:09 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-29 13:38 [PATCH v4 00/11] KFENCE: A low-overhead sampling-based memory safety error detector Marco Elver
2020-09-29 13:38 ` [PATCH v4 01/11] mm: add Kernel Electric-Fence infrastructure Marco Elver
2020-10-02  6:33   ` Jann Horn
2020-10-02  7:53     ` Jann Horn
2020-10-02 14:22       ` Dmitry Vyukov
2020-10-02 15:06         ` Mark Rutland
2020-10-02 18:27         ` Jann Horn
2020-10-05 18:59           ` Marco Elver
2020-10-02 17:19     ` Marco Elver
2020-10-02 19:31       ` Jann Horn
2020-10-02 21:12         ` Marco Elver
2020-10-02 21:28         ` Marco Elver
2020-10-02 22:27           ` Jann Horn
2020-10-12 14:20             ` Marco Elver
2020-09-29 13:38 ` [PATCH v4 02/11] x86, kfence: enable KFENCE for x86 Marco Elver
2020-10-02  5:45   ` Jann Horn
2020-10-07 13:08     ` Marco Elver [this message]
2020-10-07 14:14       ` Jann Horn
2020-10-07 14:41         ` Marco Elver
2020-10-09 17:40           ` Marco Elver
2020-10-02  6:08   ` Jann Horn
2020-09-29 13:38 ` [PATCH v4 03/11] arm64, kfence: enable KFENCE for ARM64 Marco Elver
2020-10-02  6:47   ` Jann Horn
2020-10-02 14:18     ` Marco Elver
2020-10-02 16:10       ` Jann Horn
2020-09-29 13:38 ` [PATCH v4 04/11] mm, kfence: insert KFENCE hooks for SLAB Marco Elver
2020-09-29 13:38 ` [PATCH v4 05/11] mm, kfence: insert KFENCE hooks for SLUB Marco Elver
2020-10-02  7:07   ` Jann Horn
2020-10-05  9:29     ` Alexander Potapenko
2020-09-29 13:38 ` [PATCH v4 06/11] kfence, kasan: make KFENCE compatible with KASAN Marco Elver
2020-09-29 13:38 ` [PATCH v4 07/11] kfence, kmemleak: make KFENCE compatible with KMEMLEAK Marco Elver
2020-09-29 13:38 ` [PATCH v4 08/11] kfence, lockdep: make KFENCE compatible with lockdep Marco Elver
2020-09-29 13:38 ` [PATCH v4 09/11] kfence, Documentation: add KFENCE documentation Marco Elver
2020-09-29 13:38 ` [PATCH v4 10/11] kfence: add test suite Marco Elver
2020-09-29 13:38 ` [PATCH v4 11/11] MAINTAINERS: Add entry for KFENCE Marco Elver
2020-09-29 14:21   ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANpmjNP6mukCZ931_aW9dDqbkOyv=a2zbS7MuEMkE+unb7nYeg@mail.gmail.com' \
    --to=elver@google.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@google.com \
    --cc=aryabinin@virtuozzo.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dvyukov@google.com \
    --cc=edumazet@google.com \
    --cc=glider@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hdanton@sina.com \
    --cc=hpa@zytor.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=jannh@google.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=keescook@chromium.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=penberg@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=sjpark@amazon.com \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).