All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Jann Horn <jannh@google.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	John Hubbard <jhubbard@nvidia.com>, Jan Kara <jack@suse.cz>,
	stable@vger.kernel.org, Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge_page()
Date: Mon, 14 Jun 2021 19:00:32 -0700	[thread overview]
Message-ID: <20210614190032.09d8b7ac530c8b14ace44b82@linux-foundation.org> (raw)
In-Reply-To: <20210615012014.1100672-1-jannh@google.com>

On Tue, 15 Jun 2021 03:20:14 +0200 Jann Horn <jannh@google.com> wrote:

> try_grab_compound_head() is used to grab a reference to a page from
> get_user_pages_fast(), which is only protected against concurrent
> freeing of page tables (via local_irq_save()), but not against
> concurrent TLB flushes, freeing of data pages, or splitting of compound
> pages.
> 
> Because no reference is held to the page when try_grab_compound_head()
> is called, the page may have been freed and reallocated by the time its
> refcount has been elevated; therefore, once we're holding a stable
> reference to the page, the caller re-checks whether the PTE still points
> to the same page (with the same access rights).
> 
> The problem is that try_grab_compound_head() has to grab a reference on
> the head page; but between the time we look up what the head page is and
> the time we actually grab a reference on the head page, the compound
> page may have been split up (either explicitly through split_huge_page()
> or by freeing the compound page to the buddy allocator and then
> allocating its individual order-0 pages).
> If that happens, get_user_pages_fast() may end up returning the right
> page but lifting the refcount on a now-unrelated page, leading to
> use-after-free of pages.
> 
> To fix it:
> Re-check whether the pages still belong together after lifting the
> refcount on the head page.
> Move anything else that checks compound_head(page) below the refcount
> increment.
> 
> This can't actually happen on bare-metal x86 (because there, disabling
> IRQs locks out remote TLB flushes), but it can happen on virtualized x86
> (e.g. under KVM) and probably also on arm64. The race window is pretty
> narrow, and constantly allocating and shattering hugepages isn't exactly
> fast; for now I've only managed to reproduce this in an x86 KVM guest with
> an artificially widened timing window (by adding a loop that repeatedly
> calls `inl(0x3f8 + 5)` in `try_get_compound_head()` to force VM exits,
> so that PV TLB flushes are used instead of IPIs).
> 
> As requested on the list, also replace the existing VM_BUG_ON_PAGE()
> with a warning and bailout. Since the existing code only performed the
> BUG_ON check on DEBUG_VM kernels, ensure that the new code also only
> performs the check under that configuration - I don't want to mix two
> logically separate changes together too much.
> The macro VM_WARN_ON_ONCE_PAGE() doesn't return a value on !DEBUG_VM,
> so wrap the whole check in an #ifdef block.
> An alternative would be to change the VM_WARN_ON_ONCE_PAGE() definition
> for !DEBUG_VM such that it always returns false, but since that would
> differ from the behavior of the normal WARN macros, it might be too
> confusing for readers.
> 
> ...
>
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -43,8 +43,25 @@ static void hpage_pincount_sub(struct page *page, int refs)
>  
>  	atomic_sub(refs, compound_pincount_ptr(page));
>  }
>  
> +/* Equivalent to calling put_page() @refs times. */
> +static void put_page_refs(struct page *page, int refs)
> +{
> +#ifdef CONFIG_DEBUG_VM
> +	if (VM_WARN_ON_ONCE_PAGE(page_ref_count(page) < refs, page))
> +		return;
> +#endif

Well dang those ifdefs.

With CONFIG_DEBUG_VM=n, this expands to

	if (((void)(sizeof((__force long)(page_ref_count(page) < refs))))
		return;

which will fail with "void value not ignored as it ought to be". 
Because VM_WARN_ON_ONCE_PAGE() is an rval with CONFIG_DEBUG_VM=y and is
not an rval with CONFIG_DEBUG_VM=n.    So the ifdefs are needed.

I know we've been around this loop before, but it still sucks!  Someone
please remind me of the reasoning?

Can we do

#define VM_WARN_ON_ONCE_PAGE(cond, page) {
	BUILD_BUG_ON_INVALID(cond);
	cond;
}

?


  reply	other threads:[~2021-06-15  2:36 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-15  1:20 Jann Horn
2021-06-15  2:00 ` Andrew Morton [this message]
2021-06-15  2:36   ` Jann Horn
2021-06-15  2:36     ` Jann Horn
2021-06-15  2:38     ` Jann Horn
2021-06-15  2:38       ` Jann Horn
2021-06-15  6:37 ` John Hubbard
2021-06-15 12:09   ` Jann Horn
2021-06-15 12:09     ` Jann Horn
2021-06-15 23:10     ` Yang Shi
2021-06-15 23:10       ` Yang Shi
2021-06-16 17:27       ` Vlastimil Babka
2021-06-16 18:40         ` Yang Shi
2021-06-16 18:40           ` Yang Shi
2021-06-17 16:09           ` Vlastimil Babka
2021-06-18 13:25     ` Jason Gunthorpe
2021-06-18 13:50       ` Matthew Wilcox
2021-06-18 14:58         ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210614190032.09d8b7ac530c8b14ace44b82@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    --subject='Re: [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge_page()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.