All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Jann Horn <jannh@google.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	John Hubbard <jhubbard@nvidia.com>, Jan Kara <jack@suse.cz>,
	stable@vger.kernel.org, Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge_page()
Date: Mon, 14 Jun 2021 19:00:32 -0700	[thread overview]
Message-ID: <20210614190032.09d8b7ac530c8b14ace44b82@linux-foundation.org> (raw)
In-Reply-To: <20210615012014.1100672-1-jannh@google.com>

On Tue, 15 Jun 2021 03:20:14 +0200 Jann Horn <jannh@google.com> wrote:

> try_grab_compound_head() is used to grab a reference to a page from
> get_user_pages_fast(), which is only protected against concurrent
> freeing of page tables (via local_irq_save()), but not against
> concurrent TLB flushes, freeing of data pages, or splitting of compound
> pages.
> 
> Because no reference is held to the page when try_grab_compound_head()
> is called, the page may have been freed and reallocated by the time its
> refcount has been elevated; therefore, once we're holding a stable
> reference to the page, the caller re-checks whether the PTE still points
> to the same page (with the same access rights).
> 
> The problem is that try_grab_compound_head() has to grab a reference on
> the head page; but between the time we look up what the head page is and
> the time we actually grab a reference on the head page, the compound
> page may have been split up (either explicitly through split_huge_page()
> or by freeing the compound page to the buddy allocator and then
> allocating its individual order-0 pages).
> If that happens, get_user_pages_fast() may end up returning the right
> page but lifting the refcount on a now-unrelated page, leading to
> use-after-free of pages.
> 
> To fix it:
> Re-check whether the pages still belong together after lifting the
> refcount on the head page.
> Move anything else that checks compound_head(page) below the refcount
> increment.
> 
> This can't actually happen on bare-metal x86 (because there, disabling
> IRQs locks out remote TLB flushes), but it can happen on virtualized x86
> (e.g. under KVM) and probably also on arm64. The race window is pretty
> narrow, and constantly allocating and shattering hugepages isn't exactly
> fast; for now I've only managed to reproduce this in an x86 KVM guest with
> an artificially widened timing window (by adding a loop that repeatedly
> calls `inl(0x3f8 + 5)` in `try_get_compound_head()` to force VM exits,
> so that PV TLB flushes are used instead of IPIs).
> 
> As requested on the list, also replace the existing VM_BUG_ON_PAGE()
> with a warning and bailout. Since the existing code only performed the
> BUG_ON check on DEBUG_VM kernels, ensure that the new code also only
> performs the check under that configuration - I don't want to mix two
> logically separate changes together too much.
> The macro VM_WARN_ON_ONCE_PAGE() doesn't return a value on !DEBUG_VM,
> so wrap the whole check in an #ifdef block.
> An alternative would be to change the VM_WARN_ON_ONCE_PAGE() definition
> for !DEBUG_VM such that it always returns false, but since that would
> differ from the behavior of the normal WARN macros, it might be too
> confusing for readers.
> 
> ...
>
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -43,8 +43,25 @@ static void hpage_pincount_sub(struct page *page, int refs)
>  
>  	atomic_sub(refs, compound_pincount_ptr(page));
>  }
>  
> +/* Equivalent to calling put_page() @refs times. */
> +static void put_page_refs(struct page *page, int refs)
> +{
> +#ifdef CONFIG_DEBUG_VM
> +	if (VM_WARN_ON_ONCE_PAGE(page_ref_count(page) < refs, page))
> +		return;
> +#endif

Well dang those ifdefs.

With CONFIG_DEBUG_VM=n, this expands to

	if (((void)(sizeof((__force long)(page_ref_count(page) < refs))))
		return;

which will fail with "void value not ignored as it ought to be". 
Because VM_WARN_ON_ONCE_PAGE() is an rval with CONFIG_DEBUG_VM=y and is
not an rval with CONFIG_DEBUG_VM=n.    So the ifdefs are needed.

I know we've been around this loop before, but it still sucks!  Someone
please remind me of the reasoning?

Can we do

#define VM_WARN_ON_ONCE_PAGE(cond, page) {
	BUILD_BUG_ON_INVALID(cond);
	cond;
}

?


  reply	other threads:[~2021-06-15  2:36 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-15  1:20 [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge_page() Jann Horn
2021-06-15  2:00 ` Andrew Morton [this message]
2021-06-15  2:36   ` Jann Horn
2021-06-15  2:36     ` Jann Horn
2021-06-15  2:38     ` Jann Horn
2021-06-15  2:38       ` Jann Horn
2021-06-15  6:37 ` John Hubbard
2021-06-15 12:09   ` Jann Horn
2021-06-15 12:09     ` Jann Horn
2021-06-15 23:10     ` Yang Shi
2021-06-15 23:10       ` Yang Shi
2021-06-16 17:27       ` Vlastimil Babka
2021-06-16 18:40         ` Yang Shi
2021-06-16 18:40           ` Yang Shi
2021-06-17 16:09           ` Vlastimil Babka
2021-06-18 13:25     ` Jason Gunthorpe
2021-06-18 13:50       ` Matthew Wilcox
2021-06-18 14:58         ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210614190032.09d8b7ac530c8b14ace44b82@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=jannh@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=stable@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.