From: Jason Gunthorpe <jgg@ziepe.ca> To: Matthew Wilcox <willy@infradead.org> Cc: Jann Horn <jannh@google.com>, John Hubbard <jhubbard@nvidia.com>, Andrew Morton <akpm@linux-foundation.org>, Linux-MM <linux-mm@kvack.org>, kernel list <linux-kernel@vger.kernel.org>, "Kirill A . Shutemov" <kirill@shutemov.name>, Jan Kara <jack@suse.cz>, stable <stable@vger.kernel.org> Subject: Re: [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge_page() Date: Fri, 18 Jun 2021 11:58:30 -0300 [thread overview] Message-ID: <20210618145830.GZ1096940@ziepe.ca> (raw) In-Reply-To: <YMykiGuZYMqF7DuU@casper.infradead.org> On Fri, Jun 18, 2021 at 02:50:00PM +0100, Matthew Wilcox wrote: > On Fri, Jun 18, 2021 at 10:25:56AM -0300, Jason Gunthorpe wrote: > > On Tue, Jun 15, 2021 at 02:09:38PM +0200, Jann Horn wrote: > > > On Tue, Jun 15, 2021 at 8:37 AM John Hubbard <jhubbard@nvidia.com> wrote: > > > > On 6/14/21 6:20 PM, Jann Horn wrote: > > > > > @@ -55,8 +72,23 @@ static inline struct page *try_get_compound_head(struct page *page, int refs) > > > > > if (WARN_ON_ONCE(page_ref_count(head) < 0)) > > > > > return NULL; > > > > > if (unlikely(!page_cache_add_speculative(head, refs))) > > > > > return NULL; > > > > > + > > > > > + /* > > > > > + * At this point we have a stable reference to the head page; but it > > > > > + * could be that between the compound_head() lookup and the refcount > > > > > + * increment, the compound page was split, in which case we'd end up > > > > > + * holding a reference on a page that has nothing to do with the page > > > > > + * we were given anymore. > > > > > + * So now that the head page is stable, recheck that the pages still > > > > > + * belong together. > > > > > + */ > > > > > + if (unlikely(compound_head(page) != head)) { > > > > > > > > I was just wondering about what all could happen here. Such as: page gets split, > > > > reallocated into a different-sized compound page, one that still has page pointing > > > > to head. I think that's OK, because we don't look at or change other huge page > > > > fields. > > > > > > > > But I thought I'd mention the idea in case anyone else has any clever ideas about > > > > how this simple check might be insufficient here. It seems fine to me, but I > > > > routinely lack enough imagination about concurrent operations. :) > > > > > > Hmmm... I think the scariest aspect here is probably the interaction > > > with concurrent allocation of a compound page on architectures with > > > store-store reordering (like ARM). *If* the page allocator handled > > > compound pages with lockless, non-atomic percpu freelists, I think it > > > might be possible that the zeroing of tail_page->compound_head in > > > put_page() could be reordered after the page has been freed, > > > reallocated and set to refcount 1 again? > > > > Oh wow, yes, this all looks sketchy! Doing a RCU access to page->head > > is a really challenging thing :\ > > > > On the simplified store side: > > > > page->head = my_compound > > *ptep = page > > > > There must be some kind of release barrier between those two > > operations or this is all broken.. That definately deserves a comment. > > set_compound_head() includes a WRITE_ONCE. Is that enough, or does it > need an smp_wmb()? Probably, at least the generic code maps smp_store_release() to __smp_wmb. I think Jann was making the argument that there is going to be some other release operation due to locking between the two above, eg a lock unlock or something. > > Ideally we'd use smp_store_release to install the *pte :\ > > > > Assuming we cover the release barrier, I would think the algorithm > > should be broadly: > > > > struct page *target_page = READ_ONCE(pte) > > struct page *target_folio = READ_ONCE(target_page->head) > > compound_head() includes a READ_ONCE already. Ah, see I obviously haven't memorized that detail :\ > > page_cache_add_speculative(target_folio, refs) > > That's spelled folio_ref_try_add_rcu() right now. That seems a much better name > > if (target_folio != READ_ONCE(target_page->head) || > > target_page != READ_ONCE(pte)) > > goto abort > > > > Which is what this patch does but I would like to see the > > READ_ONCE's. > > ... you want them to be uninlined from compound_head(), et al? Not really (though see below), I was mostly looking at the pte which just does pte_val(), no READ_ONCE in there > > And there possibly should be two try_grab_compound_head()'s since we > > don't need this overhead on the fully locked path, especially the > > double atomic on page_ref_add() > > There's only one atomic on page_ref_add(). Look at the original patch, it adds this: + else + page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1)); Where page is the folio, which is now two atomics to do the same ref. This is happening because we can't do hpage_pincount_available() before having initially locked the folio, thus we can no longer precompute what 'ref' to give to the first folio_ref_try_add_rcu() > And you need more of this overhead on the fully locked path than you > realise; the page might be split without holding the mmap_sem, for > example. Fully locked here means holding the PTL spinlocks, so we know the pte cannot change and particularly the refcount of a folio can't go to zero. We can't change compound_head if the refcount is elevated. Keep in mind we also do this in gpu: folio_ref_try_add_rcu(READ_ONCE(target_page->head), 1) [..] folio_put_refs(READ_ONCE(target_page->head), 1) Which makes me wonder why we have READ_ONCE inside compound_head? I'm reading the commit message of 1d798ca3f164 ("mm: make compound_head() robust"), and to me that looks like another special lockless algorithm that should have the READ_ONCE in it, not the general code. Jason
prev parent reply other threads:[~2021-06-18 14:58 UTC|newest] Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-06-15 1:20 Jann Horn 2021-06-15 2:00 ` Andrew Morton 2021-06-15 2:36 ` Jann Horn 2021-06-15 2:36 ` Jann Horn 2021-06-15 2:38 ` Jann Horn 2021-06-15 2:38 ` Jann Horn 2021-06-15 6:37 ` John Hubbard 2021-06-15 12:09 ` Jann Horn 2021-06-15 12:09 ` Jann Horn 2021-06-15 23:10 ` Yang Shi 2021-06-15 23:10 ` Yang Shi 2021-06-16 17:27 ` Vlastimil Babka 2021-06-16 18:40 ` Yang Shi 2021-06-16 18:40 ` Yang Shi 2021-06-17 16:09 ` Vlastimil Babka 2021-06-18 13:25 ` Jason Gunthorpe 2021-06-18 13:50 ` Matthew Wilcox 2021-06-18 14:58 ` Jason Gunthorpe [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20210618145830.GZ1096940@ziepe.ca \ --to=jgg@ziepe.ca \ --cc=akpm@linux-foundation.org \ --cc=jack@suse.cz \ --cc=jannh@google.com \ --cc=jhubbard@nvidia.com \ --cc=kirill@shutemov.name \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=stable@vger.kernel.org \ --cc=willy@infradead.org \ --subject='Re: [PATCH v2] mm/gup: fix try_grab_compound_head() race with split_huge_page()' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.