linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: John Hubbard <jhubbard@nvidia.com>
Cc: "Jan Kara" <jack@suse.cz>, "Michal Hocko" <mhocko@kernel.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	"Yu Zhao" <yuzhao@google.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Arnaldo Carvalho de Melo" <acme@kernel.org>,
	"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
	"Jiri Olsa" <jolsa@redhat.com>,
	"Namhyung Kim" <namhyung@kernel.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Hugh Dickins" <hughd@google.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	"David Rientjes" <rientjes@google.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Lance Roy" <ldr709@gmail.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Dave Airlie" <airlied@redhat.com>,
	"Thomas Hellstrom" <thellstrom@vmware.com>,
	"Souptick Joarder" <jrdr.linux@gmail.com>,
	"Mel Gorman" <mgorman@suse.de>,
	"Mike Kravetz" <mike.kravetz@oracle.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Aaron Lu" <ziqian.lzq@antfin.com>,
	"Omar Sandoval" <osandov@fb.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Vineeth Remanan Pillai" <vpillai@digitalocean.com>,
	"Daniel Jordan" <daniel.m.jordan@oracle.com>,
	"Mike Rapoport" <rppt@linux.ibm.com>,
	"Joel Fernandes" <joel@joelfernandes.org>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"Alexander Duyck" <alexander.h.duyck@linux.intel.com>,
	"Pavel Tatashin" <pavel.tatashin@microsoft.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Anthony Yznaga" <anthony.yznaga@oracle.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v3 3/4] mm: don't expose non-hugetlb page to fast gup prematurely
Date: Tue, 1 Oct 2019 09:10:08 +0200	[thread overview]
Message-ID: <20191001071008.GA25062@quack2.suse.cz> (raw)
In-Reply-To: <6bba357a-1706-7cdb-8a11-359157a21ae8@nvidia.com>

On Mon 30-09-19 10:57:08, John Hubbard wrote:
> On 9/30/19 2:20 AM, Jan Kara wrote:
> > On Fri 27-09-19 12:31:41, John Hubbard wrote:
> > > On 9/27/19 5:33 AM, Michal Hocko wrote:
> > > > On Thu 26-09-19 20:26:46, John Hubbard wrote:
> > > > > On 9/26/19 3:20 AM, Kirill A. Shutemov wrote:
> ...
> > > CPU 0                              CPU 1
> > > --------                           ---------------
> > >                                     get_user_pages_fast()
> > > 
> > > do_unmap()
> > >   unmap_region()
> > >    free_pgtables()
> > >                                     /*
> > >                                      * speculative reads, re-ordered
> > >                                      * by the CPU at run time, not
> > >                                      * compile time. The function calls
> > >                                      * are not really in this order, but
> > >                                      * the corresponding reads could be.
> > >                                      */
> > >                                     gup_pgd_range()
> > >                                      gup_p4d_range()
> > >                                       gup_pud_range()
> > >                                        gup_pmd_range()
> > >                                         pmd = READ_ONCE(*pmdp)
> > >                                          /* This is stale */
> > > 
> > >    tlb_finish_mmu()
> > >      tlb_flush_mmu()
> > >       tlb_flush_mmu_tlbonly()
> > >         tlb_flush()
> > >          flush_tlb_mm
> > >           flush_tlb_mm_range
> > >            flush_tlb_others
> > >             native_flush_tlb_others
> > >              smp_call_function_many: IPIs
> > >               ...blocks until CPU1 reenables
> > >                  interrupts
> > > 
> > >                                          local_irq_disable()
> > >                                           ...continue the page walk based
> > >                                              on stale/freed data...
> > 
> > Yes, but then we have:
> > 
> >                  head = try_get_compound_head(page, 1);
> > 
> > which has an atomic operation providing barrier in it and then we have:
> > 
> >                  if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> >                          put_page(head);
> >                          goto pte_unmap;
> >                  }
> > 
> > So we reload PTE again and check the value didn't change. Which should
> > prevent the race you explain above. Or do I miss anything?
> 
> 
> Well, a couple of questions:
> 
> 1. Is there *really* a memory barrier in try_get_compound_head()? Because
> I only see a READ_ONCE compile barrier, which would mean that run time
> reordering is still possible.

try_get_compound_head() has page_cache_add_speculative() which is
atomic_add_unless() which is guaranteed to provide ordering.

> 2. Your code above is after the "pmd = READ_ONCE(*pmdp)" line, so by then,
> it's already found the pte based on reading a stale pmd. So checking the
> pte seems like it's checking the wrong thing--it's too late, for this case,
> right?

Well, if PMD is getting freed, all PTEs in it should be cleared by that
time, shouldn't they? So although we read from stale PMD, either we already
see cleared PTE or the check pte_val(pte) != pte_val(*ptep) will fail and
so we never actually succeed in getting stale PTE entry (at least unless
the page table page that used to be PMD can get freed and reused
- which is not the case in the example you've shown above).

So I still don't see a problem. That being said I don't feel being expert
in this particular area. I just always thought GUP prevents races like this
by the scheme I describe so I'd like to see what I'm missing :).

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


  reply	other threads:[~2019-10-01  7:09 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-08 22:56 [PATCH] mm: don't expose page to fast gup before it's ready Yu Zhao
2018-01-09  8:46 ` Michal Hocko
2018-01-09 10:10   ` Yu Zhao
2018-01-31 23:07     ` Andrew Morton
2019-05-14 21:25     ` Andrew Morton
2019-05-14 23:07       ` Yu Zhao
2019-09-14  7:05         ` [PATCH v2] mm: don't expose page to fast gup prematurely Yu Zhao
2019-09-24 11:23           ` Kirill A. Shutemov
2019-09-24 22:05             ` Yu Zhao
2019-09-25 12:17               ` Kirill A. Shutemov
2019-09-26  3:58                 ` Yu Zhao
2019-09-24 23:24           ` [PATCH v3 1/4] mm: remove unnecessary smp_wmb() in collapse_huge_page() Yu Zhao
2019-09-24 23:24             ` [PATCH v3 2/4] mm: don't expose hugetlb page to fast gup prematurely Yu Zhao
2019-09-24 23:24             ` [PATCH v3 3/4] mm: don't expose non-hugetlb " Yu Zhao
2019-09-25  8:25               ` Peter Zijlstra
2019-09-25 22:26                 ` Yu Zhao
2019-09-26 10:20                   ` Kirill A. Shutemov
2019-09-27  3:26                     ` John Hubbard
2019-09-27  5:06                       ` Yu Zhao
2019-10-01 22:31                         ` John Hubbard
2019-10-02  0:00                           ` Yu Zhao
2019-09-27 12:33                       ` Michal Hocko
2019-09-27 18:31                         ` Yu Zhao
2019-09-27 19:31                         ` John Hubbard
2019-09-29 22:47                           ` John Hubbard
2019-09-30  9:20                           ` Jan Kara
2019-09-30 17:57                             ` John Hubbard
2019-10-01  7:10                               ` Jan Kara [this message]
2019-10-01  8:36                                 ` Peter Zijlstra
2019-10-01  8:40                                   ` Jan Kara
2019-10-01 18:43                                 ` John Hubbard
2019-10-02  9:24                                   ` Jan Kara
2019-10-02 17:33                                     ` John Hubbard
2019-09-24 23:24             ` [PATCH v3 4/4] mm: remove unnecessary smp_wmb() in __SetPageUptodate() Yu Zhao
2019-09-24 23:50               ` Matthew Wilcox
2019-09-25 22:03                 ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191001071008.GA25062@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=acme@kernel.org \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jgross@suse.com \
    --cc=jhubbard@nvidia.com \
    --cc=joel@joelfernandes.org \
    --cc=jolsa@redhat.com \
    --cc=jrdr.linux@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=ldr709@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=osandov@fb.com \
    --cc=pavel.tatashin@microsoft.com \
    --cc=peterz@infradead.org \
    --cc=rcampbell@nvidia.com \
    --cc=rientjes@google.com \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=thellstrom@vmware.com \
    --cc=vbabka@suse.cz \
    --cc=vpillai@digitalocean.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    --cc=ziqian.lzq@antfin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).