linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Jan Kara <jack@suse.cz>
Cc: "Michal Hocko" <mhocko@kernel.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	"Yu Zhao" <yuzhao@google.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Arnaldo Carvalho de Melo" <acme@kernel.org>,
	"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
	"Jiri Olsa" <jolsa@redhat.com>,
	"Namhyung Kim" <namhyung@kernel.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Hugh Dickins" <hughd@google.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	"David Rientjes" <rientjes@google.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Lance Roy" <ldr709@gmail.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Dave Airlie" <airlied@redhat.com>,
	"Thomas Hellstrom" <thellstrom@vmware.com>,
	"Souptick Joarder" <jrdr.linux@gmail.com>,
	"Mel Gorman" <mgorman@suse.de>,
	"Mike Kravetz" <mike.kravetz@oracle.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Aaron Lu" <ziqian.lzq@antfin.com>,
	"Omar Sandoval" <osandov@fb.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Vineeth Remanan Pillai" <vpillai@digitalocean.com>,
	"Daniel Jordan" <daniel.m.jordan@oracle.com>,
	"Mike Rapoport" <rppt@linux.ibm.com>,
	"Joel Fernandes" <joel@joelfernandes.org>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"Alexander Duyck" <alexander.h.duyck@linux.intel.com>,
	"Pavel Tatashin" <pavel.tatashin@microsoft.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Anthony Yznaga" <anthony.yznaga@oracle.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	"Paul E. McKenney" <paulmck@kernel.org>
Subject: Re: [PATCH v3 3/4] mm: don't expose non-hugetlb page to fast gup prematurely
Date: Wed, 2 Oct 2019 10:33:03 -0700	[thread overview]
Message-ID: <a6d5480b-fbe7-f481-6b34-243b4094793b@nvidia.com> (raw)
In-Reply-To: <20191002092447.GC9320@quack2.suse.cz>

On 10/2/19 2:24 AM, Jan Kara wrote:
> 
> OK, so you are concerned that the page table walk of pgd->p4d->pud->pmd
> will get prefetched by the CPU before interrupts are disabled, somewhere in
> the middle of the walk IPI flushing TLBs on the cpu will happen allowing
> munmap() on another CPU to proceed and free page tables so the rest of the
> walk will happen on freed and possibly reused pages. Do I understand you
> right?
> 

Yes, that's it.

thanks,
-- 
John Hubbard
NVIDIA


> Realistically, I don't think this can happen as I'd expect the CPU to throw
> away the speculation state on interrupt. But that's just my expectation and
> I agree that I don't see anything in Documentation/memory-barriers.txt that
> would prevent what you are concerned about. Let's ask Paul :)
> 
> Paul, we are discussing here a possible races between
> mm/gup.c:__get_user_pages_fast() and mm/mmap.c:unmap_region(). The first
> has a code like:
> 
> local_irq_save(flags);
> load pgd from current->mm
> load p4d from pgd
> load pud from p4d
> load pmd from pud
> ...
> local_irq_restore(flags);
> 
> while the second has code like:
> 
> unmap_region()
>    walk pgd
>      walk p4d
>        walk pud
>          walk pmd
>            clear ptes
>            flush tlb
>    free page tables
> 
> Now the serialization between these two relies on the fact that flushing
> TLBs from unmap_region() requires IPI to be served on each CPU so in naive
> understanding unmap_region() shouldn't be able to get to 'free unused page
> tables' part until __get_user_pages_fast() enables interrupts again. Now as
> John points out we don't see anything in Documentation/memory-barriers.txt
> that would actually guarantee this. So do we really need something like
> smp_rmb() after disabling interrupts in __get_user_pages_fast() or is the
> race John is concerned about impossible? Thanks!
> 
>> This one has a problem, because Documentation/memory-barriers.txt points out:
>>
>> INTERRUPT DISABLING FUNCTIONS
>> -----------------------------
>>
>> Functions that disable interrupts (ACQUIRE equivalent) and enable interrupts
>> (RELEASE equivalent) will act as compiler barriers only.  So if memory or I/O
>> barriers are required in such a situation, they must be provided from some
>> other means.
>>
>>
>> ...and so I'm suggesting that we need something approximately like this:
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index 23a9f9c9d377..1678d50a2d8b 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -2415,7 +2415,9 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
>>          if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) &&
>>              gup_fast_permitted(start, end)) {
>>                  local_irq_disable();
>> +               smp_mb();
>>                  gup_pgd_range(addr, end, gup_flags, pages, &nr);
>> +               smp_mb();
>>                  local_irq_enable();
>>                  ret = nr;
> 
> 								Honza
> 


  reply	other threads:[~2019-10-02 17:35 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-08 22:56 [PATCH] mm: don't expose page to fast gup before it's ready Yu Zhao
2018-01-09  8:46 ` Michal Hocko
2018-01-09 10:10   ` Yu Zhao
2018-01-31 23:07     ` Andrew Morton
2019-05-14 21:25     ` Andrew Morton
2019-05-14 23:07       ` Yu Zhao
2019-09-14  7:05         ` [PATCH v2] mm: don't expose page to fast gup prematurely Yu Zhao
2019-09-24 11:23           ` Kirill A. Shutemov
2019-09-24 22:05             ` Yu Zhao
2019-09-25 12:17               ` Kirill A. Shutemov
2019-09-26  3:58                 ` Yu Zhao
2019-09-24 23:24           ` [PATCH v3 1/4] mm: remove unnecessary smp_wmb() in collapse_huge_page() Yu Zhao
2019-09-24 23:24             ` [PATCH v3 2/4] mm: don't expose hugetlb page to fast gup prematurely Yu Zhao
2019-09-24 23:24             ` [PATCH v3 3/4] mm: don't expose non-hugetlb " Yu Zhao
2019-09-25  8:25               ` Peter Zijlstra
2019-09-25 22:26                 ` Yu Zhao
2019-09-26 10:20                   ` Kirill A. Shutemov
2019-09-27  3:26                     ` John Hubbard
2019-09-27  5:06                       ` Yu Zhao
2019-10-01 22:31                         ` John Hubbard
2019-10-02  0:00                           ` Yu Zhao
2019-09-27 12:33                       ` Michal Hocko
2019-09-27 18:31                         ` Yu Zhao
2019-09-27 19:31                         ` John Hubbard
2019-09-29 22:47                           ` John Hubbard
2019-09-30  9:20                           ` Jan Kara
2019-09-30 17:57                             ` John Hubbard
2019-10-01  7:10                               ` Jan Kara
2019-10-01  8:36                                 ` Peter Zijlstra
2019-10-01  8:40                                   ` Jan Kara
2019-10-01 18:43                                 ` John Hubbard
2019-10-02  9:24                                   ` Jan Kara
2019-10-02 17:33                                     ` John Hubbard [this message]
2019-09-24 23:24             ` [PATCH v3 4/4] mm: remove unnecessary smp_wmb() in __SetPageUptodate() Yu Zhao
2019-09-24 23:50               ` Matthew Wilcox
2019-09-25 22:03                 ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a6d5480b-fbe7-f481-6b34-243b4094793b@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=acme@kernel.org \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jgross@suse.com \
    --cc=joel@joelfernandes.org \
    --cc=jolsa@redhat.com \
    --cc=jrdr.linux@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=ldr709@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=osandov@fb.com \
    --cc=paulmck@kernel.org \
    --cc=pavel.tatashin@microsoft.com \
    --cc=peterz@infradead.org \
    --cc=rcampbell@nvidia.com \
    --cc=rientjes@google.com \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=thellstrom@vmware.com \
    --cc=vbabka@suse.cz \
    --cc=vpillai@digitalocean.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    --cc=ziqian.lzq@antfin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).