All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Jan Kara <jack@suse.cz>
Cc: "Michal Hocko" <mhocko@kernel.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	"Yu Zhao" <yuzhao@google.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	"Ingo Molnar" <mingo@redhat.com>,
	"Arnaldo Carvalho de Melo" <acme@kernel.org>,
	"Alexander Shishkin" <alexander.shishkin@linux.intel.com>,
	"Jiri Olsa" <jolsa@redhat.com>,
	"Namhyung Kim" <namhyung@kernel.org>,
	"Vlastimil Babka" <vbabka@suse.cz>,
	"Hugh Dickins" <hughd@google.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	"David Rientjes" <rientjes@google.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Lance Roy" <ldr709@gmail.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Dave Airlie" <airlied@redhat.com>,
	"Thomas Hellstrom" <thellstrom@vmware.com>,
	"Souptick Joarder" <jrdr.linux@gmail.com>,
	"Mel Gorman" <mgorman@suse.de>,
	"Mike Kravetz" <mike.kravetz@oracle.com>,
	"Huang Ying" <ying.huang@intel.com>,
	"Aaron Lu" <ziqian.lzq@antfin.com>,
	"Omar Sandoval" <osandov@fb.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Vineeth Remanan Pillai" <vpillai@digitalocean.com>,
	"Daniel Jordan" <daniel.m.jordan@oracle.com>,
	"Mike Rapoport" <rppt@linux.ibm.com>,
	"Joel Fernandes" <joel@joelfernandes.org>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"Alexander Duyck" <alexander.h.duyck@linux.intel.com>,
	"Pavel Tatashin" <pavel.tatashin@microsoft.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Anthony Yznaga" <anthony.yznaga@oracle.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v3 3/4] mm: don't expose non-hugetlb page to fast gup prematurely
Date: Tue, 1 Oct 2019 11:43:30 -0700	[thread overview]
Message-ID: <d7371e9b-c68b-1793-d10d-c1e7a504855c@nvidia.com> (raw)
In-Reply-To: <20191001071008.GA25062@quack2.suse.cz>

On 10/1/19 12:10 AM, Jan Kara wrote:
> On Mon 30-09-19 10:57:08, John Hubbard wrote:
>> On 9/30/19 2:20 AM, Jan Kara wrote:
>>> On Fri 27-09-19 12:31:41, John Hubbard wrote:
>>>> On 9/27/19 5:33 AM, Michal Hocko wrote:
>>>>> On Thu 26-09-19 20:26:46, John Hubbard wrote:
>>>>>> On 9/26/19 3:20 AM, Kirill A. Shutemov wrote:
>> ...
>> 2. Your code above is after the "pmd = READ_ONCE(*pmdp)" line, so by then,
>> it's already found the pte based on reading a stale pmd. So checking the
>> pte seems like it's checking the wrong thing--it's too late, for this case,
>> right?
> 
> Well, if PMD is getting freed, all PTEs in it should be cleared by that
> time, shouldn't they? So although we read from stale PMD, either we already
> see cleared PTE or the check pte_val(pte) != pte_val(*ptep) will fail and
> so we never actually succeed in getting stale PTE entry (at least unless
> the page table page that used to be PMD can get freed and reused
> - which is not the case in the example you've shown above).
>

Right, that's not what the example shows, but there is nothing here to prevent
the page table pages from being freed and re-used.

 
> So I still don't see a problem. That being said I don't feel being expert
> in this particular area. I just always thought GUP prevents races like this
> by the scheme I describe so I'd like to see what I'm missing :).
> 

I'm very much still "in training" here, so I hope I'm not wasting everyone's 
time. But I feel confident in stating at least this much:

There are two distinct lockless synchronization mechanisms here, each protecting
against a different issue, and it's important not to conflate them and think that
one protects against the other. I still see a hole in (2) below. The mechanisms 
are:

1) Protection against a page (not the page table itself) getting freed
while get_user_pages*() is trying to take a reference to it. This is avoided
by the try-get and the re-checking of pte values that you mention above.
It's an elegant little thing, too. :)

2) Protection against page tables getting freed while a get_user_pages_fast()
call is in progress. This relies on disabling interrupts in gup_fast(), while
firing interrupts in the freeing path (via tlb flushing, IPIs). And on memory
barriers (doh--missing!) to avoid leaking memory references outside of the
irq disabling. 

This one has a problem, because Documentation/memory-barriers.txt points out:

INTERRUPT DISABLING FUNCTIONS
-----------------------------

Functions that disable interrupts (ACQUIRE equivalent) and enable interrupts
(RELEASE equivalent) will act as compiler barriers only.  So if memory or I/O
barriers are required in such a situation, they must be provided from some
other means.


...and so I'm suggesting that we need something approximately like this:

diff --git a/mm/gup.c b/mm/gup.c
index 23a9f9c9d377..1678d50a2d8b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2415,7 +2415,9 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
        if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) &&
            gup_fast_permitted(start, end)) {
                local_irq_disable();
+               smp_mb();
                gup_pgd_range(addr, end, gup_flags, pages, &nr);
+               smp_mb();
                local_irq_enable();
                ret = nr;


thanks,
-- 
John Hubbard
NVIDIA


  parent reply	other threads:[~2019-10-01 18:43 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-08 22:56 [PATCH] mm: don't expose page to fast gup before it's ready Yu Zhao
2018-01-08 22:56 ` Yu Zhao
2018-01-09  8:46 ` Michal Hocko
2018-01-09  8:46   ` Michal Hocko
2018-01-09 10:10   ` Yu Zhao
2018-01-09 10:10     ` Yu Zhao
2018-01-31 23:07     ` Andrew Morton
2018-01-31 23:07       ` Andrew Morton
2019-05-14 21:25     ` Andrew Morton
2019-05-14 23:07       ` Yu Zhao
2019-09-14  7:05         ` [PATCH v2] mm: don't expose page to fast gup prematurely Yu Zhao
2019-09-24 11:23           ` Kirill A. Shutemov
2019-09-24 22:05             ` Yu Zhao
2019-09-25 12:17               ` Kirill A. Shutemov
2019-09-26  3:58                 ` Yu Zhao
2019-09-24 23:24           ` [PATCH v3 1/4] mm: remove unnecessary smp_wmb() in collapse_huge_page() Yu Zhao
2019-09-24 23:24             ` Yu Zhao
2019-09-24 23:24             ` [PATCH v3 2/4] mm: don't expose hugetlb page to fast gup prematurely Yu Zhao
2019-09-24 23:24               ` Yu Zhao
2019-09-24 23:24             ` [PATCH v3 3/4] mm: don't expose non-hugetlb " Yu Zhao
2019-09-24 23:24               ` Yu Zhao
2019-09-25  8:25               ` Peter Zijlstra
2019-09-25 22:26                 ` Yu Zhao
2019-09-26 10:20                   ` Kirill A. Shutemov
2019-09-27  3:26                     ` John Hubbard
2019-09-27  5:06                       ` Yu Zhao
2019-10-01 22:31                         ` John Hubbard
2019-10-02  0:00                           ` Yu Zhao
2019-09-27 12:33                       ` Michal Hocko
2019-09-27 18:31                         ` Yu Zhao
2019-09-27 19:31                         ` John Hubbard
2019-09-29 22:47                           ` John Hubbard
2019-09-30  9:20                           ` Jan Kara
2019-09-30 17:57                             ` John Hubbard
2019-10-01  7:10                               ` Jan Kara
2019-10-01  8:36                                 ` Peter Zijlstra
2019-10-01  8:40                                   ` Jan Kara
2019-10-01 18:43                                 ` John Hubbard [this message]
2019-10-02  9:24                                   ` Jan Kara
2019-10-02 17:33                                     ` John Hubbard
2019-09-24 23:24             ` [PATCH v3 4/4] mm: remove unnecessary smp_wmb() in __SetPageUptodate() Yu Zhao
2019-09-24 23:24               ` Yu Zhao
2019-09-24 23:50               ` Matthew Wilcox
2019-09-25 22:03                 ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d7371e9b-c68b-1793-d10d-c1e7a504855c@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=acme@kernel.org \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anthony.yznaga@oracle.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jgross@suse.com \
    --cc=joel@joelfernandes.org \
    --cc=jolsa@redhat.com \
    --cc=jrdr.linux@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=ldr709@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=osandov@fb.com \
    --cc=pavel.tatashin@microsoft.com \
    --cc=peterz@infradead.org \
    --cc=rcampbell@nvidia.com \
    --cc=rientjes@google.com \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=thellstrom@vmware.com \
    --cc=vbabka@suse.cz \
    --cc=vpillai@digitalocean.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    --cc=ziqian.lzq@antfin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.