All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Jens Axboe <axboe@kernel.dk>,
	Matthew Wilcox <willy@infradead.org>,
	Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>,
	Leon Romanovsky <leon@kernel.org>,
	Christian Benvenuti <benve@cisco.com>,
	Nelson Escobar <neescoba@cisco.com>,
	Bernard Metzler <bmt@zurich.ibm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>,
	Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Bjorn Topel <bjorn@kernel.org>,
	Magnus Karlsson <magnus.karlsson@intel.com>,
	Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
	Jonathan Lemon <jonathan.lemon@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Christian Brauner <brauner@kernel.org>,
	Richard Cochran <richardcochran@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	Oleg Nesterov <oleg@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>, Jan Kara <jack@suse.cz>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Pavel Begunkov <asml.silence@gmail.com>,
	Mika Penttila <mpenttil@redhat.com>,
	Dave Chinner <david@fromorbit.com>, Theodore Ts'o <tytso@mit.edu>,
	Peter Xu <peterx@redhat.com>,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Mike Rapoport <rppt@linux.ibm.com>
Subject: Re: [PATCH v7 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast writing to file-backed mappings
Date: Tue, 2 May 2023 20:51:21 -0300	[thread overview]
Message-ID: <ZFGh+UUd3IMZk0lb@nvidia.com> (raw)
In-Reply-To: <434c60e6-7ac4-229b-5db0-5175afbcfff5@redhat.com>

On Tue, May 02, 2023 at 09:33:45PM +0200, David Hildenbrand wrote:

> I'll just stress again that I think there are cases where we unmap and free
> a page without synchronizing against GUP-fast using an IPI or RCU.

AFAIK this is true on ARM64 and other arches that do not use IPIs for
TLB shootdown.

Eg the broadcast TLBI described here:

https://developer.arm.com/documentation/101811/0102/Translation-Lookaside-Buffer-maintenance

TLB invalidation of remote CPUs Is done at the interconnect level and
does not trigger any interrupt.

So, arches that don't use IPI have to RCU free the page table entires
to work with GUP fast. They will set MMU_GATHER_RCU_TABLE_FREE. The
whole interrupt disable thing in GUP turns into nothing more than a
hacky RCU on those arches.

The ugly bit is the comment:

 * We do not adopt an rcu_read_lock() here as we also want to block IPIs
 * that come from THPs splitting.

Which, I think, today can be summarized in today's code base as
serializing with split_huge_page_to_list().

I don't know this code well, but the comment there says "Only caller
must hold pin on the @page" which is only possible if all the PTEs
have been replaced with migration entries or otherwise before we get
here. So the IPI the comment talks about, I suppose, is from the
installation of the migration entry?

However gup_huge_pmd() does the usual read, ref, check pattern, and
split_huge_page_to_list() uses page_ref_freeze() which is cmpxchg

So the three races seem to resolve themselves without IPI magic
 - GUP sees the THP folio before the migration entry and +1's the ref
   split_huge_page_to_list() bails beacuse the ref is elevated
 - GUP fails to +1 the ref because it is zero and bails,
   split_huge_page_to_list() made it zero, so it runs
 - GUP sees the THP folio, split_huge_page_to_list() ran to
   completion, and then GUP +1's a 4k page. The recheck of pmd_val
   will see the migration entry, or the new PTE table pointer, but
   never the original THP folio. So GUP will bail. The speculative
   ref on the 4k page is harmless.

I can't figure out what this 2014 comment means in today's code.

Or where ARM64 hid the "IPI broadcast on THP split" :)

See commit 2667f50e8b81457fcb4a3dbe6aff3e81ea009e13

> That's one of the reasons why we recheck if the PTE changed to back off, so
> I've been told.

Yes, with no synchronizing point the refcount in GUP fast could be
taken on the page after it has been free'd and reallocated, but this
is only possible on RCU

> I'm happy if someone proves me wrong and a page we just (temporarily) pinned
> cannot have been freed+reused in the meantime.

Sadly I think no.. We'd need to RCU free the page itself as well to
make this true.

Jason

  parent reply	other threads:[~2023-05-02 23:51 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-02 16:34 [PATCH v7 0/3] mm/gup: disallow GUP writing to file-backed mappings by default Lorenzo Stoakes
2023-05-02 16:34 ` [PATCH v7 1/3] mm/mmap: separate writenotify and dirty tracking logic Lorenzo Stoakes
2023-05-02 16:38   ` David Hildenbrand
2023-05-02 16:53     ` Lorenzo Stoakes
2023-05-02 17:09       ` Lorenzo Stoakes
2023-05-02 17:16         ` David Hildenbrand
2023-05-02 16:34 ` [PATCH v7 2/3] mm/gup: disallow FOLL_LONGTERM GUP-nonfast writing to file-backed mappings Lorenzo Stoakes
2023-05-02 16:42   ` David Hildenbrand
2023-05-02 16:56     ` Lorenzo Stoakes
2023-05-02 16:34 ` [PATCH v7 3/3] mm/gup: disallow FOLL_LONGTERM GUP-fast " Lorenzo Stoakes
2023-05-02 17:13   ` David Hildenbrand
2023-05-02 17:22     ` Peter Zijlstra
2023-05-02 17:34       ` David Hildenbrand
2023-05-02 18:17         ` Lorenzo Stoakes
2023-05-02 19:07           ` David Hildenbrand
2023-05-02 19:07           ` Jason Gunthorpe
2023-05-02 19:25             ` Lorenzo Stoakes
2023-05-02 19:33               ` David Hildenbrand
2023-05-02 19:37                 ` Lorenzo Stoakes
2023-05-02 23:51                 ` Jason Gunthorpe [this message]
2023-05-03  0:22               ` Jason Gunthorpe
2023-05-02 18:59         ` Peter Zijlstra
2023-05-02 19:05           ` David Hildenbrand
2023-05-02 17:34       ` Lorenzo Stoakes
2023-05-02 17:31     ` Lorenzo Stoakes
2023-05-02 17:38       ` David Hildenbrand
2023-05-02 17:45         ` Lorenzo Stoakes
2023-05-02 19:17   ` David Hildenbrand
2023-05-02 19:45     ` Lorenzo Stoakes
2023-05-02 18:45 ` [PATCH v7 0/3] mm/gup: disallow GUP writing to file-backed mappings by default Matthew Rosato
2023-05-02 18:53   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZFGh+UUd3IMZk0lb@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=asml.silence@gmail.com \
    --cc=ast@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=benve@cisco.com \
    --cc=bjorn@kernel.org \
    --cc=bmt@zurich.ibm.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=david@fromorbit.com \
    --cc=david@redhat.com \
    --cc=dennis.dalessandro@cornelisnetworks.com \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=irogers@google.com \
    --cc=jack@suse.cz \
    --cc=jhubbard@nvidia.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=jonathan.lemon@gmail.com \
    --cc=kirill@shutemov.name \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=lstoakes@gmail.com \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mjrosato@linux.ibm.com \
    --cc=mpenttil@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=neescoba@cisco.com \
    --cc=netdev@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=pabeni@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=richardcochran@gmail.com \
    --cc=rppt@linux.ibm.com \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.