From: Hugh Dickins <hughd@google.com>
To: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Hugh Dickins <hughd@google.com>,
Jerome Glisse <jglisse@redhat.com>, Joe Perches <joe@perches.com>,
Lokesh Gidra <lokeshgidra@google.com>,
Mike Rapoport <rppt@linux.vnet.ibm.com>, Shaohua Li <shli@fb.com>,
Shuah Khan <shuah@kernel.org>,
Stephen Rothwell <sfr@canb.auug.org.au>,
Wang Qing <wangqing@vivo.com>,
LKML <linux-kernel@vger.kernel.org>,
linux-fsdevel@vger.kernel.org, Linux MM <linux-mm@kvack.org>,
linux-kselftest@vger.kernel.org,
Brian Geffon <bgeffon@google.com>,
Cannon Matthews <cannonmatthews@google.com>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
David Rientjes <rientjes@google.com>,
Michel Lespinasse <walken@google.com>,
Mina Almasry <almasrymina@google.com>,
Oliver Upton <oupton@google.com>
Subject: Re: [PATCH v5] userfaultfd/shmem: fix MCOPY_ATOMIC_CONTINUE behavior
Date: Thu, 8 Apr 2021 20:08:07 -0700 (PDT) [thread overview]
Message-ID: <alpine.LSU.2.11.2104081912360.19277@eggly.anvils> (raw)
In-Reply-To: <CAJHvVcjqu7XymFBOMJTuF03Tts7=pOcs0nTZy25Y=t6sYQPJrw@mail.gmail.com>
On Thu, 8 Apr 2021, Axel Rasmussen wrote:
> On Tue, Apr 6, 2021 at 4:49 PM Peter Xu <peterx@redhat.com> wrote:
> > On Mon, Apr 05, 2021 at 10:19:17AM -0700, Axel Rasmussen wrote:
...
> > > --- a/mm/userfaultfd.c
> > > +++ b/mm/userfaultfd.c
...
> > > +
> > > + if (is_file_backed) {
> > > + /* The shmem MAP_PRIVATE case requires checking the i_size */
> > > + inode = dst_vma->vm_file->f_inode;
> > > + offset = linear_page_index(dst_vma, dst_addr);
> > > + max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
> > > + ret = -EFAULT;
> > > + if (unlikely(offset >= max_off))
> > > + goto out_unlock;
> >
> > Frankly I don't really know why this must be put into pgtable lock.. Since if
> > not required then it can be moved into UFFDIO_COPY path, as CONTINUE doesn't
> > need it iiuc. Just raise it up as a pure question.
>
> It's not clear to me either. shmem_getpage_gfp() does check this twice
> kinda like we're doing, but it doesn't ever touch the PTL. What it
> seems to be worried about is, what happens if a concurrent
> FALLOC_FL_PUNCH_HOLE happens somewhere in the middle of whatever
> manipulation we're doing? From looking at shmem_fallocate(), I think
> the basic point is that truncation happens while "inode_lock(inode)"
> is held, but neither shmem_mcopy_atomic_pte() or the new
> mcopy_atomic_install_ptes() take that lock.
>
> I'm a bit hesitant to just remove it, run some tests, and then declare
> victory, because it seems plausible it's there to catch some
> semi-hard-to-induce race. I'm not sure how to prove that *isn't*
> needed, so my inclination is to just keep it?
>
> I'll send a series addressing the feedback so far this afternoon, and
> I'll leave this alone for now - at least, it doesn't seem to hurt
> anything. Maybe Hugh or someone else has some more advice about it. If
> so, I'm happy to remove it in a follow-up.
It takes some thinking about, but the i_size check is required to be
under the pagetable lock, for the MAP_PRIVATE UFFDIO_COPY path, where
it is inserting an anonymous page into the file-backed vma (skipping
actually inserting a page into page cache, as an ordinary fault would).
Not because of FALLOC_FL_PUNCH_HOLE (which makes no change to i_size;
and it's okay if a race fills in the hole immediately afterwards),
but because of truncation (which must remove all beyond i_size).
In the MAP_SHARED case, with a locked page inserted into page cache,
the page lock is enough to exclude concurrent truncation. But even
in that case the second i_size check (I'm looking at 5.12-rc's
shmem_mfill_atomic_pte(), rather than recent patches which might differ)
is required: because the first i_size check was done before the page
became visible in page cache, so a concurrent truncation could miss it).
Maybe that first check is redundant, though I'm definitely for doing it;
or maybe shmem_add_to_page_cache() would be better if it made that check
itself, under xas_lock (I think the reason it does not is historical).
The second check, in the MAP_SHARED case, does not need to be under
pagetable lock - the page lock on the page cache page is enough -
but probably Andrea placed it there to resemble the anonymous case.
You might then question, how come there is no i_size check in all of
mm/memory.c, where ordinary faulting is handled. I'll answer that
the pte_same() check, under pagetable lock in wp_page_copy(), is
where the equivalent to userfaultfd's MAP_PRIVATE UFFDIO_COPY check
is made: if the page cache page has already been truncated, that pte
will have been cleared.
Or, if the page cache page is truncated an instant after wp_page_copy()
drops page table lock, then the unmap_mapping_range(,,, even_cows = 1)
which follows truncation has to clean it up. Er, does that mean that
the i_size check I started off insisting is required, actually is not
required? Um, maybe, but let's just keep it and say goodnight!
Hugh
prev parent reply other threads:[~2021-04-09 3:08 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-05 17:19 [PATCH v5] userfaultfd/shmem: fix MCOPY_ATOMIC_CONTINUE behavior Axel Rasmussen
2021-04-06 23:49 ` Peter Xu
2021-04-08 22:08 ` Axel Rasmussen
2021-04-09 3:08 ` Hugh Dickins [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.11.2104081912360.19277@eggly.anvils \
--to=hughd@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=axelrasmussen@google.com \
--cc=bgeffon@google.com \
--cc=cannonmatthews@google.com \
--cc=dgilbert@redhat.com \
--cc=jglisse@redhat.com \
--cc=joe@perches.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lokeshgidra@google.com \
--cc=oupton@google.com \
--cc=peterx@redhat.com \
--cc=rientjes@google.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=sfr@canb.auug.org.au \
--cc=shli@fb.com \
--cc=shuah@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=walken@google.com \
--cc=wangqing@vivo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).