From: Hugh Dickins <hughd@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: Hugh Dickins <hughd@google.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Jerome Glisse <jglisse@redhat.com>, Joe Perches <joe@perches.com>,
Lokesh Gidra <lokeshgidra@google.com>,
Mike Rapoport <rppt@linux.vnet.ibm.com>, Shaohua Li <shli@fb.com>,
Shuah Khan <shuah@kernel.org>,
Stephen Rothwell <sfr@canb.auug.org.au>,
Wang Qing <wangqing@vivo.com>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
Brian Geffon <bgeffon@google.com>,
Cannon Matthews <cannonmatthews@google.com>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
David Rientjes <rientjes@google.com>,
Michel Lespinasse <walken@google.com>,
Mina Almasry <almasrymina@google.com>,
Oliver Upton <oupton@google.com>
Subject: Re: [PATCH v4] userfaultfd/shmem: fix MCOPY_ATOMIC_CONTINUE behavior
Date: Mon, 12 Apr 2021 17:51:14 -0700 (PDT) [thread overview]
Message-ID: <alpine.LSU.2.11.2104121657050.1097@eggly.anvils> (raw)
In-Reply-To: <20210412215437.GA1001332@xz-x1>
On Mon, 12 Apr 2021, Peter Xu wrote:
> On Tue, Apr 06, 2021 at 11:14:30PM -0700, Hugh Dickins wrote:
> > > +static int mcopy_atomic_install_ptes(struct mm_struct *dst_mm, pmd_t *dst_pmd,
> > > + struct vm_area_struct *dst_vma,
> > > + unsigned long dst_addr, struct page *page,
> > > + enum mcopy_atomic_mode mode, bool wp_copy)
> > > +{
>
> [...]
>
> > > + if (writable) {
> > > + _dst_pte = pte_mkdirty(_dst_pte);
> > > + if (wp_copy)
> > > + _dst_pte = pte_mkuffd_wp(_dst_pte);
> > > + else
> > > + _dst_pte = pte_mkwrite(_dst_pte);
> > > + } else if (vm_shared) {
> > > + /*
> > > + * Since we didn't pte_mkdirty(), mark the page dirty or it
> > > + * could be freed from under us. We could do this
> > > + * unconditionally, but doing it only if !writable is faster.
> > > + */
> > > + set_page_dirty(page);
> >
> > I do not remember why Andrea or I preferred set_page_dirty() here to
> > pte_mkdirty(); but I suppose there might somewhere be a BUG_ON(pte_dirty)
> > which this would avoid. Risky to change it, though it does look odd.
>
> Is any of the possible BUG_ON(pte_dirty) going to trigger because the pte has
> write bit cleared? That's one question I was not very sure, e.g., whether one
> pte is allowed to be "dirty" if it's not writable.
>
> To me it's okay, it's actually very suitable for UFFDIO_COPY case, where it is
> definitely dirty data (so we must never drop it) even if it's installed as RO,
> however to achieve that we can still set the dirty on the page rather than the
> pte as what we do here. It's just a bit awkward as you said.
>
> Meanwhile today I just noticed this in arm64 code:
>
> static inline pte_t pte_wrprotect(pte_t pte)
> {
> /*
> * If hardware-dirty (PTE_WRITE/DBM bit set and PTE_RDONLY
> * clear), set the PTE_DIRTY bit.
> */
> if (pte_hw_dirty(pte))
> pte = pte_mkdirty(pte);
>
> pte = clear_pte_bit(pte, __pgprot(PTE_WRITE));
> pte = set_pte_bit(pte, __pgprot(PTE_RDONLY));
> return pte;
> }
>
> So arm64 will explicitly set the dirty bit (from the HW dirty bit) when
> wr-protect. It seems to prove that at least for arm64 it's very valid to have
> !write && dirty pte.
I did not mean to imply that it's wrong to have pte_dirty without
pte_write: no, I agree with you, I believe that there are accepted
and generic ways in which we can have pte_dirty without pte_write
(and we could each probably add a warning somewhere which would
very quickly prove that - but those would not prove that there
are not BUG_ONs on some other path, which had been my fear).
I wanted now to demonstrate that by pointing to change_pte_range() in
mm/mprotect.c, showing that it does not clear pte_dirty when it clears
pte_write. But alarmingly found rather the reverse: that it appears to
set pte_write when it finds pte_dirty - if dirty_accountable.
That looks very wrong, but if I spent long enough following up
dirty_accountable in detail, I think I would be reassured to find that
it is only adding the pte_write there when it had removed it from the
prot passed down, for dirty accounting reasons (which apply !VM_SHARED
protections in the VM_SHARED case, so that page_mkwrite() is called
and dirty accounting done when necessary).
What I did mean to imply is that changing set_page_dirty to pte_mkdirty,
to make that userfaultfd code block look nicer, is not a change to be
done lightly: by all means try it out, test it, and send a patch after
Axel's series is in, but please do not ask Axel to make that change as
a part of his series - it would be taking a risk, just for a cleanup.
Now, I have also looked up the mail exchange with Andrea which led to
his dcf7fe9d8976 ("userfaultfd: shmem: UFFDIO_COPY: set the page dirty
if VM_WRITE is not set") - it had to be off-list at the time. And he
was rather led to that set_page_dirty by following old patterns left
over in shmem_getpage_gfp(); but when I said "or it could be done with
pte_mkdirty without pte_mkwrite", he answered "I explicitly avoided
that because pte_dirty then has side effects on mprotect to decide
pte_write. It looks safer to do set_page_dirty and not set dirty bits
in not writable ptes unnecessarily".
Haha: I think Andrea is referring to exactly the dirty_accountable code
in change_pte_protection() which worried me above. Now, I think that
will turn out okay (shmem does not have a page_mkwrite(), and does not
participate in dirty accounting), but you will have to do some work to
assure us all of that, before sending in a cleanup patch.
Hugh
next prev parent reply other threads:[~2021-04-13 0:51 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-01 18:37 [PATCH v4] userfaultfd/shmem: fix MCOPY_ATOMIC_CONTINUE behavior Axel Rasmussen
2021-04-07 6:14 ` Hugh Dickins
2021-04-07 20:33 ` Axel Rasmussen
2021-04-07 22:50 ` Hugh Dickins
2021-04-12 21:54 ` Peter Xu
2021-04-13 0:51 ` Hugh Dickins [this message]
2021-04-13 1:21 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.11.2104121657050.1097@eggly.anvils \
--to=hughd@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=axelrasmussen@google.com \
--cc=bgeffon@google.com \
--cc=cannonmatthews@google.com \
--cc=dgilbert@redhat.com \
--cc=jglisse@redhat.com \
--cc=joe@perches.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lokeshgidra@google.com \
--cc=oupton@google.com \
--cc=peterx@redhat.com \
--cc=rientjes@google.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=sfr@canb.auug.org.au \
--cc=shli@fb.com \
--cc=shuah@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=walken@google.com \
--cc=wangqing@vivo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).