From: Axel Rasmussen <axelrasmussen@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Shuah Khan <shuah@kernel.org>, Linux MM <linux-mm@kvack.org>,
Linuxkselftest <linux-kselftest@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
James Houghton <jthoughton@google.com>
Subject: Re: [PATCH 1/3] userfaultfd/selftests: fix feature support detection
Date: Wed, 22 Sep 2021 15:29:42 -0700 [thread overview]
Message-ID: <CAJHvVcijQdS_hfUnasz7BhhQeiHmNu=C5j8xfX=uWsfVO9-+Eg@mail.gmail.com> (raw)
In-Reply-To: <YUulep3+YADkwlUu@t490s>
On Wed, Sep 22, 2021 at 2:52 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Sep 22, 2021 at 01:54:53PM -0700, Axel Rasmussen wrote:
> > On Wed, Sep 22, 2021 at 10:33 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > Hello, Axel,
> > >
> > > On Wed, Sep 22, 2021 at 10:04:03AM -0700, Axel Rasmussen wrote:
> > > > Thanks for discussing the design Peter. I have some ideas which might
> > > > make for a nicer v2; I'll massage the code a bit and see what I can
> > > > come up with.
> > >
> > > Sure thing. Note again that as I don't have a strong opinion on that, feel
> > > free to keep it. However if you provide v2, I'll read.
> > >
> > > [off-topic below]
> > >
> > > Another thing I probably have forgot but need your confirmation is, when you
> > > worked on uffd minor mode, did you explicitly disable thp, or is it allowed?
> >
> > I gave a more detailed answer in the other thread, but: currently it
> > is allowed, but this was a bug / oversight on my part. :) THP collapse
> > can break the guarantees minor fault registration is trying to
> > provide.
>
> I've replied there:
>
> https://lore.kernel.org/linux-mm/YUueOUfoamxOvEyO@t490s/
>
> We can try to keep the discussion unified there regarding this.
>
> > But there's another scenario: what if the collapse happened well
> > before registration happened?
>
> Maybe yes, but my understanding of the current uffd-minor scenario tells me
> that this is fine too. Meanwhile I actually have another idea regarding minor
> mode, please continue reading.
>
> Firstly, let me try to re-cap on how minor mode is used in your production
> systems: I believe there should have two processes A and B, if A is the main
> process, B could be the migration process. B migrates pages in the background,
> while A so far should have been stopped and never ran. When we want to start
> A, we should register A with uffd-minor upon the whole range (note: I think so
> far A does not have any pgtable mapped within uffd-minor range). Then any page
> access of A should kick B and asking "whether it is the latest page", if yes
> then UFFDIO_CONTINUE, if no then B modifies the page, plus UFFDIO_CONTINUE
> afterwards. Am I right above?
>
> So if that's the case, then A should have no page table at all.
>
> Then, is that a problem if the shmem file that A maps contains huge thps? I
> think no - because UFFDIO_CONTINUE will only install small pages.
>
> Let me know if I'm understanding it right above; I'll be happy to be corrected.
Right, except that our use case is even more similar to QEMU: the code
doing UFFDIO_CONTINUE / demand paging, and the code running the vCPUs,
are in the same process (same mm) - just different threads.
>
> Actually besides this scenario, I'm also thinking of another scenario of using
> minor fault in a single process - that's mostly what QEMU is doing right now,
> as QEMU has the vcpu threads and migration thread sharing a single mm/pgtable.
> So I think it'll be great to have a new madvise(MADV_ZAP) which will tear down
> all the file-backed memory pgtables of a specific range. I think it'll suite
> perfectly for the minor fault use case, and it can be used for other things
> too. Let me know what you think about this idea, and whether that'll help in
> your case too (e.g., if you worry a current process A mapped huge shmem thp
> somewhere, we can use madvise(MADV_ZAP) to drop it).
Yes, this would be convenient for our implementation too. :) There are
workarounds if the feature doesn't exist, but it would be nice to
have. It's also useful for memory poisoning, I think, if the host
decides some page(s) are "bad" and wants to intercept any future guest
accesses to those page(s).
>
> > I *think* the existing code deals with THPs correctly in that case, but then
> > again I don't think our selftest really covers this case, and it's not
> > something I've tested in production either (to work around the other bug, we
> > currently MADV_NOHUGEPAGE the area until after VM demand paging completes,
> > and the UFFD registration is removed), so I am not super confident this is
> > the case.
>
> In all cases, enhancing the test program will always be welcomed.
>
> Thanks,
>
> --
> Peter Xu
>
next prev parent reply other threads:[~2021-09-22 22:30 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-21 16:33 [PATCH 1/3] userfaultfd/selftests: fix feature support detection Axel Rasmussen
2021-09-21 16:33 ` [PATCH 2/3] userfaultfd/selftests: fix calculation of expected ioctls Axel Rasmussen
2021-09-21 16:33 ` [PATCH 3/3] userfaultfd/selftests: don't rely on GNU extensions for random numbers Axel Rasmussen
2021-09-21 18:03 ` Peter Xu
2021-09-21 17:44 ` [PATCH 1/3] userfaultfd/selftests: fix feature support detection Peter Xu
2021-09-21 18:26 ` Axel Rasmussen
2021-09-21 19:21 ` Peter Xu
2021-09-21 20:31 ` Axel Rasmussen
2021-09-22 0:29 ` Peter Xu
2021-09-22 17:04 ` Axel Rasmussen
2021-09-22 17:32 ` Peter Xu
2021-09-22 20:54 ` Axel Rasmussen
2021-09-22 21:51 ` Peter Xu
2021-09-22 22:29 ` Axel Rasmussen [this message]
2021-09-22 23:49 ` Peter Xu
2021-09-23 4:17 ` James Houghton
2021-09-23 5:43 ` Jue Wang
2021-09-24 20:09 ` Peter Xu
2021-09-24 20:22 ` Jue Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJHvVcijQdS_hfUnasz7BhhQeiHmNu=C5j8xfX=uWsfVO9-+Eg@mail.gmail.com' \
--to=axelrasmussen@google.com \
--cc=akpm@linux-foundation.org \
--cc=jthoughton@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=peterx@redhat.com \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).