linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Axel Rasmussen <axelrasmussen@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Shuah Khan <shuah@kernel.org>, Linux MM <linux-mm@kvack.org>,
	Linuxkselftest <linux-kselftest@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	James Houghton <jthoughton@google.com>
Subject: Re: [PATCH 1/3] userfaultfd/selftests: fix feature support detection
Date: Wed, 22 Sep 2021 15:29:42 -0700	[thread overview]
Message-ID: <CAJHvVcijQdS_hfUnasz7BhhQeiHmNu=C5j8xfX=uWsfVO9-+Eg@mail.gmail.com> (raw)
In-Reply-To: <YUulep3+YADkwlUu@t490s>

On Wed, Sep 22, 2021 at 2:52 PM Peter Xu <peterx@redhat.com> wrote:
>
> On Wed, Sep 22, 2021 at 01:54:53PM -0700, Axel Rasmussen wrote:
> > On Wed, Sep 22, 2021 at 10:33 AM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > Hello, Axel,
> > >
> > > On Wed, Sep 22, 2021 at 10:04:03AM -0700, Axel Rasmussen wrote:
> > > > Thanks for discussing the design Peter. I have some ideas which might
> > > > make for a nicer v2; I'll massage the code a bit and see what I can
> > > > come up with.
> > >
> > > Sure thing.  Note again that as I don't have a strong opinion on that, feel
> > > free to keep it.  However if you provide v2, I'll read.
> > >
> > > [off-topic below]
> > >
> > > Another thing I probably have forgot but need your confirmation is, when you
> > > worked on uffd minor mode, did you explicitly disable thp, or is it allowed?
> >
> > I gave a more detailed answer in the other thread, but: currently it
> > is allowed, but this was a bug / oversight on my part. :) THP collapse
> > can break the guarantees minor fault registration is trying to
> > provide.
>
> I've replied there:
>
> https://lore.kernel.org/linux-mm/YUueOUfoamxOvEyO@t490s/
>
> We can try to keep the discussion unified there regarding this.
>
> > But there's another scenario: what if the collapse happened well
> > before registration happened?
>
> Maybe yes, but my understanding of the current uffd-minor scenario tells me
> that this is fine too.  Meanwhile I actually have another idea regarding minor
> mode, please continue reading.
>
> Firstly, let me try to re-cap on how minor mode is used in your production
> systems: I believe there should have two processes A and B, if A is the main
> process, B could be the migration process.  B migrates pages in the background,
> while A so far should have been stopped and never ran.  When we want to start
> A, we should register A with uffd-minor upon the whole range (note: I think so
> far A does not have any pgtable mapped within uffd-minor range).  Then any page
> access of A should kick B and asking "whether it is the latest page", if yes
> then UFFDIO_CONTINUE, if no then B modifies the page, plus UFFDIO_CONTINUE
> afterwards.  Am I right above?
>
> So if that's the case, then A should have no page table at all.
>
> Then, is that a problem if the shmem file that A maps contains huge thps?  I
> think no - because UFFDIO_CONTINUE will only install small pages.
>
> Let me know if I'm understanding it right above; I'll be happy to be corrected.

Right, except that our use case is even more similar to QEMU: the code
doing UFFDIO_CONTINUE / demand paging, and the code running the vCPUs,
are in the same process (same mm) - just different threads.

>
> Actually besides this scenario, I'm also thinking of another scenario of using
> minor fault in a single process - that's mostly what QEMU is doing right now,
> as QEMU has the vcpu threads and migration thread sharing a single mm/pgtable.
> So I think it'll be great to have a new madvise(MADV_ZAP) which will tear down
> all the file-backed memory pgtables of a specific range.  I think it'll suite
> perfectly for the minor fault use case, and it can be used for other things
> too.  Let me know what you think about this idea, and whether that'll help in
> your case too (e.g., if you worry a current process A mapped huge shmem thp
> somewhere, we can use madvise(MADV_ZAP) to drop it).

Yes, this would be convenient for our implementation too. :) There are
workarounds if the feature doesn't exist, but it would be nice to
have. It's also useful for memory poisoning, I think, if the host
decides some page(s) are "bad" and wants to intercept any future guest
accesses to those page(s).

>
> > I *think* the existing code deals with THPs correctly in that case, but then
> > again I don't think our selftest really covers this case, and it's not
> > something I've tested in production either (to work around the other bug, we
> > currently MADV_NOHUGEPAGE the area until after VM demand paging completes,
> > and the UFFD registration is removed), so I am not super confident this is
> > the case.
>
> In all cases, enhancing the test program will always be welcomed.
>
> Thanks,
>
> --
> Peter Xu
>

  reply	other threads:[~2021-09-22 22:30 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-21 16:33 [PATCH 1/3] userfaultfd/selftests: fix feature support detection Axel Rasmussen
2021-09-21 16:33 ` [PATCH 2/3] userfaultfd/selftests: fix calculation of expected ioctls Axel Rasmussen
2021-09-21 16:33 ` [PATCH 3/3] userfaultfd/selftests: don't rely on GNU extensions for random numbers Axel Rasmussen
2021-09-21 18:03   ` Peter Xu
2021-09-21 17:44 ` [PATCH 1/3] userfaultfd/selftests: fix feature support detection Peter Xu
2021-09-21 18:26   ` Axel Rasmussen
2021-09-21 19:21     ` Peter Xu
2021-09-21 20:31       ` Axel Rasmussen
2021-09-22  0:29         ` Peter Xu
2021-09-22 17:04           ` Axel Rasmussen
2021-09-22 17:32             ` Peter Xu
2021-09-22 20:54               ` Axel Rasmussen
2021-09-22 21:51                 ` Peter Xu
2021-09-22 22:29                   ` Axel Rasmussen [this message]
2021-09-22 23:49                     ` Peter Xu
2021-09-23  4:17                       ` James Houghton
2021-09-23  5:43                         ` Jue Wang
2021-09-24 20:09                           ` Peter Xu
2021-09-24 20:22                             ` Jue Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJHvVcijQdS_hfUnasz7BhhQeiHmNu=C5j8xfX=uWsfVO9-+Eg@mail.gmail.com' \
    --to=axelrasmussen@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=jthoughton@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).