All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Juan Quintela <quintela@redhat.com>
Cc: qemu-devel@nongnu.org,
	Leonardo Bras Soares Passos <lsoaresp@redhat.com>,
	James Houghton <jthoughton@google.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [PATCH RFC 15/21] migration: Teach qemu about minor faults and doublemap
Date: Mon, 30 Jan 2023 17:50:26 -0500	[thread overview]
Message-ID: <Y9hJshP8p9S0FaZF@x1n> (raw)
In-Reply-To: <87k014pocv.fsf@secure.mitica>

On Mon, Jan 30, 2023 at 06:45:20AM +0100, Juan Quintela wrote:
> Peter Xu <peterx@redhat.com> wrote:
> > When a ramblock is backed by hugetlbfs and the user specified using
> > double-map feature, we trap the faults on these regions using minor mode.
> > Teach QEMU about that.
> >
> > Add some sanity check on the fault flags when receiving a uffd message.
> > For minor fault trapped ranges, we should always see the MINOR flag set,
> > while when using generic missing faults we should never see it.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> 
> 
> > -    if (!(reg_struct.ioctls & ((__u64)1 << _UFFDIO_COPY))) {
> 
> Does qemu have a macro to do this bitmap handling?

Not yet that's suitable.  It's open coded like this in many places of
postcopy.  One thing close enough is bitmap_test_and_clear() but too heavy.

> 
> >  {
> >      MigrationIncomingState *mis = opaque;
> >      struct uffd_msg msg;
> > +    uint64_t address;
> >      int ret;
> >      size_t index;
> >      RAMBlock *rb = NULL;
> > @@ -945,6 +980,7 @@ static void *postcopy_ram_fault_thread(void *opaque)
> >      }
> >  
> >      while (true) {
> > +        bool use_minor_fault, minor_flag;
> 
> I think that something on the lines of:
>            bool src_minor_fault, dst_minor_fault;
> 
> will make things simpler.  Reviewing, I have to go back to definition
> place to know which is which.

These two values represents "what we expect" and "what we got from the
message", so the only thing is I'm not sure whether src/dst matches the
best here.

How about "expect_minor_fault" and "has_minor_fault" instead?

> 
> >          ram_addr_t rb_offset;
> >          int poll_result;
> >  
> > @@ -1022,22 +1058,37 @@ static void *postcopy_ram_fault_thread(void *opaque)
> >                  break;
> >              }
> >  
> > -            rb_offset = ROUND_DOWN(rb_offset, migration_ram_pagesize(rb));
> > -            trace_postcopy_ram_fault_thread_request(msg.arg.pagefault.address,
> > -                                                qemu_ram_get_idstr(rb),
> > -                                                rb_offset,
> > -                                                msg.arg.pagefault.feat.ptid);
> > -            mark_postcopy_blocktime_begin(
> > -                    (uintptr_t)(msg.arg.pagefault.address),
> > -                                msg.arg.pagefault.feat.ptid, rb);
> > +            address = ROUND_DOWN(msg.arg.pagefault.address,
> > +                                 migration_ram_pagesize(rb));
> > +            use_minor_fault = postcopy_use_minor_fault(rb);
> > +            minor_flag = !!(msg.arg.pagefault.flags &
> > +                            UFFD_PAGEFAULT_FLAG_MINOR);
> >  
> > +            /*
> > +             * Do sanity check on the message flags to make sure this is
> > +             * the one we expect to receive.  When using minor fault on
> > +             * this ramblock, it should _always_ be set; when not using
> > +             * minor fault, it should _never_ be set.
> > +             */
> > +            if (use_minor_fault ^ minor_flag) {
> > +                error_report("%s: Unexpected page fault flags (0x%"PRIx64") "
> > +                             "for address 0x%"PRIx64" (mode=%s)", __func__,
> > +                             (uint64_t)msg.arg.pagefault.flags,
> > +                             (uint64_t)msg.arg.pagefault.address,
> > +                             use_minor_fault ? "MINOR" : "MISSING");
> > +            }
> > +
> > +            trace_postcopy_ram_fault_thread_request(
> > +                address, qemu_ram_get_idstr(rb), rb_offset,
> > +                msg.arg.pagefault.feat.ptid);
> > +            mark_postcopy_blocktime_begin(
> > +                    (uintptr_t)(address), msg.arg.pagefault.feat.ptid, rb);
> >  retry:
> >              /*
> >               * Send the request to the source - we want to request one
> >               * of our host page sizes (which is >= TPS)
> >               */
> > -            ret = postcopy_request_page(mis, rb, rb_offset,
> > -                                        msg.arg.pagefault.address);
> > +            ret = postcopy_request_page(mis, rb, rb_offset, address);
> 
> This is the only change that I find 'problematic'.
> On old code, rb_offset has been ROUND_DOWN, on new code it is not.
> On old code we pass msg.arg.pagefault.address, now we use
> ROUND_DOW(msg.arg.pagefault.address, mighration_ram_pagesize(rb)).

Thanks for spotting such a detail even for a RFC series. :)

It's actually rounded down to target psize, here since we're in postcopy we
should require target psize equals to host psize (or I bet it won't really
work at all).  So the relevant rounddown was actually done here:

            rb = qemu_ram_block_from_host(
                     (void *)(uintptr_t)msg.arg.pagefault.address,
                     true, &rb_offset);

In which there's:

    *offset = (host - block->host);
    if (round_offset) {
        *offset &= TARGET_PAGE_MASK;
    }

So when I rework that chunk of code I directly dropped the ROUND_DOWN()
because I find it duplicated.

> 
> >              if (ret) {
> >                  /* May be network failure, try to wait for recovery */
> >                  postcopy_pause_fault_thread(mis);
> > @@ -1694,3 +1745,13 @@ void *postcopy_preempt_thread(void *opaque)
> >  
> >      return NULL;
> >  }
> > +
> > +/*
> > + * Whether we should use MINOR fault to trap page faults?  It will be used
> > + * when doublemap is enabled on hugetlbfs.  The default value will be
> > + * false, which means we'll keep using the legacy MISSING faults.
> > + */
> > +bool postcopy_use_minor_fault(RAMBlock *rb)
> > +{
> > +    return migrate_hugetlb_doublemap() && qemu_ram_is_hugetlb(rb);
> > +}
> 
> Are you planing using this function outside postocpy-ram.c?  Otherwise
> if you move up its definition you can make it static and drop the header
> change.

Yes, it'll be further used in ram.c later in the patch "migration: Rework
ram discard logic for hugetlb double-map" right below.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2023-01-30 22:51 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-17 22:08 [PATCH RFC 00/21] migration: Support hugetlb doublemaps Peter Xu
2023-01-17 22:08 ` [PATCH RFC 01/21] update linux headers Peter Xu
2023-01-17 22:08 ` [PATCH RFC 02/21] util: Include osdep.h first in util/mmap-alloc.c Peter Xu
2023-01-18 12:00   ` Dr. David Alan Gilbert
2023-01-25  0:19   ` Philippe Mathieu-Daudé
2023-01-30  4:57   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 03/21] physmem: Add qemu_ram_is_hugetlb() Peter Xu
2023-01-18 12:02   ` Dr. David Alan Gilbert
2023-01-30  5:00   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 04/21] madvise: Include linux/mman.h under linux-headers/ Peter Xu
2023-01-18 12:08   ` Dr. David Alan Gilbert
2023-01-30  5:01   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 05/21] madvise: Add QEMU_MADV_SPLIT Peter Xu
2023-01-30  5:01   ` Juan Quintela
2023-01-17 22:08 ` [PATCH RFC 06/21] madvise: Add QEMU_MADV_COLLAPSE Peter Xu
2023-01-18 18:51   ` Dr. David Alan Gilbert
2023-01-18 20:21     ` Peter Xu
2023-01-30  5:02   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 07/21] ramblock: Cache file offset for file-backed ramblocks Peter Xu
2023-01-30  5:02   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 08/21] ramblock: Cache the length to do file mmap() on ramblocks Peter Xu
2023-01-23 18:51   ` Dr. David Alan Gilbert
2023-01-24 20:28     ` Peter Xu
2023-01-30  5:05   ` Juan Quintela
2023-01-30 22:07     ` Peter Xu
2023-01-17 22:09 ` [PATCH RFC 09/21] ramblock: Add RAM_READONLY Peter Xu
2023-01-23 19:42   ` Dr. David Alan Gilbert
2023-01-30  5:06   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 10/21] ramblock: Add ramblock_file_map() Peter Xu
2023-01-24 10:06   ` Dr. David Alan Gilbert
2023-01-24 20:47     ` Peter Xu
2023-01-25  9:24       ` Dr. David Alan Gilbert
2023-01-25 14:46         ` Peter Xu
2023-01-30  5:09   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 11/21] migration: Add hugetlb-doublemap cap Peter Xu
2023-01-24 12:45   ` Dr. David Alan Gilbert
2023-01-24 21:15     ` Peter Xu
2023-01-30  5:13   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 12/21] migration: Introduce page size for-migration-only Peter Xu
2023-01-24 13:20   ` Dr. David Alan Gilbert
2023-01-24 21:36     ` Peter Xu
2023-01-24 22:03       ` Peter Xu
2023-01-30  5:17   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 13/21] migration: Add migration_ram_pagesize_largest() Peter Xu
2023-01-24 17:34   ` Dr. David Alan Gilbert
2023-01-30  5:19   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 14/21] migration: Map hugetlbfs ramblocks twice, and pre-allocate Peter Xu
2023-01-25 14:25   ` Dr. David Alan Gilbert
2023-01-30  5:24   ` Juan Quintela
2023-01-30 22:35     ` Peter Xu
2023-02-01 18:53       ` Juan Quintela
2023-02-06 21:40         ` Peter Xu
2023-01-17 22:09 ` [PATCH RFC 15/21] migration: Teach qemu about minor faults and doublemap Peter Xu
2023-01-30  5:45   ` Juan Quintela
2023-01-30 22:50     ` Peter Xu [this message]
2023-02-01 18:55       ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 16/21] migration: Enable doublemap with MADV_SPLIT Peter Xu
2023-02-01 18:59   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 17/21] migration: Rework ram discard logic for hugetlb double-map Peter Xu
2023-02-01 19:03   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 18/21] migration: Allow postcopy_register_shared_ufd() to fail Peter Xu
2023-02-01 19:09   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 19/21] migration: Add postcopy_mark_received() Peter Xu
2023-02-01 19:10   ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 20/21] migration: Handle page faults using UFFDIO_CONTINUE Peter Xu
2023-02-01 19:24   ` Juan Quintela
2023-02-01 19:52     ` Juan Quintela
2023-01-17 22:09 ` [PATCH RFC 21/21] migration: Collapse huge pages again after postcopy finished Peter Xu
2023-02-01 19:49   ` Juan Quintela

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9hJshP8p9S0FaZF@x1n \
    --to=peterx@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=jthoughton@google.com \
    --cc=lsoaresp@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.