Re: [PATCH RFC 12/21] migration: Introduce page size for-migration-only

From: Peter Xu <peterx@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: qemu-devel@nongnu.org,
	Leonardo Bras Soares Passos <lsoaresp@redhat.com>,
	James Houghton <jthoughton@google.com>,
	Juan Quintela <quintela@redhat.com>
Subject: Re: [PATCH RFC 12/21] migration: Introduce page size for-migration-only
Date: Tue, 24 Jan 2023 17:03:48 -0500	[thread overview]
Message-ID: <Y9BVxKj1O0MKHlTl@x1n> (raw)
In-Reply-To: <Y9BPVDBpR25kRpoc@x1n>

On Tue, Jan 24, 2023 at 04:36:20PM -0500, Peter Xu wrote:
> On Tue, Jan 24, 2023 at 01:20:37PM +0000, Dr. David Alan Gilbert wrote:
> > > @@ -3970,7 +3984,8 @@ int ram_load_postcopy(QEMUFile *f, int channel)
> > >                  break;
> > >              }
> > >              tmp_page->target_pages++;
> > > -            matches_target_page_size = block->page_size == TARGET_PAGE_SIZE;
> > > +            matches_target_page_size =
> > > +                migration_ram_pagesize(block) == TARGET_PAGE_SIZE;
> > >              /*
> > >               * Postcopy requires that we place whole host pages atomically;
> > >               * these may be huge pages for RAMBlocks that are backed by
> > 
> > Hmm do you really want this change?
> 
> Yes that's intended.  I want to reuse the same logic here when receiving
> small pages from huge pages, just like when we're receiving small pages on
> non-hugetlb mappings.
> 
> matches_target_page_size majorly affects two things:
> 
>   1) For a small zero page, whether we want to pre-set the page_buffer, or
>      simply use postcopy_place_page_zero():
>   
>         case RAM_SAVE_FLAG_ZERO:
>             ch = qemu_get_byte(f);
>             /*
>              * Can skip to set page_buffer when
>              * this is a zero page and (block->page_size == TARGET_PAGE_SIZE).
>              */
>             if (ch || !matches_target_page_size) {
>                 memset(page_buffer, ch, TARGET_PAGE_SIZE);
>             }
> 
>   2) For normal page, whether we need to use a page buffer or we can
>      directly reuse the page buffer in QEMUFile:
> 
>             if (!matches_target_page_size) {
>                 /* For huge pages, we always use temporary buffer */
>                 qemu_get_buffer(f, page_buffer, TARGET_PAGE_SIZE);
>             } else {
>                 /*
>                  * For small pages that matches target page size, we
>                  * avoid the qemu_file copy.  Instead we directly use
>                  * the buffer of QEMUFile to place the page.  Note: we
>                  * cannot do any QEMUFile operation before using that
>                  * buffer to make sure the buffer is valid when
>                  * placing the page.
>                  */
>                 qemu_get_buffer_in_place(f, (uint8_t **)&place_source,
>                                          TARGET_PAGE_SIZE);
>             }
> 
> Here:
> 
> I want 1) to reuse postcopy_place_page_zero().  For the doublemap case,
> it'll reuse postcopy_tmp_zero_page() (because qemu_ram_is_uf_zeroable()
> will return false for such a ramblock).
> 
> I want 2) to reuse qemu_get_buffer_in_place(), so we avoid a copy process
> for the small page which is faster (even if it's hugetlb backed, now we can
> reuse the qemufile buffer safely).

Since at it, one more thing worth mentioning is I didn't actually know
whether the original code is always correct when target and host small
psizes don't match..  This is the original line:

  matches_target_page_size = block->page_size == TARGET_PAGE_SIZE;

The problem is we're comparing block page size against target page size,
however block page size should be in host page size granule:

  RAMBlock *qemu_ram_alloc_internal()
  {
    new_block->page_size = qemu_real_host_page_size();

IOW, I am not sure whether postcopy will run at all in that case.  For
example, when we run an Alpha emulator upon x86_64, we can have target
psize 8K while host psize 4K.

The migration protocol should be TARGET_PAGE_SIZE based.  It means, for
postcopy when receiving a single page for Alpha VM being migrated, maybe we
should call UFFDIO_COPY (or UFFDIO_CONTINUE; doesn't matter here) twice
because one guest page contains two host pages.

I'm not sure whether I get all these right.. if so, we have two options:

  a) Forbid postcopy as a whole when detecting qemu_real_host_page_size()
     != TARGET_PAGE_SIZE.

  b) Implement postcopy for that case

I'd go with a) even if it's an issue because it means no one is migrating
that thing in postcopy way in the past N years, so it justifies that maybe
b) doesn't worth it.

-- 
Peter Xu