All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "sunhao2 [孙昊]" <sunhao2@kingsoft.com>,
	"YANGFENG1 [杨峰]" <YANGFENG1@kingsoft.com>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"DENGLINWEN [邓林文]" <DENGLINWEN@kingsoft.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"LIZHAOXIN1 [李照鑫]" <LIZHAOXIN1@kingsoft.com>
Subject: Re: [PATCH] migration/rdma: Use huge page register VM memory
Date: Mon, 7 Jun 2021 16:00:28 +0100	[thread overview]
Message-ID: <YL40jJgKFQBnq3Us@work-vm> (raw)
In-Reply-To: <YL4qh35GquFrbSfq@redhat.com>

* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Mon, Jun 07, 2021 at 01:57:02PM +0000, LIZHAOXIN1 [李照鑫] wrote:
> > When using libvirt for RDMA live migration, if the VM memory is too large,
> > it will take a lot of time to deregister the VM at the source side, resulting
> > in a long downtime (VM 64G, deregister vm time is about 400ms).
> >     
> > Although the VM's memory uses 2M huge pages, the MLNX driver still uses 4K
> > pages for pin memory, as well as for unpin. So we use huge pages to skip the
> > process of pin memory and unpin memory to reduce downtime.
> >    
> > The test environment:
> > kernel: linux-5.12
> > MLNX: ConnectX-4 LX
> > libvirt command:
> > virsh migrate --live --p2p --persistent --copy-storage-inc --listen-address \
> > 0.0.0.0 --rdma-pin-all --migrateuri rdma://192.168.0.2 [VM] qemu+tcp://192.168.0.2/system
> >     
> > Signed-off-by: lizhaoxin <lizhaoxin1@kingsoft.com>
> > 
> > diff --git a/migration/rdma.c b/migration/rdma.c
> > index 1cdb4561f3..9823449297 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -1123,13 +1123,26 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
> >      RDMALocalBlocks *local = &rdma->local_ram_blocks;
> >  
> >      for (i = 0; i < local->nb_blocks; i++) {
> > -        local->block[i].mr =
> > -            ibv_reg_mr(rdma->pd,
> > -                    local->block[i].local_host_addr,
> > -                    local->block[i].length,
> > -                    IBV_ACCESS_LOCAL_WRITE |
> > -                    IBV_ACCESS_REMOTE_WRITE
> > -                    );
> > +        if (strcmp(local->block[i].block_name,"pc.ram") == 0) {
> 
> 'pc.ram' is an x86 architecture specific name, so this will still
> leave a problem on other architectures I assume.

Yes, and also break even on PC when using NUMA.
I think the thing to do here is to call qemu_ram_pagesize on the
RAMBlock; 

  if (qemu_ram_pagesize(RAMBlock....) != qemu_real_host_page_size)
     it's a huge page

I guess it's probably best to do that in qemu_rdma_init_one_block or
something?

I wonder how that all works when there's a mix of different huge page
sizes?

Dave

> > +            local->block[i].mr =
> > +                ibv_reg_mr(rdma->pd,
> > +                        local->block[i].local_host_addr,
> > +                        local->block[i].length,
> > +                        IBV_ACCESS_LOCAL_WRITE |
> > +                        IBV_ACCESS_REMOTE_WRITE |
> > +                        IBV_ACCESS_ON_DEMAND |
> > +                        IBV_ACCESS_HUGETLB
> > +                        );
> > +        } else {
> > +            local->block[i].mr =
> > +                ibv_reg_mr(rdma->pd,
> > +                        local->block[i].local_host_addr,
> > +                        local->block[i].length,
> > +                        IBV_ACCESS_LOCAL_WRITE |
> > +                        IBV_ACCESS_REMOTE_WRITE
> > +                        );
> > +        }
> > +
> >          if (!local->block[i].mr) {
> >              perror("Failed to register local dest ram block!\n");
> >              break;
> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2021-06-07 15:01 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-07 13:57 [PATCH] migration/rdma: Use huge page register VM memory LIZHAOXIN1 [李照鑫]
2021-06-07 14:17 ` Daniel P. Berrangé
2021-06-07 15:00   ` Dr. David Alan Gilbert [this message]
2021-06-10 15:35     ` 回复: " LIZHAOXIN1 [李照鑫]
2021-06-10 15:33   ` LIZHAOXIN1 [李照鑫]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YL40jJgKFQBnq3Us@work-vm \
    --to=dgilbert@redhat.com \
    --cc=DENGLINWEN@kingsoft.com \
    --cc=LIZHAOXIN1@kingsoft.com \
    --cc=YANGFENG1@kingsoft.com \
    --cc=berrange@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=sunhao2@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.