All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: fengzhimin <fengzhimin1@huawei.com>
Cc: Zhanghailiang <zhang.zhanghailiang@huawei.com>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"armbru@redhat.com" <armbru@redhat.com>,
	"jemmy858585@gmail.com" <jemmy858585@gmail.com>
Subject: Re: [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA
Date: Mon, 20 Jan 2020 09:05:15 +0000	[thread overview]
Message-ID: <20200120090515.GA2827@work-vm> (raw)
In-Reply-To: <03C2A65461456D4EBE9E6D4D0D96C583FC1465@DGGEMI529-MBX.china.huawei.com>

* fengzhimin (fengzhimin1@huawei.com) wrote:
> OK, I will modify it.
> 
> Due to the mach-virt.ram is sent by the multiRDMA channels instead of the main channel, it don't to register on the main channel.

You might be OK if instead  of using the name, you use a size threshold;
e.g. you use the multirdma threads for any RAM block larger than say
128MB.

> It takes a long time to register the mach-virt.ram for VM with large capacity memory, so we shall try our best not to register it.

I'm curious why, I know it's expensive to register RAM blocks with rdma
code; but I thought that would just be the first time; it surprises me
that registering it with a 2nd RDMA channel is as expensive.

But then that makes me ask a 2nd question; is your performance increase
due to multiple threads or is it due to the multiple RDMA channels?
COuld you have multiple threads but still a single RDMA channel
(and with sufficient locking) still get the performance?

Dave

> Thanks for your review.
> 
> Zhimin Feng
> 
> -----Original Message-----
> From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com] 
> Sent: Saturday, January 18, 2020 2:52 AM
> To: fengzhimin <fengzhimin1@huawei.com>
> Cc: quintela@redhat.com; armbru@redhat.com; eblake@redhat.com; qemu-devel@nongnu.org; Zhanghailiang <zhang.zhanghailiang@huawei.com>; jemmy858585@gmail.com
> Subject: Re: [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA
> 
> * Zhimin Feng (fengzhimin1@huawei.com) wrote:
> > From: fengzhimin <fengzhimin1@huawei.com>
> > 
> > The virt-ram block is sent by MultiRDMA, so we only register it for 
> > MultiRDMA channels and main channel don't register the virt-ram block.
> > 
> > Signed-off-by: fengzhimin <fengzhimin1@huawei.com>
> 
> You can't specialise on the name of the RAMBlock like that.
> 'mach-virt.ram' is the name specific to just the main ram on just aarch's machine type;  for example the name on x86 is completely different and if you use NUMA or hotplug etc it would also be different on aarch.
> 
> Is there a downside to also registering the mach-virt.ram on the main channel?
> 
> Dave
> 
> > ---
> >  migration/rdma.c | 140 
> > +++++++++++++++++++++++++++++++++++++----------
> >  1 file changed, 112 insertions(+), 28 deletions(-)
> > 
> > diff --git a/migration/rdma.c b/migration/rdma.c index 
> > 0a150099e2..1477fd509b 100644
> > --- a/migration/rdma.c
> > +++ b/migration/rdma.c
> > @@ -618,7 +618,9 @@ const char *print_wrid(int wrid);  static int 
> > qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
> >                                     uint8_t *data, RDMAControlHeader *resp,
> >                                     int *resp_idx,
> > -                                   int (*callback)(RDMAContext *rdma));
> > +                                   int (*callback)(RDMAContext *rdma,
> > +                                   uint8_t id),
> > +                                   uint8_t id);
> >  
> >  static inline uint64_t ram_chunk_index(const uint8_t *start,
> >                                         const uint8_t *host) @@ 
> > -1198,24 +1200,81 @@ static int qemu_rdma_alloc_qp(RDMAContext *rdma)
> >      return 0;
> >  }
> >  
> > -static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
> > +/*
> > + * Parameters:
> > + *    @id == UNUSED_ID :
> > + *    This means that we register memory for the main RDMA channel,
> > + *    the main RDMA channel don't register the mach-virt.ram block
> > + *    when we use multiRDMA method to migrate.
> > + *
> > + *    @id == 0 or id == 1 or ... :
> > + *    This means that we register memory for the multiRDMA channels,
> > + *    the multiRDMA channels only register memory for the mach-virt.ram
> > + *    block when we use multiRDAM method to migrate.
> > + */
> > +static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma, uint8_t 
> > +id)
> >  {
> >      int i;
> >      RDMALocalBlocks *local = &rdma->local_ram_blocks;
> >  
> > -    for (i = 0; i < local->nb_blocks; i++) {
> > -        local->block[i].mr =
> > -            ibv_reg_mr(rdma->pd,
> > -                    local->block[i].local_host_addr,
> > -                    local->block[i].length,
> > -                    IBV_ACCESS_LOCAL_WRITE |
> > -                    IBV_ACCESS_REMOTE_WRITE
> > -                    );
> > -        if (!local->block[i].mr) {
> > -            perror("Failed to register local dest ram block!\n");
> > -            break;
> > +    if (migrate_use_multiRDMA()) {
> > +        if (id == UNUSED_ID) {
> > +            for (i = 0; i < local->nb_blocks; i++) {
> > +                /* main RDMA channel don't register the mach-virt.ram block */
> > +                if (strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> > +                    continue;
> > +                }
> > +
> > +                local->block[i].mr =
> > +                    ibv_reg_mr(rdma->pd,
> > +                            local->block[i].local_host_addr,
> > +                            local->block[i].length,
> > +                            IBV_ACCESS_LOCAL_WRITE |
> > +                            IBV_ACCESS_REMOTE_WRITE
> > +                            );
> > +                if (!local->block[i].mr) {
> > +                    perror("Failed to register local dest ram block!\n");
> > +                    break;
> > +                }
> > +                rdma->total_registrations++;
> > +            }
> > +        } else {
> > +            for (i = 0; i < local->nb_blocks; i++) {
> > +                /*
> > +                 * The multiRDAM channels only register
> > +                 * the mach-virt.ram block
> > +                 */
> > +                if (strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> > +                    local->block[i].mr =
> > +                        ibv_reg_mr(rdma->pd,
> > +                                local->block[i].local_host_addr,
> > +                                local->block[i].length,
> > +                                IBV_ACCESS_LOCAL_WRITE |
> > +                                IBV_ACCESS_REMOTE_WRITE
> > +                                );
> > +                    if (!local->block[i].mr) {
> > +                        perror("Failed to register local dest ram block!\n");
> > +                        break;
> > +                    }
> > +                    rdma->total_registrations++;
> > +                }
> > +            }
> > +        }
> > +    } else {
> > +        for (i = 0; i < local->nb_blocks; i++) {
> > +            local->block[i].mr =
> > +                ibv_reg_mr(rdma->pd,
> > +                        local->block[i].local_host_addr,
> > +                        local->block[i].length,
> > +                        IBV_ACCESS_LOCAL_WRITE |
> > +                        IBV_ACCESS_REMOTE_WRITE
> > +                        );
> > +            if (!local->block[i].mr) {
> > +                perror("Failed to register local dest ram block!\n");
> > +                break;
> > +            }
> > +            rdma->total_registrations++;
> >          }
> > -        rdma->total_registrations++;
> >      }
> >  
> >      if (i >= local->nb_blocks) {
> > @@ -1223,8 +1282,10 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext *rdma)
> >      }
> >  
> >      for (i--; i >= 0; i--) {
> > -        ibv_dereg_mr(local->block[i].mr);
> > -        rdma->total_registrations--;
> > +        if (local->block[i].mr) {
> > +            ibv_dereg_mr(local->block[i].mr);
> > +            rdma->total_registrations--;
> > +        }
> >      }
> >  
> >      return -1;
> > @@ -1446,7 +1507,7 @@ static int qemu_rdma_unregister_waiting(RDMAContext *rdma)
> >          reg.key.chunk = chunk;
> >          register_to_network(rdma, &reg);
> >          ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
> > -                                &resp, NULL, NULL);
> > +                                      &resp, NULL, NULL, UNUSED_ID);
> >          if (ret < 0) {
> >              return ret;
> >          }
> > @@ -1915,11 +1976,17 @@ static void qemu_rdma_move_header(RDMAContext *rdma, int idx,
> >   * The extra (optional) response is used during registration to us from having
> >   * to perform an *additional* exchange of message just to provide a response by
> >   * instead piggy-backing on the acknowledgement.
> > + *
> > + * Parameters:
> > + *    @id : callback function need two parameters, id is the second parameter.
> > + *
> >   */
> >  static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
> >                                     uint8_t *data, RDMAControlHeader *resp,
> >                                     int *resp_idx,
> > -                                   int (*callback)(RDMAContext *rdma))
> > +                                   int (*callback)(RDMAContext *rdma,
> > +                                   uint8_t id),
> > +                                   uint8_t id)
> >  {
> >      int ret = 0;
> >  
> > @@ -1973,7 +2040,7 @@ static int qemu_rdma_exchange_send(RDMAContext *rdma, RDMAControlHeader *head,
> >      if (resp) {
> >          if (callback) {
> >              trace_qemu_rdma_exchange_send_issue_callback();
> > -            ret = callback(rdma);
> > +            ret = callback(rdma, id);
> >              if (ret < 0) {
> >                  return ret;
> >              }
> > @@ -2168,7 +2235,7 @@ retry:
> >  
> >                  compress_to_network(rdma, &comp);
> >                  ret = qemu_rdma_exchange_send(rdma, &head,
> > -                                (uint8_t *) &comp, NULL, NULL, NULL);
> > +                                (uint8_t *) &comp, NULL, NULL, NULL, 
> > + UNUSED_ID);
> >  
> >                  if (ret < 0) {
> >                      return -EIO;
> > @@ -2195,7 +2262,7 @@ retry:
> >  
> >              register_to_network(rdma, &reg);
> >              ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
> > -                                    &resp, &reg_result_idx, NULL);
> > +                                    &resp, &reg_result_idx, NULL, 
> > + UNUSED_ID);
> >              if (ret < 0) {
> >                  return ret;
> >              }
> > @@ -2828,7 +2895,8 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
> >              head.len = len;
> >              head.type = RDMA_CONTROL_QEMU_FILE;
> >  
> > -            ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, NULL);
> > +            ret = qemu_rdma_exchange_send(rdma, &head, data, NULL,
> > +                                          NULL, NULL, UNUSED_ID);
> >  
> >              if (ret < 0) {
> >                  rdma->error_state = ret; @@ -3660,7 +3728,7 @@ static 
> > int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
> >              }
> >  
> >              if (rdma->pin_all) {
> > -                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
> > +                ret = qemu_rdma_reg_whole_ram_blocks(rdma, 
> > + UNUSED_ID);
> >                  if (ret) {
> >                      error_report("rdma migration: error dest "
> >                                      "registering ram blocks"); @@ 
> > -3675,6 +3743,15 @@ static int qemu_rdma_registration_handle(QEMUFile *f, void *opaque)
> >               * their "local" descriptions with what was sent.
> >               */
> >              for (i = 0; i < local->nb_blocks; i++) {
> > +                /*
> > +                 * use the main RDMA channel to deliver the block of device
> > +                 * use the multiRDMA channels to deliver the RAMBlock
> > +                 */
> > +                if (migrate_use_multiRDMA() &&
> > +                    strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> > +                        continue;
> > +                }
> > +
> >                  rdma->dest_blocks[i].remote_host_addr =
> >                      (uintptr_t)(local->block[i].local_host_addr);
> >  
> > @@ -3992,7 +4069,7 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
> >           */
> >          ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
> >                      &reg_result_idx, rdma->pin_all ?
> > -                    qemu_rdma_reg_whole_ram_blocks : NULL);
> > +                    qemu_rdma_reg_whole_ram_blocks : NULL, 
> > + UNUSED_ID);
> >          if (ret < 0) {
> >              ERROR(errp, "receiving remote info!");
> >              return ret;
> > @@ -4025,6 +4102,11 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
> >          memcpy(rdma->dest_blocks,
> >              rdma->wr_data[reg_result_idx].control_curr, resp.len);
> >          for (i = 0; i < nb_dest_blocks; i++) {
> > +            if (migrate_use_multiRDMA() &&
> > +                strcmp(local->block[i].block_name, "mach-virt.ram") == 0) {
> > +                continue;
> > +            }
> > +
> >              network_to_dest_block(&rdma->dest_blocks[i]);
> >  
> >              /* We require that the blocks are in the same order */ @@ 
> > -4050,7 +4132,8 @@ static int qemu_rdma_registration_stop(QEMUFile *f, void *opaque,
> >      trace_qemu_rdma_registration_stop(flags);
> >  
> >      head.type = RDMA_CONTROL_REGISTER_FINISHED;
> > -    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
> > +    ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL,
> > +                                  NULL, NULL, UNUSED_ID);
> >  
> >      if (migrate_use_multiRDMA()) {
> >          /*
> > @@ -4298,7 +4381,7 @@ static int qemu_multiRDMA_registration_handle(void *opaque)
> >                    sizeof(RDMALocalBlock), dest_ram_sort_func);
> >  
> >              if (rdma->pin_all) {
> > -                ret = qemu_rdma_reg_whole_ram_blocks(rdma);
> > +                ret = qemu_rdma_reg_whole_ram_blocks(rdma, p->id);
> >                  if (ret) {
> >                      error_report("rdma migration: error dest "
> >                                   "registering ram blocks"); @@ 
> > -4680,7 +4763,7 @@ static void *multiRDMA_send_thread(void *opaque)
> >  
> >      ret = qemu_rdma_exchange_send(rdma, &head, NULL, &resp,
> >              &reg_result_idx, rdma->pin_all ?
> > -            qemu_rdma_reg_whole_ram_blocks : NULL);
> > +            qemu_rdma_reg_whole_ram_blocks : NULL, p->id);
> >      if (ret < 0) {
> >          return NULL;
> >      }
> > @@ -4749,7 +4832,8 @@ static void *multiRDMA_send_thread(void *opaque)
> >  
> >          /* Send FINISHED to the destination */
> >          head.type = RDMA_CONTROL_REGISTER_FINISHED;
> > -        ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL, NULL, NULL);
> > +        ret = qemu_rdma_exchange_send(rdma, &head, NULL, NULL,
> > +                                      NULL, NULL, p->id);
> >          if (ret < 0) {
> >              return NULL;
> >          }
> > --
> > 2.19.1
> > 
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2020-01-20  9:07 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-09  4:59 [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** Zhimin Feng
2020-01-09  4:59 ` [PATCH RFC 01/12] migration: Add multiRDMA capability support Zhimin Feng
2020-01-13 15:30   ` Markus Armbruster
2020-01-15  1:55     ` fengzhimin
2020-01-13 16:26   ` Eric Blake
2020-01-15  2:04     ` fengzhimin
2020-01-15 18:09   ` Dr. David Alan Gilbert
2020-01-16 13:18     ` Juan Quintela
2020-01-17  1:30       ` fengzhimin
2020-01-09  4:59 ` [PATCH RFC 02/12] migration: Export the 'migration_incoming_setup' function and add the 'migrate_use_rdma_pin_all' function Zhimin Feng
2020-01-15 18:57   ` Dr. David Alan Gilbert
2020-01-16 13:19   ` Juan Quintela
2020-01-09  4:59 ` [PATCH RFC 03/12] migration: Create the multi-rdma-channels parameter Zhimin Feng
2020-01-13 15:34   ` Markus Armbruster
2020-01-15  1:57     ` fengzhimin
2020-01-16 13:20   ` Juan Quintela
2020-01-09  4:59 ` [PATCH RFC 04/12] migration/rdma: Create multiRDMA migration threads Zhimin Feng
2020-01-16 13:25   ` Juan Quintela
2020-01-17  1:32     ` fengzhimin
2020-01-09  4:59 ` [PATCH RFC 05/12] migration/rdma: Create the multiRDMA channels Zhimin Feng
2020-01-15 19:54   ` Dr. David Alan Gilbert
2020-01-16 13:30   ` Juan Quintela
2020-01-09  4:59 ` [PATCH RFC 06/12] migration/rdma: Transmit initial package Zhimin Feng
2020-01-15 18:36   ` Dr. David Alan Gilbert
2020-01-09  4:59 ` [PATCH RFC 07/12] migration/rdma: Be sure all channels are created Zhimin Feng
2020-01-09  4:59 ` [PATCH RFC 08/12] migration/rdma: register memory for multiRDMA channels Zhimin Feng
2020-01-09  4:59 ` [PATCH RFC 09/12] migration/rdma: Wait for all multiRDMA to complete registration Zhimin Feng
2020-01-09  4:59 ` [PATCH RFC 10/12] migration/rdma: use multiRDMA to send RAM block for rdma-pin-all mode Zhimin Feng
2020-01-09  4:59 ` [PATCH RFC 11/12] migration/rdma: use multiRDMA to send RAM block for NOT " Zhimin Feng
2020-01-09  4:59 ` [PATCH RFC 12/12] migration/rdma: only register the virt-ram block for MultiRDMA Zhimin Feng
2020-01-17 18:52   ` Dr. David Alan Gilbert
2020-01-19  1:44     ` fengzhimin
2020-01-20  9:05       ` Dr. David Alan Gilbert [this message]
2020-01-21  1:30         ` fengzhimin
2020-01-09 10:38 ` [PATCH RFC 00/12] *** mulitple RDMA channels for migration *** no-reply
2020-01-15 19:57 ` Dr. David Alan Gilbert
2020-01-16  1:37   ` fengzhimin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200120090515.GA2827@work-vm \
    --to=dgilbert@redhat.com \
    --cc=armbru@redhat.com \
    --cc=fengzhimin1@huawei.com \
    --cc=jemmy858585@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.