All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael R. Hines" <mrhines@linux.vnet.ibm.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: aliguori@us.ibm.com, quintela@redhat.com, qemu-devel@nongnu.org,
	owasserm@redhat.com, abali@us.ibm.com, mrhines@us.ibm.com,
	gokul@us.ibm.com, chegu_vinod@hp.com, knoel@redhat.com
Subject: Re: [Qemu-devel] [PATCH 3/6] rdma: core logic
Date: Fri, 28 Jun 2013 10:11:01 -0400	[thread overview]
Message-ID: <51CD9975.90806@linux.vnet.ibm.com> (raw)
In-Reply-To: <51CD3C3F.6080207@redhat.com>

On 06/28/2013 03:33 AM, Paolo Bonzini wrote:
> Il 28/06/2013 00:44, mrhines@linux.vnet.ibm.com ha scritto:
>> + *
>> + * This will have a terrible impact on migration performance, so until future
>> + * workload information or LRU information is available, do not attempt to use
>> + * this feature except for basic testing.
>> + */
>> +//#define RDMA_ALLOW_REGISTRATION_WINDOW
> Out of curiosity, how terrible is the terrible impact?

(I'll be sure to put this in the cover letter next time - sorry about that).

Here's the breakdown for a 14 GB (proprietary) memory intensive
micro-benchmark that was use here in IBM. (14 is the largest I can
create on my older hardware):

These numbers are *after* the bulk phase:

1. x-rdma-pin-all enabled:                           ~26 gbps (PCI 
limit, hardware is a few years old).
2. x-rdma-pin-all disabled, unpin disabled:  ~15 gbps (fluctuates as 
high as 20, but usually 15)
3. x-rdma-pin-all disabled, unpin enabled:   ~13 gbps (very stable at 
13, little or no deviation)

I have to admit that the difference between #2 and #3 is much smaller 
than I expected.

I suspect this is because of the very large chunk sizes (1MB).

The larger the chunk size, the less of a difference in unpinning costs.


>> + * Perform a non-optimized memory registration after every transfer
>> + * for demonsration purposes, only if pin-all is not requested.
>> + */
>> +static int qemu_rdma_unregister_waiting(RDMAContext *rdma) {
>> +#ifdef RDMA_ALLOW_REGISTRATION_WINDOW
> IIRC  function can be always enabled, it will just do nothing if
> no one registers anything.  It is just this bit that needs the
> #ifdef:

No, this is an entirely different unpinning function. It only
has one user, which is the new RDMA_ALLOW_REGISTRATION_WINDOW.

But adding an extra #ifdef doesn't hurt anyway.

> +
> +        if (!rdma->pin_all) {
> +            /*
> +             * FYI: If one wanted to signal a specific chunk to be unregistered
> +             * using LRU or workload-specific information, this is the function
> +             * you would call to do so. That chunk would then get asynchronously
> +             * unregistered later.
> +             */
>     >> #ifdef RDMA_ALLOW_REGISTRATION_WINDOW
> +            qemu_rdma_signal_unregister(rdma, index, chunk, wc.wr_id);
>     >> #endif
> +        }
>
> Then qemu_rdma_signal_unregister would be unused for
> !RDMA_ALLOW_REGISTRATION_WINDOW, so you would need to put that
> function within an #ifdef (the whole function, not just the
> body).
>
> At this point RDMA_ALLOW_REGISTRATION_WINDOW becomes a slightly
> wrong name.  What about RDMA_UNREGISTRATION_EXAMPLE?

Sounds fine to me. Will change it.

>> +        if(test_bit(chunk, block->transit_bitmap)) {
>> +            DDPRINTF("Cannot unregister inflight chunk: %" PRIu64 "\n", chunk);
>> +            continue;
>> +        }
> Nobody will process this unregistration after this.
>

Good catch - the clear_bit() was too late.


> Should you move this
>
>
> +        // perhaps test_and_clear_bit here?  and exit if the bit is zero
> +        clear_bit(chunk, block->unregister_bitmap);
> +
> +        ret = ibv_dereg_mr(block->pmr[chunk]);
> +        block->pmr[chunk] = NULL;
> +        block->remote_keys[chunk] = 0;
> +
> +        if(ret != 0) {
> +            perror("unregistration chunk failed");
> +            return -ret;
> +        }
> +        rdma->total_registrations--;
> +
> +        reg.key.chunk = chunk;
> +        ret = qemu_rdma_exchange_send(rdma, &head, (uint8_t *) &reg,
> +                                &resp, &resp_idx, NULL);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        DDPRINTF("Unregister for chunk: %" PRIu64 " complete.\n", chunk);
>
> to a separate function and check the unregister_bitmap when you get a
> transfer completed message from IB?

We would need a new thread for that. The migration is currently
single threaded. The new thread would listen asynchronously
for the unregistration complete messages. Based on the last couple
years of emails I've seen, introducing a new thread is not
a simple review task.

With respect, now that the implementation is available, I'd like
to move on, as we do not have an LRU for this migration and
I do not have the authorized manpower to implementing one.

There is a sufficient body of (compile-time-enabled) code here
for anyone to take up should overcommitment become a problem
for someone in the middle of an RDMA migration.

>
> Also, you do not need to receive the unregistration completed response
> now, do you?  You can have another bitmap for "unregistration in progress"
> and only wait for completion before re-pinning.  This way, the impact
> of unregistration is low if the page is not going to be changed soon.

Same answer above applies to this question.

> _However_, please make this new functionality a separate patch so that we
> can discuss it better.

Will do.

- Michael

  reply	other threads:[~2013-06-28 14:11 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-27 22:44 [Qemu-devel] [PATCH 0/6] rdma: core logic and unpin support mrhines
2013-06-27 22:44 ` [Qemu-devel] [PATCH 1/6] rdma: update documentation to reflect new " mrhines
2013-06-27 23:09   ` Eric Blake
2013-06-28 13:17     ` Michael R. Hines
2013-06-27 22:44 ` [Qemu-devel] [PATCH 2/6] rdma: introduce ram_handle_compressed() mrhines
2013-06-27 22:56   ` Peter Maydell
2013-06-28  7:17     ` Paolo Bonzini
2013-06-27 22:44 ` [Qemu-devel] [PATCH 3/6] rdma: core logic mrhines
2013-06-27 23:16   ` Peter Maydell
2013-06-28 13:23     ` Michael R. Hines
2013-06-28 13:28       ` Peter Maydell
2013-06-28 14:00         ` Michael R. Hines
2013-06-28 14:07           ` Peter Maydell
2013-06-28 14:22             ` Michael R. Hines
2013-06-28  7:33   ` Paolo Bonzini
2013-06-28 14:11     ` Michael R. Hines [this message]
2013-06-27 22:44 ` [Qemu-devel] [PATCH 4/6] rdma: allow state transitions between other states besides ACTIVE mrhines
2013-06-27 23:10   ` Eric Blake
2013-06-28  7:34     ` Paolo Bonzini
2013-06-27 22:44 ` [Qemu-devel] [PATCH 5/6] rdma: introduce MIG_STATE_NONE and change MIG_STATE_SETUP state transition mrhines
2013-06-27 22:44 ` [Qemu-devel] [PATCH 6/6] rdma: account for the time spent in MIG_STATE_SETUP through QMP mrhines

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51CD9975.90806@linux.vnet.ibm.com \
    --to=mrhines@linux.vnet.ibm.com \
    --cc=abali@us.ibm.com \
    --cc=aliguori@us.ibm.com \
    --cc=chegu_vinod@hp.com \
    --cc=gokul@us.ibm.com \
    --cc=knoel@redhat.com \
    --cc=mrhines@us.ibm.com \
    --cc=owasserm@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.