From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44666)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <berrange@redhat.com>) id 1aQvkd-00068x-Ce
	for qemu-devel@nongnu.org; Wed, 03 Feb 2016 06:37:36 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <berrange@redhat.com>) id 1aQvkZ-0007Bw-JS
	for qemu-devel@nongnu.org; Wed, 03 Feb 2016 06:37:35 -0500
Received: from mx1.redhat.com ([209.132.183.28]:50832)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <berrange@redhat.com>) id 1aQvkZ-0007Bs-C5
	for qemu-devel@nongnu.org; Wed, 03 Feb 2016 06:37:31 -0500
Received: from int-mx11.intmail.prod.int.phx2.redhat.com
	(int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24])
	by mx1.redhat.com (Postfix) with ESMTPS id B94423A117E
	for <qemu-devel@nongnu.org>; Wed,  3 Feb 2016 11:37:30 +0000 (UTC)
Date: Wed, 3 Feb 2016 11:37:27 +0000
From: "Daniel P. Berrange" <berrange@redhat.com>
Message-ID: <20160203113727.GG30222@redhat.com>
References: <1452599056-27357-1-git-send-email-berrange@redhat.com>
	<1452599056-27357-14-git-send-email-berrange@redhat.com>
	<20160202200136.GF4498@work-vm>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20160202200136.GF4498@work-vm>
Subject: Re: [Qemu-devel] [PATCH v1 13/22] migration: convert RDMA to use
 QIOChannel interface
Reply-To: "Daniel P. Berrange" <berrange@redhat.com>
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Amit Shah <amit.shah@redhat.com>, qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>

On Tue, Feb 02, 2016 at 08:01:36PM +0000, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrange (berrange@redhat.com) wrote:
> > This converts the RDMA code to provide a subclass of
> > QIOChannel that uses RDMA for the data transport.
> > 
> > The RDMA code would be much better off it it could
> > be split up in a generic RDMA layer, a QIOChannel
> > impl based on RMDA, and then the RMDA migration
> > glue. This is left as a future exercise for the brave.
> > 
> > Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
> > ---
> >  migration/rdma.c | 260 ++++++++++++++++++++++++++++++++++---------------------
> >  1 file changed, 161 insertions(+), 99 deletions(-)
> > 

> > +static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
> > +                                      const struct iovec *iov,
> > +                                      size_t niov,
> > +                                      int **fds,
> > +                                      size_t *nfds,
> > +                                      Error **errp)
> > +{
> > +    QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc);
> > +    RDMAContext *rdma = rioc->rdma;
> >      RDMAControlHeader head;
> >      int ret = 0;
> > +    ssize_t i;
> > +    size_t done = 0;
> >  
> >      CHECK_ERROR_STATE();
> >  
> > -    /*
> > -     * First, we hold on to the last SEND message we
> > -     * were given and dish out the bytes until we run
> > -     * out of bytes.
> > -     */
> > -    r->len = qemu_rdma_fill(r->rdma, buf, size, 0);
> > -    if (r->len) {
> > -        return r->len;
> > -    }
> > +    for (i = 0; i < niov; i++) {
> > +        size_t want = iov[i].iov_len;
> > +        uint8_t *data = (void *)iov[i].iov_base;
> >  
> > -    /*
> > -     * Once we run out, we block and wait for another
> > -     * SEND message to arrive.
> > -     */
> > -    ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE);
> > +        /*
> > +         * First, we hold on to the last SEND message we
> > +         * were given and dish out the bytes until we run
> > +         * out of bytes.
> > +         */
> > +        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
> > +        if (ret > 0) {
> > +            done += ret;
> > +            if (ret < want) {
> > +                break;
> > +            } else {
> > +                continue;
> > +            }
> 
> > +        }
> >  
> > -    if (ret < 0) {
> > -        rdma->error_state = ret;
> > -        return ret;
> > -    }
> > +        /*
> > +         * Once we run out, we block and wait for another
> > +         * SEND message to arrive.
> > +         */
> > +        ret = qemu_rdma_exchange_recv(rdma, &head, RDMA_CONTROL_QEMU_FILE);
> >  
> > -    /*
> > -     * SEND was received with new bytes, now try again.
> > -     */
> > -    return qemu_rdma_fill(r->rdma, buf, size, 0);
> > +        if (ret < 0) {
> > +            rdma->error_state = ret;
> > +            return ret;
> > +        }
> > +
> > +        /*
> > +         * SEND was received with new bytes, now try again.
> > +         */
> > +        ret = qemu_rdma_fill(rioc->rdma, data, want, 0);
> > +        if (ret > 0) {
> > +            done += ret;
> > +            if (ret < want) {
> > +                break;
> > +            }
> > +        }
> 
> I don't quite understand the behaviour of this loop.
> If rdma_fill returns less than you wanted for the first iov we break.
> If it returns 0 then we try and get some more.
> The weird thing to me is if we have two iov entries; if the
> amount returned by the qemu_rdma_fill happens to match the size of
> the 1st iov then I think we end up doing the exchange_recv and
> waiting for more.  Is that what we want? Why?

No, it isn't quite what we want. If we have successfully received
some data in a preceeding iov, then we shouldn't wait for more data
for any following iov. I'll rework this bit of code to work better

In fact technically, we should not block for data at all when the
channel is in non-blocking mode. Fixing that would require some
refactoring of qemu_rdma_block_for_wrid() so that we could tell
it to only look for an already completed work request and not
block

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|