From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47932) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1er6RA-00079c-MM for qemu-devel@nongnu.org; Wed, 28 Feb 2018 13:26:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1er6R8-0005EJ-So for qemu-devel@nongnu.org; Wed, 28 Feb 2018 13:26:44 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:46300 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1er6R8-0005Dw-Ku for qemu-devel@nongnu.org; Wed, 28 Feb 2018 13:26:42 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2432DEAE80 for ; Wed, 28 Feb 2018 18:26:42 +0000 (UTC) Date: Wed, 28 Feb 2018 18:26:32 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20180228182631.GK2981@work-vm> References: <20180216131625.9639-1-dgilbert@redhat.com> <20180216131625.9639-16-dgilbert@redhat.com> <20180227162211-mutt-send-email-mst@kernel.org> <20180227195418.GK2847@work-vm> <20180227222336-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180227222336-mutt-send-email-mst@kernel.org> Subject: Re: [Qemu-devel] [PATCH v3 15/29] vhost+postcopy: Send address back to qemu List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: qemu-devel@nongnu.org, maxime.coquelin@redhat.com, marcandre.lureau@redhat.com, peterx@redhat.com, imammedo@redhat.com, quintela@redhat.com, aarcange@redhat.com * Michael S. Tsirkin (mst@redhat.com) wrote: > On Tue, Feb 27, 2018 at 07:54:18PM +0000, Dr. David Alan Gilbert wrote: > > * Michael S. Tsirkin (mst@redhat.com) wrote: > > > On Fri, Feb 16, 2018 at 01:16:11PM +0000, Dr. David Alan Gilbert (git) wrote: > > > > From: "Dr. David Alan Gilbert" > > > > > > > > We need a better way, but at the moment we need the address of the > > > > mappings sent back to qemu so it can interpret the messages on the > > > > userfaultfd it reads. > > > > > > > > This is done as a 3 stage set: > > > > QEMU -> client > > > > set_mem_table > > > > > > > > mmap stuff, get addresses > > > > > > > > client -> qemu > > > > here are the addresses > > > > > > > > qemu -> client > > > > OK - now you can use them > > > > > > > > That ensures that qemu has registered the new addresses in it's > > > > userfault code before the client starts accessing them. > > > > > > > > Note: We don't ask for the default 'ack' reply since we've got our own. > > > > > > > > Signed-off-by: Dr. David Alan Gilbert > > > > --- > > > > contrib/libvhost-user/libvhost-user.c | 24 ++++++++++++- > > > > docs/interop/vhost-user.txt | 9 +++++ > > > > hw/virtio/trace-events | 1 + > > > > hw/virtio/vhost-user.c | 67 +++++++++++++++++++++++++++++++++-- > > > > 4 files changed, 98 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c > > > > index a18bc74a7c..e02e5d6f46 100644 > > > > --- a/contrib/libvhost-user/libvhost-user.c > > > > +++ b/contrib/libvhost-user/libvhost-user.c > > > > @@ -491,10 +491,32 @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg) > > > > dev_region->mmap_addr); > > > > } > > > > > > > > + /* Return the address to QEMU so that it can translate the ufd > > > > + * fault addresses back. > > > > + */ > > > > + msg_region->userspace_addr = (uintptr_t)(mmap_addr + > > > > + dev_region->mmap_offset); > > > > close(vmsg->fds[i]); > > > > } > > > > > > > > - /* TODO: Get address back to QEMU */ > > > > + /* Send the message back to qemu with the addresses filled in */ > > > > + vmsg->fd_num = 0; > > > > + if (!vu_message_write(dev, dev->sock, vmsg)) { > > > > + vu_panic(dev, "failed to respond to set-mem-table for postcopy"); > > > > + return false; > > > > + } > > > > + > > > > + /* Wait for QEMU to confirm that it's registered the handler for the > > > > + * faults. > > > > + */ > > > > + if (!vu_message_read(dev, dev->sock, vmsg) || > > > > + vmsg->size != sizeof(vmsg->payload.u64) || > > > > + vmsg->payload.u64 != 0) { > > > > + vu_panic(dev, "failed to receive valid ack for postcopy set-mem-table"); > > > > + return false; > > > > + } > > > > + > > > > + /* OK, now we can go and register the memory and generate faults */ > > > > for (i = 0; i < dev->nregions; i++) { > > > > VuDevRegion *dev_region = &dev->regions[i]; > > > > #ifdef UFFDIO_REGISTER > > > > diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt > > > > index bdec9ec0e8..5bbcab2cc4 100644 > > > > --- a/docs/interop/vhost-user.txt > > > > +++ b/docs/interop/vhost-user.txt > > > > @@ -454,12 +454,21 @@ Master message types > > > > Id: 5 > > > > Equivalent ioctl: VHOST_SET_MEM_TABLE > > > > Master payload: memory regions description > > > > + Slave payload: (postcopy only) memory regions description > > > > > > > > Sets the memory map regions on the slave so it can translate the vring > > > > addresses. In the ancillary data there is an array of file descriptors > > > > for each memory mapped region. The size and ordering of the fds matches > > > > the number and ordering of memory regions. > > > > > > > > + When postcopy-listening has been received, > > > > > > Which message is this? > > > > VHOST_USER_POSTCOPY_LISTEN > > > > Do you want me just to change that to, 'When VHOST_USER_POSTCOPY_LISTEN > > has been received' ? > > I think it's better this way, yes. Done. > > > > SET_MEM_TABLE replies with > > > > + the bases of the memory mapped regions to the master. It must have mmap'd > > > > + the regions but not yet accessed them and should not yet generate a userfault > > > > + event. Note NEED_REPLY_MASK is not set in this case. > > > > + QEMU will then reply back to the list of mappings with an empty > > > > + VHOST_USER_SET_MEM_TABLE as an acknolwedgment; only upon reception of this > > > > + message may the guest start accessing the memory and generating faults. > > > > + > > > > * VHOST_USER_SET_LOG_BASE > > > > > > > > Id: 6 > > > > > > As you say yourself, this is probably the best we can do for now, > > > but it's not ideal. So I think it's a good idea to isolate this > > > behind a separate protocol feature bit. For now it will be required > > > for postcopy, when it's fixed in kernel we can drop it > > > cleanly. > > > > > > > While we've talked about ways of avoiding the exact addresses being > > known by the slave, I'm not sure we've talked about a way of removing > > this handshake; although it's doable if we move more of the work to the QEMU > > side. > > > > Dave > > Some kernel changes might thinkably remove the need for use of the > address with userfaultfd, too. > > > > > diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events > > > > index 06ec03d6e7..05d18ada77 100644 > > > > --- a/hw/virtio/trace-events > > > > +++ b/hw/virtio/trace-events > > > > @@ -8,6 +8,7 @@ vhost_section(const char *name, int r) "%s:%d" > > > > > > > > # hw/virtio/vhost-user.c > > > > vhost_user_postcopy_listen(void) "" > > > > +vhost_user_set_mem_table_postcopy(uint64_t client_addr, uint64_t qhva, int reply_i, int region_i) "client:0x%"PRIx64" for hva: 0x%"PRIx64" reply %d region %d" > > > > > > > > # hw/virtio/virtio.c > > > > virtqueue_alloc_element(void *elem, size_t sz, unsigned in_num, unsigned out_num) "elem %p size %zd in_num %u out_num %u" > > > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c > > > > index 64f4b3b3f9..a060442cb9 100644 > > > > --- a/hw/virtio/vhost-user.c > > > > +++ b/hw/virtio/vhost-user.c > > > > @@ -159,6 +159,7 @@ struct vhost_user { > > > > int slave_fd; > > > > NotifierWithReturn postcopy_notifier; > > > > struct PostCopyFD postcopy_fd; > > > > + uint64_t postcopy_client_bases[VHOST_MEMORY_MAX_NREGIONS]; > > > > /* True once we've entered postcopy_listen */ > > > > bool postcopy_listen; > > > > }; > > > > @@ -328,12 +329,15 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base, > > > > static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > > > > struct vhost_memory *mem) > > > > { > > > > + struct vhost_user *u = dev->opaque; > > > > int fds[VHOST_MEMORY_MAX_NREGIONS]; > > > > int i, fd; > > > > size_t fd_num = 0; > > > > bool reply_supported = virtio_has_feature(dev->protocol_features, > > > > VHOST_USER_PROTOCOL_F_REPLY_ACK); > > > > - /* TODO: Add actual postcopy differences */ > > > > + VhostUserMsg msg_reply; > > > > + int region_i, msg_i; > > > > + > > > > VhostUserMsg msg = { > > > > .hdr.request = VHOST_USER_SET_MEM_TABLE, > > > > .hdr.flags = VHOST_USER_VERSION, > > > > @@ -380,6 +384,64 @@ static int vhost_user_set_mem_table_postcopy(struct vhost_dev *dev, > > > > return -1; > > > > } > > > > > > > > + if (vhost_user_read(dev, &msg_reply) < 0) { > > > > + return -1; > > > > + } > > > > + > > > > + if (msg_reply.hdr.request != VHOST_USER_SET_MEM_TABLE) { > > > > + error_report("%s: Received unexpected msg type." > > > > + "Expected %d received %d", __func__, > > > > + VHOST_USER_SET_MEM_TABLE, msg_reply.hdr.request); > > > > + return -1; > > > > + } > > > > + /* We're using the same structure, just reusing one of the > > > > + * fields, so it should be the same size. > > > > + */ > > > > + if (msg_reply.hdr.size != msg.hdr.size) { > > > > + error_report("%s: Unexpected size for postcopy reply " > > > > + "%d vs %d", __func__, msg_reply.hdr.size, msg.hdr.size); > > > > + return -1; > > > > + } > > > > + > > > > + memset(u->postcopy_client_bases, 0, > > > > + sizeof(uint64_t) * VHOST_MEMORY_MAX_NREGIONS); > > > > + > > > > + /* They're in the same order as the regions that were sent > > > > + * but some of the regions were skipped (above) if they > > > > + * didn't have fd's > > > > + */ > > > > + for (msg_i = 0, region_i = 0; > > > > + region_i < dev->mem->nregions; > > > > + region_i++) { > > > > + if (msg_i < fd_num && > > > > + msg_reply.payload.memory.regions[msg_i].guest_phys_addr == > > > > + dev->mem->regions[region_i].guest_phys_addr) { > > > > + u->postcopy_client_bases[region_i] = > > > > + msg_reply.payload.memory.regions[msg_i].userspace_addr; > > > > + trace_vhost_user_set_mem_table_postcopy( > > > > + msg_reply.payload.memory.regions[msg_i].userspace_addr, > > > > + msg.payload.memory.regions[msg_i].userspace_addr, > > > > + msg_i, region_i); > > > > + msg_i++; > > > > + } > > > > + } > > > > + if (msg_i != fd_num) { > > > > + error_report("%s: postcopy reply not fully consumed " > > > > + "%d vs %zd", > > > > + __func__, msg_i, fd_num); > > > > + return -1; > > > > + } > > > > + /* Now we've registered this with the postcopy code, we ack to the client, > > > > + * because now we're in the position to be able to deal with any faults > > > > + * it generates. > > > > + */ > > > > + /* TODO: Use this for failure cases as well with a bad value */ > > > > + msg.hdr.size = sizeof(msg.payload.u64); > > > > + msg.payload.u64 = 0; /* OK */ > > > > + if (vhost_user_write(dev, &msg, NULL, 0) < 0) { > > > > + return -1; > > > > + } > > > > + > > > > if (reply_supported) { > > > > return process_message_reply(dev, &msg); > > > > } > > > > @@ -396,7 +458,8 @@ static int vhost_user_set_mem_table(struct vhost_dev *dev, > > > > size_t fd_num = 0; > > > > bool do_postcopy = u->postcopy_listen && u->postcopy_fd.handler; > > > > bool reply_supported = virtio_has_feature(dev->protocol_features, > > > > - VHOST_USER_PROTOCOL_F_REPLY_ACK); > > > > + VHOST_USER_PROTOCOL_F_REPLY_ACK) && > > > > + !do_postcopy; > > > > > > > > if (do_postcopy) { > > > > /* Postcopy has enough differences that it's best done in it's own > > > > -- > > > > 2.14.3 > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK