From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752688AbdJKNtZ (ORCPT ); Wed, 11 Oct 2017 09:49:25 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39180 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752050AbdJKNtX (ORCPT ); Wed, 11 Oct 2017 09:49:23 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com DF354C059B6A Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=mst@redhat.com Date: Wed, 11 Oct 2017 16:49:16 +0300 From: "Michael S. Tsirkin" To: Wei Wang Cc: "virtio-dev@lists.oasis-open.org" , "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "mhocko@kernel.org" , "akpm@linux-foundation.org" , "mawilcox@microsoft.com" , "david@redhat.com" , "cornelia.huck@de.ibm.com" , "mgorman@techsingularity.net" , "aarcange@redhat.com" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "willy@infradead.org" , "liliang.opensource@gmail.com" , "yang.zhang.wz@gmail.com" , "quan.xu@aliyun.com" Subject: Re: [PATCH v16 5/5] virtio-balloon: VIRTIO_BALLOON_F_CTRL_VQ Message-ID: <20171011161912-mutt-send-email-mst@kernel.org> References: <1506744354-20979-1-git-send-email-wei.w.wang@intel.com> <1506744354-20979-6-git-send-email-wei.w.wang@intel.com> <20171001060305-mutt-send-email-mst@kernel.org> <286AC319A985734F985F78AFA26841F73932025A@shsmsx102.ccr.corp.intel.com> <20171010180636-mutt-send-email-mst@kernel.org> <59DDB428.4020208@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <59DDB428.4020208@intel.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Wed, 11 Oct 2017 13:49:23 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 11, 2017 at 02:03:20PM +0800, Wei Wang wrote: > On 10/10/2017 11:15 PM, Michael S. Tsirkin wrote: > > On Mon, Oct 02, 2017 at 04:38:01PM +0000, Wang, Wei W wrote: > > > On Sunday, October 1, 2017 11:19 AM, Michael S. Tsirkin wrote: > > > > On Sat, Sep 30, 2017 at 12:05:54PM +0800, Wei Wang wrote: > > > > > +static void ctrlq_send_cmd(struct virtio_balloon *vb, > > > > > + struct virtio_balloon_ctrlq_cmd *cmd, > > > > > + bool inbuf) > > > > > +{ > > > > > + struct virtqueue *vq = vb->ctrl_vq; > > > > > + > > > > > + ctrlq_add_cmd(vq, cmd, inbuf); > > > > > + if (!inbuf) { > > > > > + /* > > > > > + * All the input cmd buffers are replenished here. > > > > > + * This is necessary because the input cmd buffers are lost > > > > > + * after live migration. The device needs to rewind all of > > > > > + * them from the ctrl_vq. > > > > Confused. Live migration somehow loses state? Why is that and why is it a good > > > > idea? And how do you know this is migration even? > > > > Looks like all you know is you got free page end. Could be any reason for this. > > > > > > I think this would be something that the current live migration lacks - what the > > > device read from the vq is not transferred during live migration, an example is the > > > stat_vq_elem: > > > Line 476 at https://github.com/qemu/qemu/blob/master/hw/virtio/virtio-balloon.c > > This does not touch guest memory though it just manipulates > > internal state to make it easier to migrate. > > It's transparent to guest as migration should be. > > > > > For all the things that are added to the vq and need to be held by the device > > > to use later need to consider the situation that live migration might happen at any > > > time and they need to be re-taken from the vq by the device on the destination > > > machine. > > > > > > So, even without this live migration optimization feature, I think all the things that are > > > added to the vq for the device to hold, need a way for the device to rewind back from > > > the vq - re-adding all the elements to the vq is a trick to keep a record of all of them > > > on the vq so that the device side rewinding can work. > > > > > > Please let me know if anything is missed or if you have other suggestions. > > IMO migration should pass enough data source to destination for > > destination to continue where source left off without guest help. > > > > I'm afraid it would be difficult to pass the entire VirtQueueElement to the > destination. I think > that would also be the reason that stats_vq_elem chose to rewind from the > guest vq, which re-do the > virtqueue_pop() --> virtqueue_map_desc() steps (the QEMU virtual address to > the guest physical > address relationship may be changed on the destination). Yes but note how that rewind does not involve modifying the ring. It just rolls back some indices. > > How about another direction which would be easier - using two 32-bit device > specific configuration registers, > Host2Guest and Guest2Host command registers, to replace the ctrlq for > command exchange: > > The flow can be as follows: > > 1) Before Host sending a StartCMD, it flushes the free_page_vq in case any > old free page hint is left there; > 2) Host writes StartCMD to the Host2Guest register, and notifies the guest; > > 3) Upon receiving a configuration notification, Guest reads the Host2Guest > register, and detaches all the used buffers from free_page_vq; > (then for each StartCMD, the free_page_vq will always have no obsolete free > page hints, right? ) > > 4) Guest start report free pages: > 4.1) Host may actively write StopCMD to the Host2Guest register before > the guest finishes; or > 4.2) Guest finishes reporting, write StopCMD the Guest2HOST register, > which traps to QEMU, to stop. > > > Best, > Wei I am not sure it matters whether a VQ or the config are used to start/stop. But I think flushing is very fragile. You will easily run into races if one of the actors gets out of sync and keeps adding data. I think adding an ID in the free vq stream is a more robust approach. From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [PATCH v16 5/5] virtio-balloon: VIRTIO_BALLOON_F_CTRL_VQ Date: Wed, 11 Oct 2017 16:49:16 +0300 Message-ID: <20171011161912-mutt-send-email-mst@kernel.org> References: <1506744354-20979-1-git-send-email-wei.w.wang@intel.com> <1506744354-20979-6-git-send-email-wei.w.wang@intel.com> <20171001060305-mutt-send-email-mst@kernel.org> <286AC319A985734F985F78AFA26841F73932025A@shsmsx102.ccr.corp.intel.com> <20171010180636-mutt-send-email-mst@kernel.org> <59DDB428.4020208@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "virtio-dev@lists.oasis-open.org" , "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "mhocko@kernel.org" , "akpm@linux-foundation.org" , "mawilcox@microsoft.com" , "david@redhat.com" , "cornelia.huck@de.ibm.com" , "mgorman@techsingularity.net" , "aarcange@redhat.com" , "amit.shah@redhat.com" , "pbonzini@redhat.com" To: Wei Wang Return-path: Content-Disposition: inline In-Reply-To: <59DDB428.4020208@intel.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On Wed, Oct 11, 2017 at 02:03:20PM +0800, Wei Wang wrote: > On 10/10/2017 11:15 PM, Michael S. Tsirkin wrote: > > On Mon, Oct 02, 2017 at 04:38:01PM +0000, Wang, Wei W wrote: > > > On Sunday, October 1, 2017 11:19 AM, Michael S. Tsirkin wrote: > > > > On Sat, Sep 30, 2017 at 12:05:54PM +0800, Wei Wang wrote: > > > > > +static void ctrlq_send_cmd(struct virtio_balloon *vb, > > > > > + struct virtio_balloon_ctrlq_cmd *cmd, > > > > > + bool inbuf) > > > > > +{ > > > > > + struct virtqueue *vq = vb->ctrl_vq; > > > > > + > > > > > + ctrlq_add_cmd(vq, cmd, inbuf); > > > > > + if (!inbuf) { > > > > > + /* > > > > > + * All the input cmd buffers are replenished here. > > > > > + * This is necessary because the input cmd buffers are lost > > > > > + * after live migration. The device needs to rewind all of > > > > > + * them from the ctrl_vq. > > > > Confused. Live migration somehow loses state? Why is that and why is it a good > > > > idea? And how do you know this is migration even? > > > > Looks like all you know is you got free page end. Could be any reason for this. > > > > > > I think this would be something that the current live migration lacks - what the > > > device read from the vq is not transferred during live migration, an example is the > > > stat_vq_elem: > > > Line 476 at https://github.com/qemu/qemu/blob/master/hw/virtio/virtio-balloon.c > > This does not touch guest memory though it just manipulates > > internal state to make it easier to migrate. > > It's transparent to guest as migration should be. > > > > > For all the things that are added to the vq and need to be held by the device > > > to use later need to consider the situation that live migration might happen at any > > > time and they need to be re-taken from the vq by the device on the destination > > > machine. > > > > > > So, even without this live migration optimization feature, I think all the things that are > > > added to the vq for the device to hold, need a way for the device to rewind back from > > > the vq - re-adding all the elements to the vq is a trick to keep a record of all of them > > > on the vq so that the device side rewinding can work. > > > > > > Please let me know if anything is missed or if you have other suggestions. > > IMO migration should pass enough data source to destination for > > destination to continue where source left off without guest help. > > > > I'm afraid it would be difficult to pass the entire VirtQueueElement to the > destination. I think > that would also be the reason that stats_vq_elem chose to rewind from the > guest vq, which re-do the > virtqueue_pop() --> virtqueue_map_desc() steps (the QEMU virtual address to > the guest physical > address relationship may be changed on the destination). Yes but note how that rewind does not involve modifying the ring. It just rolls back some indices. > > How about another direction which would be easier - using two 32-bit device > specific configuration registers, > Host2Guest and Guest2Host command registers, to replace the ctrlq for > command exchange: > > The flow can be as follows: > > 1) Before Host sending a StartCMD, it flushes the free_page_vq in case any > old free page hint is left there; > 2) Host writes StartCMD to the Host2Guest register, and notifies the guest; > > 3) Upon receiving a configuration notification, Guest reads the Host2Guest > register, and detaches all the used buffers from free_page_vq; > (then for each StartCMD, the free_page_vq will always have no obsolete free > page hints, right? ) > > 4) Guest start report free pages: > 4.1) Host may actively write StopCMD to the Host2Guest register before > the guest finishes; or > 4.2) Guest finishes reporting, write StopCMD the Guest2HOST register, > which traps to QEMU, to stop. > > > Best, > Wei I am not sure it matters whether a VQ or the config are used to start/stop. But I think flushing is very fragile. You will easily run into races if one of the actors gets out of sync and keeps adding data. I think adding an ID in the free vq stream is a more robust approach. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt0-f199.google.com (mail-qt0-f199.google.com [209.85.216.199]) by kanga.kvack.org (Postfix) with ESMTP id 339516B0253 for ; Wed, 11 Oct 2017 09:49:25 -0400 (EDT) Received: by mail-qt0-f199.google.com with SMTP id p1so4546156qtg.18 for ; Wed, 11 Oct 2017 06:49:25 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id k88si2105956qtd.521.2017.10.11.06.49.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 Oct 2017 06:49:23 -0700 (PDT) Date: Wed, 11 Oct 2017 16:49:16 +0300 From: "Michael S. Tsirkin" Subject: Re: [PATCH v16 5/5] virtio-balloon: VIRTIO_BALLOON_F_CTRL_VQ Message-ID: <20171011161912-mutt-send-email-mst@kernel.org> References: <1506744354-20979-1-git-send-email-wei.w.wang@intel.com> <1506744354-20979-6-git-send-email-wei.w.wang@intel.com> <20171001060305-mutt-send-email-mst@kernel.org> <286AC319A985734F985F78AFA26841F73932025A@shsmsx102.ccr.corp.intel.com> <20171010180636-mutt-send-email-mst@kernel.org> <59DDB428.4020208@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <59DDB428.4020208@intel.com> Sender: owner-linux-mm@kvack.org List-ID: To: Wei Wang Cc: "virtio-dev@lists.oasis-open.org" , "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "mhocko@kernel.org" , "akpm@linux-foundation.org" , "mawilcox@microsoft.com" , "david@redhat.com" , "cornelia.huck@de.ibm.com" , "mgorman@techsingularity.net" , "aarcange@redhat.com" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "willy@infradead.org" , "liliang.opensource@gmail.com" , "yang.zhang.wz@gmail.com" , "quan.xu@aliyun.com" On Wed, Oct 11, 2017 at 02:03:20PM +0800, Wei Wang wrote: > On 10/10/2017 11:15 PM, Michael S. Tsirkin wrote: > > On Mon, Oct 02, 2017 at 04:38:01PM +0000, Wang, Wei W wrote: > > > On Sunday, October 1, 2017 11:19 AM, Michael S. Tsirkin wrote: > > > > On Sat, Sep 30, 2017 at 12:05:54PM +0800, Wei Wang wrote: > > > > > +static void ctrlq_send_cmd(struct virtio_balloon *vb, > > > > > + struct virtio_balloon_ctrlq_cmd *cmd, > > > > > + bool inbuf) > > > > > +{ > > > > > + struct virtqueue *vq = vb->ctrl_vq; > > > > > + > > > > > + ctrlq_add_cmd(vq, cmd, inbuf); > > > > > + if (!inbuf) { > > > > > + /* > > > > > + * All the input cmd buffers are replenished here. > > > > > + * This is necessary because the input cmd buffers are lost > > > > > + * after live migration. The device needs to rewind all of > > > > > + * them from the ctrl_vq. > > > > Confused. Live migration somehow loses state? Why is that and why is it a good > > > > idea? And how do you know this is migration even? > > > > Looks like all you know is you got free page end. Could be any reason for this. > > > > > > I think this would be something that the current live migration lacks - what the > > > device read from the vq is not transferred during live migration, an example is the > > > stat_vq_elem: > > > Line 476 at https://github.com/qemu/qemu/blob/master/hw/virtio/virtio-balloon.c > > This does not touch guest memory though it just manipulates > > internal state to make it easier to migrate. > > It's transparent to guest as migration should be. > > > > > For all the things that are added to the vq and need to be held by the device > > > to use later need to consider the situation that live migration might happen at any > > > time and they need to be re-taken from the vq by the device on the destination > > > machine. > > > > > > So, even without this live migration optimization feature, I think all the things that are > > > added to the vq for the device to hold, need a way for the device to rewind back from > > > the vq - re-adding all the elements to the vq is a trick to keep a record of all of them > > > on the vq so that the device side rewinding can work. > > > > > > Please let me know if anything is missed or if you have other suggestions. > > IMO migration should pass enough data source to destination for > > destination to continue where source left off without guest help. > > > > I'm afraid it would be difficult to pass the entire VirtQueueElement to the > destination. I think > that would also be the reason that stats_vq_elem chose to rewind from the > guest vq, which re-do the > virtqueue_pop() --> virtqueue_map_desc() steps (the QEMU virtual address to > the guest physical > address relationship may be changed on the destination). Yes but note how that rewind does not involve modifying the ring. It just rolls back some indices. > > How about another direction which would be easier - using two 32-bit device > specific configuration registers, > Host2Guest and Guest2Host command registers, to replace the ctrlq for > command exchange: > > The flow can be as follows: > > 1) Before Host sending a StartCMD, it flushes the free_page_vq in case any > old free page hint is left there; > 2) Host writes StartCMD to the Host2Guest register, and notifies the guest; > > 3) Upon receiving a configuration notification, Guest reads the Host2Guest > register, and detaches all the used buffers from free_page_vq; > (then for each StartCMD, the free_page_vq will always have no obsolete free > page hints, right? ) > > 4) Guest start report free pages: > 4.1) Host may actively write StopCMD to the Host2Guest register before > the guest finishes; or > 4.2) Guest finishes reporting, write StopCMD the Guest2HOST register, > which traps to QEMU, to stop. > > > Best, > Wei I am not sure it matters whether a VQ or the config are used to start/stop. But I think flushing is very fragile. You will easily run into races if one of the actors gets out of sync and keeps adding data. I think adding an ID in the free vq stream is a more robust approach. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35599) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1e2HO4-0000d3-EU for qemu-devel@nongnu.org; Wed, 11 Oct 2017 09:49:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1e2HO1-0000F1-82 for qemu-devel@nongnu.org; Wed, 11 Oct 2017 09:49:28 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34914) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1e2HO0-0000E8-SI for qemu-devel@nongnu.org; Wed, 11 Oct 2017 09:49:25 -0400 Date: Wed, 11 Oct 2017 16:49:16 +0300 From: "Michael S. Tsirkin" Message-ID: <20171011161912-mutt-send-email-mst@kernel.org> References: <1506744354-20979-1-git-send-email-wei.w.wang@intel.com> <1506744354-20979-6-git-send-email-wei.w.wang@intel.com> <20171001060305-mutt-send-email-mst@kernel.org> <286AC319A985734F985F78AFA26841F73932025A@shsmsx102.ccr.corp.intel.com> <20171010180636-mutt-send-email-mst@kernel.org> <59DDB428.4020208@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <59DDB428.4020208@intel.com> Subject: Re: [Qemu-devel] [PATCH v16 5/5] virtio-balloon: VIRTIO_BALLOON_F_CTRL_VQ List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Wei Wang Cc: "virtio-dev@lists.oasis-open.org" , "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "mhocko@kernel.org" , "akpm@linux-foundation.org" , "mawilcox@microsoft.com" , "david@redhat.com" , "cornelia.huck@de.ibm.com" , "mgorman@techsingularity.net" , "aarcange@redhat.com" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "willy@infradead.org" , "liliang.opensource@gmail.com" , "yang.zhang.wz@gmail.com" , "quan.xu@aliyun.com" On Wed, Oct 11, 2017 at 02:03:20PM +0800, Wei Wang wrote: > On 10/10/2017 11:15 PM, Michael S. Tsirkin wrote: > > On Mon, Oct 02, 2017 at 04:38:01PM +0000, Wang, Wei W wrote: > > > On Sunday, October 1, 2017 11:19 AM, Michael S. Tsirkin wrote: > > > > On Sat, Sep 30, 2017 at 12:05:54PM +0800, Wei Wang wrote: > > > > > +static void ctrlq_send_cmd(struct virtio_balloon *vb, > > > > > + struct virtio_balloon_ctrlq_cmd *cmd, > > > > > + bool inbuf) > > > > > +{ > > > > > + struct virtqueue *vq = vb->ctrl_vq; > > > > > + > > > > > + ctrlq_add_cmd(vq, cmd, inbuf); > > > > > + if (!inbuf) { > > > > > + /* > > > > > + * All the input cmd buffers are replenished here. > > > > > + * This is necessary because the input cmd buffers are lost > > > > > + * after live migration. The device needs to rewind all of > > > > > + * them from the ctrl_vq. > > > > Confused. Live migration somehow loses state? Why is that and why is it a good > > > > idea? And how do you know this is migration even? > > > > Looks like all you know is you got free page end. Could be any reason for this. > > > > > > I think this would be something that the current live migration lacks - what the > > > device read from the vq is not transferred during live migration, an example is the > > > stat_vq_elem: > > > Line 476 at https://github.com/qemu/qemu/blob/master/hw/virtio/virtio-balloon.c > > This does not touch guest memory though it just manipulates > > internal state to make it easier to migrate. > > It's transparent to guest as migration should be. > > > > > For all the things that are added to the vq and need to be held by the device > > > to use later need to consider the situation that live migration might happen at any > > > time and they need to be re-taken from the vq by the device on the destination > > > machine. > > > > > > So, even without this live migration optimization feature, I think all the things that are > > > added to the vq for the device to hold, need a way for the device to rewind back from > > > the vq - re-adding all the elements to the vq is a trick to keep a record of all of them > > > on the vq so that the device side rewinding can work. > > > > > > Please let me know if anything is missed or if you have other suggestions. > > IMO migration should pass enough data source to destination for > > destination to continue where source left off without guest help. > > > > I'm afraid it would be difficult to pass the entire VirtQueueElement to the > destination. I think > that would also be the reason that stats_vq_elem chose to rewind from the > guest vq, which re-do the > virtqueue_pop() --> virtqueue_map_desc() steps (the QEMU virtual address to > the guest physical > address relationship may be changed on the destination). Yes but note how that rewind does not involve modifying the ring. It just rolls back some indices. > > How about another direction which would be easier - using two 32-bit device > specific configuration registers, > Host2Guest and Guest2Host command registers, to replace the ctrlq for > command exchange: > > The flow can be as follows: > > 1) Before Host sending a StartCMD, it flushes the free_page_vq in case any > old free page hint is left there; > 2) Host writes StartCMD to the Host2Guest register, and notifies the guest; > > 3) Upon receiving a configuration notification, Guest reads the Host2Guest > register, and detaches all the used buffers from free_page_vq; > (then for each StartCMD, the free_page_vq will always have no obsolete free > page hints, right? ) > > 4) Guest start report free pages: > 4.1) Host may actively write StopCMD to the Host2Guest register before > the guest finishes; or > 4.2) Guest finishes reporting, write StopCMD the Guest2HOST register, > which traps to QEMU, to stop. > > > Best, > Wei I am not sure it matters whether a VQ or the config are used to start/stop. But I think flushing is very fragile. You will easily run into races if one of the actors gets out of sync and keeps adding data. I think adding an ID in the free vq stream is a more robust approach. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: virtio-dev-return-2613-cohuck=redhat.com@lists.oasis-open.org Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis-open.org [66.179.20.138]) by lists.oasis-open.org (Postfix) with ESMTP id BC6FA58182EA for ; Wed, 11 Oct 2017 06:49:24 -0700 (PDT) Date: Wed, 11 Oct 2017 16:49:16 +0300 From: "Michael S. Tsirkin" Message-ID: <20171011161912-mutt-send-email-mst@kernel.org> References: <1506744354-20979-1-git-send-email-wei.w.wang@intel.com> <1506744354-20979-6-git-send-email-wei.w.wang@intel.com> <20171001060305-mutt-send-email-mst@kernel.org> <286AC319A985734F985F78AFA26841F73932025A@shsmsx102.ccr.corp.intel.com> <20171010180636-mutt-send-email-mst@kernel.org> <59DDB428.4020208@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <59DDB428.4020208@intel.com> Subject: [virtio-dev] Re: [PATCH v16 5/5] virtio-balloon: VIRTIO_BALLOON_F_CTRL_VQ To: Wei Wang Cc: "virtio-dev@lists.oasis-open.org" , "linux-kernel@vger.kernel.org" , "qemu-devel@nongnu.org" , "virtualization@lists.linux-foundation.org" , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "mhocko@kernel.org" , "akpm@linux-foundation.org" , "mawilcox@microsoft.com" , "david@redhat.com" , "cornelia.huck@de.ibm.com" , "mgorman@techsingularity.net" , "aarcange@redhat.com" , "amit.shah@redhat.com" , "pbonzini@redhat.com" , "willy@infradead.org" , "liliang.opensource@gmail.com" , "yang.zhang.wz@gmail.com" , "quan.xu@aliyun.com" List-ID: On Wed, Oct 11, 2017 at 02:03:20PM +0800, Wei Wang wrote: > On 10/10/2017 11:15 PM, Michael S. Tsirkin wrote: > > On Mon, Oct 02, 2017 at 04:38:01PM +0000, Wang, Wei W wrote: > > > On Sunday, October 1, 2017 11:19 AM, Michael S. Tsirkin wrote: > > > > On Sat, Sep 30, 2017 at 12:05:54PM +0800, Wei Wang wrote: > > > > > +static void ctrlq_send_cmd(struct virtio_balloon *vb, > > > > > + struct virtio_balloon_ctrlq_cmd *cmd, > > > > > + bool inbuf) > > > > > +{ > > > > > + struct virtqueue *vq = vb->ctrl_vq; > > > > > + > > > > > + ctrlq_add_cmd(vq, cmd, inbuf); > > > > > + if (!inbuf) { > > > > > + /* > > > > > + * All the input cmd buffers are replenished here. > > > > > + * This is necessary because the input cmd buffers are lost > > > > > + * after live migration. The device needs to rewind all of > > > > > + * them from the ctrl_vq. > > > > Confused. Live migration somehow loses state? Why is that and why is it a good > > > > idea? And how do you know this is migration even? > > > > Looks like all you know is you got free page end. Could be any reason for this. > > > > > > I think this would be something that the current live migration lacks - what the > > > device read from the vq is not transferred during live migration, an example is the > > > stat_vq_elem: > > > Line 476 at https://github.com/qemu/qemu/blob/master/hw/virtio/virtio-balloon.c > > This does not touch guest memory though it just manipulates > > internal state to make it easier to migrate. > > It's transparent to guest as migration should be. > > > > > For all the things that are added to the vq and need to be held by the device > > > to use later need to consider the situation that live migration might happen at any > > > time and they need to be re-taken from the vq by the device on the destination > > > machine. > > > > > > So, even without this live migration optimization feature, I think all the things that are > > > added to the vq for the device to hold, need a way for the device to rewind back from > > > the vq - re-adding all the elements to the vq is a trick to keep a record of all of them > > > on the vq so that the device side rewinding can work. > > > > > > Please let me know if anything is missed or if you have other suggestions. > > IMO migration should pass enough data source to destination for > > destination to continue where source left off without guest help. > > > > I'm afraid it would be difficult to pass the entire VirtQueueElement to the > destination. I think > that would also be the reason that stats_vq_elem chose to rewind from the > guest vq, which re-do the > virtqueue_pop() --> virtqueue_map_desc() steps (the QEMU virtual address to > the guest physical > address relationship may be changed on the destination). Yes but note how that rewind does not involve modifying the ring. It just rolls back some indices. > > How about another direction which would be easier - using two 32-bit device > specific configuration registers, > Host2Guest and Guest2Host command registers, to replace the ctrlq for > command exchange: > > The flow can be as follows: > > 1) Before Host sending a StartCMD, it flushes the free_page_vq in case any > old free page hint is left there; > 2) Host writes StartCMD to the Host2Guest register, and notifies the guest; > > 3) Upon receiving a configuration notification, Guest reads the Host2Guest > register, and detaches all the used buffers from free_page_vq; > (then for each StartCMD, the free_page_vq will always have no obsolete free > page hints, right? ) > > 4) Guest start report free pages: > 4.1) Host may actively write StopCMD to the Host2Guest register before > the guest finishes; or > 4.2) Guest finishes reporting, write StopCMD the Guest2HOST register, > which traps to QEMU, to stop. > > > Best, > Wei I am not sure it matters whether a VQ or the config are used to start/stop. But I think flushing is very fragile. You will easily run into races if one of the actors gets out of sync and keeps adding data. I think adding an ID in the free vq stream is a more robust approach. --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org