From: Jason Wang <jasowang@redhat.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: kvm list <kvm@vger.kernel.org>,
"Michael S. Tsirkin" <mst@redhat.com>,
qemu-level <qemu-devel@nongnu.org>,
Daniel Daly <dandaly0@gmail.com>,
virtualization@lists.linux-foundation.org,
Liran Alon <liralon@gmail.com>, Eli Cohen <eli@mellanox.com>,
Nitin Shrivastav <nitin.shrivastav@broadcom.com>,
Alex Barba <alex.barba@broadcom.com>,
Christophe Fontaine <cfontain@redhat.com>,
Juan Quintela <quintela@redhat.com>,
Lee Ballard <ballle98@gmail.com>,
Lars Ganrot <lars.ganrot@gmail.com>,
Rob Miller <rob.miller@broadcom.com>,
Stefano Garzarella <sgarzare@redhat.com>,
Howard Cai <howard.cai@gmail.com>,
Parav Pandit <parav@mellanox.com>, vm <vmireyno@marvell.com>,
Salil Mehta <mehta.salil.lnk@gmail.com>,
Stephen Finucane <stephenfin@redhat.com>,
Xiao W Wang <xiao.w.wang@intel.com>,
Sean Mooney <smooney@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Jim Harford <jim.harford@broadcom.com>,
Dmytro Kazantsev <dmytro.kazantsev@gmail.com>,
Siwei Liu <loseweigh@gmail.com>,
Harpreet Singh Anand <hanand@xilinx.com>,
Michael Lilja <ml@napatech.com>, Max Gurtovoy <maxgu14@gmail.com>
Subject: Re: [RFC PATCH 00/27] vDPA software assisted live migration
Date: Thu, 26 Nov 2020 11:07:03 +0800 [thread overview]
Message-ID: <9edb2df1-dec0-8aad-4fdd-93c3b3be9ff6@redhat.com> (raw)
In-Reply-To: <CAJaqyWf+6yoMHJuLv=QGLMP4egmdm722=V2kKJ_aiQAfCCQOFw@mail.gmail.com>
On 2020/11/25 下午8:03, Eugenio Perez Martin wrote:
> On Wed, Nov 25, 2020 at 8:09 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2020/11/21 上午2:50, Eugenio Pérez wrote:
>>> This series enable vDPA software assisted live migration for vhost-net
>>> devices. This is a new method of vhost devices migration: Instead of
>>> relay on vDPA device's dirty logging capability, SW assisted LM
>>> intercepts dataplane, forwarding the descriptors between VM and device.
>>>
>>> In this migration mode, qemu offers a new vring to the device to
>>> read and write into, and disable vhost notifiers, processing guest and
>>> vhost notifications in qemu. On used buffer relay, qemu will mark the
>>> dirty memory as with plain virtio-net devices. This way, devices does
>>> not need to have dirty page logging capability.
>>>
>>> This series is a POC doing SW LM for vhost-net devices, which already
>>> have dirty page logging capabilities. None of the changes have actual
>>> effect with current devices until last two patches (26 and 27) are
>>> applied, but they can be rebased on top of any other. These checks the
>>> device to meet all requirements, and disable vhost-net devices logging
>>> so migration goes through SW LM. This last patch is not meant to be
>>> applied in the final revision, it is in the series just for testing
>>> purposes.
>>>
>>> For use SW assisted LM these vhost-net devices need to be instantiated:
>>> * With IOMMU (iommu_platform=on,ats=on)
>>> * Without event_idx (event_idx=off)
>>
>> So a question is at what level do we want to implement qemu assisted
>> live migration. To me it could be done at two levels:
>>
>> 1) generic vhost level which makes it work for both vhost-net/vhost-user
>> and vhost-vDPA
>> 2) a specific type of vhost
>>
>> To me, having a generic one looks better but it would be much more
>> complicated. So what I read from this series is it was a vhost kernel
>> specific software assisted live migration which is a good start.
>> Actually it may even have real use case, e.g it can save dirty bitmaps
>> for guest with large memory. But we need to address the above
>> limitations first.
>>
>> So I would like to know what's the reason for mandating iommu platform
>> and ats? And I think we need to fix case of event idx support.
>>
> There is no specific reason for mandating iommu & ats, it was just
> started that way.
>
> I will extend the patch to support those cases too.
>
>>> Just the notification forwarding (with no descriptor relay) can be
>>> achieved with patches 7 and 9, and starting migration. Partial applies
>>> between 13 and 24 will not work while migrating on source, and patch
>>> 25 is needed for the destination to resume network activity.
>>>
>>> It is based on the ideas of DPDK SW assisted LM, in the series of
>>
>> Actually we're better than that since there's no need the trick like
>> hardcoded IOVA for mediated(shadow) virtqueue.
>>
>>
>>> DPDK's https://patchwork.dpdk.org/cover/48370/ .
>>
>> I notice that you do GPA->VA translations and try to establish a VA->VA
>> (use VA as IOVA) mapping via device IOTLB. This shortcut should work for
>> vhost-kernel/user but not vhost-vDPA. The reason is that there's no
>> guarantee that the whole 64bit address range could be used as IOVA. One
>> example is that for hardware IOMMU like intel, it usually has 47 or 52
>> bits of address width.
>>
>> So we probably need an IOVA allocator that can make sure the IOVA is not
>> overlapped and fit for [1]. We can probably build the IOVA for guest VA
>> via memory listeners. Then we have
>>
>> 1) IOVA for GPA
>> 2) IOVA for shadow VQ
>>
>> And advertise IOVA to VA mapping to vhost.
>>
>> [1]
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1b48dc03e575a872404f33b04cd237953c5d7498
>>
> Got it, will control it too.
>
> Maybe for vhost-net we could directly send iotlb miss for [0,~0ULL].
It works but it means vhost-net needs some special care. To me a generic
IOVA allocator looks better.
>
>>> Comments are welcome.
>>>
>>> Thanks!
>>>
>>> Eugenio Pérez (27):
>>> vhost: Add vhost_dev_can_log
>>> vhost: Add device callback in vhost_migration_log
>>> vhost: Move log resize/put to vhost_dev_set_log
>>> vhost: add vhost_kernel_set_vring_enable
>>> vhost: Add hdev->dev.sw_lm_vq_handler
>>> virtio: Add virtio_queue_get_used_notify_split
>>> vhost: Route guest->host notification through qemu
>>> vhost: Add a flag for software assisted Live Migration
>>> vhost: Route host->guest notification through qemu
>>> vhost: Allocate shadow vring
>>> virtio: const-ify all virtio_tswap* functions
>>> virtio: Add virtio_queue_full
>>> vhost: Send buffers to device
>>> virtio: Remove virtio_queue_get_used_notify_split
>>> vhost: Do not invalidate signalled used
>>> virtio: Expose virtqueue_alloc_element
>>> vhost: add vhost_vring_set_notification_rcu
>>> vhost: add vhost_vring_poll_rcu
>>> vhost: add vhost_vring_get_buf_rcu
>>> vhost: Return used buffers
>>> vhost: Add vhost_virtqueue_memory_unmap
>>> vhost: Add vhost_virtqueue_memory_map
>>> vhost: unmap qemu's shadow virtqueues on sw live migration
>>> vhost: iommu changes
>>> vhost: Do not commit vhost used idx on vhost_virtqueue_stop
>>> vhost: Add vhost_hdev_can_sw_lm
>>> vhost: forbid vhost devices logging
>>>
>>> hw/virtio/vhost-sw-lm-ring.h | 39 +++
>>> include/hw/virtio/vhost.h | 5 +
>>> include/hw/virtio/virtio-access.h | 8 +-
>>> include/hw/virtio/virtio.h | 4 +
>>> hw/net/virtio-net.c | 39 ++-
>>> hw/virtio/vhost-backend.c | 29 ++
>>> hw/virtio/vhost-sw-lm-ring.c | 268 +++++++++++++++++++
>>> hw/virtio/vhost.c | 431 +++++++++++++++++++++++++-----
>>> hw/virtio/virtio.c | 18 +-
>>> hw/virtio/meson.build | 2 +-
>>> 10 files changed, 758 insertions(+), 85 deletions(-)
>>> create mode 100644 hw/virtio/vhost-sw-lm-ring.h
>>> create mode 100644 hw/virtio/vhost-sw-lm-ring.c
>>
>> So this looks like a pretty huge patchset which I'm trying to think of
>> ways to split. An idea is to do this is two steps
>>
>> 1) implement a shadow virtqueue mode for vhost first (w/o live
>> migration). Then we can test descriptors relay, IOVA allocating, etc.
> How would that mode be activated if it is not tied to live migration?
> New backend/command line switch?
Either a new cli option or even a qmp command can work.
>
> Maybe it is better to also start with no iommu & ats support and add it on top.
Yes.
>
>> 2) add live migration support on top
>>
>> And it looks to me it's better to split the shadow virtqueue (virtio
>> driver part) into an independent file. And use generic name (w/o
>> "shadow") in order to be reused by other use cases as well.
>>
> I think the same.
>
> Thanks!
>
>> Thoughts?
>>
>
next prev parent reply other threads:[~2020-11-26 3:07 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-20 18:50 [RFC PATCH 00/27] vDPA software assisted live migration Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 01/27] vhost: Add vhost_dev_can_log Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 02/27] vhost: Add device callback in vhost_migration_log Eugenio Pérez
2020-12-07 16:19 ` Stefan Hajnoczi
2020-12-09 12:20 ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 03/27] vhost: Move log resize/put to vhost_dev_set_log Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 04/27] vhost: add vhost_kernel_set_vring_enable Eugenio Pérez
2020-12-07 16:43 ` Stefan Hajnoczi
2020-12-09 12:00 ` Eugenio Perez Martin
2020-12-09 16:08 ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 05/27] vhost: Add hdev->dev.sw_lm_vq_handler Eugenio Pérez
2020-12-07 16:52 ` Stefan Hajnoczi
2020-12-09 15:02 ` Eugenio Perez Martin
2020-12-10 11:30 ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 06/27] virtio: Add virtio_queue_get_used_notify_split Eugenio Pérez
2020-12-07 16:58 ` Stefan Hajnoczi
2021-01-12 18:21 ` Eugenio Perez Martin
2021-03-02 11:22 ` Stefan Hajnoczi
2021-03-02 18:34 ` Eugenio Perez Martin
2021-03-08 10:46 ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 07/27] vhost: Route guest->host notification through qemu Eugenio Pérez
2020-12-07 17:42 ` Stefan Hajnoczi
2020-12-09 17:08 ` Eugenio Perez Martin
2020-12-10 11:50 ` Stefan Hajnoczi
2021-01-21 20:10 ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 08/27] vhost: Add a flag for software assisted Live Migration Eugenio Pérez
2020-12-08 7:20 ` Stefan Hajnoczi
2020-12-09 17:57 ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 09/27] vhost: Route host->guest notification through qemu Eugenio Pérez
2020-12-08 7:34 ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 10/27] vhost: Allocate shadow vring Eugenio Pérez
2020-12-08 7:49 ` Stefan Hajnoczi
2020-12-08 8:17 ` Stefan Hajnoczi
2020-12-09 18:15 ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 11/27] virtio: const-ify all virtio_tswap* functions Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 12/27] virtio: Add virtio_queue_full Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 13/27] vhost: Send buffers to device Eugenio Pérez
2020-12-08 8:16 ` Stefan Hajnoczi
2020-12-09 18:41 ` Eugenio Perez Martin
2020-12-10 11:55 ` Stefan Hajnoczi
2021-01-22 18:18 ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 14/27] virtio: Remove virtio_queue_get_used_notify_split Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 15/27] vhost: Do not invalidate signalled used Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 16/27] virtio: Expose virtqueue_alloc_element Eugenio Pérez
2020-12-08 8:25 ` Stefan Hajnoczi
2020-12-09 18:46 ` Eugenio Perez Martin
2020-12-10 11:57 ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 17/27] vhost: add vhost_vring_set_notification_rcu Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 18/27] vhost: add vhost_vring_poll_rcu Eugenio Pérez
2020-12-08 8:41 ` Stefan Hajnoczi
2020-12-09 18:48 ` Eugenio Perez Martin
2020-11-20 18:50 ` [RFC PATCH 19/27] vhost: add vhost_vring_get_buf_rcu Eugenio Pérez
2020-11-20 18:50 ` [RFC PATCH 20/27] vhost: Return used buffers Eugenio Pérez
2020-12-08 8:50 ` Stefan Hajnoczi
2020-11-20 18:50 ` [RFC PATCH 21/27] vhost: Add vhost_virtqueue_memory_unmap Eugenio Pérez
2020-11-20 18:51 ` [RFC PATCH 22/27] vhost: Add vhost_virtqueue_memory_map Eugenio Pérez
2020-11-20 18:51 ` [RFC PATCH 23/27] vhost: unmap qemu's shadow virtqueues on sw live migration Eugenio Pérez
2020-11-27 15:29 ` Stefano Garzarella
2020-11-30 7:54 ` Eugenio Perez Martin
2020-11-20 18:51 ` [RFC PATCH 24/27] vhost: iommu changes Eugenio Pérez
2020-12-08 9:02 ` Stefan Hajnoczi
2020-11-20 18:51 ` [RFC PATCH 25/27] vhost: Do not commit vhost used idx on vhost_virtqueue_stop Eugenio Pérez
2020-11-20 19:35 ` Eugenio Perez Martin
2020-11-20 18:51 ` [RFC PATCH 26/27] vhost: Add vhost_hdev_can_sw_lm Eugenio Pérez
2020-11-20 18:51 ` [RFC PATCH 27/27] vhost: forbid vhost devices logging Eugenio Pérez
2020-11-20 19:03 ` [RFC PATCH 00/27] vDPA software assisted live migration Eugenio Perez Martin
2020-11-20 19:30 ` no-reply
2020-11-25 7:08 ` Jason Wang
2020-11-25 12:03 ` Eugenio Perez Martin
2020-11-25 12:14 ` Eugenio Perez Martin
2020-11-26 3:07 ` Jason Wang [this message]
2020-11-27 15:44 ` Stefano Garzarella
2020-12-08 9:37 ` Stefan Hajnoczi
2020-12-09 9:26 ` Jason Wang
2020-12-09 15:57 ` Stefan Hajnoczi
2020-12-10 9:12 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9edb2df1-dec0-8aad-4fdd-93c3b3be9ff6@redhat.com \
--to=jasowang@redhat.com \
--cc=alex.barba@broadcom.com \
--cc=ballle98@gmail.com \
--cc=cfontain@redhat.com \
--cc=dandaly0@gmail.com \
--cc=dmytro.kazantsev@gmail.com \
--cc=eli@mellanox.com \
--cc=eperezma@redhat.com \
--cc=hanand@xilinx.com \
--cc=howard.cai@gmail.com \
--cc=jim.harford@broadcom.com \
--cc=kvm@vger.kernel.org \
--cc=lars.ganrot@gmail.com \
--cc=liralon@gmail.com \
--cc=loseweigh@gmail.com \
--cc=maxgu14@gmail.com \
--cc=mehta.salil.lnk@gmail.com \
--cc=ml@napatech.com \
--cc=mst@redhat.com \
--cc=nitin.shrivastav@broadcom.com \
--cc=parav@mellanox.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=rob.miller@broadcom.com \
--cc=sgarzare@redhat.com \
--cc=smooney@redhat.com \
--cc=stefanha@redhat.com \
--cc=stephenfin@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
--cc=vmireyno@marvell.com \
--cc=xiao.w.wang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).