From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,
	URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8799EC56202
	for <kvm@archiver.kernel.org>; Wed, 25 Nov 2020 12:14:47 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 1C093206E5
	for <kvm@archiver.kernel.org>; Wed, 25 Nov 2020 12:14:47 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MqJjbPP/"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1728887AbgKYMOq (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Wed, 25 Nov 2020 07:14:46 -0500
Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:56210 "EHLO
        us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK)
        by vger.kernel.org with ESMTP id S1728077AbgKYMOp (ORCPT
        <rfc822;kvm@vger.kernel.org>); Wed, 25 Nov 2020 07:14:45 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
        s=mimecast20190719; t=1606306483;
        h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
         to:to:cc:cc:mime-version:mime-version:content-type:content-type:
         content-transfer-encoding:content-transfer-encoding:
         in-reply-to:in-reply-to:references:references;
        bh=DMmA4XTQ00FJZ1hDsKzyU42iqLETf4QTiXbtu9A5zJg=;
        b=MqJjbPP/C6o9qfucuBuqx2CznwMGo+pyaMJ6MfWyuY63j8Fy/0De6bfAPZW8rYvM2FSFmv
        7Wb8D1esnb3YZCgeOaKRXxSz4QiPRMnH/CyQG/jGke6aiUcJI/bPY4DqKvHXqf0FL7hT4G
        36X17Xgzd660EmuUuHiVotQ4HBN3nqg=
Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com
 [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-456-mZbj6-TYMHiUS3XaL2uDgw-1; Wed, 25 Nov 2020 07:14:42 -0500
X-MC-Unique: mZbj6-TYMHiUS3XaL2uDgw-1
Received: by mail-qv1-f72.google.com with SMTP id y8so2249719qvu.22
        for <kvm@vger.kernel.org>; Wed, 25 Nov 2020 04:14:42 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc:content-transfer-encoding;
        bh=DMmA4XTQ00FJZ1hDsKzyU42iqLETf4QTiXbtu9A5zJg=;
        b=WlD4a+IbVWVvBqYwHabuhLaTl3k9GxVmZj/mdWuC2dBrD0ExSnFvy8GnurzOHtYHax
         XEkgRDiefE8qUnrYbF5MZ34hqAbRHLvR0ySvvBO0nHaUkPCHooL8gVj/cYNjerAxZ4vs
         71YvptZfks4JDCT8tevgN+LzB/wWQ7nANsV4h64p0s/1P4kqlRtihc1XWwMmR6vOKEPE
         IXW/2rSh05eX3JswMxihwUL4zwOIDOOK/W7mfnwzM2psHYv07x+QoMj4ZeIduMN8w4mB
         KnQoDxxEX1dx/hqxXV4jen8228WXel3N9QbdVCAG3zxgeDTTmPqOkIEW9uod4TbYZhUx
         VfzQ==
X-Gm-Message-State: AOAM533jFLj4kbou/A+uBxuoMDrCviBd1V/tuMB5a3jbgmttt4Io99mi
        opERoSBtOQyhMq/N95P1lQuYWRMR3tw1/paP+DsfKdEqKKoOnJNoQyAny+Nm8i/7qnIz4s2Hofu
        V9LgVR7yYH/Uq6YvV2SBzakCu++h6
X-Received: by 2002:a0c:a802:: with SMTP id w2mr1345932qva.9.1606306481520;
        Wed, 25 Nov 2020 04:14:41 -0800 (PST)
X-Google-Smtp-Source: ABdhPJzd8me3OtSowApCvc1hz6io7NTPHg4CrJj45yH6wBfmBDAx+yL2zySfQm4ED57QCMTJdKae9z/q4HcSu554TUY=
X-Received: by 2002:a0c:a802:: with SMTP id w2mr1345902qva.9.1606306481218;
 Wed, 25 Nov 2020 04:14:41 -0800 (PST)
MIME-Version: 1.0
References: <20201120185105.279030-1-eperezma@redhat.com> <5a4d0b7a-fb62-9e78-9e85-9262dca57f1c@redhat.com>
 <CAJaqyWf+6yoMHJuLv=QGLMP4egmdm722=V2kKJ_aiQAfCCQOFw@mail.gmail.com>
In-Reply-To: <CAJaqyWf+6yoMHJuLv=QGLMP4egmdm722=V2kKJ_aiQAfCCQOFw@mail.gmail.com>
From:   Eugenio Perez Martin <eperezma@redhat.com>
Date:   Wed, 25 Nov 2020 13:14:05 +0100
Message-ID: <CAJaqyWcQH7TXVArEX1SNZ9XaAOwWjjjSX-onx=tmkioyJefMkQ@mail.gmail.com>
Subject: Re: [RFC PATCH 00/27] vDPA software assisted live migration
To:     Jason Wang <jasowang@redhat.com>
Cc:     qemu-level <qemu-devel@nongnu.org>,
        Lars Ganrot <lars.ganrot@gmail.com>,
        virtualization@lists.linux-foundation.org,
        Salil Mehta <mehta.salil.lnk@gmail.com>,
        "Michael S. Tsirkin" <mst@redhat.com>,
        Liran Alon <liralon@gmail.com>,
        Rob Miller <rob.miller@broadcom.com>,
        Max Gurtovoy <maxgu14@gmail.com>,
        Alex Barba <alex.barba@broadcom.com>,
        Stefan Hajnoczi <stefanha@redhat.com>,
        Jim Harford <jim.harford@broadcom.com>,
        Harpreet Singh Anand <hanand@xilinx.com>,
        Christophe Fontaine <cfontain@redhat.com>,
        vm <vmireyno@marvell.com>, Daniel Daly <dandaly0@gmail.com>,
        Michael Lilja <ml@napatech.com>,
        Stefano Garzarella <sgarzare@redhat.com>,
        Nitin Shrivastav <nitin.shrivastav@broadcom.com>,
        Lee Ballard <ballle98@gmail.com>,
        Dmytro Kazantsev <dmytro.kazantsev@gmail.com>,
        Juan Quintela <quintela@redhat.com>,
        kvm list <kvm@vger.kernel.org>,
        Howard Cai <howard.cai@gmail.com>,
        Xiao W Wang <xiao.w.wang@intel.com>,
        Sean Mooney <smooney@redhat.com>,
        Parav Pandit <parav@mellanox.com>,
        Eli Cohen <eli@mellanox.com>, Siwei Liu <loseweigh@gmail.com>,
        Stephen Finucane <stephenfin@redhat.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

On Wed, Nov 25, 2020 at 1:03 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Nov 25, 2020 at 8:09 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > On 2020/11/21 =E4=B8=8A=E5=8D=882:50, Eugenio P=C3=A9rez wrote:
> > > This series enable vDPA software assisted live migration for vhost-ne=
t
> > > devices. This is a new method of vhost devices migration: Instead of
> > > relay on vDPA device's dirty logging capability, SW assisted LM
> > > intercepts dataplane, forwarding the descriptors between VM and devic=
e.
> > >
> > > In this migration mode, qemu offers a new vring to the device to
> > > read and write into, and disable vhost notifiers, processing guest an=
d
> > > vhost notifications in qemu. On used buffer relay, qemu will mark the
> > > dirty memory as with plain virtio-net devices. This way, devices does
> > > not need to have dirty page logging capability.
> > >
> > > This series is a POC doing SW LM for vhost-net devices, which already
> > > have dirty page logging capabilities. None of the changes have actual
> > > effect with current devices until last two patches (26 and 27) are
> > > applied, but they can be rebased on top of any other. These checks th=
e
> > > device to meet all requirements, and disable vhost-net devices loggin=
g
> > > so migration goes through SW LM. This last patch is not meant to be
> > > applied in the final revision, it is in the series just for testing
> > > purposes.
> > >
> > > For use SW assisted LM these vhost-net devices need to be instantiate=
d:
> > > * With IOMMU (iommu_platform=3Don,ats=3Don)
> > > * Without event_idx (event_idx=3Doff)
> >
> >
> > So a question is at what level do we want to implement qemu assisted
> > live migration. To me it could be done at two levels:
> >
> > 1) generic vhost level which makes it work for both vhost-net/vhost-use=
r
> > and vhost-vDPA
> > 2) a specific type of vhost
> >
> > To me, having a generic one looks better but it would be much more
> > complicated. So what I read from this series is it was a vhost kernel
> > specific software assisted live migration which is a good start.
> > Actually it may even have real use case, e.g it can save dirty bitmaps
> > for guest with large memory. But we need to address the above
> > limitations first.
> >
> > So I would like to know what's the reason for mandating iommu platform
> > and ats? And I think we need to fix case of event idx support.
> >
>
> There is no specific reason for mandating iommu & ats, it was just
> started that way.
>
> I will extend the patch to support those cases too.
>
> >
> > >
> > > Just the notification forwarding (with no descriptor relay) can be
> > > achieved with patches 7 and 9, and starting migration. Partial applie=
s
> > > between 13 and 24 will not work while migrating on source, and patch
> > > 25 is needed for the destination to resume network activity.
> > >
> > > It is based on the ideas of DPDK SW assisted LM, in the series of
> >
> >
> > Actually we're better than that since there's no need the trick like
> > hardcoded IOVA for mediated(shadow) virtqueue.
> >
> >
> > > DPDK's https://patchwork.dpdk.org/cover/48370/ .
> >
> >
> > I notice that you do GPA->VA translations and try to establish a VA->VA
> > (use VA as IOVA) mapping via device IOTLB. This shortcut should work fo=
r
> > vhost-kernel/user but not vhost-vDPA. The reason is that there's no
> > guarantee that the whole 64bit address range could be used as IOVA. One
> > example is that for hardware IOMMU like intel, it usually has 47 or 52
> > bits of address width.
> >
> > So we probably need an IOVA allocator that can make sure the IOVA is no=
t
> > overlapped and fit for [1]. We can probably build the IOVA for guest VA
> > via memory listeners. Then we have
> >
> > 1) IOVA for GPA
> > 2) IOVA for shadow VQ
> >
> > And advertise IOVA to VA mapping to vhost.
> >
> > [1]
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/comm=
it/?id=3D1b48dc03e575a872404f33b04cd237953c5d7498
> >
>
> Got it, will control it too.
>
> Maybe for vhost-net we could directly send iotlb miss for [0,~0ULL].
>

Sorry, this was intended to be a question :).

Given a vhost-* device IOVA usable range, is ok to expose all of qemu
overlapping VA to it? With the iotlb miss, for example. Would it be
acceptable from a security point of view? The device would have access
to all qemu VA, but on the other hand devices like vhost-net already
have it.

> >
> > >
> > > Comments are welcome.
> > >
> > > Thanks!
> > >
> > > Eugenio P=C3=A9rez (27):
> > >    vhost: Add vhost_dev_can_log
> > >    vhost: Add device callback in vhost_migration_log
> > >    vhost: Move log resize/put to vhost_dev_set_log
> > >    vhost: add vhost_kernel_set_vring_enable
> > >    vhost: Add hdev->dev.sw_lm_vq_handler
> > >    virtio: Add virtio_queue_get_used_notify_split
> > >    vhost: Route guest->host notification through qemu
> > >    vhost: Add a flag for software assisted Live Migration
> > >    vhost: Route host->guest notification through qemu
> > >    vhost: Allocate shadow vring
> > >    virtio: const-ify all virtio_tswap* functions
> > >    virtio: Add virtio_queue_full
> > >    vhost: Send buffers to device
> > >    virtio: Remove virtio_queue_get_used_notify_split
> > >    vhost: Do not invalidate signalled used
> > >    virtio: Expose virtqueue_alloc_element
> > >    vhost: add vhost_vring_set_notification_rcu
> > >    vhost: add vhost_vring_poll_rcu
> > >    vhost: add vhost_vring_get_buf_rcu
> > >    vhost: Return used buffers
> > >    vhost: Add vhost_virtqueue_memory_unmap
> > >    vhost: Add vhost_virtqueue_memory_map
> > >    vhost: unmap qemu's shadow virtqueues on sw live migration
> > >    vhost: iommu changes
> > >    vhost: Do not commit vhost used idx on vhost_virtqueue_stop
> > >    vhost: Add vhost_hdev_can_sw_lm
> > >    vhost: forbid vhost devices logging
> > >
> > >   hw/virtio/vhost-sw-lm-ring.h      |  39 +++
> > >   include/hw/virtio/vhost.h         |   5 +
> > >   include/hw/virtio/virtio-access.h |   8 +-
> > >   include/hw/virtio/virtio.h        |   4 +
> > >   hw/net/virtio-net.c               |  39 ++-
> > >   hw/virtio/vhost-backend.c         |  29 ++
> > >   hw/virtio/vhost-sw-lm-ring.c      | 268 +++++++++++++++++++
> > >   hw/virtio/vhost.c                 | 431 +++++++++++++++++++++++++--=
---
> > >   hw/virtio/virtio.c                |  18 +-
> > >   hw/virtio/meson.build             |   2 +-
> > >   10 files changed, 758 insertions(+), 85 deletions(-)
> > >   create mode 100644 hw/virtio/vhost-sw-lm-ring.h
> > >   create mode 100644 hw/virtio/vhost-sw-lm-ring.c
> >
> >
> > So this looks like a pretty huge patchset which I'm trying to think of
> > ways to split. An idea is to do this is two steps
> >
> > 1) implement a shadow virtqueue mode for vhost first (w/o live
> > migration). Then we can test descriptors relay, IOVA allocating, etc.
>
> How would that mode be activated if it is not tied to live migration?
> New backend/command line switch?
>
> Maybe it is better to also start with no iommu & ats support and add it o=
n top.
>
> > 2) add live migration support on top
> >
> > And it looks to me it's better to split the shadow virtqueue (virtio
> > driver part) into an independent file. And use generic name (w/o
> > "shadow") in order to be reused by other use cases as well.
> >
>
> I think the same.
>
> Thanks!
>
> > Thoughts?
> >