All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Kirti Wankhede <kwankhede@nvidia.com>
Cc: cohuck@redhat.com, cjia@nvidia.com, aik@ozlabs.ru,
	Zhengxiao.zx@Alibaba-inc.com, shuangtai.tst@alibaba-inc.com,
	qemu-devel@nongnu.org, peterx@redhat.com, eauger@redhat.com,
	yi.l.liu@intel.com, quintela@redhat.com, ziye.yang@intel.com,
	armbru@redhat.com, mlevitsk@redhat.com, pasic@linux.ibm.com,
	felipe@nutanix.com, zhi.a.wang@intel.com, kevin.tian@intel.com,
	yan.y.zhao@intel.com, dgilbert@redhat.com,
	changpeng.liu@intel.com, eskultet@redhat.com, Ken.Xue@amd.com,
	jonathan.davies@nutanix.com, pbonzini@redhat.com
Subject: Re: [PATCH v26 13/17] vfio: create mapped iova list when vIOMMU is enabled
Date: Mon, 19 Oct 2020 14:07:17 -0600	[thread overview]
Message-ID: <20201019140717.705368dd@w520.home> (raw)
In-Reply-To: <52361a71-f812-5f69-be57-93b732e96ed1@nvidia.com>

On Tue, 20 Oct 2020 00:45:28 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 10/19/2020 10:54 PM, Alex Williamson wrote:
> > On Mon, 19 Oct 2020 11:31:03 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> >> On 9/26/2020 3:53 AM, Alex Williamson wrote:  
> >>> On Wed, 23 Sep 2020 04:54:15 +0530
> >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>      
> >>>> Create mapped iova list when vIOMMU is enabled. For each mapped iova
> >>>> save translated address. Add node to list on MAP and remove node from
> >>>> list on UNMAP.
> >>>> This list is used to track dirty pages during migration.
> >>>>
> >>>> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> >>>> ---
> >>>>    hw/vfio/common.c              | 58 ++++++++++++++++++++++++++++++++++++++-----
> >>>>    include/hw/vfio/vfio-common.h |  8 ++++++
> >>>>    2 files changed, 60 insertions(+), 6 deletions(-)
> >>>>
> >>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >>>> index d4959c036dd1..dc56cded2d95 100644
> >>>> --- a/hw/vfio/common.c
> >>>> +++ b/hw/vfio/common.c
> >>>> @@ -407,8 +407,8 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> >>>>    }
> >>>>    
> >>>>    /* Called with rcu_read_lock held.  */
> >>>> -static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
> >>>> -                           bool *read_only)
> >>>> +static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> >>>> +                               ram_addr_t *ram_addr, bool *read_only)
> >>>>    {
> >>>>        MemoryRegion *mr;
> >>>>        hwaddr xlat;
> >>>> @@ -439,8 +439,17 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
> >>>>            return false;
> >>>>        }
> >>>>    
> >>>> -    *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> >>>> -    *read_only = !writable || mr->readonly;
> >>>> +    if (vaddr) {
> >>>> +        *vaddr = memory_region_get_ram_ptr(mr) + xlat;
> >>>> +    }
> >>>> +
> >>>> +    if (ram_addr) {
> >>>> +        *ram_addr = memory_region_get_ram_addr(mr) + xlat;
> >>>> +    }
> >>>> +
> >>>> +    if (read_only) {
> >>>> +        *read_only = !writable || mr->readonly;
> >>>> +    }
> >>>>    
> >>>>        return true;
> >>>>    }
> >>>> @@ -450,7 +459,6 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> >>>>        VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> >>>>        VFIOContainer *container = giommu->container;
> >>>>        hwaddr iova = iotlb->iova + giommu->iommu_offset;
> >>>> -    bool read_only;
> >>>>        void *vaddr;
> >>>>        int ret;
> >>>>    
> >>>> @@ -466,7 +474,10 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> >>>>        rcu_read_lock();
> >>>>    
> >>>>        if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> >>>> -        if (!vfio_get_vaddr(iotlb, &vaddr, &read_only)) {
> >>>> +        ram_addr_t ram_addr;
> >>>> +        bool read_only;
> >>>> +
> >>>> +        if (!vfio_get_xlat_addr(iotlb, &vaddr, &ram_addr, &read_only)) {
> >>>>                goto out;
> >>>>            }
> >>>>            /*
> >>>> @@ -484,8 +495,28 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> >>>>                             "0x%"HWADDR_PRIx", %p) = %d (%m)",
> >>>>                             container, iova,
> >>>>                             iotlb->addr_mask + 1, vaddr, ret);
> >>>> +        } else {
> >>>> +            VFIOIovaRange *iova_range;
> >>>> +
> >>>> +            iova_range = g_malloc0(sizeof(*iova_range));
> >>>> +            iova_range->iova = iova;
> >>>> +            iova_range->size = iotlb->addr_mask + 1;
> >>>> +            iova_range->ram_addr = ram_addr;
> >>>> +
> >>>> +            QLIST_INSERT_HEAD(&giommu->iova_list, iova_range, next);
> >>>>            }
> >>>>        } else {
> >>>> +        VFIOIovaRange *iova_range, *tmp;
> >>>> +
> >>>> +        QLIST_FOREACH_SAFE(iova_range, &giommu->iova_list, next, tmp) {
> >>>> +            if (iova_range->iova >= iova &&
> >>>> +                iova_range->iova + iova_range->size <= iova +
> >>>> +                                                       iotlb->addr_mask + 1) {
> >>>> +                QLIST_REMOVE(iova_range, next);
> >>>> +                g_free(iova_range);
> >>>> +            }
> >>>> +        }
> >>>> +  
> >>>
> >>>
> >>> This is some pretty serious overhead... can't we trigger a replay when
> >>> migration is enabled to build this information then?  
> >>
> >> Are you suggesting to call memory_region_iommu_replay() before
> >> vfio_sync_dirty_bitmap(), which would call vfio_iommu_map_notify() where
> >> iova list of mapping is maintained? Then in the notifer check if
> >> migration_is_running() and container->dirty_pages_supported == true,
> >> then only create iova mapping tree? In this case how would we know that
> >> this is triggered by
> >> vfio_sync_dirty_bitmap()  
> >>    -> memory_region_iommu_replay()  
> >> and we don't have to call vfio_dma_map()?  
> > 
> > memory_region_iommu_replay() calls a notifier of our choice, so we
> > could create a notifier specifically for creating this tree when dirty
> > logging is enabled.  Thanks,
> >   
> 
> This would also mean changes in intel_iommu.c such that it would walk 
> through the iova_tree and call notifier for each entry in iova_tree.

I think we already have that in vtd_iommu_replay(), an
IOMMUMemoryRegionClass.replay callback is rather a requirement of any
vIOMMU intending to support vfio AIUI.
 
> What about other platforms? We will have to handle such cases for
> AMD, ARM, PPC etc...?

There's already a requirement for a working replay callback to work in
any reasonable way with vfio, this is just an additional use case of a
callback we already need and use.

> I don't see replay callback for AMD, that would result in minimum
> IOMMU supported page size granularity walk - which is similar to that
> I tried to implement 2-3 versions back.

Patch 1/3:
https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg00545.html
Patch 5/10:
https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg02196.html

> Does that mean doing such change would improve performance for Intel 
> IOMMU but worsen for AMD/PPC?

We're not adding a new requirement, we already call replay, PPC doesn't
use type1.  What exactly regresses if we introduce another replay user?

> I'm changing list to tree as first level of improvement in this patch.
> 
> Can we do the change you suggested above later as next level of
> improvement?

AIUI above, we're allocating an object and adding it to a list (soon to
be tree) for every vIOMMU mapping, on the off chance that migration
might be used, regardless of devices even supporting migration.  I can
only see that as a runtime performance and size regression.  Thanks,

Alex



  reply	other threads:[~2020-10-19 20:08 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-22 23:24 [PATCH QEMU v25 00/17] Add migration support for VFIO devices Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 01/17] vfio: Add function to unmap VFIO region Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 02/17] vfio: Add vfio_get_object callback to VFIODeviceOps Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 03/17] vfio: Add save and load functions for VFIO PCI devices Kirti Wankhede
2020-09-23  6:38   ` Zenghui Yu
2020-09-24 22:49   ` Alex Williamson
2020-10-21  9:30     ` Zenghui Yu
2020-10-21 19:03       ` Alex Williamson
2020-09-22 23:24 ` [PATCH v26 04/17] vfio: Add migration region initialization and finalize function Kirti Wankhede
2020-09-24 14:08   ` Cornelia Huck
2020-10-17 20:14     ` Kirti Wankhede
2020-09-25 20:20   ` Alex Williamson
2020-09-28  9:39     ` Cornelia Huck
2020-10-17 20:17     ` Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 05/17] vfio: Add VM state change handler to know state of VM Kirti Wankhede
2020-09-24 15:02   ` Cornelia Huck
2020-09-29 11:03     ` Dr. David Alan Gilbert
2020-10-17 20:24       ` Kirti Wankhede
2020-10-20 10:51         ` Cornelia Huck
2020-10-21  5:33           ` Kirti Wankhede
2020-10-22  7:51             ` Cornelia Huck
2020-10-22 15:42               ` Kirti Wankhede
2020-10-22 15:49                 ` Cornelia Huck
2020-09-25 20:20   ` Alex Williamson
2020-10-17 20:30     ` Kirti Wankhede
2020-10-17 23:44       ` Alex Williamson
2020-10-18 17:43         ` Kirti Wankhede
2020-10-19 17:51           ` Alex Williamson
2020-10-20 10:23             ` Cornelia Huck
2020-09-22 23:24 ` [PATCH v26 06/17] vfio: Add migration state change notifier Kirti Wankhede
2020-09-25 20:20   ` Alex Williamson
2020-10-17 20:35     ` Kirti Wankhede
2020-10-19 17:57       ` Alex Williamson
2020-10-20 10:55         ` Cornelia Huck
2020-09-22 23:24 ` [PATCH v26 07/17] vfio: Register SaveVMHandlers for VFIO device Kirti Wankhede
2020-09-24 15:15   ` Philippe Mathieu-Daudé
2020-09-29 10:19     ` Dr. David Alan Gilbert
2020-10-17 20:36       ` Kirti Wankhede
2020-09-25 11:53   ` Cornelia Huck
2020-10-18 20:55     ` Kirti Wankhede
2020-10-20 15:51       ` Cornelia Huck
2020-09-25 20:20   ` Alex Williamson
2020-10-18 17:40     ` Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 08/17] vfio: Add save state functions to SaveVMHandlers Kirti Wankhede
2020-09-23 11:42   ` Wang, Zhi A
2020-10-21 14:30     ` Kirti Wankhede
2020-09-25 21:02   ` Alex Williamson
2020-10-18 18:00     ` Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 09/17] vfio: Add load " Kirti Wankhede
2020-10-01 10:07   ` Cornelia Huck
2020-10-18 20:47     ` Kirti Wankhede
2020-10-20 16:25       ` Cornelia Huck
2020-09-22 23:24 ` [PATCH v26 10/17] memory: Set DIRTY_MEMORY_MIGRATION when IOMMU is enabled Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 11/17] vfio: Get migration capability flags for container Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 12/17] vfio: Add function to start and stop dirty pages tracking Kirti Wankhede
2020-09-25 21:55   ` Alex Williamson
2020-10-18 20:52     ` Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 13/17] vfio: create mapped iova list when vIOMMU is enabled Kirti Wankhede
2020-09-25 22:23   ` Alex Williamson
2020-10-19  6:01     ` Kirti Wankhede
2020-10-19 17:24       ` Alex Williamson
2020-10-19 19:15         ` Kirti Wankhede
2020-10-19 20:07           ` Alex Williamson [this message]
2020-09-22 23:24 ` [PATCH v26 14/17] vfio: Add vfio_listener_log_sync to mark dirty pages Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 15/17] vfio: Add ioctl to get dirty pages bitmap during dma unmap Kirti Wankhede
2020-09-22 23:24 ` [PATCH v26 16/17] vfio: Make vfio-pci device migration capable Kirti Wankhede
2020-09-25 12:17   ` Cornelia Huck
2020-09-22 23:24 ` [PATCH v26 17/17] qapi: Add VFIO devices migration stats in Migration stats Kirti Wankhede
2020-09-24 15:14   ` Eric Blake
2020-09-25 22:55   ` Alex Williamson
2020-09-29 10:40   ` Dr. David Alan Gilbert
2020-09-23  7:06 ` [PATCH QEMU v25 00/17] Add migration support for VFIO devices Zenghui Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201019140717.705368dd@w520.home \
    --to=alex.williamson@redhat.com \
    --cc=Ken.Xue@amd.com \
    --cc=Zhengxiao.zx@Alibaba-inc.com \
    --cc=aik@ozlabs.ru \
    --cc=armbru@redhat.com \
    --cc=changpeng.liu@intel.com \
    --cc=cjia@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eauger@redhat.com \
    --cc=eskultet@redhat.com \
    --cc=felipe@nutanix.com \
    --cc=jonathan.davies@nutanix.com \
    --cc=kevin.tian@intel.com \
    --cc=kwankhede@nvidia.com \
    --cc=mlevitsk@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=shuangtai.tst@alibaba-inc.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhi.a.wang@intel.com \
    --cc=ziye.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.