From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44063) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b68mD-0005JV-DF for qemu-devel@nongnu.org; Thu, 26 May 2016 23:49:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b68m9-0002Bb-5B for qemu-devel@nongnu.org; Thu, 26 May 2016 23:49:32 -0400 Received: from mail-pa0-x241.google.com ([2607:f8b0:400e:c03::241]:33063) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b68m8-0002A7-Op for qemu-devel@nongnu.org; Thu, 26 May 2016 23:49:29 -0400 Received: by mail-pa0-x241.google.com with SMTP id f8so11156129pag.0 for ; Thu, 26 May 2016 20:49:27 -0700 (PDT) References: <1462344751-28281-1-git-send-email-aik@ozlabs.ru> <1462344751-28281-19-git-send-email-aik@ozlabs.ru> <20160513162634.278ad267@t450s.home> <95076137-0493-224b-54e8-71023e6da230@ozlabs.ru> <20160516142002.3b47a874@t450s.home> From: Alexey Kardashevskiy Message-ID: Date: Fri, 27 May 2016 13:49:19 +1000 MIME-Version: 1.0 In-Reply-To: <20160516142002.3b47a874@t450s.home> Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH qemu v16 18/19] vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: qemu-devel@nongnu.org, qemu-ppc@nongnu.org, Alexander Graf , David Gibson , Paolo Bonzini On 17/05/16 06:20, Alex Williamson wrote: > On Mon, 16 May 2016 14:52:41 +1000 > Alexey Kardashevskiy wrote: > >> On 05/14/2016 08:26 AM, Alex Williamson wrote: >>> On Wed, 4 May 2016 16:52:30 +1000 >>> Alexey Kardashevskiy wrote: >>> >>>> New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management. >>>> This adds ability to VFIO common code to dynamically allocate/remove >>>> DMA windows in the host kernel when new VFIO container is added/removed. >>>> >>>> This adds VFIO_IOMMU_SPAPR_TCE_CREATE ioctl to vfio_listener_region_add >>>> and adds just created IOMMU into the host IOMMU list; the opposite >>>> action is taken in vfio_listener_region_del. >>>> >>>> When creating a new window, this uses heuristic to decide on the TCE table >>>> levels number. >>>> >>>> This should cause no guest visible change in behavior. >>>> >>>> Signed-off-by: Alexey Kardashevskiy >>>> --- >>>> Changes: >>>> v16: >>>> * used memory_region_iommu_get_page_sizes() in vfio_listener_region_add() >>>> * enforced no intersections between windows >>>> >>>> v14: >>>> * new to the series >>>> --- >>>> hw/vfio/common.c | 133 ++++++++++++++++++++++++++++++++++++++++++++++++++----- >>>> trace-events | 2 + >>>> 2 files changed, 125 insertions(+), 10 deletions(-) >>>> >>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c >>>> index 03daf88..bd2dee8 100644 >>>> --- a/hw/vfio/common.c >>>> +++ b/hw/vfio/common.c >>>> @@ -240,6 +240,18 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova, >>>> return -errno; >>>> } >>>> >>>> +static bool range_contains(hwaddr start, hwaddr end, hwaddr addr) >>>> +{ >>>> + return start <= addr && addr <= end; >>>> +} >>> >>> a) If you want a "range_foo" function then put it in range.h >>> b) I suspect there are already range.h functions that can do this. >>> >>>> + >>>> +static bool vfio_host_win_intersects(VFIOHostDMAWindow *hostwin, >>>> + hwaddr min_iova, hwaddr max_iova) >>>> +{ >>>> + return range_contains(hostwin->min_iova, hostwin->min_iova, min_iova) || >>>> + range_contains(min_iova, max_iova, hostwin->min_iova); >>>> +} >>> >>> How is this different than ranges_overlap()? >>>> + >>>> static VFIOHostDMAWindow *vfio_host_win_lookup(VFIOContainer *container, >>>> hwaddr min_iova, hwaddr max_iova) >>>> { >>>> @@ -279,6 +291,14 @@ static int vfio_host_win_add(VFIOContainer *container, >>>> return 0; >>>> } >>>> >>>> +static void vfio_host_win_del(VFIOContainer *container, hwaddr min_iova) >>>> +{ >>>> + VFIOHostDMAWindow *hostwin = vfio_host_win_lookup(container, min_iova, 1); >>>> + >>>> + g_assert(hostwin); >>> >>> Handle the error please. >> >> Will this be enough? >> >> if (!hostwin) { >> error_report("%s: Cannot delete missing window at %"HWADDR_PRIx, >> __func__, min_iova); >> return; >> } > > Better. I was really thinking to return error to the caller, but if > the caller has no return path, perhaps this is as good as we can do. > Expect that I will push back on any assert() calls added to vfio. > > >>>> + QLIST_REMOVE(hostwin, hostwin_next); >>>> +} >>>> + >>>> static bool vfio_listener_skipped_section(MemoryRegionSection *section) >>>> { >>>> return (!memory_region_is_ram(section->mr) && >>>> @@ -392,6 +412,69 @@ static void vfio_listener_region_add(MemoryListener *listener, >>>> } >>>> end = int128_get64(int128_sub(llend, int128_one())); >>>> >>>> + if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) { >>>> + VFIOHostDMAWindow *hostwin; >>>> + unsigned pagesizes = memory_region_iommu_get_page_sizes(section->mr); >>>> + unsigned pagesize = (hwaddr)1 << ctz64(pagesizes); >>>> + unsigned entries, pages; >>>> + struct vfio_iommu_spapr_tce_create create = { .argsz = sizeof(create) }; >>>> + >>>> + trace_vfio_listener_region_add_iommu(iova, end); >>>> + /* >>>> + * FIXME: For VFIO iommu types which have KVM acceleration to >>>> + * avoid bouncing all map/unmaps through qemu this way, this >>>> + * would be the right place to wire that up (tell the KVM >>>> + * device emulation the VFIO iommu handles to use). >>>> + */ >>>> + create.window_size = int128_get64(section->size); >>>> + create.page_shift = ctz64(pagesize); >>>> + /* >>>> + * SPAPR host supports multilevel TCE tables, there is some >>>> + * heuristic to decide how many levels we want for our table: >>>> + * 0..64 = 1; 65..4096 = 2; 4097..262144 = 3; 262145.. = 4 >>>> + */ >>>> + entries = create.window_size >> create.page_shift; >>>> + pages = MAX((entries * sizeof(uint64_t)) / getpagesize(), 1); >>>> + pages = MAX(pow2ceil(pages) - 1, 1); /* Round up */ >>>> + create.levels = ctz64(pages) / 6 + 1; >>>> + >>>> + /* For now intersections are not allowed, we may relax this later */ >>>> + QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) { >>>> + if (vfio_host_win_intersects(hostwin, >>>> + section->offset_within_address_space, >>>> + section->offset_within_address_space + >>>> + create.window_size - 1)) { >>>> + goto fail; >>>> + } >>>> + } >>>> + >>>> + ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &create); >>>> + if (ret) { >>>> + error_report("Failed to create a window, ret = %d (%m)", ret); >>>> + goto fail; >>>> + } >>>> + >>>> + if (create.start_addr != section->offset_within_address_space) { >>>> + struct vfio_iommu_spapr_tce_remove remove = { >>>> + .argsz = sizeof(remove), >>>> + .start_addr = create.start_addr >>>> + }; >>>> + error_report("Host doesn't support DMA window at %"HWADDR_PRIx", must be %"PRIx64, >>>> + section->offset_within_address_space, >>>> + create.start_addr); >>>> + ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove); >>>> + ret = -EINVAL; >>>> + goto fail; >>>> + } >>>> + trace_vfio_spapr_create_window(create.page_shift, >>>> + create.window_size, >>>> + create.start_addr); >>>> + >>>> + vfio_host_win_add(container, create.start_addr, >>>> + create.start_addr + create.window_size - 1, >>>> + 1ULL << create.page_shift); >>>> + } >>> >>> This is a function on its own, split it out and why not stop pretending >>> prereg is some sort of generic interface and let's just make a spapr >>> support file. >> >> >> Yet another new file - spapr.c, or rename prereg.c to spapr.c and add this >> stuff there? > > prereg.c is already spapr specific, so I'd rename it and potentially > add this to it. It would help if you two decided what I should do about prereg.c vs. spapr.c - merge or keep 2 files. Thanks. -- Alexey