On Fri, May 27, 2016 at 01:49:19PM +1000, Alexey Kardashevskiy wrote: > On 17/05/16 06:20, Alex Williamson wrote: > > On Mon, 16 May 2016 14:52:41 +1000 > > Alexey Kardashevskiy wrote: > > > >> On 05/14/2016 08:26 AM, Alex Williamson wrote: > >>> On Wed, 4 May 2016 16:52:30 +1000 > >>> Alexey Kardashevskiy wrote: > >>> > >>>> New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management. > >>>> This adds ability to VFIO common code to dynamically allocate/remove > >>>> DMA windows in the host kernel when new VFIO container is added/removed. > >>>> > >>>> This adds VFIO_IOMMU_SPAPR_TCE_CREATE ioctl to vfio_listener_region_add > >>>> and adds just created IOMMU into the host IOMMU list; the opposite > >>>> action is taken in vfio_listener_region_del. > >>>> > >>>> When creating a new window, this uses heuristic to decide on the TCE table > >>>> levels number. > >>>> > >>>> This should cause no guest visible change in behavior. > >>>> > >>>> Signed-off-by: Alexey Kardashevskiy > >>>> --- > >>>> Changes: > >>>> v16: > >>>> * used memory_region_iommu_get_page_sizes() in vfio_listener_region_add() > >>>> * enforced no intersections between windows > >>>> > >>>> v14: > >>>> * new to the series > >>>> --- > >>>> hw/vfio/common.c | 133 ++++++++++++++++++++++++++++++++++++++++++++++++++----- > >>>> trace-events | 2 + > >>>> 2 files changed, 125 insertions(+), 10 deletions(-) > >>>> > >>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c > >>>> index 03daf88..bd2dee8 100644 > >>>> --- a/hw/vfio/common.c > >>>> +++ b/hw/vfio/common.c > >>>> @@ -240,6 +240,18 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr iova, > >>>> return -errno; > >>>> } > >>>> > >>>> +static bool range_contains(hwaddr start, hwaddr end, hwaddr addr) > >>>> +{ > >>>> + return start <= addr && addr <= end; > >>>> +} > >>> > >>> a) If you want a "range_foo" function then put it in range.h > >>> b) I suspect there are already range.h functions that can do this. > >>> > >>>> + > >>>> +static bool vfio_host_win_intersects(VFIOHostDMAWindow *hostwin, > >>>> + hwaddr min_iova, hwaddr max_iova) > >>>> +{ > >>>> + return range_contains(hostwin->min_iova, hostwin->min_iova, min_iova) || > >>>> + range_contains(min_iova, max_iova, hostwin->min_iova); > >>>> +} > >>> > >>> How is this different than ranges_overlap()? > >>>> + > >>>> static VFIOHostDMAWindow *vfio_host_win_lookup(VFIOContainer *container, > >>>> hwaddr min_iova, hwaddr max_iova) > >>>> { > >>>> @@ -279,6 +291,14 @@ static int vfio_host_win_add(VFIOContainer *container, > >>>> return 0; > >>>> } > >>>> > >>>> +static void vfio_host_win_del(VFIOContainer *container, hwaddr min_iova) > >>>> +{ > >>>> + VFIOHostDMAWindow *hostwin = vfio_host_win_lookup(container, min_iova, 1); > >>>> + > >>>> + g_assert(hostwin); > >>> > >>> Handle the error please. > >> > >> Will this be enough? > >> > >> if (!hostwin) { > >> error_report("%s: Cannot delete missing window at %"HWADDR_PRIx, > >> __func__, min_iova); > >> return; > >> } > > > > Better. I was really thinking to return error to the caller, but if > > the caller has no return path, perhaps this is as good as we can do. > > Expect that I will push back on any assert() calls added to vfio. > > > > > >>>> + QLIST_REMOVE(hostwin, hostwin_next); > >>>> +} > >>>> + > >>>> static bool vfio_listener_skipped_section(MemoryRegionSection *section) > >>>> { > >>>> return (!memory_region_is_ram(section->mr) && > >>>> @@ -392,6 +412,69 @@ static void vfio_listener_region_add(MemoryListener *listener, > >>>> } > >>>> end = int128_get64(int128_sub(llend, int128_one())); > >>>> > >>>> + if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) { > >>>> + VFIOHostDMAWindow *hostwin; > >>>> + unsigned pagesizes = memory_region_iommu_get_page_sizes(section->mr); > >>>> + unsigned pagesize = (hwaddr)1 << ctz64(pagesizes); > >>>> + unsigned entries, pages; > >>>> + struct vfio_iommu_spapr_tce_create create = { .argsz = sizeof(create) }; > >>>> + > >>>> + trace_vfio_listener_region_add_iommu(iova, end); > >>>> + /* > >>>> + * FIXME: For VFIO iommu types which have KVM acceleration to > >>>> + * avoid bouncing all map/unmaps through qemu this way, this > >>>> + * would be the right place to wire that up (tell the KVM > >>>> + * device emulation the VFIO iommu handles to use). > >>>> + */ > >>>> + create.window_size = int128_get64(section->size); > >>>> + create.page_shift = ctz64(pagesize); > >>>> + /* > >>>> + * SPAPR host supports multilevel TCE tables, there is some > >>>> + * heuristic to decide how many levels we want for our table: > >>>> + * 0..64 = 1; 65..4096 = 2; 4097..262144 = 3; 262145.. = 4 > >>>> + */ > >>>> + entries = create.window_size >> create.page_shift; > >>>> + pages = MAX((entries * sizeof(uint64_t)) / getpagesize(), 1); > >>>> + pages = MAX(pow2ceil(pages) - 1, 1); /* Round up */ > >>>> + create.levels = ctz64(pages) / 6 + 1; > >>>> + > >>>> + /* For now intersections are not allowed, we may relax this later */ > >>>> + QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) { > >>>> + if (vfio_host_win_intersects(hostwin, > >>>> + section->offset_within_address_space, > >>>> + section->offset_within_address_space + > >>>> + create.window_size - 1)) { > >>>> + goto fail; > >>>> + } > >>>> + } > >>>> + > >>>> + ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &create); > >>>> + if (ret) { > >>>> + error_report("Failed to create a window, ret = %d (%m)", ret); > >>>> + goto fail; > >>>> + } > >>>> + > >>>> + if (create.start_addr != section->offset_within_address_space) { > >>>> + struct vfio_iommu_spapr_tce_remove remove = { > >>>> + .argsz = sizeof(remove), > >>>> + .start_addr = create.start_addr > >>>> + }; > >>>> + error_report("Host doesn't support DMA window at %"HWADDR_PRIx", must be %"PRIx64, > >>>> + section->offset_within_address_space, > >>>> + create.start_addr); > >>>> + ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove); > >>>> + ret = -EINVAL; > >>>> + goto fail; > >>>> + } > >>>> + trace_vfio_spapr_create_window(create.page_shift, > >>>> + create.window_size, > >>>> + create.start_addr); > >>>> + > >>>> + vfio_host_win_add(container, create.start_addr, > >>>> + create.start_addr + create.window_size - 1, > >>>> + 1ULL << create.page_shift); > >>>> + } > >>> > >>> This is a function on its own, split it out and why not stop pretending > >>> prereg is some sort of generic interface and let's just make a spapr > >>> support file. > >> > >> > >> Yet another new file - spapr.c, or rename prereg.c to spapr.c and add this > >> stuff there? > > > > prereg.c is already spapr specific, so I'd rename it and potentially > > add this to it. > > > It would help if you two decided what I should do about prereg.c vs. > spapr.c - merge or keep 2 files. Thanks. I'm not particularly fussed either way. So I suggest putting the prereg stuff in spapr.c for now, and we can move it out if/when another platform starts using the mechanism. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson