From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46439) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b691H-0004Qz-7V for qemu-devel@nongnu.org; Fri, 27 May 2016 00:05:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b691F-0005pv-J0 for qemu-devel@nongnu.org; Fri, 27 May 2016 00:05:07 -0400 Date: Fri, 27 May 2016 14:05:00 +1000 From: David Gibson Message-ID: <20160527040500.GW17226@voom.fritz.box> References: <1462344751-28281-1-git-send-email-aik@ozlabs.ru> <1462344751-28281-19-git-send-email-aik@ozlabs.ru> <20160513162634.278ad267@t450s.home> <95076137-0493-224b-54e8-71023e6da230@ozlabs.ru> <20160516142002.3b47a874@t450s.home> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="6ti9qQdT93fLKTkp" Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [PATCH qemu v16 18/19] vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: Alex Williamson , qemu-devel@nongnu.org, qemu-ppc@nongnu.org, Alexander Graf , Paolo Bonzini --6ti9qQdT93fLKTkp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, May 27, 2016 at 01:49:19PM +1000, Alexey Kardashevskiy wrote: > On 17/05/16 06:20, Alex Williamson wrote: > > On Mon, 16 May 2016 14:52:41 +1000 > > Alexey Kardashevskiy wrote: > >=20 > >> On 05/14/2016 08:26 AM, Alex Williamson wrote: > >>> On Wed, 4 May 2016 16:52:30 +1000 > >>> Alexey Kardashevskiy wrote: > >>> =20 > >>>> New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window managem= ent. > >>>> This adds ability to VFIO common code to dynamically allocate/remove > >>>> DMA windows in the host kernel when new VFIO container is added/remo= ved. > >>>> > >>>> This adds VFIO_IOMMU_SPAPR_TCE_CREATE ioctl to vfio_listener_region_= add > >>>> and adds just created IOMMU into the host IOMMU list; the opposite > >>>> action is taken in vfio_listener_region_del. > >>>> > >>>> When creating a new window, this uses heuristic to decide on the TCE= table > >>>> levels number. > >>>> > >>>> This should cause no guest visible change in behavior. > >>>> > >>>> Signed-off-by: Alexey Kardashevskiy > >>>> --- > >>>> Changes: > >>>> v16: > >>>> * used memory_region_iommu_get_page_sizes() in vfio_listener_region_= add() > >>>> * enforced no intersections between windows > >>>> > >>>> v14: > >>>> * new to the series > >>>> --- > >>>> hw/vfio/common.c | 133 ++++++++++++++++++++++++++++++++++++++++++++= ++++++----- > >>>> trace-events | 2 + > >>>> 2 files changed, 125 insertions(+), 10 deletions(-) > >>>> > >>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c > >>>> index 03daf88..bd2dee8 100644 > >>>> --- a/hw/vfio/common.c > >>>> +++ b/hw/vfio/common.c > >>>> @@ -240,6 +240,18 @@ static int vfio_dma_map(VFIOContainer *containe= r, hwaddr iova, > >>>> return -errno; > >>>> } > >>>> > >>>> +static bool range_contains(hwaddr start, hwaddr end, hwaddr addr) > >>>> +{ > >>>> + return start <=3D addr && addr <=3D end; > >>>> +} =20 > >>> > >>> a) If you want a "range_foo" function then put it in range.h > >>> b) I suspect there are already range.h functions that can do this. > >>> =20 > >>>> + > >>>> +static bool vfio_host_win_intersects(VFIOHostDMAWindow *hostwin, > >>>> + hwaddr min_iova, hwaddr max_io= va) > >>>> +{ > >>>> + return range_contains(hostwin->min_iova, hostwin->min_iova, min= _iova) || > >>>> + range_contains(min_iova, max_iova, hostwin->min_iova); > >>>> +} =20 > >>> > >>> How is this different than ranges_overlap()? =20 > >>>> + > >>>> static VFIOHostDMAWindow *vfio_host_win_lookup(VFIOContainer *conta= iner, > >>>> hwaddr min_iova, hwa= ddr max_iova) > >>>> { > >>>> @@ -279,6 +291,14 @@ static int vfio_host_win_add(VFIOContainer *con= tainer, > >>>> return 0; > >>>> } > >>>> > >>>> +static void vfio_host_win_del(VFIOContainer *container, hwaddr min_= iova) > >>>> +{ > >>>> + VFIOHostDMAWindow *hostwin =3D vfio_host_win_lookup(container, = min_iova, 1); > >>>> + > >>>> + g_assert(hostwin); =20 > >>> > >>> Handle the error please. =20 > >> > >> Will this be enough? > >> > >> if (!hostwin) { > >> error_report("%s: Cannot delete missing window at %"HWADDR_PR= Ix, > >> __func__, min_iova); > >> return; > >> } > >=20 > > Better. I was really thinking to return error to the caller, but if > > the caller has no return path, perhaps this is as good as we can do. > > Expect that I will push back on any assert() calls added to vfio. > >=20 > >=20 > >>>> + QLIST_REMOVE(hostwin, hostwin_next); > >>>> +} > >>>> + > >>>> static bool vfio_listener_skipped_section(MemoryRegionSection *sect= ion) > >>>> { > >>>> return (!memory_region_is_ram(section->mr) && > >>>> @@ -392,6 +412,69 @@ static void vfio_listener_region_add(MemoryList= ener *listener, > >>>> } > >>>> end =3D int128_get64(int128_sub(llend, int128_one())); > >>>> > >>>> + if (container->iommu_type =3D=3D VFIO_SPAPR_TCE_v2_IOMMU) { > >>>> + VFIOHostDMAWindow *hostwin; > >>>> + unsigned pagesizes =3D memory_region_iommu_get_page_sizes(s= ection->mr); > >>>> + unsigned pagesize =3D (hwaddr)1 << ctz64(pagesizes); > >>>> + unsigned entries, pages; > >>>> + struct vfio_iommu_spapr_tce_create create =3D { .argsz =3D = sizeof(create) }; > >>>> + > >>>> + trace_vfio_listener_region_add_iommu(iova, end); > >>>> + /* > >>>> + * FIXME: For VFIO iommu types which have KVM acceleration = to > >>>> + * avoid bouncing all map/unmaps through qemu this way, this > >>>> + * would be the right place to wire that up (tell the KVM > >>>> + * device emulation the VFIO iommu handles to use). > >>>> + */ > >>>> + create.window_size =3D int128_get64(section->size); > >>>> + create.page_shift =3D ctz64(pagesize); > >>>> + /* > >>>> + * SPAPR host supports multilevel TCE tables, there is some > >>>> + * heuristic to decide how many levels we want for our tabl= e: > >>>> + * 0..64 =3D 1; 65..4096 =3D 2; 4097..262144 =3D 3; 262145.= =2E =3D 4 > >>>> + */ > >>>> + entries =3D create.window_size >> create.page_shift; > >>>> + pages =3D MAX((entries * sizeof(uint64_t)) / getpagesize(),= 1); > >>>> + pages =3D MAX(pow2ceil(pages) - 1, 1); /* Round up */ > >>>> + create.levels =3D ctz64(pages) / 6 + 1; > >>>> + > >>>> + /* For now intersections are not allowed, we may relax this= later */ > >>>> + QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_ne= xt) { > >>>> + if (vfio_host_win_intersects(hostwin, > >>>> + section->offset_within_address_space, > >>>> + section->offset_within_address_space + > >>>> + create.window_size - 1)) { > >>>> + goto fail; > >>>> + } > >>>> + } > >>>> + > >>>> + ret =3D ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &= create); > >>>> + if (ret) { > >>>> + error_report("Failed to create a window, ret =3D %d (%m= )", ret); > >>>> + goto fail; > >>>> + } > >>>> + > >>>> + if (create.start_addr !=3D section->offset_within_address_s= pace) { > >>>> + struct vfio_iommu_spapr_tce_remove remove =3D { > >>>> + .argsz =3D sizeof(remove), > >>>> + .start_addr =3D create.start_addr > >>>> + }; > >>>> + error_report("Host doesn't support DMA window at %"HWAD= DR_PRIx", must be %"PRIx64, > >>>> + section->offset_within_address_space, > >>>> + create.start_addr); > >>>> + ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remo= ve); > >>>> + ret =3D -EINVAL; > >>>> + goto fail; > >>>> + } > >>>> + trace_vfio_spapr_create_window(create.page_shift, > >>>> + create.window_size, > >>>> + create.start_addr); > >>>> + > >>>> + vfio_host_win_add(container, create.start_addr, > >>>> + create.start_addr + create.window_size - = 1, > >>>> + 1ULL << create.page_shift); > >>>> + } =20 > >>> > >>> This is a function on its own, split it out and why not stop pretendi= ng > >>> prereg is some sort of generic interface and let's just make a spapr > >>> support file. =20 > >> > >> > >> Yet another new file - spapr.c, or rename prereg.c to spapr.c and add = this=20 > >> stuff there? > >=20 > > prereg.c is already spapr specific, so I'd rename it and potentially > > add this to it. >=20 >=20 > It would help if you two decided what I should do about prereg.c vs. > spapr.c - merge or keep 2 files. Thanks. I'm not particularly fussed either way. So I suggest putting the prereg stuff in spapr.c for now, and we can move it out if/when another platform starts using the mechanism. --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --6ti9qQdT93fLKTkp Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJXR8dsAAoJEGw4ysog2bOSe+YP/0aKC2CIkGGYVuenZ54LJ5LV j4q7PlTJK05QIq7fvsdvQVrdkJUEwkqqOBLaj7dC+3ldyt94dvj+aGwq8iwQNfpf B8zORXBMrBJ/w6D7yxd1CZkP42OgjeZWRKU392ZqVmK+Dg15LvHVb22IOna9EVBa sJkAJVkVnEa4C4PfOeVFyKg+kzu16MKC7n97Q8LVruEXIS4nO4oqIwMyWsfYWImg kdtMf6Z9jhQP7UmwBoc4PySKv/XYx7J0x1eWRywlKfqYzMgCfeSWKq/arzPgN60g OMIj9FGpwPcaJHTRzQB7oAp0eWGdZvfHZ9GEEC9Sv+mdmWmFlliPEZZlzT7ztlvB 9sUyTSznsDQ+DDqng3vEPIomOEtnGUr97fAnGngJT+GTw7HJcB+zOR/sgALNzp8K 3gzrU+CSEmv3Qa8sXbMdBy5obiFmcZ7eNmI3iBSm7EFIsnAUYW6xdxU+MQ618Y3b FhNQ+NZEMvejm1Tg1mxmm1aZnovTaOn9cPOhwJ8oekhyRcA2pvly9P+djdhMLTNA G7EpOdaBVd2Ch0ejEP/kyLpPI6o5/sPhyckTshr3nOpiCMD27imlJrhaMiZx4n7l zMVQlhVs5/lBJIsWIIp/W96D3USOru8tamG7wiSCZbmLe+4zjQMy/gWfJ/tyUfqX NCLP/AY53y4r5B+sb29w =78ZK -----END PGP SIGNATURE----- --6ti9qQdT93fLKTkp--