From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43422) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1agVw9-0000KI-8A for qemu-devel@nongnu.org; Thu, 17 Mar 2016 07:17:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1agVw7-00016T-FX for qemu-devel@nongnu.org; Thu, 17 Mar 2016 07:17:53 -0400 Received: from mail-lf0-x230.google.com ([2a00:1450:4010:c07::230]:33515) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1agVw6-00016F-Uz for qemu-devel@nongnu.org; Thu, 17 Mar 2016 07:17:51 -0400 Received: by mail-lf0-x230.google.com with SMTP id h198so39836131lfh.0 for ; Thu, 17 Mar 2016 04:17:50 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20160315085236.GA23495@pxdev.xzpeter.org> References: <56E70871.3050305@gmail.com> <20160315085236.GA23495@pxdev.xzpeter.org> From: "Aviv B.D." Date: Thu, 17 Mar 2016 13:17:30 +0200 Message-ID: Content-Type: multipart/alternative; boundary=001a114116bce06be0052e3cc78d Subject: Re: [Qemu-devel] [PATCH][RFC] IOMMU: Add Support to VFIO devices with vIOMMU present List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Xu Cc: marcel@redhat.com, Jan Kiszka , Alex Williamson , qemu-devel@nongnu.org, "Michael S. Tsirkin" --001a114116bce06be0052e3cc78d Content-Type: text/plain; charset=UTF-8 On Tue, Mar 15, 2016 at 10:52 AM, Peter Xu wrote: > On Mon, Mar 14, 2016 at 08:52:33PM +0200, Marcel Apfelbaum wrote: > > On 03/12/2016 06:13 PM, Aviv B.D. wrote: > > Adding (possibly) interested developers to the thread. > > Thanks CC. > > Hi, Aviv, several questions inline. > > [...] > > > >@@ -1020,14 +1037,53 @@ static void > vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, > > > hwaddr addr, uint8_t am) > > > { > > > VTDIOTLBPageInvInfo info; > > >+ VFIOGuestIOMMU * giommu; > > >+ bool flag = false; > > > assert(am <= VTD_MAMV); > > > info.domain_id = domain_id; > > > info.addr = addr; > > > info.mask = ~((1 << am) - 1); > > >+ > > >+ QLIST_FOREACH(giommu, &(s->giommu_list), iommu_next){ > > >+ VTDAddressSpace *vtd_as = container_of(giommu->iommu, > VTDAddressSpace, iommu); > > >+ uint16_t vfio_source_id = > vtd_make_source_id(pci_bus_num(vtd_as->bus), vtd_as->devfn); > > >+ uint16_t vfio_domain_id = vtd_get_did_dev(s, > pci_bus_num(vtd_as->bus), vtd_as->devfn); > > >+ if (vfio_domain_id != (uint16_t)-1 && > > Could you (or anyone) help explain what does vfio_domain_id != -1 > mean? vtd_get_did_dev returns -1 if the device is not mapped to any domain (generally, the CE is not present). probably a better interface is to return whether the device has a domain or not and returns the domain_id via the pointer argument. > > > >+ domain_id == vfio_domain_id){ > > >+ VTDIOTLBEntry *iotlb_entry = vtd_lookup_iotlb(s, > vfio_source_id, addr); > > >+ if (iotlb_entry != NULL){ > > Here, shall we notify VFIO even if the address is not cached in > IOTLB? Anyway, we need to do the unmap() of the address, am I > correct? > With this code I do a unmap operation if the address was cached in the IOTLB, if not I'm assuming that the current invalidation invalidate an (previously) non present address and I should do a map operation (during the map operation I'm calling s->iommu_ops.translate that caches the address). > > > >+ IOMMUTLBEntry entry; > > >+ VTD_DPRINTF(GENERAL, "Remove addr 0x%"PRIx64 " mask > %d", addr, am); > > >+ entry.iova = addr & VTD_PAGE_MASK_4K; > > >+ entry.translated_addr = > vtd_get_slpte_addr(iotlb_entry->slpte) & VTD_PAGE_MASK_4K; > > >+ entry.addr_mask = ~VTD_PAGE_MASK_4K; > > >+ entry.perm = IOMMU_NONE; > > >+ memory_region_notify_iommu(giommu->iommu, entry); > > >+ flag = true; > > >+ > > >+ } > > >+ } > > >+ > > >+ } > > >+ > > > g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_page, > &info); > > >-} > > >+ QLIST_FOREACH(giommu, &(s->giommu_list), iommu_next){ > > >+ VTDAddressSpace *vtd_as = container_of(giommu->iommu, > VTDAddressSpace, iommu); > > >+ uint16_t vfio_domain_id = vtd_get_did_dev(s, > pci_bus_num(vtd_as->bus), vtd_as->devfn); > > >+ if (vfio_domain_id != (uint16_t)-1 && > > >+ domain_id == vfio_domain_id && !flag){ > > >+ /* do vfio map */ > > >+ VTD_DPRINTF(GENERAL, "add addr 0x%"PRIx64 " mask %d", > addr, am); > > >+ /* call to vtd_iommu_translate */ > > >+ IOMMUTLBEntry entry = > s->iommu_ops.translate(giommu->iommu, addr, 0); > > >+ entry.perm = IOMMU_RW; > > >+ memory_region_notify_iommu(giommu->iommu, entry); > > >+ //g_vfio_iommu->n.notify(&g_vfio_iommu->n, &entry); > > >+ } > > >+ } > > >+} > > I see that we did handled all the page invalidations. Would it > possible that guest kernel send domain/global invalidations? Should > we handle them too? > You are correct, currently this code is pretty much at POC level, and i support only the page invalidation because this is what linux is using. The final version should support also those invalidation ops. > > [...] > > > > static void vfio_listener_region_add(MemoryListener *listener, > > > MemoryRegionSection *section) > > > { > > >@@ -344,6 +347,7 @@ static void vfio_listener_region_add(MemoryListener > *listener, > > > iova = TARGET_PAGE_ALIGN(section->offset_within_address_space); > > > llend = int128_make64(section->offset_within_address_space); > > > llend = int128_add(llend, section->size); > > >+ llend = int128_add(llend, int128_exts64(-1)); > > Here, -1 should fix the assertion core dump. However, shall we also > handle all the rest places that used "llend" (possibly with variable > "end") too? For example, at the end of current function, when we map > dma regions: > > ret = vfio_dma_map(container, iova, end - iova, vaddr, > section->readonly); > > To: > > ret = vfio_dma_map(container, iova, end + 1 - iova, vaddr, > section->readonly); > > I will add this to the next version of the patch, thanks! Thanks. > Peter > --001a114116bce06be0052e3cc78d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Tue, Mar 15, 2016 at 10:52 AM, Peter Xu <peterx@redhat.com><= /span> wrote:
On Mon, Mar 14, 2016 at = 08:52:33PM +0200, Marcel Apfelbaum wrote:
> On 03/12/2016 06:13 PM, Aviv B.D. wrote:
> Adding (possibly) interested developers to the= thread.

Thanks CC.

Hi, Aviv, several questions inline.

[...]

> >@@ -1020,14 +1037,53 @@ static void vtd_iotlb_page_invalidate(Inte= lIOMMUState *s, uint16_t domain_id,
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 h= waddr addr, uint8_t am)
> >=C2=A0 {
> >=C2=A0 =C2=A0 =C2=A0 VTDIOTLBPageInvInfo info;
> >+=C2=A0 =C2=A0 VFIOGuestIOMMU * giommu;
> >+=C2=A0 =C2=A0 bool flag =3D false;
> >=C2=A0 =C2=A0 =C2=A0 assert(am <=3D VTD_MAMV);
> >=C2=A0 =C2=A0 =C2=A0 info.domain_id =3D domain_id;
> >=C2=A0 =C2=A0 =C2=A0 info.addr =3D addr;
> >=C2=A0 =C2=A0 =C2=A0 info.mask =3D ~((1 << am) - 1);
> >+
> >+=C2=A0 =C2=A0 QLIST_FOREACH(giommu, &(s->giommu_list), iom= mu_next){
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 VTDAddressSpace *vtd_as =3D container= _of(giommu->iommu, VTDAddressSpace, iommu);
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 uint16_t vfio_source_id =3D vtd_make_= source_id(pci_bus_num(vtd_as->bus), vtd_as->devfn);
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 uint16_t vfio_domain_id =3D vtd_get_d= id_dev(s, pci_bus_num(vtd_as->bus), vtd_as->devfn);
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (vfio_domain_id !=3D (uint16_t)-1 = &&

Could you (or anyone) help explain what does vfio_domain_id !=3D -1<= br> mean?

vtd_get_did_dev returns -1 if the dev= ice is not mapped to any domain (generally, the CE is not present).
probably a better interface is to return whether the device has a do= main or not and returns the domain_id via the pointer argument.
= =C2=A0

> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 domain_id= =3D=3D vfio_domain_id){
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 VTDIOTLBEntry *iotlb_en= try =3D vtd_lookup_iotlb(s, vfio_source_id, addr);
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (iotlb_entry !=3D NU= LL){

Here, shall we notify VFIO even if the address is not cached in
IOTLB? Anyway, we need to do the unmap() of the address, am I
correct?
With this code I do a unmap operation if the = address was cached in the IOTLB, if not I'm assuming that the current i= nvalidation invalidate an (previously) non present address and I should do = a map operation (during the map operation I'm calling=C2=A0s->iommu_= ops.translate that caches the address). =C2=A0

> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 IOMMUTLBE= ntry entry;
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 VTD_DPRIN= TF(GENERAL, "Remove addr 0x%"PRIx64 " mask %d", addr, a= m);
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 entry.iov= a =3D addr & VTD_PAGE_MASK_4K;
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 entry.tra= nslated_addr =3D vtd_get_slpte_addr(iotlb_entry->slpte) & VTD_PAGE_M= ASK_4K;
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 entry.add= r_mask =3D ~VTD_PAGE_MASK_4K;
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 entry.per= m =3D IOMMU_NONE;
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 memory_re= gion_notify_iommu(giommu->iommu, entry);
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 flag =3D = true;
> >+
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
> >+
> >+=C2=A0 =C2=A0 }
> >+
> >=C2=A0 =C2=A0 =C2=A0 g_hash_table_foreach_remove(s->iotlb, vtd_= hash_remove_by_page, &info);
> >-}
> >+=C2=A0 =C2=A0 QLIST_FOREACH(giommu, &(s->giommu_list), iom= mu_next){
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 VTDAddressSpace *vtd_as =3D container= _of(giommu->iommu, VTDAddressSpace, iommu);
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 uint16_t vfio_domain_id =3D vtd_get_d= id_dev(s, pci_bus_num(vtd_as->bus), vtd_as->devfn);
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (vfio_domain_id !=3D (uint16_t)-1 = &&
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 domain_id= =3D=3D vfio_domain_id && !flag){
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* do vfio map */
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 VTD_DPRINTF(GENERAL, &q= uot;add addr 0x%"PRIx64 " mask %d", addr, am);
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 /* call to vtd_iommu_tr= anslate */
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 IOMMUTLBEntry entry =3D= s->iommu_ops.translate(giommu->iommu, addr, 0);
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 entry.perm =3D IOMMU_RW= ;
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 memory_region_notify_io= mmu(giommu->iommu, entry);
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 //g_vfio_iommu->n.no= tify(&g_vfio_iommu->n, &entry);
> >+=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
> >+=C2=A0 =C2=A0 }
> >+}

I see that we did handled all the page invalidations. Would it<= br> possible that guest kernel send domain/global invalidations? Should
we handle them too?

You are correct, cu= rrently this code is pretty much at POC level, and i support only the page = invalidation because this is what linux is using. The final version should = support also those invalidation ops.

[...]

> >=C2=A0 static void vfio_listener_region_add(MemoryListener *listen= er,
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Me= moryRegionSection *section)
> >=C2=A0 {
> >@@ -344,6 +347,7 @@ static void vfio_listener_region_add(MemoryLis= tener *listener,
> >=C2=A0 =C2=A0 =C2=A0 iova =3D TARGET_PAGE_ALIGN(section->offset= _within_address_space);
> >=C2=A0 =C2=A0 =C2=A0 llend =3D int128_make64(section->offset_wi= thin_address_space);
> >=C2=A0 =C2=A0 =C2=A0 llend =3D int128_add(llend, section->size)= ;
> >+=C2=A0 =C2=A0 llend =3D int128_add(llend, int128_exts64(-1));

Here, -1 should fix the assertion core dump. However, shall we also<= br> handle all the rest places that used "llend" (possibly with varia= ble
"end") too? For example, at the end of current function, when we = map
dma regions:

=C2=A0 =C2=A0 ret =3D vfio_dma_map(container, iova, end - iova, vaddr, sect= ion->readonly);

To:

=C2=A0 =C2=A0 ret =3D vfio_dma_map(container, iova, end + 1 - iova, vaddr, = section->readonly);

I will add this to the next version of the patch, tha= nks!=C2=A0

Thanks.
Peter

--001a114116bce06be0052e3cc78d--