From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43853) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cTlj8-0000HO-5y for qemu-devel@nongnu.org; Wed, 18 Jan 2017 03:36:19 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cTlj4-0006yt-87 for qemu-devel@nongnu.org; Wed, 18 Jan 2017 03:36:18 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37426) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cTlj3-0006xJ-Uz for qemu-devel@nongnu.org; Wed, 18 Jan 2017 03:36:14 -0500 References: <1484276800-26814-1-git-send-email-peterx@redhat.com> <1484276800-26814-15-git-send-email-peterx@redhat.com> <20170116091801.GL30108@pxdev.xzpeter.org> <0eedf780-16b2-ca5c-8eea-5d41e6837f23@redhat.com> <20170117144514.GO30108@pxdev.xzpeter.org> <99e185dc-e3c9-678f-7a69-fc006658d07f@redhat.com> <20170118081137.GS30108@pxdev.xzpeter.org> From: Jason Wang Message-ID: Date: Wed, 18 Jan 2017 16:36:05 +0800 MIME-Version: 1.0 In-Reply-To: <20170118081137.GS30108@pxdev.xzpeter.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH RFC v3 14/14] intel_iommu: enable vfio devices List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Xu Cc: qemu-devel@nongnu.org, tianyu.lan@intel.com, kevin.tian@intel.com, mst@redhat.com, jan.kiszka@siemens.com, alex.williamson@redhat.com, bd.aviv@gmail.com On 2017=E5=B9=B401=E6=9C=8818=E6=97=A5 16:11, Peter Xu wrote: > On Wed, Jan 18, 2017 at 11:10:53AM +0800, Jason Wang wrote: >> >> On 2017=E5=B9=B401=E6=9C=8817=E6=97=A5 22:45, Peter Xu wrote: >>> On Mon, Jan 16, 2017 at 05:54:55PM +0800, Jason Wang wrote: >>>> On 2017=E5=B9=B401=E6=9C=8816=E6=97=A5 17:18, Peter Xu wrote: >>>>>>> static void vtd_iotlb_page_invalidate(IntelIOMMUState *s, uint1= 6_t domain_id, >>>>>>> hwaddr addr, uint8_t am) >>>>>>> { >>>>>>> @@ -1222,6 +1251,7 @@ static void vtd_iotlb_page_invalidate(Intel= IOMMUState *s, uint16_t domain_id, >>>>>>> info.addr =3D addr; >>>>>>> info.mask =3D ~((1 << am) - 1); >>>>>>> g_hash_table_foreach_remove(s->iotlb, vtd_hash_remove_by_pa= ge, &info); >>>>>>> + vtd_iotlb_page_invalidate_notify(s, domain_id, addr, am); >>>>>> Is the case of GLOBAL or DSI flush missed, or we don't care it at = all? >>>>> IMHO we don't. For device assignment, since we are having CM=3D1 he= re, >>>>> we should have explicit page invalidations even if guest sends >>>>> global/domain invalidations. >>>>> >>>>> Thanks, >>>>> >>>>> -- peterx >>>> Is this spec required? >>> I think not. IMO the spec is very coarse grained on describing cache >>> mode... >>> >>>> Btw, it looks to me that both DSI and GLOBAL are >>>> indeed explicit flush. >>> Actually when cache mode is on, it is unclear to me on how we should >>> treat domain/global invalidations, at least from the spec (as >>> mentioned earlier). My understanding is that they are not "explicit >>> flushes", which IMHO should only mean page selective IOTLB >>> invalidations. >> Probably not, at least from the view of performance. DSI and global sh= ould >> be more efficient in some cases. > I agree with you that DSI/GLOBAL flushes are more efficient in some > way. But IMHO that does not mean these invalidations are "explicit > invalidations", and I suspect whether cache mode has to coop with it. Well, the spec does not forbid DSI/GLOBAL with CM and the driver codes=20 had used them for almost ten years. I can hardly believe it's wrong. > > But here I should add one more thing besides PSI - context entry > invalidation should be one of "the explicit invalidations" as well, > which we need to handle just like PSI when cache mode is on. > >>>> Just have a quick go through on driver codes and find this something >>>> interesting in intel_iommu_flush_iotlb_psi(): >>>> >>>> ... >>>> /* >>>> * Fallback to domain selective flush if no PSI support or the = size is >>>> * too big. >>>> * PSI requires page size to be 2 ^ x, and the base address is = naturally >>>> * aligned to the size >>>> */ >>>> if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iomm= u->cap)) >>>> iommu->flush.flush_iotlb(iommu, did, 0, 0, >>>> DMA_TLB_DSI_FLUSH); >>>> else >>>> iommu->flush.flush_iotlb(iommu, did, addr | ih, mask, >>>> DMA_TLB_PSI_FLUSH); >>>> ... >>> I think this is interesting... and I doubt its correctness while with >>> cache mode enabled. >>> >>> If so (sending domain invalidation instead of a big range of page >>> invalidations), how should we capture which pages are unmapped in >>> emulated IOMMU? >> We don't need to track individual pages here, since all pages for a sp= ecific >> domain were unmapped I believe? > IMHO this might not be the correct behavior. > > If we receive one domain specific invalidation, I agree that we should > invalidate the IOTLB cache for all the devices inside the domain. > However, when cache mode is on, we should be depending on the PSIs to > unmap each page (unless we want to unmap the whole address space, in > this case it's very possible that the guest is just unmapping a range, > not the entire space). If we convert several PSIs into one big DSI, > IMHO we will leave those pages mapped/unmapped while we should > unmap/map them. Confused, do you have an example for this? (I fail to understand why DSI=20 can't work, at least implementation can convert DSI to several PSIs=20 internally). Thanks > >>>> It looks like DSI_FLUSH is possible even for CM on. >>>> >>>> And in flush_unmaps(): >>>> >>>> ... >>>> /* In caching mode, global flushes turn emulation expensive= */ >>>> if (!cap_caching_mode(iommu->cap)) >>>> iommu->flush.flush_iotlb(iommu, 0, 0, 0, >>>> DMA_TLB_GLOBAL_FLUSH); >>>> ... >>>> >>>> If I understand the comments correctly, GLOBAL is ok for CM too (tho= ugh the >>>> code did not do it for performance reason). >>> I think it should be okay to send global flush for CM, but not sure >>> whether we should notify anything when we receive it. Hmm, anyway, I >>> think I need some more reading to make sure I understand the whole >>> thing correctly. :) >>> >>> For example, when I see this commit: >>> >>> commit 78d5f0f500e6ba8f6cfd0673475ff4d941d705a2 >>> Author: Nadav Amit >>> Date: Thu Apr 8 23:00:41 2010 +0300 >>> >>> intel-iommu: Avoid global flushes with caching mode. >>> While it may be efficient on real hardware, emulation of global >>> invalidations is very expensive as all shadow entries must be ex= amined. >>> This patch changes the behaviour when caching mode is enabled (w= hich is >>> the case when IOMMU emulation takes place). In this case, page s= pecific >>> invalidation is used instead. >>> >>> Before I ask someone else besides qemu-devel, I am curious about >>> whether there is existing VT-d emulation code (outside QEMU, of >>> course) so that I can have a reference? >> Yes, it has. The author of this patch - Nadav has done lots of researc= h on >> emulated IOMMU. See following papers: >> >> https://hal.inria.fr/inria-00493752/document >> http://www.cse.iitd.ac.in/~sbansal/csl862-virt/readings/vIOMMU.pdf > Thanks for these good materials. I will google the author for sure > next time. :) > > -- peterx