From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60085) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bwPNM-0007Yh-6M for qemu-devel@nongnu.org; Tue, 18 Oct 2016 04:04:02 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bwPNI-0007zP-51 for qemu-devel@nongnu.org; Tue, 18 Oct 2016 04:03:56 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56274) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1bwPNH-0007ys-TH for qemu-devel@nongnu.org; Tue, 18 Oct 2016 04:03:52 -0400 Date: Tue, 18 Oct 2016 02:03:48 -0600 From: Alex Williamson Message-ID: <20161018020348.776d3014@t450s.home> In-Reply-To: <20161018055204.GH25390@umbus.fritz.box> References: <1476719064-9242-1-git-send-email-bd.aviv@gmail.com> <20161017100736.68a56fd9@t450s.home> <20161018040655.GG25390@umbus.fritz.box> <20161017224702.53301858@t450s.home> <20161018055204.GH25390@umbus.fritz.box> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v4 RESEND 0/3] IOMMU: intel_iommu support map and unmap notifications List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: "Aviv B.D" , Jan Kiszka , qemu-devel@nongnu.org, Peter Xu , "Michael S. Tsirkin" On Tue, 18 Oct 2016 16:52:04 +1100 David Gibson wrote: > On Mon, Oct 17, 2016 at 10:47:02PM -0600, Alex Williamson wrote: > > On Tue, 18 Oct 2016 15:06:55 +1100 > > David Gibson wrote: > > > > > On Mon, Oct 17, 2016 at 10:07:36AM -0600, Alex Williamson wrote: > > > > On Mon, 17 Oct 2016 18:44:21 +0300 > > > > "Aviv B.D" wrote: > > > > > > > > > From: "Aviv Ben-David" > > > > > > > > > > * Advertize Cache Mode capability in iommu cap register. > > > > > This capability is controlled by "cache-mode" property of intel-iommu device. > > > > > To enable this option call QEMU with "-device intel-iommu,cache-mode=true". > > > > > > > > > > * On page cache invalidation in intel vIOMMU, check if the domain belong to > > > > > registered notifier, and notify accordingly. > > > > > > > > > > Currently this patch still doesn't enabling VFIO devices support with vIOMMU > > > > > present. Current problems: > > > > > * vfio_iommu_map_notify is not aware about memory range belong to specific > > > > > VFIOGuestIOMMU. > > > > > > > > Could you elaborate on why this is an issue? > > > > > > > > > * memory_region_iommu_replay hangs QEMU on start up while it itterate over > > > > > 64bit address space. Commenting out the call to this function enables > > > > > workable VFIO device while vIOMMU present. > > > > > > > > This has been discussed previously, it would be incorrect for vfio not > > > > to call the replay function. The solution is to add an iommu driver > > > > callback to efficiently walk the mappings within a MemoryRegion. > > > > > > Right, replay is a bit of a hack. There are a couple of other > > > approaches that might be adequate without a new callback: > > > - Make the VFIOGuestIOMMU aware of the guest address range mapped > > > by the vIOMMU. Intel currently advertises that as a full 64-bit > > > address space, but I bet that's not actually true in practice. > > > - Have the IOMMU MR advertise a (minimum) page size for vIOMMU > > > mappings. That may let you stpe through the range with greater > > > strides > > > > Hmm, VT-d supports at least a 39-bit address width and always supports > > a minimum 4k page size, so yes that does reduce us from 2^52 steps down > > to 2^27, > > Right, which is probably doable, if not ideal > > > but it's still absurd to walk through the raw address space. > > Well.. it depends on the internal structure of the IOMMU. For Power, > it's traditionally just a 1-level page table, so we can't actually do > any better than stepping through each IOMMU page. Intel always has a least a 3-level page table AIUI. > > It does however seem correct to create the MemoryRegion with a width > > that actually matches the IOMMU capability, but I don't think that's a > > sufficient fix by itself. Thanks, > > I suspect it would actually make it workable in the short term. > > But I don't disagree that a "traverse" or "replay" callback of some > sort in the iommu_ops is a better idea long term. Having a fallback > to the current replay implementation if the callback isn't supplied > seems pretty reasonable though. Exactly, the callback could be optional where IOMMUs that supply a relatively small IOVA window could fallback to the code we have today. Thanks, Alex