From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38495) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ctos8-00008D-UK for qemu-devel@nongnu.org; Fri, 31 Mar 2017 01:13:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ctos5-0006NX-Nr for qemu-devel@nongnu.org; Fri, 31 Mar 2017 01:13:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36422) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ctos5-0006Ld-Eo for qemu-devel@nongnu.org; Fri, 31 Mar 2017 01:13:13 -0400 References: <1486456099-7345-1-git-send-email-peterx@redhat.com> <1486456099-7345-15-git-send-email-peterx@redhat.com> <20170327091208.GG11497@pxdev.xzpeter.org> <9c3cda64-b4a3-6f9b-f951-bf73f6613faa@redhat.com> <20170331025618.GB3981@pxdev.xzpeter.org> <23619b9a-b671-75ff-ffc5-01a61ea9d8c5@redhat.com> <20170331050101.GF3981@pxdev.xzpeter.org> From: Jason Wang Message-ID: <9a5bfd93-2d61-96c8-7a95-bccb5a0c819d@redhat.com> Date: Fri, 31 Mar 2017 13:12:56 +0800 MIME-Version: 1.0 In-Reply-To: <20170331050101.GF3981@pxdev.xzpeter.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v7 14/17] memory: add MemoryRegionIOMMUOps.replay() callback List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Xu Cc: "Liu, Yi L" , "'alex.williamson@redhat.com'" , "Lan, Tianyu" , "Tian, Kevin" , "'mst@redhat.com'" , "'jan.kiszka@siemens.com'" , "'bd.aviv@gmail.com'" , 'David Gibson' , "'qemu-devel@nongnu.org'" On 2017=E5=B9=B403=E6=9C=8831=E6=97=A5 13:01, Peter Xu wrote: > On Fri, Mar 31, 2017 at 12:21:23PM +0800, Jason Wang wrote: >> >> On 2017=E5=B9=B403=E6=9C=8831=E6=97=A5 10:56, Peter Xu wrote: >>>>> Just come to mind that there may be a corner case here. >>>>> >>>>> Intel VT-d actually has a "pt" mode which allows device use physica= l address >>>>> even when VT-d is enabled. In kernel, there is a iommu_identity_map= ping. >>>>> If a device is in this map, then it would use "pt" mode. So that IO= MMU driver >>>>> would not build second-level page table for it. >>>> Yes, but qemu does not support ECAP_PT now, so guest will still have= a page >>>> table in this case. >>>> >>>>> Back to the virtual IOVA implementation, if an assigned device is i= n the >>>>> iommu_identity_mapping(e.g. VGA controller), it uses GPA directly t= o do DMA. >>>>> So it demands a GPA->HPA mapping in host. However, the iommu->ops.r= eplay >>>>> is not able to build it when guest SL page table is empty. >>>>> >>>>> So I think building an entire guest PA->HPA mapping before guest ke= rnel boot >>>>> would be recommended. Any thoughts? >>>> We plan to add PT in 2.10, a possible rough idea is disabled iommu d= mar >>>> region and use another region without iommu_ops. Then >>>> vfio_listener_region_add() will just do the correct mappings. >>> Even without any new region. With the patch 16/17 ("intel_iommu: allo= w >>> dynamic switch of IOMMU region"), we can just turn the IOMMU region >>> on/off, following the device's PT bit, maybe using the new >>> vtd_switch_address_space() interface. That should be enough. >> Right. For vhost it was probably need more works, e.g setting up stati= c >> mappings during region_add(). > Do we need to? Not a must if we don't care about performance. > > VFIO will need it for building up shadow page table, even without a > vIOMMU. But imho that should not be needed by vhost, right? Device IOTLB will be enabled unconditionally if iommu_platform is=20 specified. If we don't set static mappings, vhost will send IOTLB miss=20 request. The performance will be horrible in this case. Thanks > > -- peterx