> On Mar 18, 2021, at 2:25 AM, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote: > > > >> -----Original Message----- >> From: Tian, Kevin [mailto:kevin.tian@intel.com] >> Sent: Thursday, March 18, 2021 4:56 PM >> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) >> ; Nadav Amit >> Cc: chenjiashang ; David Woodhouse >> ; iommu@lists.linux-foundation.org; LKML >> ; alex.williamson@redhat.com; Gonglei (Arei) >> ; will@kernel.org >> Subject: RE: A problem of Intel IOMMU hardware ? >> >>> From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) >>> >>> >>>> -----Original Message----- >>>> From: Tian, Kevin [mailto:kevin.tian@intel.com] >>>> Sent: Thursday, March 18, 2021 4:27 PM >>>> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) >>>> ; Nadav Amit >>>> Cc: chenjiashang ; David Woodhouse >>>> ; iommu@lists.linux-foundation.org; LKML >>>> ; alex.williamson@redhat.com; Gonglei >>> (Arei) >>>> ; will@kernel.org >>>> Subject: RE: A problem of Intel IOMMU hardware ? >>>> >>>>> From: iommu On Behalf >>>>> Of Longpeng (Mike, Cloud Infrastructure Service Product Dept.) >>>>> >>>>>> 2. Consider ensuring that the problem is not somehow related to >>>>>> queued invalidations. Try to use __iommu_flush_iotlb() instead >>>>>> of >>>> qi_flush_iotlb(). >>>>>> >>>>> >>>>> I tried to force to use __iommu_flush_iotlb(), but maybe something >>>>> wrong, the system crashed, so I prefer to lower the priority of >>>>> this >>> operation. >>>>> >>>> >>>> The VT-d spec clearly says that register-based invalidation can be >>>> used only >>> when >>>> queued-invalidations are not enabled. Intel-IOMMU driver doesn't >>>> provide >>> an >>>> option to disable queued-invalidation though, when the hardware is >>> capable. If you >>>> really want to try, tweak the code in intel_iommu_init_qi. >>>> >>> >>> Hi Kevin, >>> >>> Thanks to point out this. Do you have any ideas about this problem ? I >>> tried to descript the problem much clear in my reply to Alex, hope you >>> could have a look if you're interested. >>> >> >> btw I saw you used 4.18 kernel in this test. What about latest kernel? >> > > Not test yet. It's hard to upgrade kernel in our environment. > >> Also one way to separate sw/hw bug is to trace the low level interface (e.g., >> qi_flush_iotlb) which actually sends invalidation descriptors to the IOMMU >> hardware. Check the window between b) and c) and see whether the software does >> the right thing as expected there. >> > > We add some log in iommu driver these days, the software seems fine. But we > didn't look inside the qi_submit_sync yet, I'll try it tonight. So here is my guess: Intel probably used as a basis for the IOTLB an implementation of some other (regular) TLB design. Intel SDM says regarding TLBs (4.10.4.2 “Recommended Invalidation”): "Software wishing to prevent this uncertainty should not write to a paging-structure entry in a way that would change, for any linear address, both the page size and either the page frame, access rights, or other attributes.” Now the aforementioned uncertainty is a bit different (multiple *valid* translations of a single address). Yet, perhaps this is yet another thing that might happen. From a brief look on the handling of MMU (not IOMMU) hugepages in Linux, indeed the PMD is first cleared and flushed before a new valid PMD is set. This is possible for MMUs since they allow the software to handle spurious page-faults gracefully. This is not the case for the IOMMU though (without PRI). Not sure this explains everything though. If that is the problem, then during a mapping that changes page-sizes, a TLB flush is needed, similarly to the one Longpeng did manually.