From mboxrd@z Thu Jan 1 00:00:00 1970 From: Quan Xu Subject: [PATCH v3 0/2] VT-d flush issue Date: Sat, 12 Dec 2015 21:21:46 +0800 Message-ID: <1449926508-72058-1-git-send-email-quan.xu@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: jbeulich@suse.com, kevin.tian@intel.com Cc: feng.wu@intel.com, eddie.dong@intel.com, george.dunlap@eu.citrix.com, andrew.cooper3@citrix.com, tim@xen.org, xen-devel@lists.xen.org, jun.nakajima@intel.com, Quan Xu , keir@xen.org List-Id: xen-devel@lists.xenproject.org This patches are based on Kevin Tian's previous discussion 'Revisit VT-d asynchronous flush issue'. Fix current timeout concern and also allow limited ATS support in a light way: 1. Reduce spin timeout to 1ms, which can be boot-time changed with 'iommu_qi_timeout_ms'. For example: multiboot /boot/xen.gz ats=1 iommu_qi_timeout_ms=100 2. Fix vt-d flush timeout issue. If IOTLB/Context/IETC flush is timeout, we should think all devices under this IOMMU cannot function correctly. So for each device under this IOMMU we'll mark it as unassignable and kill the domain owning the device. If Device-TLB flush is timeout, we'll mark the target ATS device as unassignable and kill the domain owning this device. If impacted domain is hardware domain, just throw out a warning. It's an open here whether we want to kill hardware domain (or directly panic hypervisor). Comments are welcomed. Device marked as unassignable will be disallowed to be further assigned to any domain. -- * DMAR_OPERATION_TIMEOUT should be also chopped down to a low number of milliseconds. As Kevin Tian mentioned in 'Revisit VT-d asynchronous flush issue', We also confirmed with hardware team that 1ms is large enough for IOMMU internal flush. So I can change DMAR_OPERATION_TIMEOUT from 1000 ms to 1 ms. IOMMU_WAIT_OP() is only for VT-d registers read/write, and there is also a panic. We need a further discussion whether or how to remove this panic in next patch set. * Kevin Tian did basic functional review. --Changes in v3: 1. Once found you can break the for loop immediately since BDF is unique. 2. Separate invalidate_timeout(struct iommu *iommu, int type, u16 did, u16 seg, u8 bus, u8 devfn) into invalidate_timeout(struct iommu *iommu) and device_tlb_invalidate_timeout(struct iommu *iommu, u16 did, u16 seg, u8 bus, u8 devfn). invalidate_timeout() is for iotlb/iec/context flush error. device_tlb_invalidate_timeout() is for Device-TLB flush error. then ignore these INVALID_* parameters. 3. Change macros(mark_pdev_unassignable/IS_PDEV_UNASSIGNABLE) into static inline functions. Quan Xu (2): VT-d: Reduce spin timeout to 1ms, which can be boot-time changed. VT-d: Fix vt-d flush timeout issue. xen/drivers/passthrough/vtd/extern.h | 5 ++ xen/drivers/passthrough/vtd/iommu.c | 6 +++ xen/drivers/passthrough/vtd/qinval.c | 89 +++++++++++++++++++++++++++++++++-- xen/drivers/passthrough/vtd/x86/ats.c | 16 +++++++ xen/include/xen/pci.h | 11 +++++ 5 files changed, 123 insertions(+), 4 deletions(-) -- 1.8.1.2