linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v12 0/5] fix vt-d hard lockup when hotplug ATS capable device
@ 2024-01-29  3:49 Ethan Zhao
  2024-01-29  3:49 ` [PATCH v12 1/5] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
                   ` (5 more replies)
  0 siblings, 6 replies; 39+ messages in thread
From: Ethan Zhao @ 2024-01-29  3:49 UTC (permalink / raw)
  To: baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: kevin.tian, dwmw2, will, lukas, yi.l.liu, iommu, linux-kernel,
	linux-pci, Ethan Zhao

This patchset is used to fix vt-d hard lockup reported when surprise
unplug ATS capable endpoint device connects to system via PCIe switch
as following topology.

     +-[0000:15]-+-00.0  Intel Corporation Ice Lake Memory Map/VT-d
     |           +-00.1  Intel Corporation Ice Lake Mesh 2 PCIe
     |           +-00.2  Intel Corporation Ice Lake RAS
     |           +-00.4  Intel Corporation Device 0b23
     |           \-01.0-[16-1b]----00.0-[17-1b]--+-00.0-[18]----00.0
                                           NVIDIA Corporation Device 2324
     |                                           +-01.0-[19]----00.0
                          Mellanox Technologies MT2910 Family [ConnectX-7]

User brought endpoint device 19:00.0's link down by flapping it's hotplug
capable slot 17:01.0 link control register, as sequence DLLSC response,
pciehp_ist() will unload device driver and power it off, durning device
driver is unloading an iommu device-TLB invalidation (Intel VT-d spec, or
'ATS Invalidation' in PCIe spec) request issued to that link down device,
thus a long time completion/timeout waiting in interrupt context causes
continuous hard lockup warnning and system hang.

Other detail, see every patch commit log.

patch [1&2] were tested by yehaorong@bytedance.com on stable v6.7-rc4.
patch [1-5] passed compiling on stable v6.8-rc2.

change log:
v12: 
- use base-commit tag to format patch.
- fix building issue on v6.8-rc2 repported by lkp@intel.com.
v11:
- update per latest comment and suggestion from Baolu and YiLiu.
- split refactoring patch into two patches, [3/5] for simplify parameters
  and [4/5] for pdev parameter passing.
- re-order patches.
- fold target device presence check into qi_check_fault().
- combine patch[2][5] in v10 into one patch[5].
- some commit description correctness.
- add fixes tag to patch[2/5].
- rebased on 6.8rc1
v10:
- refactor qi_submit_sync() and its callers to get pci_dev instance, as
  Kevin pointed out add target_flush_dev to iommu is not right.
v9:
- unify all spelling of ATS Invalidation adhere to PCIe spec per Bjorn's
  suggestion.
v8:
- add a patch to break the loop for timeout device-TLB invalidation, as
  Bjorn said there is possibility device just no response but not gone.
v7:
- reorder patches and revise commit log per Bjorn's guide.
- other code and commit log revise per Lukas' suggestion.
- rebased to stable v6.7-rc6.
v6:
- add two patches to break out device-TLB invalidation if device is gone.
v5:
- add a patch try to fix the rare case (surprise remove a device in
  safe removal process). not work because surprise removal handling can't
  re-enter when another safe removal is in process.
v4:
- move the PCI device state checking after ATS per Baolu's suggestion.
v3:
- fix commit description typo.
v2:
- revise commit[1] description part according to Lukas' suggestion.
- revise commit[2] description to clarify the issue's impact.
v1:
- https://lore.kernel.org/lkml/20231213034637.2603013-1-haifeng.zhao@
linux.intel.com/T/


Thanks,
Ethan

 


Ethan Zhao (5):
  PCI: make pci_dev_is_disconnected() helper public for other drivers
  iommu/vt-d: don't issue ATS Invalidation request when device is
    disconnected
  iommu/vt-d: simplify parameters of qi_submit_sync() ATS invalidation
    callers
  iommu/vt-d: pass pdev parameter for qi_check_fault() and refactor
    callers
  iommu/vt-d: improve ITE fault handling if target device isn't present

 drivers/iommu/intel/dmar.c          | 71 +++++++++++++++++++++++------
 drivers/iommu/intel/iommu.c         | 29 +++---------
 drivers/iommu/intel/iommu.h         | 20 ++++----
 drivers/iommu/intel/irq_remapping.c |  2 +-
 drivers/iommu/intel/nested.c        |  9 +---
 drivers/iommu/intel/pasid.c         | 12 ++---
 drivers/iommu/intel/svm.c           | 23 +++++-----
 drivers/pci/pci.h                   |  5 --
 include/linux/pci.h                 |  5 ++
 9 files changed, 98 insertions(+), 78 deletions(-)


base-commit: 41bccc98fb7931d63d03f326a746ac4d429c1dd3
-- 
2.31.1


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v12 1/5] PCI: make pci_dev_is_disconnected() helper public for other drivers
  2024-01-29  3:49 [PATCH v12 0/5] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
@ 2024-01-29  3:49 ` Ethan Zhao
  2024-01-29  8:50   ` Tian, Kevin
  2024-01-29  3:49 ` [PATCH v12 2/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected Ethan Zhao
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 39+ messages in thread
From: Ethan Zhao @ 2024-01-29  3:49 UTC (permalink / raw)
  To: baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: kevin.tian, dwmw2, will, lukas, yi.l.liu, iommu, linux-kernel,
	linux-pci, Ethan Zhao, Haorong Ye

Make pci_dev_is_disconnected() public so that it can be called from
Intel VT-d driver to quickly fix/workaround the surprise removal
unplug hang issue for those ATS capable devices on PCIe switch downstream
hotplug capable ports.

Beside pci_device_is_present() function, this one has no config space
space access, so is light enough to optimize the normal pure surprise
removal and safe removal flow.

Tested-by: Haorong Ye <yehaorong@bytedance.com>
Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
---
 drivers/pci/pci.h   | 5 -----
 include/linux/pci.h | 5 +++++
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 2336a8d1edab..94118c4cff54 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -368,11 +368,6 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
 	return 0;
 }
 
-static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
-{
-	return dev->error_state == pci_channel_io_perm_failure;
-}
-
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
 #define PCI_DPC_RECOVERED 1
diff --git a/include/linux/pci.h b/include/linux/pci.h
index add9368e6314..19dfe5f16fc9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2512,6 +2512,11 @@ static inline struct pci_dev *pcie_find_root_port(struct pci_dev *dev)
 	return NULL;
 }
 
+static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
+{
+	return dev->error_state == pci_channel_io_perm_failure;
+}
+
 void pci_request_acs(void);
 bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
 bool pci_acs_path_enabled(struct pci_dev *start,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v12 2/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected
  2024-01-29  3:49 [PATCH v12 0/5] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
  2024-01-29  3:49 ` [PATCH v12 1/5] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
@ 2024-01-29  3:49 ` Ethan Zhao
  2024-01-29  8:53   ` Tian, Kevin
  2024-01-29  9:32   ` Yi Liu
  2024-01-29  3:49 ` [PATCH v12 3/5] iommu/vt-d: simplify parameters of qi_submit_sync() ATS invalidation callers Ethan Zhao
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 39+ messages in thread
From: Ethan Zhao @ 2024-01-29  3:49 UTC (permalink / raw)
  To: baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: kevin.tian, dwmw2, will, lukas, yi.l.liu, iommu, linux-kernel,
	linux-pci, Ethan Zhao, Haorong Ye

For those endpoint devices connect to system via hotplug capable ports,
users could request a hot reset to the device by flapping device's link
through setting the slot's link control register, as pciehp_ist() DLLSC
interrupt sequence response, pciehp will unload the device driver and
then power it off. thus cause an IOMMU device-TLB invalidation (Intel
VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence
target device to be sent and deadly loop to retry that request after ITE
fault triggered in interrupt context.

That would cause following continuous hard lockup warning and system hang

[ 4211.433662] pcieport 0000:17:01.0: pciehp: Slot(108): Link Down
[ 4211.433664] pcieport 0000:17:01.0: pciehp: Slot(108): Card not present
[ 4223.822591] NMI watchdog: Watchdog detected hard LOCKUP on cpu 144
[ 4223.822622] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S
         OE    kernel version xxxx
[ 4223.822623] Hardware name: vendorname xxxx 666-106,
BIOS 01.01.02.03.01 05/15/2023
[ 4223.822623] RIP: 0010:qi_submit_sync+0x2c0/0x490
[ 4223.822624] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b
 57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 1
0 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39
[ 4223.822624] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093
[ 4223.822625] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005
[ 4223.822625] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340
[ 4223.822625] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000
[ 4223.822626] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200
[ 4223.822626] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004
[ 4223.822626] FS:  0000000000000000(0000) GS:ffffa237ae400000(0000)
knlGS:0000000000000000
[ 4223.822627] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4223.822627] CR2: 00007ffe86515d80 CR3: 000002fd3000a001 CR4: 0000000000770ee0
[ 4223.822627] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4223.822628] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[ 4223.822628] PKRU: 55555554
[ 4223.822628] Call Trace:
[ 4223.822628]  qi_flush_dev_iotlb+0xb1/0xd0
[ 4223.822628]  __dmar_remove_one_dev_info+0x224/0x250
[ 4223.822629]  dmar_remove_one_dev_info+0x3e/0x50
[ 4223.822629]  intel_iommu_release_device+0x1f/0x30
[ 4223.822629]  iommu_release_device+0x33/0x60
[ 4223.822629]  iommu_bus_notifier+0x7f/0x90
[ 4223.822630]  blocking_notifier_call_chain+0x60/0x90
[ 4223.822630]  device_del+0x2e5/0x420
[ 4223.822630]  pci_remove_bus_device+0x70/0x110
[ 4223.822630]  pciehp_unconfigure_device+0x7c/0x130
[ 4223.822631]  pciehp_disable_slot+0x6b/0x100
[ 4223.822631]  pciehp_handle_presence_or_link_change+0xd8/0x320
[ 4223.822631]  pciehp_ist+0x176/0x180
[ 4223.822631]  ? irq_finalize_oneshot.part.50+0x110/0x110
[ 4223.822632]  irq_thread_fn+0x19/0x50
[ 4223.822632]  irq_thread+0x104/0x190
[ 4223.822632]  ? irq_forced_thread_fn+0x90/0x90
[ 4223.822632]  ? irq_thread_check_affinity+0xe0/0xe0
[ 4223.822633]  kthread+0x114/0x130
[ 4223.822633]  ? __kthread_cancel_work+0x40/0x40
[ 4223.822633]  ret_from_fork+0x1f/0x30
[ 4223.822633] Kernel panic - not syncing: Hard LOCKUP
[ 4223.822634] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S
         OE     kernel version xxxx
[ 4223.822634] Hardware name: vendorname xxxx 666-106,
BIOS 01.01.02.03.01 05/15/2023
[ 4223.822634] Call Trace:
[ 4223.822634]  <NMI>
[ 4223.822635]  dump_stack+0x6d/0x88
[ 4223.822635]  panic+0x101/0x2d0
[ 4223.822635]  ? ret_from_fork+0x11/0x30
[ 4223.822635]  nmi_panic.cold.14+0xc/0xc
[ 4223.822636]  watchdog_overflow_callback.cold.8+0x6d/0x81
[ 4223.822636]  __perf_event_overflow+0x4f/0xf0
[ 4223.822636]  handle_pmi_common+0x1ef/0x290
[ 4223.822636]  ? __set_pte_vaddr+0x28/0x40
[ 4223.822637]  ? flush_tlb_one_kernel+0xa/0x20
[ 4223.822637]  ? __native_set_fixmap+0x24/0x30
[ 4223.822637]  ? ghes_copy_tofrom_phys+0x70/0x100
[ 4223.822637]  ? __ghes_peek_estatus.isra.16+0x49/0xa0
[ 4223.822637]  intel_pmu_handle_irq+0xba/0x2b0
[ 4223.822638]  perf_event_nmi_handler+0x24/0x40
[ 4223.822638]  nmi_handle+0x4d/0xf0
[ 4223.822638]  default_do_nmi+0x49/0x100
[ 4223.822638]  exc_nmi+0x134/0x180
[ 4223.822639]  end_repeat_nmi+0x16/0x67
[ 4223.822639] RIP: 0010:qi_submit_sync+0x2c0/0x490
[ 4223.822639] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b
 57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 10
 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39
[ 4223.822640] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093
[ 4223.822640] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005
[ 4223.822640] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340
[ 4223.822641] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000
[ 4223.822641] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200
[ 4223.822641] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004
[ 4223.822641]  ? qi_submit_sync+0x2c0/0x490
[ 4223.822642]  ? qi_submit_sync+0x2c0/0x490
[ 4223.822642]  </NMI>
[ 4223.822642]  qi_flush_dev_iotlb+0xb1/0xd0
[ 4223.822642]  __dmar_remove_one_dev_info+0x224/0x250
[ 4223.822643]  dmar_remove_one_dev_info+0x3e/0x50
[ 4223.822643]  intel_iommu_release_device+0x1f/0x30
[ 4223.822643]  iommu_release_device+0x33/0x60
[ 4223.822643]  iommu_bus_notifier+0x7f/0x90
[ 4223.822644]  blocking_notifier_call_chain+0x60/0x90
[ 4223.822644]  device_del+0x2e5/0x420
[ 4223.822644]  pci_remove_bus_device+0x70/0x110
[ 4223.822644]  pciehp_unconfigure_device+0x7c/0x130
[ 4223.822644]  pciehp_disable_slot+0x6b/0x100
[ 4223.822645]  pciehp_handle_presence_or_link_change+0xd8/0x320
[ 4223.822645]  pciehp_ist+0x176/0x180
[ 4223.822645]  ? irq_finalize_oneshot.part.50+0x110/0x110
[ 4223.822645]  irq_thread_fn+0x19/0x50
[ 4223.822646]  irq_thread+0x104/0x190
[ 4223.822646]  ? irq_forced_thread_fn+0x90/0x90
[ 4223.822646]  ? irq_thread_check_affinity+0xe0/0xe0
[ 4223.822646]  kthread+0x114/0x130
[ 4223.822647]  ? __kthread_cancel_work+0x40/0x40
[ 4223.822647]  ret_from_fork+0x1f/0x30
[ 4223.822647] Kernel Offset: 0x6400000 from 0xffffffff81000000 (relocation
range: 0xffffffff80000000-0xffffffffbfffffff)

Such issue could be triggered by all kinds of regular surprise removal
hotplug operation. like:

1. pull EP(endpoint device) out directly.
2. turn off EP's power.
3. bring the link down.
etc.

this patch aims to work for regular safe removal and surprise removal
unplug. these hot unplug handling process could be optimized for fix the
ATS Invalidation hang issue by calling pci_dev_is_disconnected() in
function devtlb_invalidation_with_pasid() to check target device state to
avoid sending meaningless ATS Invalidation request to iommu when device is
gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)

For safe removal, device wouldn't be removed until the whole software
handling process is done, it wouldn't trigger the hard lock up issue
caused by too long ATS Invalidation timeout wait. In safe removal path,
device state isn't set to pci_channel_io_perm_failure in
pciehp_unconfigure_device() by checking 'presence' parameter, calling
pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return
false there, wouldn't break the function.

For surprise removal, device state is set to pci_channel_io_perm_failure in
pciehp_unconfigure_device(), means device is already gone (disconnected)
call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
return true to break the function not to send ATS Invalidation request to
the disconnected device blindly, thus avoid to trigger further ITE fault,
and ITE fault will block all invalidation request to be handled.
furthermore retry the timeout request could trigger hard lockup.

safe removal (present) & surprise removal (not present)

pciehp_ist()
   pciehp_handle_presence_or_link_change()
     pciehp_disable_slot()
       remove_board()
         pciehp_unconfigure_device(presence) {
           if (!presence)
                pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
           }

this patch works for regular safe removal and surprise removal of ATS
capable endpoint on PCIe switch downstream ports.

Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface")
Tested-by: Haorong Ye <yehaorong@bytedance.com>
Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
---
 drivers/iommu/intel/pasid.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 3239cefa4c33..953592125e4a 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -214,6 +214,9 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
 	if (!info || !info->ats_enabled)
 		return;
 
+	if (pci_dev_is_disconnected(to_pci_dev(dev)))
+		return;
+
 	sid = info->bus << 8 | info->devfn;
 	qdep = info->ats_qdep;
 	pfsid = info->pfsid;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v12 3/5] iommu/vt-d: simplify parameters of qi_submit_sync() ATS invalidation callers
  2024-01-29  3:49 [PATCH v12 0/5] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
  2024-01-29  3:49 ` [PATCH v12 1/5] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
  2024-01-29  3:49 ` [PATCH v12 2/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected Ethan Zhao
@ 2024-01-29  3:49 ` Ethan Zhao
  2024-01-29  9:37   ` Yi Liu
  2024-01-29  3:49 ` [PATCH v12 4/5] iommu/vt-d: pass pdev parameter for qi_check_fault() and refactor callers Ethan Zhao
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 39+ messages in thread
From: Ethan Zhao @ 2024-01-29  3:49 UTC (permalink / raw)
  To: baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: kevin.tian, dwmw2, will, lukas, yi.l.liu, iommu, linux-kernel,
	linux-pci, Ethan Zhao

fold parameters back to struct device_domain_info *info instead of extract
and pass them, thus decrease the number of the parameter passed for
following functions

qi_flush_dev_iotlb()
qi_flush_dev_iotlb_pasid()
quirk_extra_dev_tlb_flush()

no function change.

Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
---
 drivers/iommu/intel/dmar.c   | 26 ++++++++++++++++++++++----
 drivers/iommu/intel/iommu.c  | 29 +++++++----------------------
 drivers/iommu/intel/iommu.h  | 17 ++++++++---------
 drivers/iommu/intel/nested.c |  9 ++-------
 drivers/iommu/intel/pasid.c  |  9 ++-------
 drivers/iommu/intel/svm.c    | 17 ++++++++---------
 6 files changed, 49 insertions(+), 58 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 23cb80d62a9a..ab5e1760bd59 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1517,11 +1517,20 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 	qi_submit_sync(iommu, &desc, 1, 0);
 }
 
-void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
-			u16 qdep, u64 addr, unsigned mask)
+void qi_flush_dev_iotlb(struct intel_iommu *iommu,
+			struct device_domain_info *info, u64 addr,
+			unsigned int mask)
 {
+	u16 sid, qdep, pfsid;
 	struct qi_desc desc;
 
+	if (!info || !info->ats_enabled)
+		return;
+
+	sid = info->bus << 8 | info->devfn;
+	qdep = info->ats_qdep;
+	pfsid = info->pfsid;
+
 	/*
 	 * VT-d spec, section 4.3:
 	 *
@@ -1590,11 +1599,20 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
 }
 
 /* PASID-based device IOTLB Invalidate */
-void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
-			      u32 pasid,  u16 qdep, u64 addr, unsigned int size_order)
+void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu,
+			      struct device_domain_info *info, u64 addr, u32 pasid,
+			      unsigned int size_order)
 {
 	unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order - 1);
 	struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
+	u16 sid, qdep, pfsid;
+
+	if (!info || !dev_is_pci(info->dev))
+		return;
+
+	sid = info->bus << 8 | info->devfn;
+	qdep = info->ats_qdep;
+	pfsid = info->pfsid;
 
 	/*
 	 * VT-d spec, section 4.3:
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6fb5f6fceea1..e5902944b3db 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1310,16 +1310,11 @@ static void iommu_disable_pci_caps(struct device_domain_info *info)
 static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
 				    u64 addr, unsigned int mask)
 {
-	u16 sid, qdep;
-
 	if (!info || !info->ats_enabled)
 		return;
 
-	sid = info->bus << 8 | info->devfn;
-	qdep = info->ats_qdep;
-	qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
-			   qdep, addr, mask);
-	quirk_extra_dev_tlb_flush(info, addr, mask, IOMMU_NO_PASID, qdep);
+	qi_flush_dev_iotlb(info->iommu, info, addr, mask);
+	quirk_extra_dev_tlb_flush(info, addr, IOMMU_NO_PASID, mask);
 }
 
 static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
@@ -1342,11 +1337,7 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
 		if (!info->ats_enabled)
 			continue;
 
-		qi_flush_dev_iotlb_pasid(info->iommu,
-					 PCI_DEVID(info->bus, info->devfn),
-					 info->pfsid, dev_pasid->pasid,
-					 info->ats_qdep, addr,
-					 mask);
+		qi_flush_dev_iotlb_pasid(info->iommu, info, addr, dev_pasid->pasid, mask);
 	}
 	spin_unlock_irqrestore(&domain->lock, flags);
 }
@@ -4990,22 +4981,16 @@ static void __init check_tylersburg_isoch(void)
  *
  * As a reminder, #6 will *NEED* this quirk as we enable nested translation.
  */
-void quirk_extra_dev_tlb_flush(struct device_domain_info *info,
-			       unsigned long address, unsigned long mask,
-			       u32 pasid, u16 qdep)
+void quirk_extra_dev_tlb_flush(struct device_domain_info *info, u32 pasid,
+			       unsigned long address, unsigned long mask)
 {
-	u16 sid;
-
 	if (likely(!info->dtlb_extra_inval))
 		return;
 
-	sid = PCI_DEVID(info->bus, info->devfn);
 	if (pasid == IOMMU_NO_PASID) {
-		qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
-				   qdep, address, mask);
+		qi_flush_dev_iotlb(info->iommu, info, address, mask);
 	} else {
-		qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid,
-					 pasid, qdep, address, mask);
+		qi_flush_dev_iotlb_pasid(info->iommu, info, address, pasid, mask);
 	}
 }
 
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index d02f916d8e59..f68f17476d85 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -1037,18 +1037,17 @@ void qi_flush_context(struct intel_iommu *iommu, u16 did,
 		      u16 sid, u8 fm, u64 type);
 void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 		    unsigned int size_order, u64 type);
-void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
-			u16 qdep, u64 addr, unsigned mask);
-
+void qi_flush_dev_iotlb(struct intel_iommu *iommu,
+			struct device_domain_info *info, u64 addr,
+			unsigned int mask);
 void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
 		     unsigned long npages, bool ih);
 
-void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
-			      u32 pasid, u16 qdep, u64 addr,
-			      unsigned int size_order);
-void quirk_extra_dev_tlb_flush(struct device_domain_info *info,
-			       unsigned long address, unsigned long pages,
-			       u32 pasid, u16 qdep);
+void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu,
+			      struct device_domain_info *info, u64 addr,
+			      u32 pasid, unsigned int size_order);
+void quirk_extra_dev_tlb_flush(struct device_domain_info *info, u32 pasid,
+			       unsigned long address, unsigned long mask);
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
 			  u32 pasid);
 
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
index f26c7f1c46cc..d15f72b55940 100644
--- a/drivers/iommu/intel/nested.c
+++ b/drivers/iommu/intel/nested.c
@@ -78,18 +78,13 @@ static void nested_flush_dev_iotlb(struct dmar_domain *domain, u64 addr,
 {
 	struct device_domain_info *info;
 	unsigned long flags;
-	u16 sid, qdep;
 
 	spin_lock_irqsave(&domain->lock, flags);
 	list_for_each_entry(info, &domain->devices, link) {
 		if (!info->ats_enabled)
 			continue;
-		sid = info->bus << 8 | info->devfn;
-		qdep = info->ats_qdep;
-		qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
-				   qdep, addr, mask);
-		quirk_extra_dev_tlb_flush(info, addr, mask,
-					  IOMMU_NO_PASID, qdep);
+		qi_flush_dev_iotlb(info->iommu, info, addr, mask);
+		quirk_extra_dev_tlb_flush(info, IOMMU_NO_PASID, addr, mask);
 	}
 	spin_unlock_irqrestore(&domain->lock, flags);
 }
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 953592125e4a..5dacdea3cab7 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -208,7 +208,6 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
 			       struct device *dev, u32 pasid)
 {
 	struct device_domain_info *info;
-	u16 sid, qdep, pfsid;
 
 	info = dev_iommu_priv_get(dev);
 	if (!info || !info->ats_enabled)
@@ -217,10 +216,6 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
 	if (pci_dev_is_disconnected(to_pci_dev(dev)))
 		return;
 
-	sid = info->bus << 8 | info->devfn;
-	qdep = info->ats_qdep;
-	pfsid = info->pfsid;
-
 	/*
 	 * When PASID 0 is used, it indicates RID2PASID(DMA request w/o PASID),
 	 * devTLB flush w/o PASID should be used. For non-zero PASID under
@@ -228,9 +223,9 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
 	 * efficient to flush devTLB specific to the PASID.
 	 */
 	if (pasid == IOMMU_NO_PASID)
-		qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);
+		qi_flush_dev_iotlb(iommu, info, 0, 64 - VTD_PAGE_SHIFT);
 	else
-		qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 - VTD_PAGE_SHIFT);
+		qi_flush_dev_iotlb_pasid(iommu, info, 0, pasid, 64 - VTD_PAGE_SHIFT);
 }
 
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 40edd282903f..89168b31bf31 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -181,11 +181,10 @@ static void __flush_svm_range_dev(struct intel_svm *svm,
 
 	qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, address, pages, ih);
 	if (info->ats_enabled) {
-		qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid,
-					 svm->pasid, sdev->qdep, address,
+		qi_flush_dev_iotlb_pasid(sdev->iommu, info, address, svm->pasid,
 					 order_base_2(pages));
-		quirk_extra_dev_tlb_flush(info, address, order_base_2(pages),
-					  svm->pasid, sdev->qdep);
+		quirk_extra_dev_tlb_flush(info, svm->pasid, address,
+					  order_base_2(pages));
 	}
 }
 
@@ -227,11 +226,11 @@ static void intel_flush_svm_all(struct intel_svm *svm)
 
 		qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, 0, -1UL, 0);
 		if (info->ats_enabled) {
-			qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid,
-						 svm->pasid, sdev->qdep,
-						 0, 64 - VTD_PAGE_SHIFT);
-			quirk_extra_dev_tlb_flush(info, 0, 64 - VTD_PAGE_SHIFT,
-						  svm->pasid, sdev->qdep);
+			qi_flush_dev_iotlb_pasid(sdev->iommu, info, 0,
+						 svm->pasid,
+						 64 - VTD_PAGE_SHIFT);
+			quirk_extra_dev_tlb_flush(info, svm->pasid, 0,
+						  64 - VTD_PAGE_SHIFT);
 		}
 	}
 	rcu_read_unlock();
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v12 4/5] iommu/vt-d: pass pdev parameter for qi_check_fault() and refactor callers
  2024-01-29  3:49 [PATCH v12 0/5] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
                   ` (2 preceding siblings ...)
  2024-01-29  3:49 ` [PATCH v12 3/5] iommu/vt-d: simplify parameters of qi_submit_sync() ATS invalidation callers Ethan Zhao
@ 2024-01-29  3:49 ` Ethan Zhao
  2024-01-29  8:58   ` Tian, Kevin
  2024-02-08  7:15   ` Dan Carpenter
  2024-01-29  3:49 ` [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present Ethan Zhao
  2024-01-29  5:16 ` [PATCH v12 0/5] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
  5 siblings, 2 replies; 39+ messages in thread
From: Ethan Zhao @ 2024-01-29  3:49 UTC (permalink / raw)
  To: baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: kevin.tian, dwmw2, will, lukas, yi.l.liu, iommu, linux-kernel,
	linux-pci, Ethan Zhao

to check state of ATS capable pci device in qi_check_fault() for surprise
removal case, we need to pass the target pci device of ATS invalidation
request to qi_check_fault(). if pdev is valid, means current request is for
ATS invalidation, vice vesa.

no function change.

Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
---
 drivers/iommu/intel/dmar.c          | 31 ++++++++++++++++++-----------
 drivers/iommu/intel/iommu.h         |  3 ++-
 drivers/iommu/intel/irq_remapping.c |  2 +-
 drivers/iommu/intel/pasid.c         |  2 +-
 drivers/iommu/intel/svm.c           |  6 +++---
 5 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index ab5e1760bd59..814134e9aa5a 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1267,7 +1267,8 @@ static void qi_dump_fault(struct intel_iommu *iommu, u32 fault)
 	       (unsigned long long)desc->qw1);
 }
 
-static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index)
+static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index,
+			  struct pci_dev *pdev)
 {
 	u32 fault;
 	int head, tail;
@@ -1344,7 +1345,8 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index)
  * can be part of the submission but it will not be polled for completion.
  */
 int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
-		   unsigned int count, unsigned long options)
+		   unsigned int count, unsigned long options,
+		   struct pci_dev *pdev)
 {
 	struct q_inval *qi = iommu->qi;
 	s64 devtlb_start_ktime = 0;
@@ -1430,7 +1432,7 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
 		 * a deadlock where the interrupt context can wait indefinitely
 		 * for free slots in the queue.
 		 */
-		rc = qi_check_fault(iommu, index, wait_index);
+		rc = qi_check_fault(iommu, index, wait_index, pdev);
 		if (rc)
 			break;
 
@@ -1476,7 +1478,7 @@ void qi_global_iec(struct intel_iommu *iommu)
 	desc.qw3 = 0;
 
 	/* should never fail */
-	qi_submit_sync(iommu, &desc, 1, 0);
+	qi_submit_sync(iommu, &desc, 1, 0, NULL);
 }
 
 void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
@@ -1490,7 +1492,7 @@ void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(iommu, &desc, 1, 0);
+	qi_submit_sync(iommu, &desc, 1, 0, NULL);
 }
 
 void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
@@ -1514,23 +1516,27 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(iommu, &desc, 1, 0);
+	qi_submit_sync(iommu, &desc, 1, 0, NULL);
 }
 
 void qi_flush_dev_iotlb(struct intel_iommu *iommu,
 			struct device_domain_info *info, u64 addr,
 			unsigned int mask)
 {
+	struct pci_dev *pdev = NULL;
 	u16 sid, qdep, pfsid;
 	struct qi_desc desc;
 
 	if (!info || !info->ats_enabled)
 		return;
 
+	if (info->dev || !dev_is_pci(info->dev))
+		return;
+
+	pdev = to_pci_dev(info->dev);
 	sid = info->bus << 8 | info->devfn;
 	qdep = info->ats_qdep;
 	pfsid = info->pfsid;
-
 	/*
 	 * VT-d spec, section 4.3:
 	 *
@@ -1554,7 +1560,7 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(iommu, &desc, 1, 0);
+	qi_submit_sync(iommu, &desc, 1, 0, pdev);
 }
 
 /* PASID-based IOTLB invalidation */
@@ -1595,7 +1601,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
 				QI_EIOTLB_AM(mask);
 	}
 
-	qi_submit_sync(iommu, &desc, 1, 0);
+	qi_submit_sync(iommu, &desc, 1, 0, NULL);
 }
 
 /* PASID-based device IOTLB Invalidate */
@@ -1605,15 +1611,16 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu,
 {
 	unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order - 1);
 	struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
+	struct pci_dev *pdev = NULL;
 	u16 sid, qdep, pfsid;
 
 	if (!info || !dev_is_pci(info->dev))
 		return;
 
+	pdev = to_pci_dev(info->dev);
 	sid = info->bus << 8 | info->devfn;
 	qdep = info->ats_qdep;
 	pfsid = info->pfsid;
-
 	/*
 	 * VT-d spec, section 4.3:
 	 *
@@ -1657,7 +1664,7 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu,
 		desc.qw1 |= QI_DEV_EIOTLB_SIZE;
 	}
 
-	qi_submit_sync(iommu, &desc, 1, 0);
+	qi_submit_sync(iommu, &desc, 1, 0, pdev);
 }
 
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
@@ -1667,7 +1674,7 @@ void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
 
 	desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) |
 			QI_PC_GRAN(granu) | QI_PC_TYPE;
-	qi_submit_sync(iommu, &desc, 1, 0);
+	qi_submit_sync(iommu, &desc, 1, 0, NULL);
 }
 
 /*
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index f68f17476d85..72e1d5fb2114 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -1052,7 +1052,8 @@ void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
 			  u32 pasid);
 
 int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
-		   unsigned int count, unsigned long options);
+		   unsigned int count, unsigned long options,
+		   struct pci_dev *pdev);
 /*
  * Options used in qi_submit_sync:
  * QI_OPT_WAIT_DRAIN - Wait for PRQ drain completion, spec 6.5.2.8.
diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c
index 566297bc87dd..09276bfa127d 100644
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -153,7 +153,7 @@ static int qi_flush_iec(struct intel_iommu *iommu, int index, int mask)
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	return qi_submit_sync(iommu, &desc, 1, 0);
+	return qi_submit_sync(iommu, &desc, 1, 0, NULL);
 }
 
 static int modify_irte(struct irq_2_iommu *irq_iommu,
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 5dacdea3cab7..8bba5721aeba 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -200,7 +200,7 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
 	desc.qw2 = 0;
 	desc.qw3 = 0;
 
-	qi_submit_sync(iommu, &desc, 1, 0);
+	qi_submit_sync(iommu, &desc, 1, 0, NULL);
 }
 
 static void
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 89168b31bf31..c2f4f3822191 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -538,7 +538,7 @@ void intel_drain_pasid_prq(struct device *dev, u32 pasid)
 			QI_DEV_IOTLB_PFSID(info->pfsid);
 qi_retry:
 	reinit_completion(&iommu->prq_complete);
-	qi_submit_sync(iommu, desc, 3, QI_OPT_WAIT_DRAIN);
+	qi_submit_sync(iommu, desc, 3, QI_OPT_WAIT_DRAIN, NULL);
 	if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) {
 		wait_for_completion(&iommu->prq_complete);
 		goto qi_retry;
@@ -641,7 +641,7 @@ static void handle_bad_prq_event(struct intel_iommu *iommu,
 		desc.qw3 = 0;
 	}
 
-	qi_submit_sync(iommu, &desc, 1, 0);
+	qi_submit_sync(iommu, &desc, 1, 0, NULL);
 }
 
 static irqreturn_t prq_event_thread(int irq, void *d)
@@ -797,7 +797,7 @@ int intel_svm_page_response(struct device *dev,
 				ktime_to_ns(ktime_get()) - prm->private_data[0]);
 		}
 
-		qi_submit_sync(iommu, &desc, 1, 0);
+		qi_submit_sync(iommu, &desc, 1, 0, NULL);
 	}
 out:
 	return ret;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-29  3:49 [PATCH v12 0/5] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
                   ` (3 preceding siblings ...)
  2024-01-29  3:49 ` [PATCH v12 4/5] iommu/vt-d: pass pdev parameter for qi_check_fault() and refactor callers Ethan Zhao
@ 2024-01-29  3:49 ` Ethan Zhao
  2024-01-29  9:06   ` Tian, Kevin
  2024-01-29  9:33   ` Yi Liu
  2024-01-29  5:16 ` [PATCH v12 0/5] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
  5 siblings, 2 replies; 39+ messages in thread
From: Ethan Zhao @ 2024-01-29  3:49 UTC (permalink / raw)
  To: baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: kevin.tian, dwmw2, will, lukas, yi.l.liu, iommu, linux-kernel,
	linux-pci, Ethan Zhao

Because surprise removal could happen anytime, e.g. user could request safe
removal to EP(endpoint device) via sysfs and brings its link down to do
surprise removal cocurrently. such aggressive cases would cause ATS
invalidation request issued to non-existence target device, then deadly
loop to retry that request after ITE fault triggered in interrupt context.
this patch aims to optimize the ITE handling by checking the target device
presence state to avoid retrying the timeout request blindly, thus avoid
hard lockup or system hang.

Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
---
 drivers/iommu/intel/dmar.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 814134e9aa5a..2e214b43725c 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1272,6 +1272,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index,
 {
 	u32 fault;
 	int head, tail;
+	u64 iqe_err, ite_sid;
 	struct q_inval *qi = iommu->qi;
 	int shift = qi_shift(iommu);
 
@@ -1316,6 +1317,13 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index,
 		tail = readl(iommu->reg + DMAR_IQT_REG);
 		tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
 
+		/*
+		 * SID field is valid only when the ITE field is Set in FSTS_REG
+		 * see Intel VT-d spec r4.1, section 11.4.9.9
+		 */
+		iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
+		ite_sid = DMAR_IQER_REG_ITESID(iqe_err);
+
 		writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
 		pr_info("Invalidation Time-out Error (ITE) cleared\n");
 
@@ -1325,6 +1333,16 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index,
 			head = (head - 2 + QI_LENGTH) % QI_LENGTH;
 		} while (head != tail);
 
+		/*
+		 * If got ITE, we need to check if the sid of ITE is the same as
+		 * current ATS invalidation target device, if yes, don't try this
+		 * request anymore if the target device isn't present.
+		 * 0 value of ite_sid means old VT-d device, no ite_sid value.
+		 */
+		if (pdev && ite_sid && !pci_device_is_present(pdev) &&
+			ite_sid == pci_dev_id(pci_physfn(pdev)))
+			return -ETIMEDOUT;
+
 		if (qi->desc_status[wait_index] == QI_ABORT)
 			return -EAGAIN;
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 0/5] fix vt-d hard lockup when hotplug ATS capable device
  2024-01-29  3:49 [PATCH v12 0/5] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
                   ` (4 preceding siblings ...)
  2024-01-29  3:49 ` [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present Ethan Zhao
@ 2024-01-29  5:16 ` Ethan Zhao
  5 siblings, 0 replies; 39+ messages in thread
From: Ethan Zhao @ 2024-01-29  5:16 UTC (permalink / raw)
  To: baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: kevin.tian, dwmw2, will, lukas, yi.l.liu, iommu, linux-kernel, linux-pci

Bjorn,

On 1/29/2024 11:49 AM, Ethan Zhao wrote:
> This patchset is used to fix vt-d hard lockup reported when surprise
> unplug ATS capable endpoint device connects to system via PCIe switch
> as following topology.
>
>       +-[0000:15]-+-00.0  Intel Corporation Ice Lake Memory Map/VT-d
>       |           +-00.1  Intel Corporation Ice Lake Mesh 2 PCIe
>       |           +-00.2  Intel Corporation Ice Lake RAS
>       |           +-00.4  Intel Corporation Device 0b23
>       |           \-01.0-[16-1b]----00.0-[17-1b]--+-00.0-[18]----00.0
>                                             NVIDIA Corporation Device 2324
>       |                                           +-01.0-[19]----00.0
>                            Mellanox Technologies MT2910 Family [ConnectX-7]
>
> User brought endpoint device 19:00.0's link down by flapping it's hotplug
> capable slot 17:01.0 link control register, as sequence DLLSC response,
> pciehp_ist() will unload device driver and power it off, durning device
> driver is unloading an iommu device-TLB invalidation (Intel VT-d spec, or
> 'ATS Invalidation' in PCIe spec) request issued to that link down device,
> thus a long time completion/timeout waiting in interrupt context causes
> continuous hard lockup warnning and system hang.
>
> Other detail, see every patch commit log.
>
> patch [1&2] were tested by yehaorong@bytedance.com on stable v6.7-rc4.
> patch [1-5] passed compiling on stable v6.8-rc2.
>
> change log:
> v12:
> - use base-commit tag to format patch.
> - fix building issue on v6.8-rc2 repported by lkp@intel.com.
> v11:
> - update per latest comment and suggestion from Baolu and YiLiu.
> - split refactoring patch into two patches, [3/5] for simplify parameters
>    and [4/5] for pdev parameter passing.
> - re-order patches.
> - fold target device presence check into qi_check_fault().
> - combine patch[2][5] in v10 into one patch[5].
> - some commit description correctness.
> - add fixes tag to patch[2/5].
> - rebased on 6.8rc1
> v10:
> - refactor qi_submit_sync() and its callers to get pci_dev instance, as
>    Kevin pointed out add target_flush_dev to iommu is not right.
> v9:
> - unify all spelling of ATS Invalidation adhere to PCIe spec per Bjorn's
>    suggestion.
> v8:
> - add a patch to break the loop for timeout device-TLB invalidation, as
>    Bjorn said there is possibility device just no response but not gone.
> v7:
> - reorder patches and revise commit log per Bjorn's guide.
> - other code and commit log revise per Lukas' suggestion.
> - rebased to stable v6.7-rc6.
> v6:
> - add two patches to break out device-TLB invalidation if device is gone.
> v5:
> - add a patch try to fix the rare case (surprise remove a device in
>    safe removal process). not work because surprise removal handling can't
>    re-enter when another safe removal is in process.
> v4:
> - move the PCI device state checking after ATS per Baolu's suggestion.
> v3:
> - fix commit description typo.
> v2:
> - revise commit[1] description part according to Lukas' suggestion.
> - revise commit[2] description to clarify the issue's impact.
> v1:
> - https://lore.kernel.org/lkml/20231213034637.2603013-1-haifeng.zhao@
> linux.intel.com/T/
>
>
> Thanks,
> Ethan
>
>   
>
>
> Ethan Zhao (5):
>    PCI: make pci_dev_is_disconnected() helper public for other drivers
>    iommu/vt-d: don't issue ATS Invalidation request when device is
>      disconnected
>    iommu/vt-d: simplify parameters of qi_submit_sync() ATS invalidation
>      callers
>    iommu/vt-d: pass pdev parameter for qi_check_fault() and refactor
>      callers
>    iommu/vt-d: improve ITE fault handling if target device isn't present
>
>   drivers/iommu/intel/dmar.c          | 71 +++++++++++++++++++++++------
>   drivers/iommu/intel/iommu.c         | 29 +++---------
>   drivers/iommu/intel/iommu.h         | 20 ++++----
>   drivers/iommu/intel/irq_remapping.c |  2 +-
>   drivers/iommu/intel/nested.c        |  9 +---
>   drivers/iommu/intel/pasid.c         | 12 ++---
>   drivers/iommu/intel/svm.c           | 23 +++++-----
>   drivers/pci/pci.h                   |  5 --
>   include/linux/pci.h                 |  5 ++
>   9 files changed, 98 insertions(+), 78 deletions(-)
>
>
> base-commit: 41bccc98fb7931d63d03f326a746ac4d429c1dd3

I got mail delivery failure message from Gmail for "unsolicited mail", I 
don't

know if you could read it by linux-pci mail list. the vt-d maintainer is 
going

to pick up this patchset, may I ask your 'ack' for the pci change if no 
more

comment from you ?

Thanks,

Ethan


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 1/5] PCI: make pci_dev_is_disconnected() helper public for other drivers
  2024-01-29  3:49 ` [PATCH v12 1/5] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
@ 2024-01-29  8:50   ` Tian, Kevin
  2024-01-30  5:23     ` Ethan Zhao
  2024-01-30  5:25     ` Ethan Zhao
  0 siblings, 2 replies; 39+ messages in thread
From: Tian, Kevin @ 2024-01-29  8:50 UTC (permalink / raw)
  To: Ethan Zhao, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, Liu, Yi L, iommu, linux-kernel, linux-pci,
	Haorong Ye

> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
> Sent: Monday, January 29, 2024 11:49 AM
> 
> Make pci_dev_is_disconnected() public so that it can be called from
> Intel VT-d driver to quickly fix/workaround the surprise removal
> unplug hang issue for those ATS capable devices on PCIe switch downstream
> hotplug capable ports.
> 
> Beside pci_device_is_present() function, this one has no config space

s/Beside/Unlike/

> space access, so is light enough to optimize the normal pure surprise
> removal and safe removal flow.

somehow this paragraph belongs to the patch using it. The 1st paragraph
is sufficient for the reason of exposure.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 2/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected
  2024-01-29  3:49 ` [PATCH v12 2/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected Ethan Zhao
@ 2024-01-29  8:53   ` Tian, Kevin
  2024-01-29  9:32   ` Yi Liu
  1 sibling, 0 replies; 39+ messages in thread
From: Tian, Kevin @ 2024-01-29  8:53 UTC (permalink / raw)
  To: Ethan Zhao, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, Liu, Yi L, iommu, linux-kernel, linux-pci,
	Haorong Ye

> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
> Sent: Monday, January 29, 2024 11:49 AM
> 
[snip]
> Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface")
> Tested-by: Haorong Ye <yehaorong@bytedance.com>
> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
> ---
>  drivers/iommu/intel/pasid.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 3239cefa4c33..953592125e4a 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -214,6 +214,9 @@ devtlb_invalidation_with_pasid(struct intel_iommu
> *iommu,
>  	if (!info || !info->ats_enabled)
>  		return;
> 
> +	if (pci_dev_is_disconnected(to_pci_dev(dev)))
> +		return;
> +

the long commit msg doesn't tell the fact that this is not a full fix.
surprise removal could happen after that check so there could still
be chance seeing timeout.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 4/5] iommu/vt-d: pass pdev parameter for qi_check_fault() and refactor callers
  2024-01-29  3:49 ` [PATCH v12 4/5] iommu/vt-d: pass pdev parameter for qi_check_fault() and refactor callers Ethan Zhao
@ 2024-01-29  8:58   ` Tian, Kevin
  2024-01-30  7:30     ` Ethan Zhao
  2024-02-08  7:15   ` Dan Carpenter
  1 sibling, 1 reply; 39+ messages in thread
From: Tian, Kevin @ 2024-01-29  8:58 UTC (permalink / raw)
  To: Ethan Zhao, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, Liu, Yi L, iommu, linux-kernel, linux-pci

> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
> Sent: Monday, January 29, 2024 11:49 AM
> 
> to check state of ATS capable pci device in qi_check_fault() for surprise
> removal case, we need to pass the target pci device of ATS invalidation
> request to qi_check_fault(). if pdev is valid, means current request is for
> ATS invalidation, vice vesa.
> 
> no function change.

qi_submit_sync() is used for all kinds of iotlb/cache/devtlb invalidations.
it's a bit weird to see a device pointer (even being NULL) in places where
a device doesn't even matter.

having a new qi_submit_sync_devtlb() wrapper sounds cleaner to me,
with an internal __qi_submit_sync() helper to accept a device pointer.

qi_submit_sync() calls __qi_submit_sync() with a null device pointer then
non-devtlb paths are intact.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-29  3:49 ` [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present Ethan Zhao
@ 2024-01-29  9:06   ` Tian, Kevin
  2024-01-29  9:21     ` Yi Liu
  2024-01-29 14:48     ` Baolu Lu
  2024-01-29  9:33   ` Yi Liu
  1 sibling, 2 replies; 39+ messages in thread
From: Tian, Kevin @ 2024-01-29  9:06 UTC (permalink / raw)
  To: Ethan Zhao, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, Liu, Yi L, iommu, linux-kernel, linux-pci

> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
> Sent: Monday, January 29, 2024 11:49 AM
> 
> Because surprise removal could happen anytime, e.g. user could request safe
> removal to EP(endpoint device) via sysfs and brings its link down to do
> surprise removal cocurrently. such aggressive cases would cause ATS
> invalidation request issued to non-existence target device, then deadly
> loop to retry that request after ITE fault triggered in interrupt context.
> this patch aims to optimize the ITE handling by checking the target device
> presence state to avoid retrying the timeout request blindly, thus avoid
> hard lockup or system hang.
> 
> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
> ---
>  drivers/iommu/intel/dmar.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index 814134e9aa5a..2e214b43725c 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1272,6 +1272,7 @@ static int qi_check_fault(struct intel_iommu
> *iommu, int index, int wait_index,
>  {
>  	u32 fault;
>  	int head, tail;
> +	u64 iqe_err, ite_sid;
>  	struct q_inval *qi = iommu->qi;
>  	int shift = qi_shift(iommu);
> 
> @@ -1316,6 +1317,13 @@ static int qi_check_fault(struct intel_iommu
> *iommu, int index, int wait_index,
>  		tail = readl(iommu->reg + DMAR_IQT_REG);
>  		tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
> 
> +		/*
> +		 * SID field is valid only when the ITE field is Set in FSTS_REG
> +		 * see Intel VT-d spec r4.1, section 11.4.9.9
> +		 */
> +		iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
> +		ite_sid = DMAR_IQER_REG_ITESID(iqe_err);
> +
>  		writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
>  		pr_info("Invalidation Time-out Error (ITE) cleared\n");
> 
> @@ -1325,6 +1333,16 @@ static int qi_check_fault(struct intel_iommu
> *iommu, int index, int wait_index,
>  			head = (head - 2 + QI_LENGTH) % QI_LENGTH;
>  		} while (head != tail);
> 
> +		/*
> +		 * If got ITE, we need to check if the sid of ITE is the same as
> +		 * current ATS invalidation target device, if yes, don't try this
> +		 * request anymore if the target device isn't present.
> +		 * 0 value of ite_sid means old VT-d device, no ite_sid value.
> +		 */
> +		if (pdev && ite_sid && !pci_device_is_present(pdev) &&
> +			ite_sid == pci_dev_id(pci_physfn(pdev)))
> +			return -ETIMEDOUT;
> +

since the hardware already reports source id leading to timeout, can't we
just find the pci_dev according to reported ite_sid? this is a slow path (either
due to device in bad state or removed) hence it's not necessary to add more
intelligence to pass the pci_dev in, leading to only a partial fix can be backported.

It's also more future-proof, say if one day the driver allows batching invalidation
requests for multiple devices then no need to pass in a list of devices.

Then it's easier to backport a full fix.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-29  9:06   ` Tian, Kevin
@ 2024-01-29  9:21     ` Yi Liu
  2024-01-30  5:12       ` Ethan Zhao
  2024-01-29 14:48     ` Baolu Lu
  1 sibling, 1 reply; 39+ messages in thread
From: Yi Liu @ 2024-01-29  9:21 UTC (permalink / raw)
  To: Tian, Kevin, Ethan Zhao, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, iommu, linux-kernel, linux-pci

On 2024/1/29 17:06, Tian, Kevin wrote:
>> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> Sent: Monday, January 29, 2024 11:49 AM
>>
>> Because surprise removal could happen anytime, e.g. user could request safe
>> removal to EP(endpoint device) via sysfs and brings its link down to do
>> surprise removal cocurrently. such aggressive cases would cause ATS
>> invalidation request issued to non-existence target device, then deadly
>> loop to retry that request after ITE fault triggered in interrupt context.
>> this patch aims to optimize the ITE handling by checking the target device
>> presence state to avoid retrying the timeout request blindly, thus avoid
>> hard lockup or system hang.
>>
>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> ---
>>   drivers/iommu/intel/dmar.c | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
>> index 814134e9aa5a..2e214b43725c 100644
>> --- a/drivers/iommu/intel/dmar.c
>> +++ b/drivers/iommu/intel/dmar.c
>> @@ -1272,6 +1272,7 @@ static int qi_check_fault(struct intel_iommu
>> *iommu, int index, int wait_index,
>>   {
>>   	u32 fault;
>>   	int head, tail;
>> +	u64 iqe_err, ite_sid;
>>   	struct q_inval *qi = iommu->qi;
>>   	int shift = qi_shift(iommu);
>>
>> @@ -1316,6 +1317,13 @@ static int qi_check_fault(struct intel_iommu
>> *iommu, int index, int wait_index,
>>   		tail = readl(iommu->reg + DMAR_IQT_REG);
>>   		tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
>>
>> +		/*
>> +		 * SID field is valid only when the ITE field is Set in FSTS_REG
>> +		 * see Intel VT-d spec r4.1, section 11.4.9.9
>> +		 */
>> +		iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
>> +		ite_sid = DMAR_IQER_REG_ITESID(iqe_err);
>> +
>>   		writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
>>   		pr_info("Invalidation Time-out Error (ITE) cleared\n");
>>
>> @@ -1325,6 +1333,16 @@ static int qi_check_fault(struct intel_iommu
>> *iommu, int index, int wait_index,
>>   			head = (head - 2 + QI_LENGTH) % QI_LENGTH;
>>   		} while (head != tail);
>>
>> +		/*
>> +		 * If got ITE, we need to check if the sid of ITE is the same as
>> +		 * current ATS invalidation target device, if yes, don't try this
>> +		 * request anymore if the target device isn't present.
>> +		 * 0 value of ite_sid means old VT-d device, no ite_sid value.
>> +		 */
>> +		if (pdev && ite_sid && !pci_device_is_present(pdev) &&
>> +			ite_sid == pci_dev_id(pci_physfn(pdev)))
>> +			return -ETIMEDOUT;
>> +
> 
> since the hardware already reports source id leading to timeout, can't we
> just find the pci_dev according to reported ite_sid? this is a slow path (either
> due to device in bad state or removed) hence it's not necessary to add more
> intelligence to pass the pci_dev in, leading to only a partial fix can be backported.
> 
> It's also more future-proof, say if one day the driver allows batching invalidation
> requests for multiple devices then no need to pass in a list of devices.
> 
> Then it's easier to backport a full fix.

May consider pci_get_domain_bus_and_slot() or
pci_find_bus()/pci_get_slot(). But I doubt if the pci_dev is still tracked
in the bus or a kind of dev list in the device hot removal case. So Ethan
may need to check.

Regards,
Yi Liu

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 2/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected
  2024-01-29  3:49 ` [PATCH v12 2/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected Ethan Zhao
  2024-01-29  8:53   ` Tian, Kevin
@ 2024-01-29  9:32   ` Yi Liu
  2024-01-30  5:37     ` Ethan Zhao
  1 sibling, 1 reply; 39+ messages in thread
From: Yi Liu @ 2024-01-29  9:32 UTC (permalink / raw)
  To: Ethan Zhao, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: kevin.tian, dwmw2, will, lukas, iommu, linux-kernel, linux-pci,
	Haorong Ye

On 2024/1/29 11:49, Ethan Zhao wrote:
> For those endpoint devices connect to system via hotplug capable ports,
> users could request a hot reset to the device by flapping device's link
> through setting the slot's link control register, as pciehp_ist() DLLSC
> interrupt sequence response, pciehp will unload the device driver and
> then power it off. thus cause an IOMMU device-TLB invalidation (Intel
> VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence
> target device to be sent and deadly loop to retry that request after ITE
> fault triggered in interrupt context.
> 
> That would cause following continuous hard lockup warning and system hang
> 
> [ 4211.433662] pcieport 0000:17:01.0: pciehp: Slot(108): Link Down
> [ 4211.433664] pcieport 0000:17:01.0: pciehp: Slot(108): Card not present
> [ 4223.822591] NMI watchdog: Watchdog detected hard LOCKUP on cpu 144
> [ 4223.822622] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S
>           OE    kernel version xxxx
> [ 4223.822623] Hardware name: vendorname xxxx 666-106,
> BIOS 01.01.02.03.01 05/15/2023
> [ 4223.822623] RIP: 0010:qi_submit_sync+0x2c0/0x490
> [ 4223.822624] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b
>   57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 1
> 0 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39
> [ 4223.822624] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093
> [ 4223.822625] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005
> [ 4223.822625] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340
> [ 4223.822625] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000
> [ 4223.822626] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200
> [ 4223.822626] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004
> [ 4223.822626] FS:  0000000000000000(0000) GS:ffffa237ae400000(0000)
> knlGS:0000000000000000
> [ 4223.822627] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4223.822627] CR2: 00007ffe86515d80 CR3: 000002fd3000a001 CR4: 0000000000770ee0
> [ 4223.822627] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 4223.822628] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> [ 4223.822628] PKRU: 55555554
> [ 4223.822628] Call Trace:
> [ 4223.822628]  qi_flush_dev_iotlb+0xb1/0xd0
> [ 4223.822628]  __dmar_remove_one_dev_info+0x224/0x250
> [ 4223.822629]  dmar_remove_one_dev_info+0x3e/0x50
> [ 4223.822629]  intel_iommu_release_device+0x1f/0x30
> [ 4223.822629]  iommu_release_device+0x33/0x60
> [ 4223.822629]  iommu_bus_notifier+0x7f/0x90
> [ 4223.822630]  blocking_notifier_call_chain+0x60/0x90
> [ 4223.822630]  device_del+0x2e5/0x420
> [ 4223.822630]  pci_remove_bus_device+0x70/0x110
> [ 4223.822630]  pciehp_unconfigure_device+0x7c/0x130
> [ 4223.822631]  pciehp_disable_slot+0x6b/0x100
> [ 4223.822631]  pciehp_handle_presence_or_link_change+0xd8/0x320
> [ 4223.822631]  pciehp_ist+0x176/0x180
> [ 4223.822631]  ? irq_finalize_oneshot.part.50+0x110/0x110
> [ 4223.822632]  irq_thread_fn+0x19/0x50
> [ 4223.822632]  irq_thread+0x104/0x190
> [ 4223.822632]  ? irq_forced_thread_fn+0x90/0x90
> [ 4223.822632]  ? irq_thread_check_affinity+0xe0/0xe0
> [ 4223.822633]  kthread+0x114/0x130
> [ 4223.822633]  ? __kthread_cancel_work+0x40/0x40
> [ 4223.822633]  ret_from_fork+0x1f/0x30
> [ 4223.822633] Kernel panic - not syncing: Hard LOCKUP
> [ 4223.822634] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded Tainted: G S
>           OE     kernel version xxxx
> [ 4223.822634] Hardware name: vendorname xxxx 666-106,
> BIOS 01.01.02.03.01 05/15/2023
> [ 4223.822634] Call Trace:
> [ 4223.822634]  <NMI>
> [ 4223.822635]  dump_stack+0x6d/0x88
> [ 4223.822635]  panic+0x101/0x2d0
> [ 4223.822635]  ? ret_from_fork+0x11/0x30
> [ 4223.822635]  nmi_panic.cold.14+0xc/0xc
> [ 4223.822636]  watchdog_overflow_callback.cold.8+0x6d/0x81
> [ 4223.822636]  __perf_event_overflow+0x4f/0xf0
> [ 4223.822636]  handle_pmi_common+0x1ef/0x290
> [ 4223.822636]  ? __set_pte_vaddr+0x28/0x40
> [ 4223.822637]  ? flush_tlb_one_kernel+0xa/0x20
> [ 4223.822637]  ? __native_set_fixmap+0x24/0x30
> [ 4223.822637]  ? ghes_copy_tofrom_phys+0x70/0x100
> [ 4223.822637]  ? __ghes_peek_estatus.isra.16+0x49/0xa0
> [ 4223.822637]  intel_pmu_handle_irq+0xba/0x2b0
> [ 4223.822638]  perf_event_nmi_handler+0x24/0x40
> [ 4223.822638]  nmi_handle+0x4d/0xf0
> [ 4223.822638]  default_do_nmi+0x49/0x100
> [ 4223.822638]  exc_nmi+0x134/0x180
> [ 4223.822639]  end_repeat_nmi+0x16/0x67
> [ 4223.822639] RIP: 0010:qi_submit_sync+0x2c0/0x490
> [ 4223.822639] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 c1 48 8b
>   57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> f6 c6 10
>   74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39
> [ 4223.822640] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093
> [ 4223.822640] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 0000000000000005
> [ 4223.822640] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: ffff9f38401a8340
> [ 4223.822641] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 0000000000000000
> [ 4223.822641] R10: 0000000000000010 R11: 0000000000000018 R12: ffff9f384005e200
> [ 4223.822641] R13: 0000000000000004 R14: 0000000000000046 R15: 0000000000000004
> [ 4223.822641]  ? qi_submit_sync+0x2c0/0x490
> [ 4223.822642]  ? qi_submit_sync+0x2c0/0x490
> [ 4223.822642]  </NMI>
> [ 4223.822642]  qi_flush_dev_iotlb+0xb1/0xd0
> [ 4223.822642]  __dmar_remove_one_dev_info+0x224/0x250
> [ 4223.822643]  dmar_remove_one_dev_info+0x3e/0x50
> [ 4223.822643]  intel_iommu_release_device+0x1f/0x30
> [ 4223.822643]  iommu_release_device+0x33/0x60
> [ 4223.822643]  iommu_bus_notifier+0x7f/0x90
> [ 4223.822644]  blocking_notifier_call_chain+0x60/0x90
> [ 4223.822644]  device_del+0x2e5/0x420
> [ 4223.822644]  pci_remove_bus_device+0x70/0x110
> [ 4223.822644]  pciehp_unconfigure_device+0x7c/0x130
> [ 4223.822644]  pciehp_disable_slot+0x6b/0x100
> [ 4223.822645]  pciehp_handle_presence_or_link_change+0xd8/0x320
> [ 4223.822645]  pciehp_ist+0x176/0x180
> [ 4223.822645]  ? irq_finalize_oneshot.part.50+0x110/0x110
> [ 4223.822645]  irq_thread_fn+0x19/0x50
> [ 4223.822646]  irq_thread+0x104/0x190
> [ 4223.822646]  ? irq_forced_thread_fn+0x90/0x90
> [ 4223.822646]  ? irq_thread_check_affinity+0xe0/0xe0
> [ 4223.822646]  kthread+0x114/0x130
> [ 4223.822647]  ? __kthread_cancel_work+0x40/0x40
> [ 4223.822647]  ret_from_fork+0x1f/0x30
> [ 4223.822647] Kernel Offset: 0x6400000 from 0xffffffff81000000 (relocation
> range: 0xffffffff80000000-0xffffffffbfffffff)
> 
> Such issue could be triggered by all kinds of regular surprise removal
> hotplug operation. like:
> 
> 1. pull EP(endpoint device) out directly.
> 2. turn off EP's power.
> 3. bring the link down.
> etc.
> 
> this patch aims to work for regular safe removal and surprise removal
> unplug. these hot unplug handling process could be optimized for fix the
> ATS Invalidation hang issue by calling pci_dev_is_disconnected() in
> function devtlb_invalidation_with_pasid() to check target device state to
> avoid sending meaningless ATS Invalidation request to iommu when device is
> gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)
> 
> For safe removal, device wouldn't be removed until the whole software
> handling process is done, it wouldn't trigger the hard lock up issue
> caused by too long ATS Invalidation timeout wait. In safe removal path,
> device state isn't set to pci_channel_io_perm_failure in
> pciehp_unconfigure_device() by checking 'presence' parameter, calling
> pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return
> false there, wouldn't break the function.
> 
> For surprise removal, device state is set to pci_channel_io_perm_failure in
> pciehp_unconfigure_device(), means device is already gone (disconnected)
> call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
> return true to break the function not to send ATS Invalidation request to
> the disconnected device blindly, thus avoid to trigger further ITE fault,
> and ITE fault will block all invalidation request to be handled.
> furthermore retry the timeout request could trigger hard lockup.
> 
> safe removal (present) & surprise removal (not present)
> 
> pciehp_ist()
>     pciehp_handle_presence_or_link_change()
>       pciehp_disable_slot()
>         remove_board()
>           pciehp_unconfigure_device(presence) {
>             if (!presence)
>                  pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
>             }
> 
> this patch works for regular safe removal and surprise removal of ATS
> capable endpoint on PCIe switch downstream ports.

this is not the real fix. So this series may focus on the real fix (avoid
dead loop in intel iommu driver when ITE happens), and in the end add this
patch as an optimization.

> Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface")
> Tested-by: Haorong Ye <yehaorong@bytedance.com>
> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
> ---
>   drivers/iommu/intel/pasid.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 3239cefa4c33..953592125e4a 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -214,6 +214,9 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
>   	if (!info || !info->ats_enabled)
>   		return;
>   
> +	if (pci_dev_is_disconnected(to_pci_dev(dev)))
> +		return;
> +
>   	sid = info->bus << 8 | info->devfn;
>   	qdep = info->ats_qdep;
>   	pfsid = info->pfsid;

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-29  3:49 ` [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present Ethan Zhao
  2024-01-29  9:06   ` Tian, Kevin
@ 2024-01-29  9:33   ` Yi Liu
  1 sibling, 0 replies; 39+ messages in thread
From: Yi Liu @ 2024-01-29  9:33 UTC (permalink / raw)
  To: Ethan Zhao, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: kevin.tian, dwmw2, will, lukas, iommu, linux-kernel, linux-pci

On 2024/1/29 11:49, Ethan Zhao wrote:
> Because surprise removal could happen anytime, e.g. user could request safe
> removal to EP(endpoint device) via sysfs and brings its link down to do
> surprise removal cocurrently. such aggressive cases would cause ATS
> invalidation request issued to non-existence target device, then deadly
> loop to retry that request after ITE fault triggered in interrupt context.
> this patch aims to optimize the ITE handling by checking the target device
> presence state to avoid retrying the timeout request blindly, thus avoid
> hard lockup or system hang.
>

should include a Fix tag here?

> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
> ---
>   drivers/iommu/intel/dmar.c | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index 814134e9aa5a..2e214b43725c 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1272,6 +1272,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index,
>   {
>   	u32 fault;
>   	int head, tail;
> +	u64 iqe_err, ite_sid;
>   	struct q_inval *qi = iommu->qi;
>   	int shift = qi_shift(iommu);
>   
> @@ -1316,6 +1317,13 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index,
>   		tail = readl(iommu->reg + DMAR_IQT_REG);
>   		tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
>   
> +		/*
> +		 * SID field is valid only when the ITE field is Set in FSTS_REG
> +		 * see Intel VT-d spec r4.1, section 11.4.9.9
> +		 */
> +		iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
> +		ite_sid = DMAR_IQER_REG_ITESID(iqe_err);
> +
>   		writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
>   		pr_info("Invalidation Time-out Error (ITE) cleared\n");
>   
> @@ -1325,6 +1333,16 @@ static int qi_check_fault(struct intel_iommu *iommu, int index, int wait_index,
>   			head = (head - 2 + QI_LENGTH) % QI_LENGTH;
>   		} while (head != tail);
>   
> +		/*
> +		 * If got ITE, we need to check if the sid of ITE is the same as
> +		 * current ATS invalidation target device, if yes, don't try this
> +		 * request anymore if the target device isn't present.
> +		 * 0 value of ite_sid means old VT-d device, no ite_sid value.
> +		 */
> +		if (pdev && ite_sid && !pci_device_is_present(pdev) &&
> +			ite_sid == pci_dev_id(pci_physfn(pdev)))
> +			return -ETIMEDOUT;
> +
>   		if (qi->desc_status[wait_index] == QI_ABORT)
>   			return -EAGAIN;
>   	}

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 3/5] iommu/vt-d: simplify parameters of qi_submit_sync() ATS invalidation callers
  2024-01-29  3:49 ` [PATCH v12 3/5] iommu/vt-d: simplify parameters of qi_submit_sync() ATS invalidation callers Ethan Zhao
@ 2024-01-29  9:37   ` Yi Liu
  2024-01-30  5:43     ` Ethan Zhao
  0 siblings, 1 reply; 39+ messages in thread
From: Yi Liu @ 2024-01-29  9:37 UTC (permalink / raw)
  To: Ethan Zhao, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: kevin.tian, dwmw2, will, lukas, iommu, linux-kernel, linux-pci

On 2024/1/29 11:49, Ethan Zhao wrote:
> fold parameters back to struct device_domain_info *info instead of extract
> and pass them, thus decrease the number of the parameter passed for
> following functions
> 
> qi_flush_dev_iotlb()
> qi_flush_dev_iotlb_pasid()
> quirk_extra_dev_tlb_flush()
> 
> no function change.
> 
> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
> ---
>   drivers/iommu/intel/dmar.c   | 26 ++++++++++++++++++++++----
>   drivers/iommu/intel/iommu.c  | 29 +++++++----------------------
>   drivers/iommu/intel/iommu.h  | 17 ++++++++---------
>   drivers/iommu/intel/nested.c |  9 ++-------
>   drivers/iommu/intel/pasid.c  |  9 ++-------
>   drivers/iommu/intel/svm.c    | 17 ++++++++---------
>   6 files changed, 49 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index 23cb80d62a9a..ab5e1760bd59 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1517,11 +1517,20 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
>   	qi_submit_sync(iommu, &desc, 1, 0);
>   }
>   
> -void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> -			u16 qdep, u64 addr, unsigned mask)
> +void qi_flush_dev_iotlb(struct intel_iommu *iommu,
> +			struct device_domain_info *info, u64 addr,

If you want to fold the parameters, why iommu is left? info also includes
iommu pointer.

> +			unsigned int mask)
>   {
> +	u16 sid, qdep, pfsid;
>   	struct qi_desc desc;
>   
> +	if (!info || !info->ats_enabled)
> +		return;
> +
> +	sid = info->bus << 8 | info->devfn;
> +	qdep = info->ats_qdep;
> +	pfsid = info->pfsid;
> +
>   	/*
>   	 * VT-d spec, section 4.3:
>   	 *
> @@ -1590,11 +1599,20 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
>   }
>   
>   /* PASID-based device IOTLB Invalidate */
> -void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> -			      u32 pasid,  u16 qdep, u64 addr, unsigned int size_order)
> +void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu,
> +			      struct device_domain_info *info, u64 addr, u32 pasid,
> +			      unsigned int size_order)
>   {
>   	unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order - 1);
>   	struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
> +	u16 sid, qdep, pfsid;
> +
> +	if (!info || !dev_is_pci(info->dev))
> +		return;
> +
> +	sid = info->bus << 8 | info->devfn;
> +	qdep = info->ats_qdep;
> +	pfsid = info->pfsid;
>   
>   	/*
>   	 * VT-d spec, section 4.3:
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 6fb5f6fceea1..e5902944b3db 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1310,16 +1310,11 @@ static void iommu_disable_pci_caps(struct device_domain_info *info)
>   static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
>   				    u64 addr, unsigned int mask)
>   {
> -	u16 sid, qdep;
> -
>   	if (!info || !info->ats_enabled)
>   		return;
>   
> -	sid = info->bus << 8 | info->devfn;
> -	qdep = info->ats_qdep;
> -	qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
> -			   qdep, addr, mask);
> -	quirk_extra_dev_tlb_flush(info, addr, mask, IOMMU_NO_PASID, qdep);
> +	qi_flush_dev_iotlb(info->iommu, info, addr, mask);
> +	quirk_extra_dev_tlb_flush(info, addr, IOMMU_NO_PASID, mask);
>   }
>   
>   static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
> @@ -1342,11 +1337,7 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
>   		if (!info->ats_enabled)
>   			continue;
>   
> -		qi_flush_dev_iotlb_pasid(info->iommu,
> -					 PCI_DEVID(info->bus, info->devfn),
> -					 info->pfsid, dev_pasid->pasid,
> -					 info->ats_qdep, addr,
> -					 mask);
> +		qi_flush_dev_iotlb_pasid(info->iommu, info, addr, dev_pasid->pasid, mask);
>   	}
>   	spin_unlock_irqrestore(&domain->lock, flags);
>   }
> @@ -4990,22 +4981,16 @@ static void __init check_tylersburg_isoch(void)
>    *
>    * As a reminder, #6 will *NEED* this quirk as we enable nested translation.
>    */
> -void quirk_extra_dev_tlb_flush(struct device_domain_info *info,
> -			       unsigned long address, unsigned long mask,
> -			       u32 pasid, u16 qdep)
> +void quirk_extra_dev_tlb_flush(struct device_domain_info *info, u32 pasid,
> +			       unsigned long address, unsigned long mask)
>   {
> -	u16 sid;
> -
>   	if (likely(!info->dtlb_extra_inval))
>   		return;
>   
> -	sid = PCI_DEVID(info->bus, info->devfn);
>   	if (pasid == IOMMU_NO_PASID) {
> -		qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
> -				   qdep, address, mask);
> +		qi_flush_dev_iotlb(info->iommu, info, address, mask);
>   	} else {
> -		qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid,
> -					 pasid, qdep, address, mask);
> +		qi_flush_dev_iotlb_pasid(info->iommu, info, address, pasid, mask);
>   	}
>   }
>   
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index d02f916d8e59..f68f17476d85 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -1037,18 +1037,17 @@ void qi_flush_context(struct intel_iommu *iommu, u16 did,
>   		      u16 sid, u8 fm, u64 type);
>   void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
>   		    unsigned int size_order, u64 type);
> -void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> -			u16 qdep, u64 addr, unsigned mask);
> -
> +void qi_flush_dev_iotlb(struct intel_iommu *iommu,
> +			struct device_domain_info *info, u64 addr,
> +			unsigned int mask);
>   void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
>   		     unsigned long npages, bool ih);
>   
> -void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> -			      u32 pasid, u16 qdep, u64 addr,
> -			      unsigned int size_order);
> -void quirk_extra_dev_tlb_flush(struct device_domain_info *info,
> -			       unsigned long address, unsigned long pages,
> -			       u32 pasid, u16 qdep);
> +void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu,
> +			      struct device_domain_info *info, u64 addr,
> +			      u32 pasid, unsigned int size_order);
> +void quirk_extra_dev_tlb_flush(struct device_domain_info *info, u32 pasid,
> +			       unsigned long address, unsigned long mask);
>   void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
>   			  u32 pasid);
>   
> diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
> index f26c7f1c46cc..d15f72b55940 100644
> --- a/drivers/iommu/intel/nested.c
> +++ b/drivers/iommu/intel/nested.c
> @@ -78,18 +78,13 @@ static void nested_flush_dev_iotlb(struct dmar_domain *domain, u64 addr,
>   {
>   	struct device_domain_info *info;
>   	unsigned long flags;
> -	u16 sid, qdep;
>   
>   	spin_lock_irqsave(&domain->lock, flags);
>   	list_for_each_entry(info, &domain->devices, link) {
>   		if (!info->ats_enabled)
>   			continue;
> -		sid = info->bus << 8 | info->devfn;
> -		qdep = info->ats_qdep;
> -		qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
> -				   qdep, addr, mask);
> -		quirk_extra_dev_tlb_flush(info, addr, mask,
> -					  IOMMU_NO_PASID, qdep);
> +		qi_flush_dev_iotlb(info->iommu, info, addr, mask);
> +		quirk_extra_dev_tlb_flush(info, IOMMU_NO_PASID, addr, mask);
>   	}
>   	spin_unlock_irqrestore(&domain->lock, flags);
>   }
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 953592125e4a..5dacdea3cab7 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -208,7 +208,6 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
>   			       struct device *dev, u32 pasid)
>   {
>   	struct device_domain_info *info;
> -	u16 sid, qdep, pfsid;
>   
>   	info = dev_iommu_priv_get(dev);
>   	if (!info || !info->ats_enabled)
> @@ -217,10 +216,6 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
>   	if (pci_dev_is_disconnected(to_pci_dev(dev)))
>   		return;
>   
> -	sid = info->bus << 8 | info->devfn;
> -	qdep = info->ats_qdep;
> -	pfsid = info->pfsid;
> -
>   	/*
>   	 * When PASID 0 is used, it indicates RID2PASID(DMA request w/o PASID),
>   	 * devTLB flush w/o PASID should be used. For non-zero PASID under
> @@ -228,9 +223,9 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
>   	 * efficient to flush devTLB specific to the PASID.
>   	 */
>   	if (pasid == IOMMU_NO_PASID)
> -		qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);
> +		qi_flush_dev_iotlb(iommu, info, 0, 64 - VTD_PAGE_SHIFT);
>   	else
> -		qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 - VTD_PAGE_SHIFT);
> +		qi_flush_dev_iotlb_pasid(iommu, info, 0, pasid, 64 - VTD_PAGE_SHIFT);
>   }
>   
>   void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev,
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 40edd282903f..89168b31bf31 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -181,11 +181,10 @@ static void __flush_svm_range_dev(struct intel_svm *svm,
>   
>   	qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, address, pages, ih);
>   	if (info->ats_enabled) {
> -		qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid,
> -					 svm->pasid, sdev->qdep, address,
> +		qi_flush_dev_iotlb_pasid(sdev->iommu, info, address, svm->pasid,
>   					 order_base_2(pages));
> -		quirk_extra_dev_tlb_flush(info, address, order_base_2(pages),
> -					  svm->pasid, sdev->qdep);
> +		quirk_extra_dev_tlb_flush(info, svm->pasid, address,
> +					  order_base_2(pages));
>   	}
>   }
>   
> @@ -227,11 +226,11 @@ static void intel_flush_svm_all(struct intel_svm *svm)
>   
>   		qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, 0, -1UL, 0);
>   		if (info->ats_enabled) {
> -			qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid,
> -						 svm->pasid, sdev->qdep,
> -						 0, 64 - VTD_PAGE_SHIFT);
> -			quirk_extra_dev_tlb_flush(info, 0, 64 - VTD_PAGE_SHIFT,
> -						  svm->pasid, sdev->qdep);
> +			qi_flush_dev_iotlb_pasid(sdev->iommu, info, 0,
> +						 svm->pasid,
> +						 64 - VTD_PAGE_SHIFT);
> +			quirk_extra_dev_tlb_flush(info, svm->pasid, 0,
> +						  64 - VTD_PAGE_SHIFT);
>   		}
>   	}
>   	rcu_read_unlock();

-- 
Regards,
Yi Liu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-29  9:06   ` Tian, Kevin
  2024-01-29  9:21     ` Yi Liu
@ 2024-01-29 14:48     ` Baolu Lu
  2024-01-30  3:28       ` Tian, Kevin
  2024-01-30  8:43       ` Ethan Zhao
  1 sibling, 2 replies; 39+ messages in thread
From: Baolu Lu @ 2024-01-29 14:48 UTC (permalink / raw)
  To: Tian, Kevin, Ethan Zhao, bhelgaas, robin.murphy, jgg
  Cc: baolu.lu, dwmw2, will, lukas, Liu, Yi L, iommu, linux-kernel, linux-pci

On 2024/1/29 17:06, Tian, Kevin wrote:
>> From: Ethan Zhao<haifeng.zhao@linux.intel.com>
>> Sent: Monday, January 29, 2024 11:49 AM
>>
>> Because surprise removal could happen anytime, e.g. user could request safe
>> removal to EP(endpoint device) via sysfs and brings its link down to do
>> surprise removal cocurrently. such aggressive cases would cause ATS
>> invalidation request issued to non-existence target device, then deadly
>> loop to retry that request after ITE fault triggered in interrupt context.
>> this patch aims to optimize the ITE handling by checking the target device
>> presence state to avoid retrying the timeout request blindly, thus avoid
>> hard lockup or system hang.
>>
>> Signed-off-by: Ethan Zhao<haifeng.zhao@linux.intel.com>
>> ---
>>   drivers/iommu/intel/dmar.c | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
>> index 814134e9aa5a..2e214b43725c 100644
>> --- a/drivers/iommu/intel/dmar.c
>> +++ b/drivers/iommu/intel/dmar.c
>> @@ -1272,6 +1272,7 @@ static int qi_check_fault(struct intel_iommu
>> *iommu, int index, int wait_index,
>>   {
>>   	u32 fault;
>>   	int head, tail;
>> +	u64 iqe_err, ite_sid;
>>   	struct q_inval *qi = iommu->qi;
>>   	int shift = qi_shift(iommu);
>>
>> @@ -1316,6 +1317,13 @@ static int qi_check_fault(struct intel_iommu
>> *iommu, int index, int wait_index,
>>   		tail = readl(iommu->reg + DMAR_IQT_REG);
>>   		tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
>>
>> +		/*
>> +		 * SID field is valid only when the ITE field is Set in FSTS_REG
>> +		 * see Intel VT-d spec r4.1, section 11.4.9.9
>> +		 */
>> +		iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
>> +		ite_sid = DMAR_IQER_REG_ITESID(iqe_err);
>> +
>>   		writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
>>   		pr_info("Invalidation Time-out Error (ITE) cleared\n");
>>
>> @@ -1325,6 +1333,16 @@ static int qi_check_fault(struct intel_iommu
>> *iommu, int index, int wait_index,
>>   			head = (head - 2 + QI_LENGTH) % QI_LENGTH;
>>   		} while (head != tail);
>>
>> +		/*
>> +		 * If got ITE, we need to check if the sid of ITE is the same as
>> +		 * current ATS invalidation target device, if yes, don't try this
>> +		 * request anymore if the target device isn't present.
>> +		 * 0 value of ite_sid means old VT-d device, no ite_sid value.
>> +		 */
>> +		if (pdev && ite_sid && !pci_device_is_present(pdev) &&
>> +			ite_sid == pci_dev_id(pci_physfn(pdev)))
>> +			return -ETIMEDOUT;
>> +
> since the hardware already reports source id leading to timeout, can't we
> just find the pci_dev according to reported ite_sid? this is a slow path (either
> due to device in bad state or removed) hence it's not necessary to add more
> intelligence to pass the pci_dev in, leading to only a partial fix can be backported.
> 
> It's also more future-proof, say if one day the driver allows batching invalidation
> requests for multiple devices then no need to pass in a list of devices.

I have ever thought about this solution and gave up in the end due to
the locking issue.

A batch of qi requests must be handled in the spin lock critical region
to enforce that only one batch of requests is submitted at a time.
Searching pci_dev in this locking region might result in nested locking
issues, and I haven't found a good solution for this yet.

Unless someone can bring up a better solution, perhaps we have to live
in a world where only single device TLB invalidation request in a batch
could be submitted to the queue.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-29 14:48     ` Baolu Lu
@ 2024-01-30  3:28       ` Tian, Kevin
  2024-01-30  8:43       ` Ethan Zhao
  1 sibling, 0 replies; 39+ messages in thread
From: Tian, Kevin @ 2024-01-30  3:28 UTC (permalink / raw)
  To: Baolu Lu, Ethan Zhao, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, Liu, Yi L, iommu, linux-kernel, linux-pci

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Monday, January 29, 2024 10:49 PM
> 
> On 2024/1/29 17:06, Tian, Kevin wrote:
> >> From: Ethan Zhao<haifeng.zhao@linux.intel.com>
> >> Sent: Monday, January 29, 2024 11:49 AM
> >>
> >> Because surprise removal could happen anytime, e.g. user could request
> safe
> >> removal to EP(endpoint device) via sysfs and brings its link down to do
> >> surprise removal cocurrently. such aggressive cases would cause ATS
> >> invalidation request issued to non-existence target device, then deadly
> >> loop to retry that request after ITE fault triggered in interrupt context.
> >> this patch aims to optimize the ITE handling by checking the target device
> >> presence state to avoid retrying the timeout request blindly, thus avoid
> >> hard lockup or system hang.
> >>
> >> Signed-off-by: Ethan Zhao<haifeng.zhao@linux.intel.com>
> >> ---
> >>   drivers/iommu/intel/dmar.c | 18 ++++++++++++++++++
> >>   1 file changed, 18 insertions(+)
> >>
> >> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> >> index 814134e9aa5a..2e214b43725c 100644
> >> --- a/drivers/iommu/intel/dmar.c
> >> +++ b/drivers/iommu/intel/dmar.c
> >> @@ -1272,6 +1272,7 @@ static int qi_check_fault(struct intel_iommu
> >> *iommu, int index, int wait_index,
> >>   {
> >>   	u32 fault;
> >>   	int head, tail;
> >> +	u64 iqe_err, ite_sid;
> >>   	struct q_inval *qi = iommu->qi;
> >>   	int shift = qi_shift(iommu);
> >>
> >> @@ -1316,6 +1317,13 @@ static int qi_check_fault(struct intel_iommu
> >> *iommu, int index, int wait_index,
> >>   		tail = readl(iommu->reg + DMAR_IQT_REG);
> >>   		tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
> >>
> >> +		/*
> >> +		 * SID field is valid only when the ITE field is Set in FSTS_REG
> >> +		 * see Intel VT-d spec r4.1, section 11.4.9.9
> >> +		 */
> >> +		iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
> >> +		ite_sid = DMAR_IQER_REG_ITESID(iqe_err);
> >> +
> >>   		writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
> >>   		pr_info("Invalidation Time-out Error (ITE) cleared\n");
> >>
> >> @@ -1325,6 +1333,16 @@ static int qi_check_fault(struct intel_iommu
> >> *iommu, int index, int wait_index,
> >>   			head = (head - 2 + QI_LENGTH) % QI_LENGTH;
> >>   		} while (head != tail);
> >>
> >> +		/*
> >> +		 * If got ITE, we need to check if the sid of ITE is the same as
> >> +		 * current ATS invalidation target device, if yes, don't try this
> >> +		 * request anymore if the target device isn't present.
> >> +		 * 0 value of ite_sid means old VT-d device, no ite_sid value.
> >> +		 */
> >> +		if (pdev && ite_sid && !pci_device_is_present(pdev) &&
> >> +			ite_sid == pci_dev_id(pci_physfn(pdev)))
> >> +			return -ETIMEDOUT;
> >> +
> > since the hardware already reports source id leading to timeout, can't we
> > just find the pci_dev according to reported ite_sid? this is a slow path
> (either
> > due to device in bad state or removed) hence it's not necessary to add
> more
> > intelligence to pass the pci_dev in, leading to only a partial fix can be
> backported.
> >
> > It's also more future-proof, say if one day the driver allows batching
> invalidation
> > requests for multiple devices then no need to pass in a list of devices.
> 
> I have ever thought about this solution and gave up in the end due to
> the locking issue.
> 
> A batch of qi requests must be handled in the spin lock critical region
> to enforce that only one batch of requests is submitted at a time.
> Searching pci_dev in this locking region might result in nested locking
> issues, and I haven't found a good solution for this yet.
> 

in spin-lock you just get the sid.

searching pci_dev can be done outside of the critical region. anyway
the handling of -EAGAIN is already after spin_unlock.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-29  9:21     ` Yi Liu
@ 2024-01-30  5:12       ` Ethan Zhao
  2024-01-30  6:22         ` Tian, Kevin
  0 siblings, 1 reply; 39+ messages in thread
From: Ethan Zhao @ 2024-01-30  5:12 UTC (permalink / raw)
  To: Yi Liu, Tian, Kevin, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, iommu, linux-kernel, linux-pci


On 1/29/2024 5:21 PM, Yi Liu wrote:
> On 2024/1/29 17:06, Tian, Kevin wrote:
>>> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
>>> Sent: Monday, January 29, 2024 11:49 AM
>>>
>>> Because surprise removal could happen anytime, e.g. user could 
>>> request safe
>>> removal to EP(endpoint device) via sysfs and brings its link down to do
>>> surprise removal cocurrently. such aggressive cases would cause ATS
>>> invalidation request issued to non-existence target device, then deadly
>>> loop to retry that request after ITE fault triggered in interrupt 
>>> context.
>>> this patch aims to optimize the ITE handling by checking the target 
>>> device
>>> presence state to avoid retrying the timeout request blindly, thus 
>>> avoid
>>> hard lockup or system hang.
>>>
>>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>>> ---
>>>   drivers/iommu/intel/dmar.c | 18 ++++++++++++++++++
>>>   1 file changed, 18 insertions(+)
>>>
>>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
>>> index 814134e9aa5a..2e214b43725c 100644
>>> --- a/drivers/iommu/intel/dmar.c
>>> +++ b/drivers/iommu/intel/dmar.c
>>> @@ -1272,6 +1272,7 @@ static int qi_check_fault(struct intel_iommu
>>> *iommu, int index, int wait_index,
>>>   {
>>>       u32 fault;
>>>       int head, tail;
>>> +    u64 iqe_err, ite_sid;
>>>       struct q_inval *qi = iommu->qi;
>>>       int shift = qi_shift(iommu);
>>>
>>> @@ -1316,6 +1317,13 @@ static int qi_check_fault(struct intel_iommu
>>> *iommu, int index, int wait_index,
>>>           tail = readl(iommu->reg + DMAR_IQT_REG);
>>>           tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
>>>
>>> +        /*
>>> +         * SID field is valid only when the ITE field is Set in 
>>> FSTS_REG
>>> +         * see Intel VT-d spec r4.1, section 11.4.9.9
>>> +         */
>>> +        iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
>>> +        ite_sid = DMAR_IQER_REG_ITESID(iqe_err);
>>> +
>>>           writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
>>>           pr_info("Invalidation Time-out Error (ITE) cleared\n");
>>>
>>> @@ -1325,6 +1333,16 @@ static int qi_check_fault(struct intel_iommu
>>> *iommu, int index, int wait_index,
>>>               head = (head - 2 + QI_LENGTH) % QI_LENGTH;
>>>           } while (head != tail);
>>>
>>> +        /*
>>> +         * If got ITE, we need to check if the sid of ITE is the 
>>> same as
>>> +         * current ATS invalidation target device, if yes, don't 
>>> try this
>>> +         * request anymore if the target device isn't present.
>>> +         * 0 value of ite_sid means old VT-d device, no ite_sid value.
>>> +         */
>>> +        if (pdev && ite_sid && !pci_device_is_present(pdev) &&
>>> +            ite_sid == pci_dev_id(pci_physfn(pdev)))
>>> +            return -ETIMEDOUT;
>>> +
>>
>> since the hardware already reports source id leading to timeout, 
>> can't we
>> just find the pci_dev according to reported ite_sid? this is a slow 
>> path (either
>> due to device in bad state or removed) hence it's not necessary to 
>> add more
>> intelligence to pass the pci_dev in, leading to only a partial fix 
>> can be backported.
>>
>> It's also more future-proof, say if one day the driver allows 
>> batching invalidation
>> requests for multiple devices then no need to pass in a list of devices.
>>
>> Then it's easier to backport a full fix.
>
> May consider pci_get_domain_bus_and_slot() or
> pci_find_bus()/pci_get_slot(). But I doubt if the pci_dev is still 
> tracked
> in the bus or a kind of dev list in the device hot removal case. So Ethan
> may need to check.

Perhaps it is too late to call pci_find_bus() or 
pci_get_domain_bus_and_slot() to get the

device instance from this notifier registered as BUS_NOTIFY_REMOVED_DEVICE

action. if the device is still there in bus list, *must* be a bug of 
device subsystem as

*removed* device.

but if we call iommu_release_device() in iommu_bus_notifier() for 
BUS_NOTIFY_DEL_DEVICE

action, there should be opportuniy to get the device instance, but that 
change need

more evaluation about side effect.

furthermore, iommu never cross domain number per context table 
defination in VT-d

spec, not mean, domain number in a system never will be !0, per my 
understanding.


Thanks,

Ethan

>
> Regards,
> Yi Liu
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 1/5] PCI: make pci_dev_is_disconnected() helper public for other drivers
  2024-01-29  8:50   ` Tian, Kevin
@ 2024-01-30  5:23     ` Ethan Zhao
  2024-01-30  5:25     ` Ethan Zhao
  1 sibling, 0 replies; 39+ messages in thread
From: Ethan Zhao @ 2024-01-30  5:23 UTC (permalink / raw)
  To: Tian, Kevin, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, Liu, Yi L, iommu, linux-kernel, linux-pci,
	Haorong Ye


On 1/29/2024 4:50 PM, Tian, Kevin wrote:
>> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> Sent: Monday, January 29, 2024 11:49 AM
>>
>> Make pci_dev_is_disconnected() public so that it can be called from
>> Intel VT-d driver to quickly fix/workaround the surprise removal
>> unplug hang issue for those ATS capable devices on PCIe switch downstream
>> hotplug capable ports.
>>
>> Beside pci_device_is_present() function, this one has no config space
> s/Beside/Unlike/

Beside means 'comparing with' here, perhaps it is not clear enough

or there is syntax error if used here ?



Thanks,

Ethan

>
>> space access, so is light enough to optimize the normal pure surprise
>> removal and safe removal flow.
> somehow this paragraph belongs to the patch using it. The 1st paragraph
> is sufficient for the reason of exposure.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 1/5] PCI: make pci_dev_is_disconnected() helper public for other drivers
  2024-01-29  8:50   ` Tian, Kevin
  2024-01-30  5:23     ` Ethan Zhao
@ 2024-01-30  5:25     ` Ethan Zhao
  2024-01-30  6:23       ` Tian, Kevin
  1 sibling, 1 reply; 39+ messages in thread
From: Ethan Zhao @ 2024-01-30  5:25 UTC (permalink / raw)
  To: Tian, Kevin, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, Liu, Yi L, iommu, linux-kernel, linux-pci,
	Haorong Ye


On 1/29/2024 4:50 PM, Tian, Kevin wrote:
>> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> Sent: Monday, January 29, 2024 11:49 AM
>>
>> Make pci_dev_is_disconnected() public so that it can be called from
>> Intel VT-d driver to quickly fix/workaround the surprise removal
>> unplug hang issue for those ATS capable devices on PCIe switch downstream
>> hotplug capable ports.
>>
>> Beside pci_device_is_present() function, this one has no config space
> s/Beside/Unlike/
>
>> space access, so is light enough to optimize the normal pure surprise
>> removal and safe removal flow.
> somehow this paragraph belongs to the patch using it. The 1st paragraph
> is sufficient for the reason of exposure.

pci_device_is_present() is aleardy exported symbol, why not just use it ?

Also try to make it clear, the difference and justification.


Thanks,
Ethan


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 2/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected
  2024-01-29  9:32   ` Yi Liu
@ 2024-01-30  5:37     ` Ethan Zhao
  2024-01-31  4:25       ` Yi Liu
  0 siblings, 1 reply; 39+ messages in thread
From: Ethan Zhao @ 2024-01-30  5:37 UTC (permalink / raw)
  To: Yi Liu, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: kevin.tian, dwmw2, will, lukas, iommu, linux-kernel, linux-pci,
	Haorong Ye


On 1/29/2024 5:32 PM, Yi Liu wrote:
> On 2024/1/29 11:49, Ethan Zhao wrote:
>> For those endpoint devices connect to system via hotplug capable ports,
>> users could request a hot reset to the device by flapping device's link
>> through setting the slot's link control register, as pciehp_ist() DLLSC
>> interrupt sequence response, pciehp will unload the device driver and
>> then power it off. thus cause an IOMMU device-TLB invalidation (Intel
>> VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for 
>> non-existence
>> target device to be sent and deadly loop to retry that request after ITE
>> fault triggered in interrupt context.
>>
>> That would cause following continuous hard lockup warning and system 
>> hang
>>
>> [ 4211.433662] pcieport 0000:17:01.0: pciehp: Slot(108): Link Down
>> [ 4211.433664] pcieport 0000:17:01.0: pciehp: Slot(108): Card not 
>> present
>> [ 4223.822591] NMI watchdog: Watchdog detected hard LOCKUP on cpu 144
>> [ 4223.822622] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded 
>> Tainted: G S
>>           OE    kernel version xxxx
>> [ 4223.822623] Hardware name: vendorname xxxx 666-106,
>> BIOS 01.01.02.03.01 05/15/2023
>> [ 4223.822623] RIP: 0010:qi_submit_sync+0x2c0/0x490
>> [ 4223.822624] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 
>> 95 c1 48 8b
>>   57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 
>> <40> f6 c6 1
>> 0 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39
>> [ 4223.822624] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093
>> [ 4223.822625] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 
>> 0000000000000005
>> [ 4223.822625] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: 
>> ffff9f38401a8340
>> [ 4223.822625] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 
>> 0000000000000000
>> [ 4223.822626] R10: 0000000000000010 R11: 0000000000000018 R12: 
>> ffff9f384005e200
>> [ 4223.822626] R13: 0000000000000004 R14: 0000000000000046 R15: 
>> 0000000000000004
>> [ 4223.822626] FS:  0000000000000000(0000) GS:ffffa237ae400000(0000)
>> knlGS:0000000000000000
>> [ 4223.822627] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 4223.822627] CR2: 00007ffe86515d80 CR3: 000002fd3000a001 CR4: 
>> 0000000000770ee0
>> [ 4223.822627] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>> 0000000000000000
>> [ 4223.822628] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 
>> 0000000000000400
>> [ 4223.822628] PKRU: 55555554
>> [ 4223.822628] Call Trace:
>> [ 4223.822628]  qi_flush_dev_iotlb+0xb1/0xd0
>> [ 4223.822628]  __dmar_remove_one_dev_info+0x224/0x250
>> [ 4223.822629]  dmar_remove_one_dev_info+0x3e/0x50
>> [ 4223.822629]  intel_iommu_release_device+0x1f/0x30
>> [ 4223.822629]  iommu_release_device+0x33/0x60
>> [ 4223.822629]  iommu_bus_notifier+0x7f/0x90
>> [ 4223.822630]  blocking_notifier_call_chain+0x60/0x90
>> [ 4223.822630]  device_del+0x2e5/0x420
>> [ 4223.822630]  pci_remove_bus_device+0x70/0x110
>> [ 4223.822630]  pciehp_unconfigure_device+0x7c/0x130
>> [ 4223.822631]  pciehp_disable_slot+0x6b/0x100
>> [ 4223.822631]  pciehp_handle_presence_or_link_change+0xd8/0x320
>> [ 4223.822631]  pciehp_ist+0x176/0x180
>> [ 4223.822631]  ? irq_finalize_oneshot.part.50+0x110/0x110
>> [ 4223.822632]  irq_thread_fn+0x19/0x50
>> [ 4223.822632]  irq_thread+0x104/0x190
>> [ 4223.822632]  ? irq_forced_thread_fn+0x90/0x90
>> [ 4223.822632]  ? irq_thread_check_affinity+0xe0/0xe0
>> [ 4223.822633]  kthread+0x114/0x130
>> [ 4223.822633]  ? __kthread_cancel_work+0x40/0x40
>> [ 4223.822633]  ret_from_fork+0x1f/0x30
>> [ 4223.822633] Kernel panic - not syncing: Hard LOCKUP
>> [ 4223.822634] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded 
>> Tainted: G S
>>           OE     kernel version xxxx
>> [ 4223.822634] Hardware name: vendorname xxxx 666-106,
>> BIOS 01.01.02.03.01 05/15/2023
>> [ 4223.822634] Call Trace:
>> [ 4223.822634]  <NMI>
>> [ 4223.822635]  dump_stack+0x6d/0x88
>> [ 4223.822635]  panic+0x101/0x2d0
>> [ 4223.822635]  ? ret_from_fork+0x11/0x30
>> [ 4223.822635]  nmi_panic.cold.14+0xc/0xc
>> [ 4223.822636]  watchdog_overflow_callback.cold.8+0x6d/0x81
>> [ 4223.822636]  __perf_event_overflow+0x4f/0xf0
>> [ 4223.822636]  handle_pmi_common+0x1ef/0x290
>> [ 4223.822636]  ? __set_pte_vaddr+0x28/0x40
>> [ 4223.822637]  ? flush_tlb_one_kernel+0xa/0x20
>> [ 4223.822637]  ? __native_set_fixmap+0x24/0x30
>> [ 4223.822637]  ? ghes_copy_tofrom_phys+0x70/0x100
>> [ 4223.822637]  ? __ghes_peek_estatus.isra.16+0x49/0xa0
>> [ 4223.822637]  intel_pmu_handle_irq+0xba/0x2b0
>> [ 4223.822638]  perf_event_nmi_handler+0x24/0x40
>> [ 4223.822638]  nmi_handle+0x4d/0xf0
>> [ 4223.822638]  default_do_nmi+0x49/0x100
>> [ 4223.822638]  exc_nmi+0x134/0x180
>> [ 4223.822639]  end_repeat_nmi+0x16/0x67
>> [ 4223.822639] RIP: 0010:qi_submit_sync+0x2c0/0x490
>> [ 4223.822639] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 
>> 95 c1 48 8b
>>   57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 
>> <40> f6 c6 10
>>   74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39
>> [ 4223.822640] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093
>> [ 4223.822640] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 
>> 0000000000000005
>> [ 4223.822640] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: 
>> ffff9f38401a8340
>> [ 4223.822641] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 
>> 0000000000000000
>> [ 4223.822641] R10: 0000000000000010 R11: 0000000000000018 R12: 
>> ffff9f384005e200
>> [ 4223.822641] R13: 0000000000000004 R14: 0000000000000046 R15: 
>> 0000000000000004
>> [ 4223.822641]  ? qi_submit_sync+0x2c0/0x490
>> [ 4223.822642]  ? qi_submit_sync+0x2c0/0x490
>> [ 4223.822642]  </NMI>
>> [ 4223.822642]  qi_flush_dev_iotlb+0xb1/0xd0
>> [ 4223.822642]  __dmar_remove_one_dev_info+0x224/0x250
>> [ 4223.822643]  dmar_remove_one_dev_info+0x3e/0x50
>> [ 4223.822643]  intel_iommu_release_device+0x1f/0x30
>> [ 4223.822643]  iommu_release_device+0x33/0x60
>> [ 4223.822643]  iommu_bus_notifier+0x7f/0x90
>> [ 4223.822644]  blocking_notifier_call_chain+0x60/0x90
>> [ 4223.822644]  device_del+0x2e5/0x420
>> [ 4223.822644]  pci_remove_bus_device+0x70/0x110
>> [ 4223.822644]  pciehp_unconfigure_device+0x7c/0x130
>> [ 4223.822644]  pciehp_disable_slot+0x6b/0x100
>> [ 4223.822645]  pciehp_handle_presence_or_link_change+0xd8/0x320
>> [ 4223.822645]  pciehp_ist+0x176/0x180
>> [ 4223.822645]  ? irq_finalize_oneshot.part.50+0x110/0x110
>> [ 4223.822645]  irq_thread_fn+0x19/0x50
>> [ 4223.822646]  irq_thread+0x104/0x190
>> [ 4223.822646]  ? irq_forced_thread_fn+0x90/0x90
>> [ 4223.822646]  ? irq_thread_check_affinity+0xe0/0xe0
>> [ 4223.822646]  kthread+0x114/0x130
>> [ 4223.822647]  ? __kthread_cancel_work+0x40/0x40
>> [ 4223.822647]  ret_from_fork+0x1f/0x30
>> [ 4223.822647] Kernel Offset: 0x6400000 from 0xffffffff81000000 
>> (relocation
>> range: 0xffffffff80000000-0xffffffffbfffffff)
>>
>> Such issue could be triggered by all kinds of regular surprise removal
>> hotplug operation. like:
>>
>> 1. pull EP(endpoint device) out directly.
>> 2. turn off EP's power.
>> 3. bring the link down.
>> etc.
>>
>> this patch aims to work for regular safe removal and surprise removal
>> unplug. these hot unplug handling process could be optimized for fix the
>> ATS Invalidation hang issue by calling pci_dev_is_disconnected() in
>> function devtlb_invalidation_with_pasid() to check target device 
>> state to
>> avoid sending meaningless ATS Invalidation request to iommu when 
>> device is
>> gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)
>>
>> For safe removal, device wouldn't be removed until the whole software
>> handling process is done, it wouldn't trigger the hard lock up issue
>> caused by too long ATS Invalidation timeout wait. In safe removal path,
>> device state isn't set to pci_channel_io_perm_failure in
>> pciehp_unconfigure_device() by checking 'presence' parameter, calling
>> pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will 
>> return
>> false there, wouldn't break the function.
>>
>> For surprise removal, device state is set to 
>> pci_channel_io_perm_failure in
>> pciehp_unconfigure_device(), means device is already gone (disconnected)
>> call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
>> return true to break the function not to send ATS Invalidation 
>> request to
>> the disconnected device blindly, thus avoid to trigger further ITE 
>> fault,
>> and ITE fault will block all invalidation request to be handled.
>> furthermore retry the timeout request could trigger hard lockup.
>>
>> safe removal (present) & surprise removal (not present)
>>
>> pciehp_ist()
>>     pciehp_handle_presence_or_link_change()
>>       pciehp_disable_slot()
>>         remove_board()
>>           pciehp_unconfigure_device(presence) {
>>             if (!presence)
>>                  pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
>>             }
>>
>> this patch works for regular safe removal and surprise removal of ATS
>> capable endpoint on PCIe switch downstream ports.
>
> this is not the real fix. So this series may focus on the real fix (avoid
> dead loop in intel iommu driver when ITE happens), and in the end add 
> this
> patch as an optimization.

This is the second time I brought it on top of other patches as Baolu 
perfers

Bjorn also suggested to take this one as optimization addition to others.

Anyway, just the order in this patch list, the same result after applied.

to solve customer issue, this one is needed.


Thanks,

Ethan

>
>> Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table 
>> interface")
>> Tested-by: Haorong Ye <yehaorong@bytedance.com>
>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> ---
>>   drivers/iommu/intel/pasid.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
>> index 3239cefa4c33..953592125e4a 100644
>> --- a/drivers/iommu/intel/pasid.c
>> +++ b/drivers/iommu/intel/pasid.c
>> @@ -214,6 +214,9 @@ devtlb_invalidation_with_pasid(struct intel_iommu 
>> *iommu,
>>       if (!info || !info->ats_enabled)
>>           return;
>>   +    if (pci_dev_is_disconnected(to_pci_dev(dev)))
>> +        return;
>> +
>>       sid = info->bus << 8 | info->devfn;
>>       qdep = info->ats_qdep;
>>       pfsid = info->pfsid;
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 3/5] iommu/vt-d: simplify parameters of qi_submit_sync() ATS invalidation callers
  2024-01-29  9:37   ` Yi Liu
@ 2024-01-30  5:43     ` Ethan Zhao
  0 siblings, 0 replies; 39+ messages in thread
From: Ethan Zhao @ 2024-01-30  5:43 UTC (permalink / raw)
  To: Yi Liu, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: kevin.tian, dwmw2, will, lukas, iommu, linux-kernel, linux-pci


On 1/29/2024 5:37 PM, Yi Liu wrote:
> On 2024/1/29 11:49, Ethan Zhao wrote:
>> fold parameters back to struct device_domain_info *info instead of 
>> extract
>> and pass them, thus decrease the number of the parameter passed for
>> following functions
>>
>> qi_flush_dev_iotlb()
>> qi_flush_dev_iotlb_pasid()
>> quirk_extra_dev_tlb_flush()
>>
>> no function change.
>>
>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> ---
>>   drivers/iommu/intel/dmar.c   | 26 ++++++++++++++++++++++----
>>   drivers/iommu/intel/iommu.c  | 29 +++++++----------------------
>>   drivers/iommu/intel/iommu.h  | 17 ++++++++---------
>>   drivers/iommu/intel/nested.c |  9 ++-------
>>   drivers/iommu/intel/pasid.c  |  9 ++-------
>>   drivers/iommu/intel/svm.c    | 17 ++++++++---------
>>   6 files changed, 49 insertions(+), 58 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
>> index 23cb80d62a9a..ab5e1760bd59 100644
>> --- a/drivers/iommu/intel/dmar.c
>> +++ b/drivers/iommu/intel/dmar.c
>> @@ -1517,11 +1517,20 @@ void qi_flush_iotlb(struct intel_iommu 
>> *iommu, u16 did, u64 addr,
>>       qi_submit_sync(iommu, &desc, 1, 0);
>>   }
>>   -void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 
>> pfsid,
>> -            u16 qdep, u64 addr, unsigned mask)
>> +void qi_flush_dev_iotlb(struct intel_iommu *iommu,
>> +            struct device_domain_info *info, u64 addr,
>
> If you want to fold the parameters, why iommu is left? info also includes
> iommu pointer.

Good catch.

No reason to leave it there.


Thanks,

Ethan

>
>> +            unsigned int mask)
>>   {
>> +    u16 sid, qdep, pfsid;
>>       struct qi_desc desc;
>>   +    if (!info || !info->ats_enabled)
>> +        return;
>> +
>> +    sid = info->bus << 8 | info->devfn;
>> +    qdep = info->ats_qdep;
>> +    pfsid = info->pfsid;
>> +
>>       /*
>>        * VT-d spec, section 4.3:
>>        *
>> @@ -1590,11 +1599,20 @@ void qi_flush_piotlb(struct intel_iommu 
>> *iommu, u16 did, u32 pasid, u64 addr,
>>   }
>>     /* PASID-based device IOTLB Invalidate */
>> -void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, 
>> u16 pfsid,
>> -                  u32 pasid,  u16 qdep, u64 addr, unsigned int 
>> size_order)
>> +void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu,
>> +                  struct device_domain_info *info, u64 addr, u32 pasid,
>> +                  unsigned int size_order)
>>   {
>>       unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order - 1);
>>       struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
>> +    u16 sid, qdep, pfsid;
>> +
>> +    if (!info || !dev_is_pci(info->dev))
>> +        return;
>> +
>> +    sid = info->bus << 8 | info->devfn;
>> +    qdep = info->ats_qdep;
>> +    pfsid = info->pfsid;
>>         /*
>>        * VT-d spec, section 4.3:
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index 6fb5f6fceea1..e5902944b3db 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -1310,16 +1310,11 @@ static void iommu_disable_pci_caps(struct 
>> device_domain_info *info)
>>   static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
>>                       u64 addr, unsigned int mask)
>>   {
>> -    u16 sid, qdep;
>> -
>>       if (!info || !info->ats_enabled)
>>           return;
>>   -    sid = info->bus << 8 | info->devfn;
>> -    qdep = info->ats_qdep;
>> -    qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
>> -               qdep, addr, mask);
>> -    quirk_extra_dev_tlb_flush(info, addr, mask, IOMMU_NO_PASID, qdep);
>> +    qi_flush_dev_iotlb(info->iommu, info, addr, mask);
>> +    quirk_extra_dev_tlb_flush(info, addr, IOMMU_NO_PASID, mask);
>>   }
>>     static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
>> @@ -1342,11 +1337,7 @@ static void iommu_flush_dev_iotlb(struct 
>> dmar_domain *domain,
>>           if (!info->ats_enabled)
>>               continue;
>>   -        qi_flush_dev_iotlb_pasid(info->iommu,
>> -                     PCI_DEVID(info->bus, info->devfn),
>> -                     info->pfsid, dev_pasid->pasid,
>> -                     info->ats_qdep, addr,
>> -                     mask);
>> +        qi_flush_dev_iotlb_pasid(info->iommu, info, addr, 
>> dev_pasid->pasid, mask);
>>       }
>>       spin_unlock_irqrestore(&domain->lock, flags);
>>   }
>> @@ -4990,22 +4981,16 @@ static void __init check_tylersburg_isoch(void)
>>    *
>>    * As a reminder, #6 will *NEED* this quirk as we enable nested 
>> translation.
>>    */
>> -void quirk_extra_dev_tlb_flush(struct device_domain_info *info,
>> -                   unsigned long address, unsigned long mask,
>> -                   u32 pasid, u16 qdep)
>> +void quirk_extra_dev_tlb_flush(struct device_domain_info *info, u32 
>> pasid,
>> +                   unsigned long address, unsigned long mask)
>>   {
>> -    u16 sid;
>> -
>>       if (likely(!info->dtlb_extra_inval))
>>           return;
>>   -    sid = PCI_DEVID(info->bus, info->devfn);
>>       if (pasid == IOMMU_NO_PASID) {
>> -        qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
>> -                   qdep, address, mask);
>> +        qi_flush_dev_iotlb(info->iommu, info, address, mask);
>>       } else {
>> -        qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid,
>> -                     pasid, qdep, address, mask);
>> +        qi_flush_dev_iotlb_pasid(info->iommu, info, address, pasid, 
>> mask);
>>       }
>>   }
>>   diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
>> index d02f916d8e59..f68f17476d85 100644
>> --- a/drivers/iommu/intel/iommu.h
>> +++ b/drivers/iommu/intel/iommu.h
>> @@ -1037,18 +1037,17 @@ void qi_flush_context(struct intel_iommu 
>> *iommu, u16 did,
>>                 u16 sid, u8 fm, u64 type);
>>   void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
>>               unsigned int size_order, u64 type);
>> -void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
>> -            u16 qdep, u64 addr, unsigned mask);
>> -
>> +void qi_flush_dev_iotlb(struct intel_iommu *iommu,
>> +            struct device_domain_info *info, u64 addr,
>> +            unsigned int mask);
>>   void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, 
>> u64 addr,
>>                unsigned long npages, bool ih);
>>   -void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, 
>> u16 pfsid,
>> -                  u32 pasid, u16 qdep, u64 addr,
>> -                  unsigned int size_order);
>> -void quirk_extra_dev_tlb_flush(struct device_domain_info *info,
>> -                   unsigned long address, unsigned long pages,
>> -                   u32 pasid, u16 qdep);
>> +void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu,
>> +                  struct device_domain_info *info, u64 addr,
>> +                  u32 pasid, unsigned int size_order);
>> +void quirk_extra_dev_tlb_flush(struct device_domain_info *info, u32 
>> pasid,
>> +                   unsigned long address, unsigned long mask);
>>   void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 
>> granu,
>>                 u32 pasid);
>>   diff --git a/drivers/iommu/intel/nested.c 
>> b/drivers/iommu/intel/nested.c
>> index f26c7f1c46cc..d15f72b55940 100644
>> --- a/drivers/iommu/intel/nested.c
>> +++ b/drivers/iommu/intel/nested.c
>> @@ -78,18 +78,13 @@ static void nested_flush_dev_iotlb(struct 
>> dmar_domain *domain, u64 addr,
>>   {
>>       struct device_domain_info *info;
>>       unsigned long flags;
>> -    u16 sid, qdep;
>>         spin_lock_irqsave(&domain->lock, flags);
>>       list_for_each_entry(info, &domain->devices, link) {
>>           if (!info->ats_enabled)
>>               continue;
>> -        sid = info->bus << 8 | info->devfn;
>> -        qdep = info->ats_qdep;
>> -        qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
>> -                   qdep, addr, mask);
>> -        quirk_extra_dev_tlb_flush(info, addr, mask,
>> -                      IOMMU_NO_PASID, qdep);
>> +        qi_flush_dev_iotlb(info->iommu, info, addr, mask);
>> +        quirk_extra_dev_tlb_flush(info, IOMMU_NO_PASID, addr, mask);
>>       }
>>       spin_unlock_irqrestore(&domain->lock, flags);
>>   }
>> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
>> index 953592125e4a..5dacdea3cab7 100644
>> --- a/drivers/iommu/intel/pasid.c
>> +++ b/drivers/iommu/intel/pasid.c
>> @@ -208,7 +208,6 @@ devtlb_invalidation_with_pasid(struct intel_iommu 
>> *iommu,
>>                      struct device *dev, u32 pasid)
>>   {
>>       struct device_domain_info *info;
>> -    u16 sid, qdep, pfsid;
>>         info = dev_iommu_priv_get(dev);
>>       if (!info || !info->ats_enabled)
>> @@ -217,10 +216,6 @@ devtlb_invalidation_with_pasid(struct 
>> intel_iommu *iommu,
>>       if (pci_dev_is_disconnected(to_pci_dev(dev)))
>>           return;
>>   -    sid = info->bus << 8 | info->devfn;
>> -    qdep = info->ats_qdep;
>> -    pfsid = info->pfsid;
>> -
>>       /*
>>        * When PASID 0 is used, it indicates RID2PASID(DMA request w/o 
>> PASID),
>>        * devTLB flush w/o PASID should be used. For non-zero PASID under
>> @@ -228,9 +223,9 @@ devtlb_invalidation_with_pasid(struct intel_iommu 
>> *iommu,
>>        * efficient to flush devTLB specific to the PASID.
>>        */
>>       if (pasid == IOMMU_NO_PASID)
>> -        qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - 
>> VTD_PAGE_SHIFT);
>> +        qi_flush_dev_iotlb(iommu, info, 0, 64 - VTD_PAGE_SHIFT);
>>       else
>> -        qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 
>> 64 - VTD_PAGE_SHIFT);
>> +        qi_flush_dev_iotlb_pasid(iommu, info, 0, pasid, 64 - 
>> VTD_PAGE_SHIFT);
>>   }
>>     void intel_pasid_tear_down_entry(struct intel_iommu *iommu, 
>> struct device *dev,
>> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
>> index 40edd282903f..89168b31bf31 100644
>> --- a/drivers/iommu/intel/svm.c
>> +++ b/drivers/iommu/intel/svm.c
>> @@ -181,11 +181,10 @@ static void __flush_svm_range_dev(struct 
>> intel_svm *svm,
>>         qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, address, 
>> pages, ih);
>>       if (info->ats_enabled) {
>> -        qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid,
>> -                     svm->pasid, sdev->qdep, address,
>> +        qi_flush_dev_iotlb_pasid(sdev->iommu, info, address, 
>> svm->pasid,
>>                        order_base_2(pages));
>> -        quirk_extra_dev_tlb_flush(info, address, order_base_2(pages),
>> -                      svm->pasid, sdev->qdep);
>> +        quirk_extra_dev_tlb_flush(info, svm->pasid, address,
>> +                      order_base_2(pages));
>>       }
>>   }
>>   @@ -227,11 +226,11 @@ static void intel_flush_svm_all(struct 
>> intel_svm *svm)
>>             qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, 0, 
>> -1UL, 0);
>>           if (info->ats_enabled) {
>> -            qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, 
>> info->pfsid,
>> -                         svm->pasid, sdev->qdep,
>> -                         0, 64 - VTD_PAGE_SHIFT);
>> -            quirk_extra_dev_tlb_flush(info, 0, 64 - VTD_PAGE_SHIFT,
>> -                          svm->pasid, sdev->qdep);
>> +            qi_flush_dev_iotlb_pasid(sdev->iommu, info, 0,
>> +                         svm->pasid,
>> +                         64 - VTD_PAGE_SHIFT);
>> +            quirk_extra_dev_tlb_flush(info, svm->pasid, 0,
>> +                          64 - VTD_PAGE_SHIFT);
>>           }
>>       }
>>       rcu_read_unlock();
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-30  5:12       ` Ethan Zhao
@ 2024-01-30  6:22         ` Tian, Kevin
  2024-01-30  8:15           ` Ethan Zhao
  0 siblings, 1 reply; 39+ messages in thread
From: Tian, Kevin @ 2024-01-30  6:22 UTC (permalink / raw)
  To: Ethan Zhao, Liu, Yi L, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, iommu, linux-kernel, linux-pci

> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
> Sent: Tuesday, January 30, 2024 1:13 PM
> 
> On 1/29/2024 5:21 PM, Yi Liu wrote:
> > On 2024/1/29 17:06, Tian, Kevin wrote:
> >>> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
> >>> Sent: Monday, January 29, 2024 11:49 AM
> >>>
> >>> Because surprise removal could happen anytime, e.g. user could
> >>> request safe
> >>> removal to EP(endpoint device) via sysfs and brings its link down to do
> >>> surprise removal cocurrently. such aggressive cases would cause ATS
> >>> invalidation request issued to non-existence target device, then deadly
> >>> loop to retry that request after ITE fault triggered in interrupt
> >>> context.
> >>> this patch aims to optimize the ITE handling by checking the target
> >>> device
> >>> presence state to avoid retrying the timeout request blindly, thus
> >>> avoid
> >>> hard lockup or system hang.
> >>>
> >>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
> >>> ---
> >>>   drivers/iommu/intel/dmar.c | 18 ++++++++++++++++++
> >>>   1 file changed, 18 insertions(+)
> >>>
> >>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> >>> index 814134e9aa5a..2e214b43725c 100644
> >>> --- a/drivers/iommu/intel/dmar.c
> >>> +++ b/drivers/iommu/intel/dmar.c
> >>> @@ -1272,6 +1272,7 @@ static int qi_check_fault(struct intel_iommu
> >>> *iommu, int index, int wait_index,
> >>>   {
> >>>       u32 fault;
> >>>       int head, tail;
> >>> +    u64 iqe_err, ite_sid;
> >>>       struct q_inval *qi = iommu->qi;
> >>>       int shift = qi_shift(iommu);
> >>>
> >>> @@ -1316,6 +1317,13 @@ static int qi_check_fault(struct intel_iommu
> >>> *iommu, int index, int wait_index,
> >>>           tail = readl(iommu->reg + DMAR_IQT_REG);
> >>>           tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
> >>>
> >>> +        /*
> >>> +         * SID field is valid only when the ITE field is Set in
> >>> FSTS_REG
> >>> +         * see Intel VT-d spec r4.1, section 11.4.9.9
> >>> +         */
> >>> +        iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
> >>> +        ite_sid = DMAR_IQER_REG_ITESID(iqe_err);
> >>> +
> >>>           writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
> >>>           pr_info("Invalidation Time-out Error (ITE) cleared\n");
> >>>
> >>> @@ -1325,6 +1333,16 @@ static int qi_check_fault(struct intel_iommu
> >>> *iommu, int index, int wait_index,
> >>>               head = (head - 2 + QI_LENGTH) % QI_LENGTH;
> >>>           } while (head != tail);
> >>>
> >>> +        /*
> >>> +         * If got ITE, we need to check if the sid of ITE is the
> >>> same as
> >>> +         * current ATS invalidation target device, if yes, don't
> >>> try this
> >>> +         * request anymore if the target device isn't present.
> >>> +         * 0 value of ite_sid means old VT-d device, no ite_sid value.
> >>> +         */
> >>> +        if (pdev && ite_sid && !pci_device_is_present(pdev) &&
> >>> +            ite_sid == pci_dev_id(pci_physfn(pdev)))
> >>> +            return -ETIMEDOUT;
> >>> +
> >>
> >> since the hardware already reports source id leading to timeout,
> >> can't we
> >> just find the pci_dev according to reported ite_sid? this is a slow
> >> path (either
> >> due to device in bad state or removed) hence it's not necessary to
> >> add more
> >> intelligence to pass the pci_dev in, leading to only a partial fix
> >> can be backported.
> >>
> >> It's also more future-proof, say if one day the driver allows
> >> batching invalidation
> >> requests for multiple devices then no need to pass in a list of devices.
> >>
> >> Then it's easier to backport a full fix.
> >
> > May consider pci_get_domain_bus_and_slot() or
> > pci_find_bus()/pci_get_slot(). But I doubt if the pci_dev is still
> > tracked
> > in the bus or a kind of dev list in the device hot removal case. So Ethan
> > may need to check.
> 
> Perhaps it is too late to call pci_find_bus() or
> pci_get_domain_bus_and_slot() to get the
> 
> device instance from this notifier registered as
> BUS_NOTIFY_REMOVED_DEVICE
> 
> action. if the device is still there in bus list, *must* be a bug of
> device subsystem as
> 
> *removed* device.
> 

Ethan, looks your reply is not formatted well. Can you fix your mail
client like how you write the commit msg?

Here we need consider two situations.

One is that the device is not bound to a driver or bound to a driver
which doesn't do active work to the device when it's removed. In
that case one may observe the timeout situation only in the removal
path as the stack dump in your patch02 shows.

patch02 can fix that case by checking whether the device is present
to skip sending the invalidation requests. So the logic being discussed
here doesn't matter.

The 2nd situation is more tricky. The device might be bound to
a driver which is doing active work to the device with in-fly
ATS invalidation requests. In this case in-fly requests must be aborted
before the driver can be detached from the removed device. Conceptually
a device is removed from the bus only after its driver is detached.

From this angle you can still find the pci_dev from the bus when handling
timeout error for in-fly invalidation requests. But I'm not a PCI person
so let's wait for the inputs from Bjorn  and Lukas.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 1/5] PCI: make pci_dev_is_disconnected() helper public for other drivers
  2024-01-30  5:25     ` Ethan Zhao
@ 2024-01-30  6:23       ` Tian, Kevin
  0 siblings, 0 replies; 39+ messages in thread
From: Tian, Kevin @ 2024-01-30  6:23 UTC (permalink / raw)
  To: Ethan Zhao, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, Liu, Yi L, iommu, linux-kernel, linux-pci,
	Haorong Ye

> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
> Sent: Tuesday, January 30, 2024 1:26 PM
> 
> On 1/29/2024 4:50 PM, Tian, Kevin wrote:
> >> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
> >> Sent: Monday, January 29, 2024 11:49 AM
> >>
> >> Make pci_dev_is_disconnected() public so that it can be called from
> >> Intel VT-d driver to quickly fix/workaround the surprise removal
> >> unplug hang issue for those ATS capable devices on PCIe switch
> downstream
> >> hotplug capable ports.
> >>
> >> Beside pci_device_is_present() function, this one has no config space
> > s/Beside/Unlike/
> >
> >> space access, so is light enough to optimize the normal pure surprise
> >> removal and safe removal flow.
> > somehow this paragraph belongs to the patch using it. The 1st paragraph
> > is sufficient for the reason of exposure.
> 
> pci_device_is_present() is aleardy exported symbol, why not just use it ?
> 
> Also try to make it clear, the difference and justification.
> 

I misread it. 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 4/5] iommu/vt-d: pass pdev parameter for qi_check_fault() and refactor callers
  2024-01-29  8:58   ` Tian, Kevin
@ 2024-01-30  7:30     ` Ethan Zhao
  0 siblings, 0 replies; 39+ messages in thread
From: Ethan Zhao @ 2024-01-30  7:30 UTC (permalink / raw)
  To: Tian, Kevin, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, Liu, Yi L, iommu, linux-kernel, linux-pci


On 1/29/2024 4:58 PM, Tian, Kevin wrote:
>> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> Sent: Monday, January 29, 2024 11:49 AM
>>
>> to check state of ATS capable pci device in qi_check_fault() for surprise
>> removal case, we need to pass the target pci device of ATS invalidation
>> request to qi_check_fault(). if pdev is valid, means current request is for
>> ATS invalidation, vice vesa.
>>
>> no function change.
> qi_submit_sync() is used for all kinds of iotlb/cache/devtlb invalidations.
> it's a bit weird to see a device pointer (even being NULL) in places where
> a device doesn't even matter.
>
> having a new qi_submit_sync_devtlb() wrapper sounds cleaner to me,
> with an internal __qi_submit_sync() helper to accept a device pointer.
>
> qi_submit_sync() calls __qi_submit_sync() with a null device pointer then
> non-devtlb paths are intact.

Make sense !

That way, could keep about 10 qi_submit_sync() calling intact, while

only 2-3 qi_submit_sync_devtlb() wrapper calling needed.

Thanks,

Ethan


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-30  6:22         ` Tian, Kevin
@ 2024-01-30  8:15           ` Ethan Zhao
  2024-01-30  8:43             ` Tian, Kevin
  2024-01-30 16:29             ` Jason Gunthorpe
  0 siblings, 2 replies; 39+ messages in thread
From: Ethan Zhao @ 2024-01-30  8:15 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, iommu, linux-kernel, linux-pci


On 1/30/2024 2:22 PM, Tian, Kevin wrote:
>> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> Sent: Tuesday, January 30, 2024 1:13 PM
>>
>> On 1/29/2024 5:21 PM, Yi Liu wrote:
>>> On 2024/1/29 17:06, Tian, Kevin wrote:
>>>>> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
>>>>> Sent: Monday, January 29, 2024 11:49 AM
>>>>>
>>>>> Because surprise removal could happen anytime, e.g. user could
>>>>> request safe
>>>>> removal to EP(endpoint device) via sysfs and brings its link down to do
>>>>> surprise removal cocurrently. such aggressive cases would cause ATS
>>>>> invalidation request issued to non-existence target device, then deadly
>>>>> loop to retry that request after ITE fault triggered in interrupt
>>>>> context.
>>>>> this patch aims to optimize the ITE handling by checking the target
>>>>> device
>>>>> presence state to avoid retrying the timeout request blindly, thus
>>>>> avoid
>>>>> hard lockup or system hang.
>>>>>
>>>>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>>>>> ---
>>>>>    drivers/iommu/intel/dmar.c | 18 ++++++++++++++++++
>>>>>    1 file changed, 18 insertions(+)
>>>>>
>>>>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
>>>>> index 814134e9aa5a..2e214b43725c 100644
>>>>> --- a/drivers/iommu/intel/dmar.c
>>>>> +++ b/drivers/iommu/intel/dmar.c
>>>>> @@ -1272,6 +1272,7 @@ static int qi_check_fault(struct intel_iommu
>>>>> *iommu, int index, int wait_index,
>>>>>    {
>>>>>        u32 fault;
>>>>>        int head, tail;
>>>>> +    u64 iqe_err, ite_sid;
>>>>>        struct q_inval *qi = iommu->qi;
>>>>>        int shift = qi_shift(iommu);
>>>>>
>>>>> @@ -1316,6 +1317,13 @@ static int qi_check_fault(struct intel_iommu
>>>>> *iommu, int index, int wait_index,
>>>>>            tail = readl(iommu->reg + DMAR_IQT_REG);
>>>>>            tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
>>>>>
>>>>> +        /*
>>>>> +         * SID field is valid only when the ITE field is Set in
>>>>> FSTS_REG
>>>>> +         * see Intel VT-d spec r4.1, section 11.4.9.9
>>>>> +         */
>>>>> +        iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
>>>>> +        ite_sid = DMAR_IQER_REG_ITESID(iqe_err);
>>>>> +
>>>>>            writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
>>>>>            pr_info("Invalidation Time-out Error (ITE) cleared\n");
>>>>>
>>>>> @@ -1325,6 +1333,16 @@ static int qi_check_fault(struct intel_iommu
>>>>> *iommu, int index, int wait_index,
>>>>>                head = (head - 2 + QI_LENGTH) % QI_LENGTH;
>>>>>            } while (head != tail);
>>>>>
>>>>> +        /*
>>>>> +         * If got ITE, we need to check if the sid of ITE is the
>>>>> same as
>>>>> +         * current ATS invalidation target device, if yes, don't
>>>>> try this
>>>>> +         * request anymore if the target device isn't present.
>>>>> +         * 0 value of ite_sid means old VT-d device, no ite_sid value.
>>>>> +         */
>>>>> +        if (pdev && ite_sid && !pci_device_is_present(pdev) &&
>>>>> +            ite_sid == pci_dev_id(pci_physfn(pdev)))
>>>>> +            return -ETIMEDOUT;
>>>>> +
>>>> since the hardware already reports source id leading to timeout,
>>>> can't we
>>>> just find the pci_dev according to reported ite_sid? this is a slow
>>>> path (either
>>>> due to device in bad state or removed) hence it's not necessary to
>>>> add more
>>>> intelligence to pass the pci_dev in, leading to only a partial fix
>>>> can be backported.
>>>>
>>>> It's also more future-proof, say if one day the driver allows
>>>> batching invalidation
>>>> requests for multiple devices then no need to pass in a list of devices.
>>>>
>>>> Then it's easier to backport a full fix.
>>> May consider pci_get_domain_bus_and_slot() or
>>> pci_find_bus()/pci_get_slot(). But I doubt if the pci_dev is still
>>> tracked
>>> in the bus or a kind of dev list in the device hot removal case. So Ethan
>>> may need to check.
>> Perhaps it is too late to call pci_find_bus() or
>> pci_get_domain_bus_and_slot() to get the
>>
>> device instance from this notifier registered as
>> BUS_NOTIFY_REMOVED_DEVICE
>>
>> action. if the device is still there in bus list, *must* be a bug of
>> device subsystem as
>>
>> *removed* device.
>>
> Ethan, looks your reply is not formatted well. Can you fix your mail
> client like how you write the commit msg?
Sorry for the mail format.
> Here we need consider two situations.
>
> One is that the device is not bound to a driver or bound to a driver
> which doesn't do active work to the device when it's removed. In
> that case one may observe the timeout situation only in the removal
> path as the stack dump in your patch02 shows.

When iommu_bus_notifier() got called for hotplug removal cases to
flush devTLB (ATS invalidation), driver was already unloaded.
whatever safe removal or surprise removal. so in theory no active
driver working there.

pciehp_ist()
  pciehp_disable_slot()
   remove_board()
    pciehp_unconfigure_device()
     pci_stop_and_remove_bus_device()
      pci_stop_bus_device()--->here unload driver
      pci_remove_bus_device()->here qi_flush_dev_iotlb() got called.

>
> patch02 can fix that case by checking whether the device is present
> to skip sending the invalidation requests. So the logic being discussed
> here doesn't matter.
>
> The 2nd situation is more tricky. The device might be bound to
> a driver which is doing active work to the device with in-fly
> ATS invalidation requests. In this case in-fly requests must be aborted
> before the driver can be detached from the removed device. Conceptually
> a device is removed from the bus only after its driver is detached.

Some tricky situations:

1. The ATS invalidation request is issued from driver driver, while it is
in handling, device is removed. this momment, the device instance still
exists in the bus list. yes, if searching it by BDF, could get it.

2. The ATS invalidation request is issued from iommu_bus_notifier()
for surprise removal reason, as shown in above calltrace, device was
already removed from bus list. if searching it by BDF, return NULL.

3. The ATS invlidation request is issued from iommu_bus_notifier()
for safe removal, when is in handling, device is removed or link
is down. also as #2, device was already removed from bus list.
if searching it by BDF. got NULL.
...

so, searching device by BDF, only works for the ATS invalidation
request is from device driver.

Thanks,
Ethan

>
>  From this angle you can still find the pci_dev from the bus when handling
> timeout error for in-fly invalidation requests. But I'm not a PCI person
> so let's wait for the inputs from Bjorn  and Lukas.
>
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-29 14:48     ` Baolu Lu
  2024-01-30  3:28       ` Tian, Kevin
@ 2024-01-30  8:43       ` Ethan Zhao
  1 sibling, 0 replies; 39+ messages in thread
From: Ethan Zhao @ 2024-01-30  8:43 UTC (permalink / raw)
  To: Baolu Lu, Tian, Kevin, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, Liu, Yi L, iommu, linux-kernel, linux-pci

On 1/29/2024 10:48 PM, Baolu Lu wrote:
> On 2024/1/29 17:06, Tian, Kevin wrote:
>>> From: Ethan Zhao<haifeng.zhao@linux.intel.com>
>>> Sent: Monday, January 29, 2024 11:49 AM
>>>
>>> Because surprise removal could happen anytime, e.g. user could 
>>> request safe
>>> removal to EP(endpoint device) via sysfs and brings its link down to do
>>> surprise removal cocurrently. such aggressive cases would cause ATS
>>> invalidation request issued to non-existence target device, then deadly
>>> loop to retry that request after ITE fault triggered in interrupt 
>>> context.
>>> this patch aims to optimize the ITE handling by checking the target 
>>> device
>>> presence state to avoid retrying the timeout request blindly, thus 
>>> avoid
>>> hard lockup or system hang.
>>>
>>> Signed-off-by: Ethan Zhao<haifeng.zhao@linux.intel.com>
>>> ---
>>>   drivers/iommu/intel/dmar.c | 18 ++++++++++++++++++
>>>   1 file changed, 18 insertions(+)
>>>
>>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
>>> index 814134e9aa5a..2e214b43725c 100644
>>> --- a/drivers/iommu/intel/dmar.c
>>> +++ b/drivers/iommu/intel/dmar.c
>>> @@ -1272,6 +1272,7 @@ static int qi_check_fault(struct intel_iommu
>>> *iommu, int index, int wait_index,
>>>   {
>>>       u32 fault;
>>>       int head, tail;
>>> +    u64 iqe_err, ite_sid;
>>>       struct q_inval *qi = iommu->qi;
>>>       int shift = qi_shift(iommu);
>>>
>>> @@ -1316,6 +1317,13 @@ static int qi_check_fault(struct intel_iommu
>>> *iommu, int index, int wait_index,
>>>           tail = readl(iommu->reg + DMAR_IQT_REG);
>>>           tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
>>>
>>> +        /*
>>> +         * SID field is valid only when the ITE field is Set in 
>>> FSTS_REG
>>> +         * see Intel VT-d spec r4.1, section 11.4.9.9
>>> +         */
>>> +        iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
>>> +        ite_sid = DMAR_IQER_REG_ITESID(iqe_err);
>>> +
>>>           writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
>>>           pr_info("Invalidation Time-out Error (ITE) cleared\n");
>>>
>>> @@ -1325,6 +1333,16 @@ static int qi_check_fault(struct intel_iommu
>>> *iommu, int index, int wait_index,
>>>               head = (head - 2 + QI_LENGTH) % QI_LENGTH;
>>>           } while (head != tail);
>>>
>>> +        /*
>>> +         * If got ITE, we need to check if the sid of ITE is the 
>>> same as
>>> +         * current ATS invalidation target device, if yes, don't 
>>> try this
>>> +         * request anymore if the target device isn't present.
>>> +         * 0 value of ite_sid means old VT-d device, no ite_sid value.
>>> +         */
>>> +        if (pdev && ite_sid && !pci_device_is_present(pdev) &&
>>> +            ite_sid == pci_dev_id(pci_physfn(pdev)))
>>> +            return -ETIMEDOUT;
>>> +
>> since the hardware already reports source id leading to timeout, 
>> can't we
>> just find the pci_dev according to reported ite_sid? this is a slow 
>> path (either
>> due to device in bad state or removed) hence it's not necessary to 
>> add more
>> intelligence to pass the pci_dev in, leading to only a partial fix 
>> can be backported.
>>
>> It's also more future-proof, say if one day the driver allows 
>> batching invalidation
>> requests for multiple devices then no need to pass in a list of devices.
>
> I have ever thought about this solution and gave up in the end due to
> the locking issue.
>
> A batch of qi requests must be handled in the spin lock critical region
> to enforce that only one batch of requests is submitted at a time.
> Searching pci_dev in this locking region might result in nested locking
> issues, and I haven't found a good solution for this yet.
>
You said async-interrupt model is a bad idea, how bad is it ? I wonder if
the hardware and VT-d spec definition could support it pefectly or not.
at least, would never get in trouble about balance timeout & wakeup
watchdog.

Yes, the VT-d DMAR driver wasn't inited as async-interrupt model from
begnining...

Thanks,
Ethan
  

> Unless someone can bring up a better solution, perhaps we have to live
> in a world where only single device TLB invalidation request in a batch
> could be submitted to the queue.
>
> Best regards,
> baolu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-30  8:15           ` Ethan Zhao
@ 2024-01-30  8:43             ` Tian, Kevin
  2024-01-30  9:13               ` Ethan Zhao
  2024-01-30 16:29             ` Jason Gunthorpe
  1 sibling, 1 reply; 39+ messages in thread
From: Tian, Kevin @ 2024-01-30  8:43 UTC (permalink / raw)
  To: Ethan Zhao, Liu, Yi L, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, iommu, linux-kernel, linux-pci

> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
> Sent: Tuesday, January 30, 2024 4:16 PM
> 
> On 1/30/2024 2:22 PM, Tian, Kevin wrote:
> > Here we need consider two situations.
> >
> > One is that the device is not bound to a driver or bound to a driver
> > which doesn't do active work to the device when it's removed. In
> > that case one may observe the timeout situation only in the removal
> > path as the stack dump in your patch02 shows.
> 
> When iommu_bus_notifier() got called for hotplug removal cases to
> flush devTLB (ATS invalidation), driver was already unloaded.
> whatever safe removal or surprise removal. so in theory no active
> driver working there.
> 
> pciehp_ist()
>   pciehp_disable_slot()
>    remove_board()
>     pciehp_unconfigure_device()
>      pci_stop_and_remove_bus_device()
>       pci_stop_bus_device()--->here unload driver
>       pci_remove_bus_device()->here qi_flush_dev_iotlb() got called.

yes, so patch02 can fix this case.

> 
> >
> > patch02 can fix that case by checking whether the device is present
> > to skip sending the invalidation requests. So the logic being discussed
> > here doesn't matter.
> >
> > The 2nd situation is more tricky. The device might be bound to
> > a driver which is doing active work to the device with in-fly
> > ATS invalidation requests. In this case in-fly requests must be aborted
> > before the driver can be detached from the removed device. Conceptually
> > a device is removed from the bus only after its driver is detached.
> 
> Some tricky situations:
> 
> 1. The ATS invalidation request is issued from driver driver, while it is
> in handling, device is removed. this momment, the device instance still
> exists in the bus list. yes, if searching it by BDF, could get it.

it's searchable between the point where the device is removed and the
point where the driver is unloaded:

        CPU0                                CPU1
  (Driver is active)                    (pciehp handler)
  qi_submit_sync()                      pciehp_ist()
    ...                                   ...
    loop for completion() {               pciehp_unconfigure_device()
      ...                                   pci_dev_set_disconnected()
      if (ITE) {                            ...
        //find pci_dev from sid             pci_remove_bus_device()
        if (pci_dev_is_connected())           device_del()
          break;                                bus_remove_device()
      }                                           device_remove_driver()
      ..                                            //wait for driver unload
    }                                               
    ..
    return;

                                                  BUS_NOTIFY_REMOVED_DEVICE;
                                              list_del(&dev->bus_list);

(I didn’t draw the full calling stack on the right hand side)

> 
> 2. The ATS invalidation request is issued from iommu_bus_notifier()
> for surprise removal reason, as shown in above calltrace, device was
> already removed from bus list. if searching it by BDF, return NULL.
> 
> 3. The ATS invlidation request is issued from iommu_bus_notifier()
> for safe removal, when is in handling, device is removed or link
> is down. also as #2, device was already removed from bus list.
> if searching it by BDF. got NULL.
> ...
> 
> so, searching device by BDF, only works for the ATS invalidation
> request is from device driver.
> 

anything related to bus notifier has been fixed by patch02. 

the remaining logic is really for fixing the race invalidation from
device driver. 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-30  8:43             ` Tian, Kevin
@ 2024-01-30  9:13               ` Ethan Zhao
  2024-01-30  9:24                 ` Tian, Kevin
  0 siblings, 1 reply; 39+ messages in thread
From: Ethan Zhao @ 2024-01-30  9:13 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, iommu, linux-kernel, linux-pci


On 1/30/2024 4:43 PM, Tian, Kevin wrote:
>> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> Sent: Tuesday, January 30, 2024 4:16 PM
>>
>> On 1/30/2024 2:22 PM, Tian, Kevin wrote:
>>> Here we need consider two situations.
>>>
>>> One is that the device is not bound to a driver or bound to a driver
>>> which doesn't do active work to the device when it's removed. In
>>> that case one may observe the timeout situation only in the removal
>>> path as the stack dump in your patch02 shows.
>> When iommu_bus_notifier() got called for hotplug removal cases to
>> flush devTLB (ATS invalidation), driver was already unloaded.
>> whatever safe removal or surprise removal. so in theory no active
>> driver working there.
>>
>> pciehp_ist()
>>    pciehp_disable_slot()
>>     remove_board()
>>      pciehp_unconfigure_device()
>>       pci_stop_and_remove_bus_device()
>>        pci_stop_bus_device()--->here unload driver
>>        pci_remove_bus_device()->here qi_flush_dev_iotlb() got called.
> yes, so patch02 can fix this case.
>
>>> patch02 can fix that case by checking whether the device is present
>>> to skip sending the invalidation requests. So the logic being discussed
>>> here doesn't matter.
>>>
>>> The 2nd situation is more tricky. The device might be bound to
>>> a driver which is doing active work to the device with in-fly
>>> ATS invalidation requests. In this case in-fly requests must be aborted
>>> before the driver can be detached from the removed device. Conceptually
>>> a device is removed from the bus only after its driver is detached.
>> Some tricky situations:
>>
>> 1. The ATS invalidation request is issued from driver driver, while it is
>> in handling, device is removed. this momment, the device instance still
>> exists in the bus list. yes, if searching it by BDF, could get it.
> it's searchable between the point where the device is removed and the
> point where the driver is unloaded:
>
>          CPU0                                CPU1
>    (Driver is active)                    (pciehp handler)
>    qi_submit_sync()                      pciehp_ist()
>      ...                                   ...
>      loop for completion() {               pciehp_unconfigure_device()
>        ...                                   pci_dev_set_disconnected()
>        if (ITE) {                            ...
>          //find pci_dev from sid             pci_remove_bus_device()
>          if (pci_dev_is_connected())           device_del()
>            break;                                bus_remove_device()
>        }                                           device_remove_driver()

If the device was hot plugin or re-scanned, the device has a PCI_DEV_ADDED flag,
if so the driver unloading work isn't defered to the tail of device_del(), it
is unloaded before pci_remove_bus_device()->device_del(), in pci_stop_dev

pci_stop_bus_device()
  pci_stop_dev()
  {
   if (pci_dev_is_added(dev)) {
       device_release_driver(&dev->dev);
  }

So the interval the device is searchable, only applied to those devices
not hot plugged, or never be scanned.


Thanks,
Ethan

>        ..                                            //wait for driver unload
>      }
>      ..
>      return;
>
>                                                    BUS_NOTIFY_REMOVED_DEVICE;
>                                                list_del(&dev->bus_list);
>
> (I didn’t draw the full calling stack on the right hand side)

>
>> 2. The ATS invalidation request is issued from iommu_bus_notifier()
>> for surprise removal reason, as shown in above calltrace, device was
>> already removed from bus list. if searching it by BDF, return NULL.
>>
>> 3. The ATS invlidation request is issued from iommu_bus_notifier()
>> for safe removal, when is in handling, device is removed or link
>> is down. also as #2, device was already removed from bus list.
>> if searching it by BDF. got NULL.
>> ...
>>
>> so, searching device by BDF, only works for the ATS invalidation
>> request is from device driver.
>>
> anything related to bus notifier has been fixed by patch02.
>
> the remaining logic is really for fixing the race invalidation from
> device driver.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-30  9:13               ` Ethan Zhao
@ 2024-01-30  9:24                 ` Tian, Kevin
  2024-01-31  5:42                   ` Ethan Zhao
  0 siblings, 1 reply; 39+ messages in thread
From: Tian, Kevin @ 2024-01-30  9:24 UTC (permalink / raw)
  To: Ethan Zhao, Liu, Yi L, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, iommu, linux-kernel, linux-pci

> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
> Sent: Tuesday, January 30, 2024 5:13 PM
> 
> On 1/30/2024 4:43 PM, Tian, Kevin wrote:
> >> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
> >> Sent: Tuesday, January 30, 2024 4:16 PM
> >>
> >> On 1/30/2024 2:22 PM, Tian, Kevin wrote:
> >>> Here we need consider two situations.
> >>>
> >>> One is that the device is not bound to a driver or bound to a driver
> >>> which doesn't do active work to the device when it's removed. In
> >>> that case one may observe the timeout situation only in the removal
> >>> path as the stack dump in your patch02 shows.
> >> When iommu_bus_notifier() got called for hotplug removal cases to
> >> flush devTLB (ATS invalidation), driver was already unloaded.
> >> whatever safe removal or surprise removal. so in theory no active
> >> driver working there.
> >>
> >> pciehp_ist()
> >>    pciehp_disable_slot()
> >>     remove_board()
> >>      pciehp_unconfigure_device()
> >>       pci_stop_and_remove_bus_device()
> >>        pci_stop_bus_device()--->here unload driver
> >>        pci_remove_bus_device()->here qi_flush_dev_iotlb() got called.
> > yes, so patch02 can fix this case.
> >
> >>> patch02 can fix that case by checking whether the device is present
> >>> to skip sending the invalidation requests. So the logic being discussed
> >>> here doesn't matter.
> >>>
> >>> The 2nd situation is more tricky. The device might be bound to
> >>> a driver which is doing active work to the device with in-fly
> >>> ATS invalidation requests. In this case in-fly requests must be aborted
> >>> before the driver can be detached from the removed device.
> Conceptually
> >>> a device is removed from the bus only after its driver is detached.
> >> Some tricky situations:
> >>
> >> 1. The ATS invalidation request is issued from driver driver, while it is
> >> in handling, device is removed. this momment, the device instance still
> >> exists in the bus list. yes, if searching it by BDF, could get it.
> > it's searchable between the point where the device is removed and the
> > point where the driver is unloaded:
> >
> >          CPU0                                CPU1
> >    (Driver is active)                    (pciehp handler)
> >    qi_submit_sync()                      pciehp_ist()
> >      ...                                   ...
> >      loop for completion() {               pciehp_unconfigure_device()
> >        ...                                   pci_dev_set_disconnected()
> >        if (ITE) {                            ...
> >          //find pci_dev from sid             pci_remove_bus_device()
> >          if (pci_dev_is_connected())           device_del()
> >            break;                                bus_remove_device()
> >        }                                           device_remove_driver()
> 
> If the device was hot plugin or re-scanned, the device has a PCI_DEV_ADDED
> flag,

in this case is pci_dev_is_disconnected() true or false? 

how is this patch supposed to work with it?

> if so the driver unloading work isn't defered to the tail of device_del(), it
> is unloaded before pci_remove_bus_device()->device_del(), in pci_stop_dev
> 
> pci_stop_bus_device()
>   pci_stop_dev()
>   {
>    if (pci_dev_is_added(dev)) {
>        device_release_driver(&dev->dev);
>   }

no matter where driver unload is requested, it needs to wait for aborting
in-fly request on CPU0.

> 
> So the interval the device is searchable, only applied to those devices
> not hot plugged, or never be scanned.
> 

and in the worst case even if pci_dev is not searchable, isn't it already
an indicator that the device is absent then qi_submit_sync() should
just exit upon ITE?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-30  8:15           ` Ethan Zhao
  2024-01-30  8:43             ` Tian, Kevin
@ 2024-01-30 16:29             ` Jason Gunthorpe
  2024-01-31  6:21               ` Baolu Lu
  1 sibling, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2024-01-30 16:29 UTC (permalink / raw)
  To: Ethan Zhao
  Cc: Tian, Kevin, Liu, Yi L, baolu.lu, bhelgaas, robin.murphy, dwmw2,
	will, lukas, iommu, linux-kernel, linux-pci

On Tue, Jan 30, 2024 at 04:15:33PM +0800, Ethan Zhao wrote:
> Some tricky situations:
> 
> 1. The ATS invalidation request is issued from driver driver, while it is
> in handling, device is removed. this momment, the device instance still
> exists in the bus list. yes, if searching it by BDF, could get it.
> 
> 2. The ATS invalidation request is issued from iommu_bus_notifier()
> for surprise removal reason, as shown in above calltrace, device was
> already removed from bus list. if searching it by BDF, return NULL.
> 
> 3. The ATS invlidation request is issued from iommu_bus_notifier()
> for safe removal, when is in handling, device is removed or link
> is down. also as #2, device was already removed from bus list.
> if searching it by BDF. got NULL.
> ...
> 
> so, searching device by BDF, only works for the ATS invalidation
> request is from device driver.

In the good path, where the hot removal is expected and this is about
coordinating, the IOMMU driver should do an orderly shutdown of the
ATS mechanism:

 1 Write to PCI config space to disable the ATS
 2 Make the IOMMU respond to ATS requests with UR and set it to BLOCKED
 3 Issue a flush of the ATC
 4 Wait for all outstanding ATC flushes to complete

If it is a bad/surprise path where the device is already gone then:

 1 should automatically not do anything, possibly timing out
 2 must succeed
 3 should time out
 4 should "complete" in that the ATC flushes are all timed out

IMHO all you need to do is not crash/lockup while processing the ATC
timeouts. If this is a surprise path then the ATC timeout might
already happened before the iommu driver remove notifier event happens.

If the driver needs to translate from the IOMMU device table index
into a struct device it is probably best to do that inside the driver.

eg ARM maintains a rbtree in the iommu dev data. (see
arm_smmu_insert_master)

Jason

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 2/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected
  2024-01-30  5:37     ` Ethan Zhao
@ 2024-01-31  4:25       ` Yi Liu
  2024-01-31  5:25         ` Ethan Zhao
  0 siblings, 1 reply; 39+ messages in thread
From: Yi Liu @ 2024-01-31  4:25 UTC (permalink / raw)
  To: Ethan Zhao, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: kevin.tian, dwmw2, will, lukas, iommu, linux-kernel, linux-pci,
	Haorong Ye

On 2024/1/30 13:37, Ethan Zhao wrote:
> 
> On 1/29/2024 5:32 PM, Yi Liu wrote:
>> On 2024/1/29 11:49, Ethan Zhao wrote:
>>> For those endpoint devices connect to system via hotplug capable ports,
>>> users could request a hot reset to the device by flapping device's link
>>> through setting the slot's link control register, as pciehp_ist() DLLSC
>>> interrupt sequence response, pciehp will unload the device driver and
>>> then power it off. thus cause an IOMMU device-TLB invalidation (Intel
>>> VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for non-existence
>>> target device to be sent and deadly loop to retry that request after ITE
>>> fault triggered in interrupt context.
>>>
>>> That would cause following continuous hard lockup warning and system hang
>>>
>>> [ 4211.433662] pcieport 0000:17:01.0: pciehp: Slot(108): Link Down
>>> [ 4211.433664] pcieport 0000:17:01.0: pciehp: Slot(108): Card not present
>>> [ 4223.822591] NMI watchdog: Watchdog detected hard LOCKUP on cpu 144
>>> [ 4223.822622] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded 
>>> Tainted: G S
>>>           OE    kernel version xxxx
>>> [ 4223.822623] Hardware name: vendorname xxxx 666-106,
>>> BIOS 01.01.02.03.01 05/15/2023
>>> [ 4223.822623] RIP: 0010:qi_submit_sync+0x2c0/0x490
>>> [ 4223.822624] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 
>>> c1 48 8b
>>>   57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> 
>>> f6 c6 1
>>> 0 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39
>>> [ 4223.822624] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093
>>> [ 4223.822625] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 
>>> 0000000000000005
>>> [ 4223.822625] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: 
>>> ffff9f38401a8340
>>> [ 4223.822625] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 
>>> 0000000000000000
>>> [ 4223.822626] R10: 0000000000000010 R11: 0000000000000018 R12: 
>>> ffff9f384005e200
>>> [ 4223.822626] R13: 0000000000000004 R14: 0000000000000046 R15: 
>>> 0000000000000004
>>> [ 4223.822626] FS:  0000000000000000(0000) GS:ffffa237ae400000(0000)
>>> knlGS:0000000000000000
>>> [ 4223.822627] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 4223.822627] CR2: 00007ffe86515d80 CR3: 000002fd3000a001 CR4: 
>>> 0000000000770ee0
>>> [ 4223.822627] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>>> 0000000000000000
>>> [ 4223.822628] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 
>>> 0000000000000400
>>> [ 4223.822628] PKRU: 55555554
>>> [ 4223.822628] Call Trace:
>>> [ 4223.822628]  qi_flush_dev_iotlb+0xb1/0xd0
>>> [ 4223.822628]  __dmar_remove_one_dev_info+0x224/0x250
>>> [ 4223.822629]  dmar_remove_one_dev_info+0x3e/0x50
>>> [ 4223.822629]  intel_iommu_release_device+0x1f/0x30
>>> [ 4223.822629]  iommu_release_device+0x33/0x60
>>> [ 4223.822629]  iommu_bus_notifier+0x7f/0x90
>>> [ 4223.822630]  blocking_notifier_call_chain+0x60/0x90
>>> [ 4223.822630]  device_del+0x2e5/0x420
>>> [ 4223.822630]  pci_remove_bus_device+0x70/0x110
>>> [ 4223.822630]  pciehp_unconfigure_device+0x7c/0x130
>>> [ 4223.822631]  pciehp_disable_slot+0x6b/0x100
>>> [ 4223.822631]  pciehp_handle_presence_or_link_change+0xd8/0x320
>>> [ 4223.822631]  pciehp_ist+0x176/0x180
>>> [ 4223.822631]  ? irq_finalize_oneshot.part.50+0x110/0x110
>>> [ 4223.822632]  irq_thread_fn+0x19/0x50
>>> [ 4223.822632]  irq_thread+0x104/0x190
>>> [ 4223.822632]  ? irq_forced_thread_fn+0x90/0x90
>>> [ 4223.822632]  ? irq_thread_check_affinity+0xe0/0xe0
>>> [ 4223.822633]  kthread+0x114/0x130
>>> [ 4223.822633]  ? __kthread_cancel_work+0x40/0x40
>>> [ 4223.822633]  ret_from_fork+0x1f/0x30
>>> [ 4223.822633] Kernel panic - not syncing: Hard LOCKUP
>>> [ 4223.822634] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded 
>>> Tainted: G S
>>>           OE     kernel version xxxx
>>> [ 4223.822634] Hardware name: vendorname xxxx 666-106,
>>> BIOS 01.01.02.03.01 05/15/2023
>>> [ 4223.822634] Call Trace:
>>> [ 4223.822634]  <NMI>
>>> [ 4223.822635]  dump_stack+0x6d/0x88
>>> [ 4223.822635]  panic+0x101/0x2d0
>>> [ 4223.822635]  ? ret_from_fork+0x11/0x30
>>> [ 4223.822635]  nmi_panic.cold.14+0xc/0xc
>>> [ 4223.822636]  watchdog_overflow_callback.cold.8+0x6d/0x81
>>> [ 4223.822636]  __perf_event_overflow+0x4f/0xf0
>>> [ 4223.822636]  handle_pmi_common+0x1ef/0x290
>>> [ 4223.822636]  ? __set_pte_vaddr+0x28/0x40
>>> [ 4223.822637]  ? flush_tlb_one_kernel+0xa/0x20
>>> [ 4223.822637]  ? __native_set_fixmap+0x24/0x30
>>> [ 4223.822637]  ? ghes_copy_tofrom_phys+0x70/0x100
>>> [ 4223.822637]  ? __ghes_peek_estatus.isra.16+0x49/0xa0
>>> [ 4223.822637]  intel_pmu_handle_irq+0xba/0x2b0
>>> [ 4223.822638]  perf_event_nmi_handler+0x24/0x40
>>> [ 4223.822638]  nmi_handle+0x4d/0xf0
>>> [ 4223.822638]  default_do_nmi+0x49/0x100
>>> [ 4223.822638]  exc_nmi+0x134/0x180
>>> [ 4223.822639]  end_repeat_nmi+0x16/0x67
>>> [ 4223.822639] RIP: 0010:qi_submit_sync+0x2c0/0x490
>>> [ 4223.822639] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 0f 95 
>>> c1 48 8b
>>>   57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 <40> 
>>> f6 c6 10
>>>   74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39
>>> [ 4223.822640] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093
>>> [ 4223.822640] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 
>>> 0000000000000005
>>> [ 4223.822640] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: 
>>> ffff9f38401a8340
>>> [ 4223.822641] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 
>>> 0000000000000000
>>> [ 4223.822641] R10: 0000000000000010 R11: 0000000000000018 R12: 
>>> ffff9f384005e200
>>> [ 4223.822641] R13: 0000000000000004 R14: 0000000000000046 R15: 
>>> 0000000000000004
>>> [ 4223.822641]  ? qi_submit_sync+0x2c0/0x490
>>> [ 4223.822642]  ? qi_submit_sync+0x2c0/0x490
>>> [ 4223.822642]  </NMI>
>>> [ 4223.822642]  qi_flush_dev_iotlb+0xb1/0xd0
>>> [ 4223.822642]  __dmar_remove_one_dev_info+0x224/0x250
>>> [ 4223.822643]  dmar_remove_one_dev_info+0x3e/0x50
>>> [ 4223.822643]  intel_iommu_release_device+0x1f/0x30
>>> [ 4223.822643]  iommu_release_device+0x33/0x60
>>> [ 4223.822643]  iommu_bus_notifier+0x7f/0x90
>>> [ 4223.822644]  blocking_notifier_call_chain+0x60/0x90
>>> [ 4223.822644]  device_del+0x2e5/0x420
>>> [ 4223.822644]  pci_remove_bus_device+0x70/0x110
>>> [ 4223.822644]  pciehp_unconfigure_device+0x7c/0x130
>>> [ 4223.822644]  pciehp_disable_slot+0x6b/0x100
>>> [ 4223.822645]  pciehp_handle_presence_or_link_change+0xd8/0x320
>>> [ 4223.822645]  pciehp_ist+0x176/0x180
>>> [ 4223.822645]  ? irq_finalize_oneshot.part.50+0x110/0x110
>>> [ 4223.822645]  irq_thread_fn+0x19/0x50
>>> [ 4223.822646]  irq_thread+0x104/0x190
>>> [ 4223.822646]  ? irq_forced_thread_fn+0x90/0x90
>>> [ 4223.822646]  ? irq_thread_check_affinity+0xe0/0xe0
>>> [ 4223.822646]  kthread+0x114/0x130
>>> [ 4223.822647]  ? __kthread_cancel_work+0x40/0x40
>>> [ 4223.822647]  ret_from_fork+0x1f/0x30
>>> [ 4223.822647] Kernel Offset: 0x6400000 from 0xffffffff81000000 (relocation
>>> range: 0xffffffff80000000-0xffffffffbfffffff)
>>>
>>> Such issue could be triggered by all kinds of regular surprise removal
>>> hotplug operation. like:
>>>
>>> 1. pull EP(endpoint device) out directly.
>>> 2. turn off EP's power.
>>> 3. bring the link down.
>>> etc.
>>>
>>> this patch aims to work for regular safe removal and surprise removal
>>> unplug. these hot unplug handling process could be optimized for fix the
>>> ATS Invalidation hang issue by calling pci_dev_is_disconnected() in
>>> function devtlb_invalidation_with_pasid() to check target device state to
>>> avoid sending meaningless ATS Invalidation request to iommu when device is
>>> gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)
>>>
>>> For safe removal, device wouldn't be removed until the whole software
>>> handling process is done, it wouldn't trigger the hard lock up issue
>>> caused by too long ATS Invalidation timeout wait. In safe removal path,
>>> device state isn't set to pci_channel_io_perm_failure in
>>> pciehp_unconfigure_device() by checking 'presence' parameter, calling
>>> pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return
>>> false there, wouldn't break the function.
>>>
>>> For surprise removal, device state is set to pci_channel_io_perm_failure in
>>> pciehp_unconfigure_device(), means device is already gone (disconnected)
>>> call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
>>> return true to break the function not to send ATS Invalidation request to
>>> the disconnected device blindly, thus avoid to trigger further ITE fault,
>>> and ITE fault will block all invalidation request to be handled.
>>> furthermore retry the timeout request could trigger hard lockup.
>>>
>>> safe removal (present) & surprise removal (not present)
>>>
>>> pciehp_ist()
>>>     pciehp_handle_presence_or_link_change()
>>>       pciehp_disable_slot()
>>>         remove_board()
>>>           pciehp_unconfigure_device(presence) {
>>>             if (!presence)
>>>                  pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
>>>             }
>>>
>>> this patch works for regular safe removal and surprise removal of ATS
>>> capable endpoint on PCIe switch downstream ports.
>>
>> this is not the real fix. So this series may focus on the real fix (avoid
>> dead loop in intel iommu driver when ITE happens), and in the end add this
>> patch as an optimization.
> 
> This is the second time I brought it on top of other patches as Baolu perfers
> 
> Bjorn also suggested to take this one as optimization addition to others.
> 
> Anyway, just the order in this patch list, the same result after applied.
> 
> to solve customer issue, this one is needed.

I think even without this patch, customer's issue can be fixed by the last
3 patches of this series. is it? So this patch is not the real fix customer
wants, but nice to have. That's why I think it is an optimization. The
result is the same after applying in mainline. It's fine to keep it the
first two of this series, but need to tell customer what kind of patches
need to be back-ported.

Regards,
Yi Liu

> 
>>
>>> Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface")
>>> Tested-by: Haorong Ye <yehaorong@bytedance.com>
>>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>>> ---
>>>   drivers/iommu/intel/pasid.c | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
>>> index 3239cefa4c33..953592125e4a 100644
>>> --- a/drivers/iommu/intel/pasid.c
>>> +++ b/drivers/iommu/intel/pasid.c
>>> @@ -214,6 +214,9 @@ devtlb_invalidation_with_pasid(struct intel_iommu 
>>> *iommu,
>>>       if (!info || !info->ats_enabled)
>>>           return;
>>>   +    if (pci_dev_is_disconnected(to_pci_dev(dev)))
>>> +        return;
>>> +
>>>       sid = info->bus << 8 | info->devfn;
>>>       qdep = info->ats_qdep;
>>>       pfsid = info->pfsid;
>>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 2/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected
  2024-01-31  4:25       ` Yi Liu
@ 2024-01-31  5:25         ` Ethan Zhao
  0 siblings, 0 replies; 39+ messages in thread
From: Ethan Zhao @ 2024-01-31  5:25 UTC (permalink / raw)
  To: Yi Liu, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: kevin.tian, dwmw2, will, lukas, iommu, linux-kernel, linux-pci,
	Haorong Ye

On 1/31/2024 12:25 PM, Yi Liu wrote:
> On 2024/1/30 13:37, Ethan Zhao wrote:
>>
>> On 1/29/2024 5:32 PM, Yi Liu wrote:
>>> On 2024/1/29 11:49, Ethan Zhao wrote:
>>>> For those endpoint devices connect to system via hotplug capable 
>>>> ports,
>>>> users could request a hot reset to the device by flapping device's 
>>>> link
>>>> through setting the slot's link control register, as pciehp_ist() 
>>>> DLLSC
>>>> interrupt sequence response, pciehp will unload the device driver and
>>>> then power it off. thus cause an IOMMU device-TLB invalidation (Intel
>>>> VT-d spec, or ATS Invalidation in PCIe spec r6.1) request for 
>>>> non-existence
>>>> target device to be sent and deadly loop to retry that request 
>>>> after ITE
>>>> fault triggered in interrupt context.
>>>>
>>>> That would cause following continuous hard lockup warning and 
>>>> system hang
>>>>
>>>> [ 4211.433662] pcieport 0000:17:01.0: pciehp: Slot(108): Link Down
>>>> [ 4211.433664] pcieport 0000:17:01.0: pciehp: Slot(108): Card not 
>>>> present
>>>> [ 4223.822591] NMI watchdog: Watchdog detected hard LOCKUP on cpu 144
>>>> [ 4223.822622] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded 
>>>> Tainted: G S
>>>>           OE    kernel version xxxx
>>>> [ 4223.822623] Hardware name: vendorname xxxx 666-106,
>>>> BIOS 01.01.02.03.01 05/15/2023
>>>> [ 4223.822623] RIP: 0010:qi_submit_sync+0x2c0/0x490
>>>> [ 4223.822624] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 
>>>> 0f 95 c1 48 8b
>>>>   57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 
>>>> <40> f6 c6 1
>>>> 0 74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39
>>>> [ 4223.822624] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093
>>>> [ 4223.822625] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 
>>>> 0000000000000005
>>>> [ 4223.822625] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: 
>>>> ffff9f38401a8340
>>>> [ 4223.822625] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 
>>>> 0000000000000000
>>>> [ 4223.822626] R10: 0000000000000010 R11: 0000000000000018 R12: 
>>>> ffff9f384005e200
>>>> [ 4223.822626] R13: 0000000000000004 R14: 0000000000000046 R15: 
>>>> 0000000000000004
>>>> [ 4223.822626] FS:  0000000000000000(0000) GS:ffffa237ae400000(0000)
>>>> knlGS:0000000000000000
>>>> [ 4223.822627] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [ 4223.822627] CR2: 00007ffe86515d80 CR3: 000002fd3000a001 CR4: 
>>>> 0000000000770ee0
>>>> [ 4223.822627] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>>>> 0000000000000000
>>>> [ 4223.822628] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 
>>>> 0000000000000400
>>>> [ 4223.822628] PKRU: 55555554
>>>> [ 4223.822628] Call Trace:
>>>> [ 4223.822628]  qi_flush_dev_iotlb+0xb1/0xd0
>>>> [ 4223.822628]  __dmar_remove_one_dev_info+0x224/0x250
>>>> [ 4223.822629]  dmar_remove_one_dev_info+0x3e/0x50
>>>> [ 4223.822629]  intel_iommu_release_device+0x1f/0x30
>>>> [ 4223.822629]  iommu_release_device+0x33/0x60
>>>> [ 4223.822629]  iommu_bus_notifier+0x7f/0x90
>>>> [ 4223.822630]  blocking_notifier_call_chain+0x60/0x90
>>>> [ 4223.822630]  device_del+0x2e5/0x420
>>>> [ 4223.822630]  pci_remove_bus_device+0x70/0x110
>>>> [ 4223.822630]  pciehp_unconfigure_device+0x7c/0x130
>>>> [ 4223.822631]  pciehp_disable_slot+0x6b/0x100
>>>> [ 4223.822631] pciehp_handle_presence_or_link_change+0xd8/0x320
>>>> [ 4223.822631]  pciehp_ist+0x176/0x180
>>>> [ 4223.822631]  ? irq_finalize_oneshot.part.50+0x110/0x110
>>>> [ 4223.822632]  irq_thread_fn+0x19/0x50
>>>> [ 4223.822632]  irq_thread+0x104/0x190
>>>> [ 4223.822632]  ? irq_forced_thread_fn+0x90/0x90
>>>> [ 4223.822632]  ? irq_thread_check_affinity+0xe0/0xe0
>>>> [ 4223.822633]  kthread+0x114/0x130
>>>> [ 4223.822633]  ? __kthread_cancel_work+0x40/0x40
>>>> [ 4223.822633]  ret_from_fork+0x1f/0x30
>>>> [ 4223.822633] Kernel panic - not syncing: Hard LOCKUP
>>>> [ 4223.822634] CPU: 144 PID: 1422 Comm: irq/57-pciehp Kdump: loaded 
>>>> Tainted: G S
>>>>           OE     kernel version xxxx
>>>> [ 4223.822634] Hardware name: vendorname xxxx 666-106,
>>>> BIOS 01.01.02.03.01 05/15/2023
>>>> [ 4223.822634] Call Trace:
>>>> [ 4223.822634]  <NMI>
>>>> [ 4223.822635]  dump_stack+0x6d/0x88
>>>> [ 4223.822635]  panic+0x101/0x2d0
>>>> [ 4223.822635]  ? ret_from_fork+0x11/0x30
>>>> [ 4223.822635]  nmi_panic.cold.14+0xc/0xc
>>>> [ 4223.822636]  watchdog_overflow_callback.cold.8+0x6d/0x81
>>>> [ 4223.822636]  __perf_event_overflow+0x4f/0xf0
>>>> [ 4223.822636]  handle_pmi_common+0x1ef/0x290
>>>> [ 4223.822636]  ? __set_pte_vaddr+0x28/0x40
>>>> [ 4223.822637]  ? flush_tlb_one_kernel+0xa/0x20
>>>> [ 4223.822637]  ? __native_set_fixmap+0x24/0x30
>>>> [ 4223.822637]  ? ghes_copy_tofrom_phys+0x70/0x100
>>>> [ 4223.822637]  ? __ghes_peek_estatus.isra.16+0x49/0xa0
>>>> [ 4223.822637]  intel_pmu_handle_irq+0xba/0x2b0
>>>> [ 4223.822638]  perf_event_nmi_handler+0x24/0x40
>>>> [ 4223.822638]  nmi_handle+0x4d/0xf0
>>>> [ 4223.822638]  default_do_nmi+0x49/0x100
>>>> [ 4223.822638]  exc_nmi+0x134/0x180
>>>> [ 4223.822639]  end_repeat_nmi+0x16/0x67
>>>> [ 4223.822639] RIP: 0010:qi_submit_sync+0x2c0/0x490
>>>> [ 4223.822639] Code: 48 be 00 00 00 00 00 08 00 00 49 85 74 24 20 
>>>> 0f 95 c1 48 8b
>>>>   57 10 83 c1 04 83 3c 1a 03 0f 84 a2 01 00 00 49 8b 04 24 8b 70 34 
>>>> <40> f6 c6 10
>>>>   74 17 49 8b 04 24 8b 80 80 00 00 00 89 c2 d3 fa 41 39
>>>> [ 4223.822640] RSP: 0018:ffffc4f074f0bbb8 EFLAGS: 00000093
>>>> [ 4223.822640] RAX: ffffc4f040059000 RBX: 0000000000000014 RCX: 
>>>> 0000000000000005
>>>> [ 4223.822640] RDX: ffff9f3841315800 RSI: 0000000000000000 RDI: 
>>>> ffff9f38401a8340
>>>> [ 4223.822641] RBP: ffff9f38401a8340 R08: ffffc4f074f0bc00 R09: 
>>>> 0000000000000000
>>>> [ 4223.822641] R10: 0000000000000010 R11: 0000000000000018 R12: 
>>>> ffff9f384005e200
>>>> [ 4223.822641] R13: 0000000000000004 R14: 0000000000000046 R15: 
>>>> 0000000000000004
>>>> [ 4223.822641]  ? qi_submit_sync+0x2c0/0x490
>>>> [ 4223.822642]  ? qi_submit_sync+0x2c0/0x490
>>>> [ 4223.822642]  </NMI>
>>>> [ 4223.822642]  qi_flush_dev_iotlb+0xb1/0xd0
>>>> [ 4223.822642]  __dmar_remove_one_dev_info+0x224/0x250
>>>> [ 4223.822643]  dmar_remove_one_dev_info+0x3e/0x50
>>>> [ 4223.822643]  intel_iommu_release_device+0x1f/0x30
>>>> [ 4223.822643]  iommu_release_device+0x33/0x60
>>>> [ 4223.822643]  iommu_bus_notifier+0x7f/0x90
>>>> [ 4223.822644]  blocking_notifier_call_chain+0x60/0x90
>>>> [ 4223.822644]  device_del+0x2e5/0x420
>>>> [ 4223.822644]  pci_remove_bus_device+0x70/0x110
>>>> [ 4223.822644]  pciehp_unconfigure_device+0x7c/0x130
>>>> [ 4223.822644]  pciehp_disable_slot+0x6b/0x100
>>>> [ 4223.822645] pciehp_handle_presence_or_link_change+0xd8/0x320
>>>> [ 4223.822645]  pciehp_ist+0x176/0x180
>>>> [ 4223.822645]  ? irq_finalize_oneshot.part.50+0x110/0x110
>>>> [ 4223.822645]  irq_thread_fn+0x19/0x50
>>>> [ 4223.822646]  irq_thread+0x104/0x190
>>>> [ 4223.822646]  ? irq_forced_thread_fn+0x90/0x90
>>>> [ 4223.822646]  ? irq_thread_check_affinity+0xe0/0xe0
>>>> [ 4223.822646]  kthread+0x114/0x130
>>>> [ 4223.822647]  ? __kthread_cancel_work+0x40/0x40
>>>> [ 4223.822647]  ret_from_fork+0x1f/0x30
>>>> [ 4223.822647] Kernel Offset: 0x6400000 from 0xffffffff81000000 
>>>> (relocation
>>>> range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>
>>>> Such issue could be triggered by all kinds of regular surprise removal
>>>> hotplug operation. like:
>>>>
>>>> 1. pull EP(endpoint device) out directly.
>>>> 2. turn off EP's power.
>>>> 3. bring the link down.
>>>> etc.
>>>>
>>>> this patch aims to work for regular safe removal and surprise removal
>>>> unplug. these hot unplug handling process could be optimized for 
>>>> fix the
>>>> ATS Invalidation hang issue by calling pci_dev_is_disconnected() in
>>>> function devtlb_invalidation_with_pasid() to check target device 
>>>> state to
>>>> avoid sending meaningless ATS Invalidation request to iommu when 
>>>> device is
>>>> gone. (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)
>>>>
>>>> For safe removal, device wouldn't be removed until the whole software
>>>> handling process is done, it wouldn't trigger the hard lock up issue
>>>> caused by too long ATS Invalidation timeout wait. In safe removal 
>>>> path,
>>>> device state isn't set to pci_channel_io_perm_failure in
>>>> pciehp_unconfigure_device() by checking 'presence' parameter, calling
>>>> pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will 
>>>> return
>>>> false there, wouldn't break the function.
>>>>
>>>> For surprise removal, device state is set to 
>>>> pci_channel_io_perm_failure in
>>>> pciehp_unconfigure_device(), means device is already gone 
>>>> (disconnected)
>>>> call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() 
>>>> will
>>>> return true to break the function not to send ATS Invalidation 
>>>> request to
>>>> the disconnected device blindly, thus avoid to trigger further ITE 
>>>> fault,
>>>> and ITE fault will block all invalidation request to be handled.
>>>> furthermore retry the timeout request could trigger hard lockup.
>>>>
>>>> safe removal (present) & surprise removal (not present)
>>>>
>>>> pciehp_ist()
>>>>     pciehp_handle_presence_or_link_change()
>>>>       pciehp_disable_slot()
>>>>         remove_board()
>>>>           pciehp_unconfigure_device(presence) {
>>>>             if (!presence)
>>>>                  pci_walk_bus(parent, pci_dev_set_disconnected, NULL);
>>>>             }
>>>>
>>>> this patch works for regular safe removal and surprise removal of ATS
>>>> capable endpoint on PCIe switch downstream ports.
>>>
>>> this is not the real fix. So this series may focus on the real fix 
>>> (avoid
>>> dead loop in intel iommu driver when ITE happens), and in the end 
>>> add this
>>> patch as an optimization.
>>
>> This is the second time I brought it on top of other patches as Baolu 
>> perfers
>>
>> Bjorn also suggested to take this one as optimization addition to 
>> others.
>>
>> Anyway, just the order in this patch list, the same result after 
>> applied.
>>
>> to solve customer issue, this one is needed.
>
> I think even without this patch, customer's issue can be fixed by the 
> last
> 3 patches of this series. is it? So this patch is not the real fix 
> customer
> wants, but nice to have. That's why I think it is an optimization. The
> result is the same after applying in mainline. It's fine to keep it the
> first two of this series, but need to tell customer what kind of patches
> need to be back-ported.

The timeout threshold of hard lockup watchdog could be configured by user via
sysctl sysfs etc, I am not 100% sure the handling after ITE could suppress all
user's hard lockup warning.

So avoid to be trapped into fault handling is still the first choice in my
opinion.

Thanks,
Ethan

>
> Regards,
> Yi Liu
>
>>
>>>
>>>> Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table 
>>>> interface")
>>>> Tested-by: Haorong Ye <yehaorong@bytedance.com>
>>>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>>>> ---
>>>>   drivers/iommu/intel/pasid.c | 3 +++
>>>>   1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
>>>> index 3239cefa4c33..953592125e4a 100644
>>>> --- a/drivers/iommu/intel/pasid.c
>>>> +++ b/drivers/iommu/intel/pasid.c
>>>> @@ -214,6 +214,9 @@ devtlb_invalidation_with_pasid(struct 
>>>> intel_iommu *iommu,
>>>>       if (!info || !info->ats_enabled)
>>>>           return;
>>>>   +    if (pci_dev_is_disconnected(to_pci_dev(dev)))
>>>> +        return;
>>>> +
>>>>       sid = info->bus << 8 | info->devfn;
>>>>       qdep = info->ats_qdep;
>>>>       pfsid = info->pfsid;
>>>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-30  9:24                 ` Tian, Kevin
@ 2024-01-31  5:42                   ` Ethan Zhao
  0 siblings, 0 replies; 39+ messages in thread
From: Ethan Zhao @ 2024-01-31  5:42 UTC (permalink / raw)
  To: Tian, Kevin, Liu, Yi L, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: dwmw2, will, lukas, iommu, linux-kernel, linux-pci

On 1/30/2024 5:24 PM, Tian, Kevin wrote:
>> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> Sent: Tuesday, January 30, 2024 5:13 PM
>>
>> On 1/30/2024 4:43 PM, Tian, Kevin wrote:
>>>> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
>>>> Sent: Tuesday, January 30, 2024 4:16 PM
>>>>
>>>> On 1/30/2024 2:22 PM, Tian, Kevin wrote:
>>>>> Here we need consider two situations.
>>>>>
>>>>> One is that the device is not bound to a driver or bound to a driver
>>>>> which doesn't do active work to the device when it's removed. In
>>>>> that case one may observe the timeout situation only in the removal
>>>>> path as the stack dump in your patch02 shows.
>>>> When iommu_bus_notifier() got called for hotplug removal cases to
>>>> flush devTLB (ATS invalidation), driver was already unloaded.
>>>> whatever safe removal or surprise removal. so in theory no active
>>>> driver working there.
>>>>
>>>> pciehp_ist()
>>>>     pciehp_disable_slot()
>>>>      remove_board()
>>>>       pciehp_unconfigure_device()
>>>>        pci_stop_and_remove_bus_device()
>>>>         pci_stop_bus_device()--->here unload driver
>>>>         pci_remove_bus_device()->here qi_flush_dev_iotlb() got called.
>>> yes, so patch02 can fix this case.
>>>
>>>>> patch02 can fix that case by checking whether the device is present
>>>>> to skip sending the invalidation requests. So the logic being discussed
>>>>> here doesn't matter.
>>>>>
>>>>> The 2nd situation is more tricky. The device might be bound to
>>>>> a driver which is doing active work to the device with in-fly
>>>>> ATS invalidation requests. In this case in-fly requests must be aborted
>>>>> before the driver can be detached from the removed device.
>> Conceptually
>>>>> a device is removed from the bus only after its driver is detached.
>>>> Some tricky situations:
>>>>
>>>> 1. The ATS invalidation request is issued from driver driver, while it is
>>>> in handling, device is removed. this momment, the device instance still
>>>> exists in the bus list. yes, if searching it by BDF, could get it.
>>> it's searchable between the point where the device is removed and the
>>> point where the driver is unloaded:
>>>
>>>           CPU0                                CPU1
>>>     (Driver is active)                    (pciehp handler)
>>>     qi_submit_sync()                      pciehp_ist()
>>>       ...                                   ...
>>>       loop for completion() {               pciehp_unconfigure_device()
>>>         ...                                   pci_dev_set_disconnected()
>>>         if (ITE) {                            ...
>>>           //find pci_dev from sid             pci_remove_bus_device()
>>>           if (pci_dev_is_connected())           device_del()
>>>             break;                                bus_remove_device()
>>>         }                                           device_remove_driver()
>> If the device was hot plugin or re-scanned, the device has a PCI_DEV_ADDED
>> flag,
> in this case is pci_dev_is_disconnected() true or false?
>
> how is this patch supposed to work with it?

pci_dev_is_disconnected() is true for safe removal, false for surprise 
removal, but it not called in this patch, is used in patch[2/5], 
explained in its commit log. This patch use the pci_device_is_present() 
to check device present or not. if pci_dev_is_disconnected() returns true, then check its presence by pci vendor
configuration reading (a specific protocal in PCIe spec).

>
>> if so the driver unloading work isn't defered to the tail of device_del(), it
>> is unloaded before pci_remove_bus_device()->device_del(), in pci_stop_dev
>>
>> pci_stop_bus_device()
>>    pci_stop_dev()
>>    {
>>     if (pci_dev_is_added(dev)) {
>>         device_release_driver(&dev->dev);
>>    }
> no matter where driver unload is requested, it needs to wait for aborting
> in-fly request on CPU0.

yes, the progress of driver unloading has complex sync mechanism in
  __device_release_driver() to do that.

>
>> So the interval the device is searchable, only applied to those devices
>> not hot plugged, or never be scanned.
>>
> and in the worst case even if pci_dev is not searchable, isn't it already
> an indicator that the device is absent then qi_submit_sync() should
> just exit upon ITE?

Hmmm, pci_dev is not searchable, but that pci_dev instance is just not in
the bus list or device list, not mean is disconnected or not present that
moment. :)


Thanks,
Ethan


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-30 16:29             ` Jason Gunthorpe
@ 2024-01-31  6:21               ` Baolu Lu
  2024-02-01 19:34                 ` Jason Gunthorpe
  0 siblings, 1 reply; 39+ messages in thread
From: Baolu Lu @ 2024-01-31  6:21 UTC (permalink / raw)
  To: Jason Gunthorpe, Ethan Zhao
  Cc: baolu.lu, Tian, Kevin, Liu, Yi L, bhelgaas, robin.murphy, dwmw2,
	will, lukas, iommu, linux-kernel, linux-pci

On 2024/1/31 0:29, Jason Gunthorpe wrote:
> On Tue, Jan 30, 2024 at 04:15:33PM +0800, Ethan Zhao wrote:
>> Some tricky situations:
>>
>> 1. The ATS invalidation request is issued from driver driver, while it is
>> in handling, device is removed. this momment, the device instance still
>> exists in the bus list. yes, if searching it by BDF, could get it.
>>
>> 2. The ATS invalidation request is issued from iommu_bus_notifier()
>> for surprise removal reason, as shown in above calltrace, device was
>> already removed from bus list. if searching it by BDF, return NULL.
>>
>> 3. The ATS invlidation request is issued from iommu_bus_notifier()
>> for safe removal, when is in handling, device is removed or link
>> is down. also as #2, device was already removed from bus list.
>> if searching it by BDF. got NULL.
>> ...
>>
>> so, searching device by BDF, only works for the ATS invalidation
>> request is from device driver.
> In the good path, where the hot removal is expected and this is about
> coordinating, the IOMMU driver should do an orderly shutdown of the
> ATS mechanism:
> 
>   1 Write to PCI config space to disable the ATS
>   2 Make the IOMMU respond to ATS requests with UR and set it to BLOCKED
>   3 Issue a flush of the ATC
>   4 Wait for all outstanding ATC flushes to complete
> 
> If it is a bad/surprise path where the device is already gone then:
> 
>   1 should automatically not do anything, possibly timing out
>   2 must succeed
>   3 should time out
>   4 should "complete" in that the ATC flushes are all timed out
> 
> IMHO all you need to do is not crash/lockup while processing the ATC
> timeouts. If this is a surprise path then the ATC timeout might
> already happened before the iommu driver remove notifier event happens.
> 
> If the driver needs to translate from the IOMMU device table index
> into a struct device it is probably best to do that inside the driver.
> 
> eg ARM maintains a rbtree in the iommu dev data. (see
> arm_smmu_insert_master)

An rbtree for IOMMU device data for the VT-d driver would be beneficial.
It also benefits other paths of fault handling, such as the I/O page
fault handling path, where it currently still relies on the PCI
subsystem to convert a RID value into a pci_device structure.

Given that such an rbtree would be helpful for multiple individual
drivers that handle PCI devices, it seems valuable to implement it in
the core?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-01-31  6:21               ` Baolu Lu
@ 2024-02-01 19:34                 ` Jason Gunthorpe
  2024-02-15  7:37                   ` Baolu Lu
  0 siblings, 1 reply; 39+ messages in thread
From: Jason Gunthorpe @ 2024-02-01 19:34 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Ethan Zhao, Tian, Kevin, Liu, Yi L, bhelgaas, robin.murphy,
	dwmw2, will, lukas, iommu, linux-kernel, linux-pci

On Wed, Jan 31, 2024 at 02:21:20PM +0800, Baolu Lu wrote:
> An rbtree for IOMMU device data for the VT-d driver would be beneficial.
> It also benefits other paths of fault handling, such as the I/O page
> fault handling path, where it currently still relies on the PCI
> subsystem to convert a RID value into a pci_device structure.
> 
> Given that such an rbtree would be helpful for multiple individual
> drivers that handle PCI devices, it seems valuable to implement it in
> the core?

rbtree is already supposed to be a re-usable library.

There is already good helper support in rbtree to make things easy to
implement. I see arm hasn't used them yet, it should look something
like this:

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index f58e77b99d476b..ebf86c6a8787c4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1673,26 +1673,37 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 	return 0;
 }
 
+static int arm_smmu_streams_cmp_key(const void *lhs, const struct rb_node *rhs)
+{
+	struct arm_smmu_stream *stream_rhs =
+		rb_entry(rhs, struct arm_smmu_stream, node);
+	const u32 *sid_lhs = lhs;
+
+	if (*sid_lhs < stream_rhs->id)
+		return -1;
+	if (*sid_lhs > stream_rhs->id)
+		return 1;
+	return 0;
+}
+
+static int arm_smmu_streams_cmp_node(struct rb_node *lhs,
+				     const struct rb_node *rhs)
+{
+	return arm_smmu_streams_cmp_key(
+		&rb_entry(lhs, struct arm_smmu_stream, node)->id, rhs);
+}
+
 static struct arm_smmu_master *
 arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid)
 {
 	struct rb_node *node;
-	struct arm_smmu_stream *stream;
 
 	lockdep_assert_held(&smmu->streams_mutex);
 
-	node = smmu->streams.rb_node;
-	while (node) {
-		stream = rb_entry(node, struct arm_smmu_stream, node);
-		if (stream->id < sid)
-			node = node->rb_right;
-		else if (stream->id > sid)
-			node = node->rb_left;
-		else
-			return stream->master;
-	}
-
-	return NULL;
+	node = rb_find(&sid, &smmu->streams, arm_smmu_streams_cmp_key);
+	if (!node)
+		return NULL;
+	return rb_entry(node, struct arm_smmu_stream, node)->master;
 }
 
 /* IRQ and event handlers */
@@ -3324,8 +3335,6 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 {
 	int i;
 	int ret = 0;
-	struct arm_smmu_stream *new_stream, *cur_stream;
-	struct rb_node **new_node, *parent_node = NULL;
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
 
 	master->streams = kcalloc(fwspec->num_ids, sizeof(*master->streams),
@@ -3336,6 +3345,7 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 
 	mutex_lock(&smmu->streams_mutex);
 	for (i = 0; i < fwspec->num_ids; i++) {
+		struct arm_smmu_stream *new_stream;
 		u32 sid = fwspec->ids[i];
 
 		new_stream = &master->streams[i];
@@ -3347,28 +3357,13 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 			break;
 
 		/* Insert into SID tree */
-		new_node = &(smmu->streams.rb_node);
-		while (*new_node) {
-			cur_stream = rb_entry(*new_node, struct arm_smmu_stream,
-					      node);
-			parent_node = *new_node;
-			if (cur_stream->id > new_stream->id) {
-				new_node = &((*new_node)->rb_left);
-			} else if (cur_stream->id < new_stream->id) {
-				new_node = &((*new_node)->rb_right);
-			} else {
-				dev_warn(master->dev,
-					 "stream %u already in tree\n",
-					 cur_stream->id);
-				ret = -EINVAL;
-				break;
-			}
-		}
-		if (ret)
+		if (rb_find_add(&new_stream->node, &smmu->streams,
+				arm_smmu_streams_cmp_node)) {
+			dev_warn(master->dev, "stream %u already in tree\n",
+				 sid);
+			ret = -EINVAL;
 			break;
-
-		rb_link_node(&new_stream->node, parent_node, new_node);
-		rb_insert_color(&new_stream->node, &smmu->streams);
+		}
 	}
 
 	if (ret) {

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 4/5] iommu/vt-d: pass pdev parameter for qi_check_fault() and refactor callers
  2024-01-29  3:49 ` [PATCH v12 4/5] iommu/vt-d: pass pdev parameter for qi_check_fault() and refactor callers Ethan Zhao
  2024-01-29  8:58   ` Tian, Kevin
@ 2024-02-08  7:15   ` Dan Carpenter
  2024-02-09  2:08     ` Ethan Zhao
  1 sibling, 1 reply; 39+ messages in thread
From: Dan Carpenter @ 2024-02-08  7:15 UTC (permalink / raw)
  To: oe-kbuild, Ethan Zhao, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: lkp, oe-kbuild-all, kevin.tian, dwmw2, will, lukas, yi.l.liu,
	iommu, linux-kernel, linux-pci, Ethan Zhao

Hi Ethan,

kernel test robot noticed the following build warnings:

url:    https://github.com/intel-lab-lkp/linux/commits/Ethan-Zhao/PCI-make-pci_dev_is_disconnected-helper-public-for-other-drivers/20240129-115259
base:   41bccc98fb7931d63d03f326a746ac4d429c1dd3
patch link:    https://lore.kernel.org/r/20240129034924.817005-5-haifeng.zhao%40linux.intel.com
patch subject: [PATCH v12 4/5] iommu/vt-d: pass pdev parameter for qi_check_fault() and refactor callers
config: x86_64-randconfig-161-20240207 (https://download.01.org/0day-ci/archive/20240208/202402080321.n77hu71Y-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202402080321.n77hu71Y-lkp@intel.com/

smatch warnings:
drivers/iommu/intel/dmar.c:1533 qi_flush_dev_iotlb() error: we previously assumed 'info->dev' could be null (see line 1533)

vim +1533 drivers/iommu/intel/dmar.c

20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1522  void qi_flush_dev_iotlb(struct intel_iommu *iommu,
20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1523  			struct device_domain_info *info, u64 addr,
20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1524  			unsigned int mask)
6ba6c3a4cacfd6 drivers/pci/dmar.c         Yu Zhao      2009-05-18  1525  {
c830e699e08a6c drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1526  	struct pci_dev *pdev = NULL;
20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1527  	u16 sid, qdep, pfsid;
6ba6c3a4cacfd6 drivers/pci/dmar.c         Yu Zhao      2009-05-18  1528  	struct qi_desc desc;
6ba6c3a4cacfd6 drivers/pci/dmar.c         Yu Zhao      2009-05-18  1529  
20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1530  	if (!info || !info->ats_enabled)
20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1531  		return;
20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1532  
c830e699e08a6c drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28 @1533  	if (info->dev || !dev_is_pci(info->dev))

Missing ! character.  if (!info->dev

c830e699e08a6c drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1534  		return;
c830e699e08a6c drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1535  
c830e699e08a6c drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1536  	pdev = to_pci_dev(info->dev);
20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1537  	sid = info->bus << 8 | info->devfn;
20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1538  	qdep = info->ats_qdep;
20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1539  	pfsid = info->pfsid;

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 4/5] iommu/vt-d: pass pdev parameter for qi_check_fault() and refactor callers
  2024-02-08  7:15   ` Dan Carpenter
@ 2024-02-09  2:08     ` Ethan Zhao
  0 siblings, 0 replies; 39+ messages in thread
From: Ethan Zhao @ 2024-02-09  2:08 UTC (permalink / raw)
  To: Dan Carpenter, oe-kbuild, baolu.lu, bhelgaas, robin.murphy, jgg
  Cc: lkp, oe-kbuild-all, kevin.tian, dwmw2, will, lukas, yi.l.liu,
	iommu, linux-kernel, linux-pci


On 2/8/2024 3:15 PM, Dan Carpenter wrote:
> Hi Ethan,
>
> kernel test robot noticed the following build warnings:
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Ethan-Zhao/PCI-make-pci_dev_is_disconnected-helper-public-for-other-drivers/20240129-115259
> base:   41bccc98fb7931d63d03f326a746ac4d429c1dd3
> patch link:    https://lore.kernel.org/r/20240129034924.817005-5-haifeng.zhao%40linux.intel.com
> patch subject: [PATCH v12 4/5] iommu/vt-d: pass pdev parameter for qi_check_fault() and refactor callers
> config: x86_64-randconfig-161-20240207 (https://download.01.org/0day-ci/archive/20240208/202402080321.n77hu71Y-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
> | Closes: https://lore.kernel.org/r/202402080321.n77hu71Y-lkp@intel.com/
>
> smatch warnings:
> drivers/iommu/intel/dmar.c:1533 qi_flush_dev_iotlb() error: we previously assumed 'info->dev' could be null (see line 1533)
>
> vim +1533 drivers/iommu/intel/dmar.c
>
> 20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1522  void qi_flush_dev_iotlb(struct intel_iommu *iommu,
> 20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1523  			struct device_domain_info *info, u64 addr,
> 20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1524  			unsigned int mask)
> 6ba6c3a4cacfd6 drivers/pci/dmar.c         Yu Zhao      2009-05-18  1525  {
> c830e699e08a6c drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1526  	struct pci_dev *pdev = NULL;
> 20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1527  	u16 sid, qdep, pfsid;
> 6ba6c3a4cacfd6 drivers/pci/dmar.c         Yu Zhao      2009-05-18  1528  	struct qi_desc desc;
> 6ba6c3a4cacfd6 drivers/pci/dmar.c         Yu Zhao      2009-05-18  1529
> 20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1530  	if (!info || !info->ats_enabled)
> 20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1531  		return;
> 20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1532
> c830e699e08a6c drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28 @1533  	if (info->dev || !dev_is_pci(info->dev))
>
> Missing ! character.  if (!info->dev

Got it, thanks.

>
> c830e699e08a6c drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1534  		return;
> c830e699e08a6c drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1535
> c830e699e08a6c drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1536  	pdev = to_pci_dev(info->dev);
> 20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1537  	sid = info->bus << 8 | info->devfn;
> 20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1538  	qdep = info->ats_qdep;
> 20da7293134024 drivers/iommu/intel/dmar.c Ethan Zhao   2024-01-28  1539  	pfsid = info->pfsid;
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present
  2024-02-01 19:34                 ` Jason Gunthorpe
@ 2024-02-15  7:37                   ` Baolu Lu
  0 siblings, 0 replies; 39+ messages in thread
From: Baolu Lu @ 2024-02-15  7:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: baolu.lu, Ethan Zhao, Tian, Kevin, Liu, Yi L, bhelgaas,
	robin.murphy, dwmw2, will, lukas, iommu, linux-kernel, linux-pci

On 2/2/24 3:34 AM, Jason Gunthorpe wrote:
> On Wed, Jan 31, 2024 at 02:21:20PM +0800, Baolu Lu wrote:
>> An rbtree for IOMMU device data for the VT-d driver would be beneficial.
>> It also benefits other paths of fault handling, such as the I/O page
>> fault handling path, where it currently still relies on the PCI
>> subsystem to convert a RID value into a pci_device structure.
>>
>> Given that such an rbtree would be helpful for multiple individual
>> drivers that handle PCI devices, it seems valuable to implement it in
>> the core?
> rbtree is already supposed to be a re-usable library.
> 
> There is already good helper support in rbtree to make things easy to
> implement. I see arm hasn't used them yet, it should look something
> like this:

I have posted a similar implementation for the vt-d driver here:

https://lore.kernel.org/linux-iommu/20240215072249.4465-1-baolu.lu@linux.intel.com/

Based on this implementation, only patches 1 and 2 are required. The
last patch could be like below (code compiled but not tested, comments
not changed yet):

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index f9b63c2875f7..30a659a4d3ed 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1272,6 +1272,7 @@ static int qi_check_fault(struct intel_iommu 
*iommu, int index, int wait_index)
  {
         u32 fault;
         int head, tail;
+       u64 iqe_err, ite_sid;
         struct q_inval *qi = iommu->qi;
         int shift = qi_shift(iommu);

@@ -1316,6 +1317,13 @@ static int qi_check_fault(struct intel_iommu 
*iommu, int index, int wait_index)
                 tail = readl(iommu->reg + DMAR_IQT_REG);
                 tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;

+               /*
+                * SID field is valid only when the ITE field is Set in 
FSTS_REG
+                * see Intel VT-d spec r4.1, section 11.4.9.9
+                */
+               iqe_err = dmar_readq(iommu->reg + DMAR_IQER_REG);
+               ite_sid = DMAR_IQER_REG_ITESID(iqe_err);
+
                 writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
                 pr_info("Invalidation Time-out Error (ITE) cleared\n");

@@ -1325,6 +1333,19 @@ static int qi_check_fault(struct intel_iommu 
*iommu, int index, int wait_index)
                         head = (head - 2 + QI_LENGTH) % QI_LENGTH;
                 } while (head != tail);

+               /*
+                * If got ITE, we need to check if the sid of ITE is the 
same as
+                * current ATS invalidation target device, if yes, don't 
try this
+                * request anymore if the target device isn't present.
+                * 0 value of ite_sid means old VT-d device, no ite_sid 
value.
+                */
+               if (ite_sid) {
+                       struct device *dev = device_rbtree_find(iommu, 
ite_sid);
+
+                       if (!dev || !pci_device_is_present(to_pci_dev(dev)))
+                               return -ETIMEDOUT;
+               }
+
                 if (qi->desc_status[wait_index] == QI_ABORT)
                         return -EAGAIN;
         }

Best regards,
baolu

^ permalink raw reply related	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2024-02-15  7:43 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-29  3:49 [PATCH v12 0/5] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
2024-01-29  3:49 ` [PATCH v12 1/5] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
2024-01-29  8:50   ` Tian, Kevin
2024-01-30  5:23     ` Ethan Zhao
2024-01-30  5:25     ` Ethan Zhao
2024-01-30  6:23       ` Tian, Kevin
2024-01-29  3:49 ` [PATCH v12 2/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected Ethan Zhao
2024-01-29  8:53   ` Tian, Kevin
2024-01-29  9:32   ` Yi Liu
2024-01-30  5:37     ` Ethan Zhao
2024-01-31  4:25       ` Yi Liu
2024-01-31  5:25         ` Ethan Zhao
2024-01-29  3:49 ` [PATCH v12 3/5] iommu/vt-d: simplify parameters of qi_submit_sync() ATS invalidation callers Ethan Zhao
2024-01-29  9:37   ` Yi Liu
2024-01-30  5:43     ` Ethan Zhao
2024-01-29  3:49 ` [PATCH v12 4/5] iommu/vt-d: pass pdev parameter for qi_check_fault() and refactor callers Ethan Zhao
2024-01-29  8:58   ` Tian, Kevin
2024-01-30  7:30     ` Ethan Zhao
2024-02-08  7:15   ` Dan Carpenter
2024-02-09  2:08     ` Ethan Zhao
2024-01-29  3:49 ` [PATCH v12 5/5] iommu/vt-d: improve ITE fault handling if target device isn't present Ethan Zhao
2024-01-29  9:06   ` Tian, Kevin
2024-01-29  9:21     ` Yi Liu
2024-01-30  5:12       ` Ethan Zhao
2024-01-30  6:22         ` Tian, Kevin
2024-01-30  8:15           ` Ethan Zhao
2024-01-30  8:43             ` Tian, Kevin
2024-01-30  9:13               ` Ethan Zhao
2024-01-30  9:24                 ` Tian, Kevin
2024-01-31  5:42                   ` Ethan Zhao
2024-01-30 16:29             ` Jason Gunthorpe
2024-01-31  6:21               ` Baolu Lu
2024-02-01 19:34                 ` Jason Gunthorpe
2024-02-15  7:37                   ` Baolu Lu
2024-01-29 14:48     ` Baolu Lu
2024-01-30  3:28       ` Tian, Kevin
2024-01-30  8:43       ` Ethan Zhao
2024-01-29  9:33   ` Yi Liu
2024-01-29  5:16 ` [PATCH v12 0/5] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).