All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v10 3/5] PCI: make pci_dev_is_disconnected() helper public for other drivers
@ 2023-12-28 17:05 Ethan Zhao
  2023-12-28 17:05 ` [RFC PATCH v10 4/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected Ethan Zhao
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Ethan Zhao @ 2023-12-28 17:05 UTC (permalink / raw)
  To: kevin.tian, bhelgaas, baolu.lu, dwmw2, will, robin.murphy, lukas
  Cc: linux-pci, iommu, linux-kernel

Make pci_dev_is_disconnected() public so that it can be called from
Intel VT-d driver to quickly fix/workaround the surprise removal
unplug hang issue for those ATS capable devices on PCIe switch downstream
hotplug capable ports.

Beside pci_device_is_present() function, this one has no config space
space access, so is light enough to optimize the normal pure surprise
removal and safe removal flow.

Tested-by: Haorong Ye <yehaorong@bytedance.com>
Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
---
 drivers/pci/pci.h   | 5 -----
 include/linux/pci.h | 5 +++++
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 5ecbcf041179..75fa2084492f 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -366,11 +366,6 @@ static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
 	return 0;
 }
 
-static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
-{
-	return dev->error_state == pci_channel_io_perm_failure;
-}
-
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
 #define PCI_DPC_RECOVERED 1
diff --git a/include/linux/pci.h b/include/linux/pci.h
index dea043bc1e38..4779eec8b267 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2506,6 +2506,11 @@ static inline struct pci_dev *pcie_find_root_port(struct pci_dev *dev)
 	return NULL;
 }
 
+static inline bool pci_dev_is_disconnected(const struct pci_dev *dev)
+{
+	return dev->error_state == pci_channel_io_perm_failure;
+}
+
 void pci_request_acs(void);
 bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
 bool pci_acs_path_enabled(struct pci_dev *start,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH v10 4/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected
  2023-12-28 17:05 [RFC PATCH v10 3/5] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
@ 2023-12-28 17:05 ` Ethan Zhao
  2024-01-10  5:24   ` Baolu Lu
  2023-12-28 17:05 ` [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever Ethan Zhao
  2024-01-10  5:25 ` [RFC PATCH v10 3/5] PCI: make pci_dev_is_disconnected() helper public for other drivers Baolu Lu
  2 siblings, 1 reply; 16+ messages in thread
From: Ethan Zhao @ 2023-12-28 17:05 UTC (permalink / raw)
  To: kevin.tian, bhelgaas, baolu.lu, dwmw2, will, robin.murphy, lukas
  Cc: linux-pci, iommu, linux-kernel

Except those aggressive hotplug cases - surprise remove a hotplug device
while its safe removal is requested and handled in process by:

1. pull it out directly.
2. turn off its power.
3. bring the link down.
4. just died there that moment.

etc, in a word, 'gone' or 'disconnected'.

Mostly are regular normal safe removal and surprise removal unplug.
these hot unplug handling process could be optimized for fix the ATS
Invalidation hang issue by calling pci_dev_is_disconnected() in function
devtlb_invalidation_with_pasid() to check target device state to avoid
sending meaningless ATS Invalidation request to iommu when device is gone.
(see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)

For safe removal, device wouldn't be removed untill the whole software
handling process is done, it wouldn't trigger the hard lock up issue
caused by too long ATS Invalidation timeout wait. In safe removal path,
device state isn't set to pci_channel_io_perm_failure in
pciehp_unconfigure_device() by checking 'presence' parameter, calling
pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return
false there, wouldn't break the function.

For surprise removal, device state is set to pci_channel_io_perm_failure in
pciehp_unconfigure_device(), means device is already gone (disconnected)
call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
return true to break the function not to send ATS Invalidation request to
the disconnected device blindly, thus avoid the further long time waiting
triggers the hard lockup.

safe removal & surprise removal

pciehp_ist()
   pciehp_handle_presence_or_link_change()
     pciehp_disable_slot()
       remove_board()
         pciehp_unconfigure_device(presence)

Tested-by: Haorong Ye <yehaorong@bytedance.com>
Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
---
 drivers/iommu/intel/pasid.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 715943531091..3d5ed27f39ef 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -480,6 +480,8 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
 	if (!info || !info->ats_enabled)
 		return;
 
+	if (pci_dev_is_disconnected(to_pci_dev(dev)))
+		return;
 	/*
 	 * When PASID 0 is used, it indicates RID2PASID(DMA request w/o PASID),
 	 * devTLB flush w/o PASID should be used. For non-zero PASID under
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever
  2023-12-28 17:05 [RFC PATCH v10 3/5] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
  2023-12-28 17:05 ` [RFC PATCH v10 4/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected Ethan Zhao
@ 2023-12-28 17:05 ` Ethan Zhao
  2023-12-28 17:10   ` Ethan Zhao
                     ` (2 more replies)
  2024-01-10  5:25 ` [RFC PATCH v10 3/5] PCI: make pci_dev_is_disconnected() helper public for other drivers Baolu Lu
  2 siblings, 3 replies; 16+ messages in thread
From: Ethan Zhao @ 2023-12-28 17:05 UTC (permalink / raw)
  To: kevin.tian, bhelgaas, baolu.lu, dwmw2, will, robin.murphy, lukas
  Cc: linux-pci, iommu, linux-kernel

When the ATS Invalidation request timeout happens, the qi_submit_sync()
will restart and loop for the invalidation request forever till it is
done, it will block another Invalidation thread such as the fq_timer
to issue invalidation request, cause the system lockup as following

[exception RIP: native_queued_spin_lock_slowpath+92]

RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002

RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000

RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0

RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000

R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000

R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980

ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018

(the left part of exception see the hotplug case of ATS capable device)

If one endpoint device just no response to the ATS Invalidation request,
but is not gone, it will bring down the whole system, to avoid such
case, don't try the timeout ATS Invalidation request forever.

Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
---
 drivers/iommu/intel/dmar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 0a8d628a42ee..9edb4b44afca 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
 	reclaim_free_desc(qi);
 	raw_spin_unlock_irqrestore(&qi->q_lock, flags);
 
-	if (rc == -EAGAIN)
+	if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != QI_DEIOTLB_TYPE)
 		goto restart;
 
 	if (iotlb_start_ktime)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever
  2023-12-28 17:05 ` [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever Ethan Zhao
@ 2023-12-28 17:10   ` Ethan Zhao
  2024-01-10  5:28   ` Baolu Lu
  2024-01-11  7:44   ` Ethan Zhao
  2 siblings, 0 replies; 16+ messages in thread
From: Ethan Zhao @ 2023-12-28 17:10 UTC (permalink / raw)
  To: kevin.tian, bhelgaas, baolu.lu, dwmw2, will, robin.murphy, lukas
  Cc: linux-pci, iommu, linux-kernel


On 12/29/2023 1:05 AM, Ethan Zhao wrote:
> When the ATS Invalidation request timeout happens, the qi_submit_sync()
> will restart and loop for the invalidation request forever till it is
> done, it will block another Invalidation thread such as the fq_timer
> to issue invalidation request, cause the system lockup as following
>
> [exception RIP: native_queued_spin_lock_slowpath+92]
>
> RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002
>
> RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000
>
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0
>
> RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000
>
> R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000
>
> R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980
>
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#12 [ffffb202f268cdc8] native_queued_spin_lock_slowpath at ffffffffa9d1025c

#13 [ffffb202f268cdc8] do_raw_spin_lock at ffffffffa9d121f1

#14 [ffffb202f268cdd8] _raw_spin_lock_irqsave at ffffffffaa51795b

#15 [ffffb202f268cdf8] iommu_flush_dev_iotlb at ffffffffaa20df48

#16 [ffffb202f268ce28] iommu_flush_iova at ffffffffaa20e182

#17 [ffffb202f268ce60] iova_domain_flush at ffffffffaa220e27

#18 [ffffb202f268ce70] fq_flush_timeout at ffffffffaa221c9d

#19 [ffffb202f268cea8] call_timer_fn at ffffffffa9d46661

#20 [ffffb202f268cf08] run_timer_softirq at ffffffffa9d47933

#21 [ffffb202f268cf98] __softirqentry_text_start at ffffffffaa8000e0

#22 [ffffb202f268cff0] asm_call_sysvec_on_stack at ffffffffaa60114f

This part get lost perhpas I append "----" here.


Thanks,

Ethan

>
> (the left part of exception see the hotplug case of ATS capable device)
>
> If one endpoint device just no response to the ATS Invalidation request,
> but is not gone, it will bring down the whole system, to avoid such
> case, don't try the timeout ATS Invalidation request forever.
>
> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
> ---
>   drivers/iommu/intel/dmar.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index 0a8d628a42ee..9edb4b44afca 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
>   	reclaim_free_desc(qi);
>   	raw_spin_unlock_irqrestore(&qi->q_lock, flags);
>   
> -	if (rc == -EAGAIN)
> +	if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != QI_DEIOTLB_TYPE)
>   		goto restart;
>   
>   	if (iotlb_start_ktime)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v10 4/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected
  2023-12-28 17:05 ` [RFC PATCH v10 4/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected Ethan Zhao
@ 2024-01-10  5:24   ` Baolu Lu
  2024-01-10  8:37     ` Ethan Zhao
  0 siblings, 1 reply; 16+ messages in thread
From: Baolu Lu @ 2024-01-10  5:24 UTC (permalink / raw)
  To: Ethan Zhao, kevin.tian, bhelgaas, dwmw2, will, robin.murphy, lukas
  Cc: baolu.lu, linux-pci, iommu, linux-kernel

On 12/29/23 1:05 AM, Ethan Zhao wrote:
> Except those aggressive hotplug cases - surprise remove a hotplug device
> while its safe removal is requested and handled in process by:
> 
> 1. pull it out directly.
> 2. turn off its power.
> 3. bring the link down.
> 4. just died there that moment.
> 
> etc, in a word, 'gone' or 'disconnected'.
> 
> Mostly are regular normal safe removal and surprise removal unplug.
> these hot unplug handling process could be optimized for fix the ATS
> Invalidation hang issue by calling pci_dev_is_disconnected() in function
> devtlb_invalidation_with_pasid() to check target device state to avoid
> sending meaningless ATS Invalidation request to iommu when device is gone.
> (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)
> 
> For safe removal, device wouldn't be removed untill the whole software
> handling process is done, it wouldn't trigger the hard lock up issue
> caused by too long ATS Invalidation timeout wait. In safe removal path,
> device state isn't set to pci_channel_io_perm_failure in
> pciehp_unconfigure_device() by checking 'presence' parameter, calling
> pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will return
> false there, wouldn't break the function.
> 
> For surprise removal, device state is set to pci_channel_io_perm_failure in
> pciehp_unconfigure_device(), means device is already gone (disconnected)
> call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
> return true to break the function not to send ATS Invalidation request to
> the disconnected device blindly, thus avoid the further long time waiting
> triggers the hard lockup.
> 
> safe removal & surprise removal
> 
> pciehp_ist()
>     pciehp_handle_presence_or_link_change()
>       pciehp_disable_slot()
>         remove_board()
>           pciehp_unconfigure_device(presence)
> 
> Tested-by: Haorong Ye <yehaorong@bytedance.com>
> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
> ---
>   drivers/iommu/intel/pasid.c | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 715943531091..3d5ed27f39ef 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -480,6 +480,8 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
>   	if (!info || !info->ats_enabled)
>   		return;
>   
> +	if (pci_dev_is_disconnected(to_pci_dev(dev)))
> +		return;

Why do you need the above after changes in PATCH 2/5? It's unnecessary
and not complete. We have other places where device TLB invalidation is
issued, right?

>   	/*
>   	 * When PASID 0 is used, it indicates RID2PASID(DMA request w/o PASID),
>   	 * devTLB flush w/o PASID should be used. For non-zero PASID under

Best regards,
baolu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v10 3/5] PCI: make pci_dev_is_disconnected() helper public for other drivers
  2023-12-28 17:05 [RFC PATCH v10 3/5] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
  2023-12-28 17:05 ` [RFC PATCH v10 4/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected Ethan Zhao
  2023-12-28 17:05 ` [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever Ethan Zhao
@ 2024-01-10  5:25 ` Baolu Lu
  2024-01-10  8:47   ` Ethan Zhao
  2 siblings, 1 reply; 16+ messages in thread
From: Baolu Lu @ 2024-01-10  5:25 UTC (permalink / raw)
  To: Ethan Zhao, kevin.tian, bhelgaas, dwmw2, will, robin.murphy, lukas
  Cc: baolu.lu, linux-pci, iommu, linux-kernel

On 12/29/23 1:05 AM, Ethan Zhao wrote:
> Make pci_dev_is_disconnected() public so that it can be called from
> Intel VT-d driver to quickly fix/workaround the surprise removal
> unplug hang issue for those ATS capable devices on PCIe switch downstream
> hotplug capable ports.
> 
> Beside pci_device_is_present() function, this one has no config space
> space access, so is light enough to optimize the normal pure surprise
> removal and safe removal flow.
> 
> Tested-by: Haorong Ye<yehaorong@bytedance.com>
> Signed-off-by: Ethan Zhao<haifeng.zhao@linux.intel.com>
> ---
>   drivers/pci/pci.h   | 5 -----
>   include/linux/pci.h | 5 +++++
>   2 files changed, 5 insertions(+), 5 deletions(-)

This should be moved before PATCH 2/5? Otherwise, PATCH 2/5 couldn't be
compiled.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever
  2023-12-28 17:05 ` [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever Ethan Zhao
  2023-12-28 17:10   ` Ethan Zhao
@ 2024-01-10  5:28   ` Baolu Lu
  2024-01-10  8:40     ` Ethan Zhao
  2024-01-11  7:44   ` Ethan Zhao
  2 siblings, 1 reply; 16+ messages in thread
From: Baolu Lu @ 2024-01-10  5:28 UTC (permalink / raw)
  To: Ethan Zhao, kevin.tian, bhelgaas, dwmw2, will, robin.murphy, lukas
  Cc: baolu.lu, linux-pci, iommu, linux-kernel

On 12/29/23 1:05 AM, Ethan Zhao wrote:
> When the ATS Invalidation request timeout happens, the qi_submit_sync()
> will restart and loop for the invalidation request forever till it is
> done, it will block another Invalidation thread such as the fq_timer
> to issue invalidation request, cause the system lockup as following
> 
> [exception RIP: native_queued_spin_lock_slowpath+92]
> 
> RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002
> 
> RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000
> 
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0
> 
> RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000
> 
> R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000
> 
> R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980
> 
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> 
> (the left part of exception see the hotplug case of ATS capable device)
> 
> If one endpoint device just no response to the ATS Invalidation request,
> but is not gone, it will bring down the whole system, to avoid such
> case, don't try the timeout ATS Invalidation request forever.
> 
> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
> ---
>   drivers/iommu/intel/dmar.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index 0a8d628a42ee..9edb4b44afca 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
>   	reclaim_free_desc(qi);
>   	raw_spin_unlock_irqrestore(&qi->q_lock, flags);
>   
> -	if (rc == -EAGAIN)
> +	if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != QI_DEIOTLB_TYPE)
>   		goto restart;
>   
>   	if (iotlb_start_ktime)

Above is also unnecessary if qi_check_fault() returns -ETIMEDOUT,
instead of -EAGAIN. Or did I miss anything?

Best regards,
baolu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v10 4/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected
  2024-01-10  5:24   ` Baolu Lu
@ 2024-01-10  8:37     ` Ethan Zhao
  2024-01-11  2:24       ` Baolu Lu
  0 siblings, 1 reply; 16+ messages in thread
From: Ethan Zhao @ 2024-01-10  8:37 UTC (permalink / raw)
  To: Baolu Lu, kevin.tian, bhelgaas, dwmw2, will, robin.murphy, lukas
  Cc: linux-pci, iommu, linux-kernel


On 1/10/2024 1:24 PM, Baolu Lu wrote:
> On 12/29/23 1:05 AM, Ethan Zhao wrote:
>> Except those aggressive hotplug cases - surprise remove a hotplug device
>> while its safe removal is requested and handled in process by:
>>
>> 1. pull it out directly.
>> 2. turn off its power.
>> 3. bring the link down.
>> 4. just died there that moment.
>>
>> etc, in a word, 'gone' or 'disconnected'.
>>
>> Mostly are regular normal safe removal and surprise removal unplug.
>> these hot unplug handling process could be optimized for fix the ATS
>> Invalidation hang issue by calling pci_dev_is_disconnected() in function
>> devtlb_invalidation_with_pasid() to check target device state to avoid
>> sending meaningless ATS Invalidation request to iommu when device is 
>> gone.
>> (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)
>>
>> For safe removal, device wouldn't be removed untill the whole software
>> handling process is done, it wouldn't trigger the hard lock up issue
>> caused by too long ATS Invalidation timeout wait. In safe removal path,
>> device state isn't set to pci_channel_io_perm_failure in
>> pciehp_unconfigure_device() by checking 'presence' parameter, calling
>> pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will 
>> return
>> false there, wouldn't break the function.
>>
>> For surprise removal, device state is set to 
>> pci_channel_io_perm_failure in
>> pciehp_unconfigure_device(), means device is already gone (disconnected)
>> call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
>> return true to break the function not to send ATS Invalidation 
>> request to
>> the disconnected device blindly, thus avoid the further long time 
>> waiting
>> triggers the hard lockup.
>>
>> safe removal & surprise removal
>>
>> pciehp_ist()
>>     pciehp_handle_presence_or_link_change()
>>       pciehp_disable_slot()
>>         remove_board()
>>           pciehp_unconfigure_device(presence)
>>
>> Tested-by: Haorong Ye <yehaorong@bytedance.com>
>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> ---
>>   drivers/iommu/intel/pasid.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
>> index 715943531091..3d5ed27f39ef 100644
>> --- a/drivers/iommu/intel/pasid.c
>> +++ b/drivers/iommu/intel/pasid.c
>> @@ -480,6 +480,8 @@ devtlb_invalidation_with_pasid(struct intel_iommu 
>> *iommu,
>>       if (!info || !info->ats_enabled)
>>           return;
>>   +    if (pci_dev_is_disconnected(to_pci_dev(dev)))
>> +        return;
>
> Why do you need the above after changes in PATCH 2/5? It's unnecessary
> and not complete. We have other places where device TLB invalidation is
> issued, right?

This one could be regarded as optimization, no need to trapped into rabbit

hole if we could predict the result. because the bad thing is we don't know

what response to us in the rabbit hole from third party switch (bridges will

feedback timeout to requester as PCIe spec mentioned if the endpoint is

gone).


Thanks,

Ethan

>
>>       /*
>>        * When PASID 0 is used, it indicates RID2PASID(DMA request w/o 
>> PASID),
>>        * devTLB flush w/o PASID should be used. For non-zero PASID under
>
> Best regards,
> baolu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever
  2024-01-10  5:28   ` Baolu Lu
@ 2024-01-10  8:40     ` Ethan Zhao
  2024-01-11  2:31       ` Baolu Lu
  0 siblings, 1 reply; 16+ messages in thread
From: Ethan Zhao @ 2024-01-10  8:40 UTC (permalink / raw)
  To: Baolu Lu, kevin.tian, bhelgaas, dwmw2, will, robin.murphy, lukas
  Cc: linux-pci, iommu, linux-kernel


On 1/10/2024 1:28 PM, Baolu Lu wrote:
> On 12/29/23 1:05 AM, Ethan Zhao wrote:
>> When the ATS Invalidation request timeout happens, the qi_submit_sync()
>> will restart and loop for the invalidation request forever till it is
>> done, it will block another Invalidation thread such as the fq_timer
>> to issue invalidation request, cause the system lockup as following
>>
>> [exception RIP: native_queued_spin_lock_slowpath+92]
>>
>> RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002
>>
>> RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000
>>
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0
>>
>> RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000
>>
>> R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000
>>
>> R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980
>>
>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>>
>> (the left part of exception see the hotplug case of ATS capable device)
>>
>> If one endpoint device just no response to the ATS Invalidation request,
>> but is not gone, it will bring down the whole system, to avoid such
>> case, don't try the timeout ATS Invalidation request forever.
>>
>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> ---
>>   drivers/iommu/intel/dmar.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
>> index 0a8d628a42ee..9edb4b44afca 100644
>> --- a/drivers/iommu/intel/dmar.c
>> +++ b/drivers/iommu/intel/dmar.c
>> @@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu *iommu, 
>> struct qi_desc *desc,
>>       reclaim_free_desc(qi);
>>       raw_spin_unlock_irqrestore(&qi->q_lock, flags);
>>   -    if (rc == -EAGAIN)
>> +    if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != 
>> QI_DEIOTLB_TYPE)
>>           goto restart;
>>         if (iotlb_start_ktime)
>
> Above is also unnecessary if qi_check_fault() returns -ETIMEDOUT,
> instead of -EAGAIN. Or did I miss anything?

It is pro if we fold it into qi_check_fault(), the con is we have to add

more parameter to qi_check_fault(), no need check invalidation type

of QI_DIOTLB_TYPE&QI_DEIOTLB_TYPE in qi_check_fault() ?


Thanks,

Ethan

>
> Best regards,
> baolu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v10 3/5] PCI: make pci_dev_is_disconnected() helper public for other drivers
  2024-01-10  5:25 ` [RFC PATCH v10 3/5] PCI: make pci_dev_is_disconnected() helper public for other drivers Baolu Lu
@ 2024-01-10  8:47   ` Ethan Zhao
  0 siblings, 0 replies; 16+ messages in thread
From: Ethan Zhao @ 2024-01-10  8:47 UTC (permalink / raw)
  To: Baolu Lu, kevin.tian, bhelgaas, dwmw2, will, robin.murphy, lukas
  Cc: linux-pci, iommu, linux-kernel


On 1/10/2024 1:25 PM, Baolu Lu wrote:
> On 12/29/23 1:05 AM, Ethan Zhao wrote:
>> Make pci_dev_is_disconnected() public so that it can be called from
>> Intel VT-d driver to quickly fix/workaround the surprise removal
>> unplug hang issue for those ATS capable devices on PCIe switch 
>> downstream
>> hotplug capable ports.
>>
>> Beside pci_device_is_present() function, this one has no config space
>> space access, so is light enough to optimize the normal pure surprise
>> removal and safe removal flow.
>>
>> Tested-by: Haorong Ye<yehaorong@bytedance.com>
>> Signed-off-by: Ethan Zhao<haifeng.zhao@linux.intel.com>
>> ---
>>   drivers/pci/pci.h   | 5 -----
>>   include/linux/pci.h | 5 +++++
>>   2 files changed, 5 insertions(+), 5 deletions(-)
>
> This should be moved before PATCH 2/5? Otherwise, PATCH 2/5 couldn't be

Seems the order was mixed when send-email was abort by network connection

and sent again.

[3/5] &[4/5]  goes to upset.  though the subject order is right.

anyway will resend in next version.


Thanks,

Ethan

> compiled.
>
> Best regards,
> baolu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v10 4/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected
  2024-01-10  8:37     ` Ethan Zhao
@ 2024-01-11  2:24       ` Baolu Lu
  2024-01-11  4:16         ` Ethan Zhao
  0 siblings, 1 reply; 16+ messages in thread
From: Baolu Lu @ 2024-01-11  2:24 UTC (permalink / raw)
  To: Ethan Zhao, kevin.tian, bhelgaas, dwmw2, will, robin.murphy, lukas
  Cc: baolu.lu, linux-pci, iommu, linux-kernel

On 1/10/24 4:37 PM, Ethan Zhao wrote:
> 
> On 1/10/2024 1:24 PM, Baolu Lu wrote:
>> On 12/29/23 1:05 AM, Ethan Zhao wrote:
>>> Except those aggressive hotplug cases - surprise remove a hotplug device
>>> while its safe removal is requested and handled in process by:
>>>
>>> 1. pull it out directly.
>>> 2. turn off its power.
>>> 3. bring the link down.
>>> 4. just died there that moment.
>>>
>>> etc, in a word, 'gone' or 'disconnected'.
>>>
>>> Mostly are regular normal safe removal and surprise removal unplug.
>>> these hot unplug handling process could be optimized for fix the ATS
>>> Invalidation hang issue by calling pci_dev_is_disconnected() in function
>>> devtlb_invalidation_with_pasid() to check target device state to avoid
>>> sending meaningless ATS Invalidation request to iommu when device is 
>>> gone.
>>> (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)
>>>
>>> For safe removal, device wouldn't be removed untill the whole software
>>> handling process is done, it wouldn't trigger the hard lock up issue
>>> caused by too long ATS Invalidation timeout wait. In safe removal path,
>>> device state isn't set to pci_channel_io_perm_failure in
>>> pciehp_unconfigure_device() by checking 'presence' parameter, calling
>>> pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will 
>>> return
>>> false there, wouldn't break the function.
>>>
>>> For surprise removal, device state is set to 
>>> pci_channel_io_perm_failure in
>>> pciehp_unconfigure_device(), means device is already gone (disconnected)
>>> call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will
>>> return true to break the function not to send ATS Invalidation 
>>> request to
>>> the disconnected device blindly, thus avoid the further long time 
>>> waiting
>>> triggers the hard lockup.
>>>
>>> safe removal & surprise removal
>>>
>>> pciehp_ist()
>>>     pciehp_handle_presence_or_link_change()
>>>       pciehp_disable_slot()
>>>         remove_board()
>>>           pciehp_unconfigure_device(presence)
>>>
>>> Tested-by: Haorong Ye <yehaorong@bytedance.com>
>>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>>> ---
>>>   drivers/iommu/intel/pasid.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
>>> index 715943531091..3d5ed27f39ef 100644
>>> --- a/drivers/iommu/intel/pasid.c
>>> +++ b/drivers/iommu/intel/pasid.c
>>> @@ -480,6 +480,8 @@ devtlb_invalidation_with_pasid(struct intel_iommu 
>>> *iommu,
>>>       if (!info || !info->ats_enabled)
>>>           return;
>>>   +    if (pci_dev_is_disconnected(to_pci_dev(dev)))
>>> +        return;
>>
>> Why do you need the above after changes in PATCH 2/5? It's unnecessary
>> and not complete. We have other places where device TLB invalidation is
>> issued, right?
> 
> This one could be regarded as optimization, no need to trapped into rabbit
> 
> hole if we could predict the result. because the bad thing is we don't know
> 
> what response to us in the rabbit hole from third party switch (bridges 
> will
> 
> feedback timeout to requester as PCIe spec mentioned if the endpoint is
> 
> gone).

The IOMMU hardware has its own timeout mechanism. This timeout might
happen if:

1) The link to the endpoint is broken, so the invalidation completion
    message is lost on the way.
2) The device has a longer timeout value, so the device is still busy
    with handling the cache invalidation when IOMMU's timeout is
    triggered.

Here, we are doing the following:

For Case 1, we return -ETIMEDOUT directly. For Case 2, we attempt to
retry.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever
  2024-01-10  8:40     ` Ethan Zhao
@ 2024-01-11  2:31       ` Baolu Lu
  2024-01-11  3:44         ` Ethan Zhao
  0 siblings, 1 reply; 16+ messages in thread
From: Baolu Lu @ 2024-01-11  2:31 UTC (permalink / raw)
  To: Ethan Zhao, kevin.tian, bhelgaas, dwmw2, will, robin.murphy, lukas
  Cc: baolu.lu, linux-pci, iommu, linux-kernel

On 1/10/24 4:40 PM, Ethan Zhao wrote:
> 
> On 1/10/2024 1:28 PM, Baolu Lu wrote:
>> On 12/29/23 1:05 AM, Ethan Zhao wrote:
>>> When the ATS Invalidation request timeout happens, the qi_submit_sync()
>>> will restart and loop for the invalidation request forever till it is
>>> done, it will block another Invalidation thread such as the fq_timer
>>> to issue invalidation request, cause the system lockup as following
>>>
>>> [exception RIP: native_queued_spin_lock_slowpath+92]
>>>
>>> RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002
>>>
>>> RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000
>>>
>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0
>>>
>>> RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000
>>>
>>> R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000
>>>
>>> R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980
>>>
>>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>>>
>>> (the left part of exception see the hotplug case of ATS capable device)
>>>
>>> If one endpoint device just no response to the ATS Invalidation request,
>>> but is not gone, it will bring down the whole system, to avoid such
>>> case, don't try the timeout ATS Invalidation request forever.
>>>
>>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>>> ---
>>>   drivers/iommu/intel/dmar.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
>>> index 0a8d628a42ee..9edb4b44afca 100644
>>> --- a/drivers/iommu/intel/dmar.c
>>> +++ b/drivers/iommu/intel/dmar.c
>>> @@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu *iommu, 
>>> struct qi_desc *desc,
>>>       reclaim_free_desc(qi);
>>>       raw_spin_unlock_irqrestore(&qi->q_lock, flags);
>>>   -    if (rc == -EAGAIN)
>>> +    if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != 
>>> QI_DEIOTLB_TYPE)
>>>           goto restart;
>>>         if (iotlb_start_ktime)
>>
>> Above is also unnecessary if qi_check_fault() returns -ETIMEDOUT,
>> instead of -EAGAIN. Or did I miss anything?
> 
> It is pro if we fold it into qi_check_fault(), the con is we have to add
> 
> more parameter to qi_check_fault(), no need check invalidation type
> 
> of QI_DIOTLB_TYPE&QI_DEIOTLB_TYPE in qi_check_fault() ?

No need to check the request type as multiple requests might be batched
together in a single call. This is also the reason why I asked you to
add a flag bit to this helper and make the intention explicit, say,

"This includes requests to interact with a PCI endpoint. The device may
  become unavailable at any time, so do not attempt to retry if ITE is
  detected and the device has gone away."

Best regards,
baolu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever
  2024-01-11  2:31       ` Baolu Lu
@ 2024-01-11  3:44         ` Ethan Zhao
  2024-01-11  6:09           ` Ethan Zhao
  0 siblings, 1 reply; 16+ messages in thread
From: Ethan Zhao @ 2024-01-11  3:44 UTC (permalink / raw)
  To: Baolu Lu, kevin.tian, bhelgaas, dwmw2, will, robin.murphy, lukas
  Cc: linux-pci, iommu, linux-kernel


On 1/11/2024 10:31 AM, Baolu Lu wrote:
> On 1/10/24 4:40 PM, Ethan Zhao wrote:
>>
>> On 1/10/2024 1:28 PM, Baolu Lu wrote:
>>> On 12/29/23 1:05 AM, Ethan Zhao wrote:
>>>> When the ATS Invalidation request timeout happens, the 
>>>> qi_submit_sync()
>>>> will restart and loop for the invalidation request forever till it is
>>>> done, it will block another Invalidation thread such as the fq_timer
>>>> to issue invalidation request, cause the system lockup as following
>>>>
>>>> [exception RIP: native_queued_spin_lock_slowpath+92]
>>>>
>>>> RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002
>>>>
>>>> RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000
>>>>
>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0
>>>>
>>>> RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000
>>>>
>>>> R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000
>>>>
>>>> R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980
>>>>
>>>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>>>>
>>>> (the left part of exception see the hotplug case of ATS capable 
>>>> device)
>>>>
>>>> If one endpoint device just no response to the ATS Invalidation 
>>>> request,
>>>> but is not gone, it will bring down the whole system, to avoid such
>>>> case, don't try the timeout ATS Invalidation request forever.
>>>>
>>>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>>>> ---
>>>>   drivers/iommu/intel/dmar.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
>>>> index 0a8d628a42ee..9edb4b44afca 100644
>>>> --- a/drivers/iommu/intel/dmar.c
>>>> +++ b/drivers/iommu/intel/dmar.c
>>>> @@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu *iommu, 
>>>> struct qi_desc *desc,
>>>>       reclaim_free_desc(qi);
>>>>       raw_spin_unlock_irqrestore(&qi->q_lock, flags);
>>>>   -    if (rc == -EAGAIN)
>>>> +    if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != 
>>>> QI_DEIOTLB_TYPE)
>>>>           goto restart;
>>>>         if (iotlb_start_ktime)
>>>
>>> Above is also unnecessary if qi_check_fault() returns -ETIMEDOUT,
>>> instead of -EAGAIN. Or did I miss anything?
>>
>> It is pro if we fold it into qi_check_fault(), the con is we have to add
>>
>> more parameter to qi_check_fault(), no need check invalidation type
>>
>> of QI_DIOTLB_TYPE&QI_DEIOTLB_TYPE in qi_check_fault() ?
>
> No need to check the request type as multiple requests might be batched
> together in a single call. This is also the reason why I asked you to
> add a flag bit to this helper and make the intention explicit, say,
>
> "This includes requests to interact with a PCI endpoint. The device may
>  become unavailable at any time, so do not attempt to retry if ITE is
>  detected and the device has gone away."

That is to say, the usage of this function finally becomes that way,

the user space interface could submit request with mixed iotlb & devtlb

invalidation together in the queue or seperated iotlb/devtlb invalidation.

we depend on caller to pass the QI_OPT_CHECK_ENDPOINT as option

bit to bail out even there is other iotlb invalidation in the same batch ?

then is user's call to choose retry the iotbl /devtlb invalidation or not.

if the caller hits the case the endpoint dead, the caller will get 
-ETIMEDOUT/

-ENOTCONN as returned value, but no real ITE in its interested list, to

tell userland user what happened, we fake a DMA_FSTS_ITE for user ?

given we wouldn't read a ITE from DMA_FSTS_REG that moment.


1. checking the first request for devTLB invalidation will miss chance to

    check endpoint state if the iotlb & devtlb invalidation were mixed.

    here explict option bit would be better.  while valid pdev does the

    same thing.  so if pdev passed, no need to check for QI_DIOTLB_TYPE

    || QI_EIOTLB_TYPE in qi_submit_sync() & qi_check_fault().


2. seems not perfect to drop or retry whole batch of request if there is

   devtlb invalidation within the batch, let caller to choose the later 
action

   is simpler than making the qi_submit_sync() too complex.


3. fake a DMA_FSTS_ITE for user's interested list on behalf of hardware

   is better than no error/ fault feedback to user even it is predicted not

   happened yet.


my cents.


Thanks,

Ethan



>
> Best regards,
> baolu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v10 4/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected
  2024-01-11  2:24       ` Baolu Lu
@ 2024-01-11  4:16         ` Ethan Zhao
  0 siblings, 0 replies; 16+ messages in thread
From: Ethan Zhao @ 2024-01-11  4:16 UTC (permalink / raw)
  To: Baolu Lu, kevin.tian, bhelgaas, dwmw2, will, robin.murphy, lukas
  Cc: linux-pci, iommu, linux-kernel


On 1/11/2024 10:24 AM, Baolu Lu wrote:
> On 1/10/24 4:37 PM, Ethan Zhao wrote:
>>
>> On 1/10/2024 1:24 PM, Baolu Lu wrote:
>>> On 12/29/23 1:05 AM, Ethan Zhao wrote:
>>>> Except those aggressive hotplug cases - surprise remove a hotplug 
>>>> device
>>>> while its safe removal is requested and handled in process by:
>>>>
>>>> 1. pull it out directly.
>>>> 2. turn off its power.
>>>> 3. bring the link down.
>>>> 4. just died there that moment.
>>>>
>>>> etc, in a word, 'gone' or 'disconnected'.
>>>>
>>>> Mostly are regular normal safe removal and surprise removal unplug.
>>>> these hot unplug handling process could be optimized for fix the ATS
>>>> Invalidation hang issue by calling pci_dev_is_disconnected() in 
>>>> function
>>>> devtlb_invalidation_with_pasid() to check target device state to avoid
>>>> sending meaningless ATS Invalidation request to iommu when device 
>>>> is gone.
>>>> (see IMPLEMENTATION NOTE in PCIe spec r6.1 section 10.3.1)
>>>>
>>>> For safe removal, device wouldn't be removed untill the whole software
>>>> handling process is done, it wouldn't trigger the hard lock up issue
>>>> caused by too long ATS Invalidation timeout wait. In safe removal 
>>>> path,
>>>> device state isn't set to pci_channel_io_perm_failure in
>>>> pciehp_unconfigure_device() by checking 'presence' parameter, calling
>>>> pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() will 
>>>> return
>>>> false there, wouldn't break the function.
>>>>
>>>> For surprise removal, device state is set to 
>>>> pci_channel_io_perm_failure in
>>>> pciehp_unconfigure_device(), means device is already gone 
>>>> (disconnected)
>>>> call pci_dev_is_disconnected() in devtlb_invalidation_with_pasid() 
>>>> will
>>>> return true to break the function not to send ATS Invalidation 
>>>> request to
>>>> the disconnected device blindly, thus avoid the further long time 
>>>> waiting
>>>> triggers the hard lockup.
>>>>
>>>> safe removal & surprise removal
>>>>
>>>> pciehp_ist()
>>>>     pciehp_handle_presence_or_link_change()
>>>>       pciehp_disable_slot()
>>>>         remove_board()
>>>>           pciehp_unconfigure_device(presence)
>>>>
>>>> Tested-by: Haorong Ye <yehaorong@bytedance.com>
>>>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>>>> ---
>>>>   drivers/iommu/intel/pasid.c | 2 ++
>>>>   1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
>>>> index 715943531091..3d5ed27f39ef 100644
>>>> --- a/drivers/iommu/intel/pasid.c
>>>> +++ b/drivers/iommu/intel/pasid.c
>>>> @@ -480,6 +480,8 @@ devtlb_invalidation_with_pasid(struct 
>>>> intel_iommu *iommu,
>>>>       if (!info || !info->ats_enabled)
>>>>           return;
>>>>   +    if (pci_dev_is_disconnected(to_pci_dev(dev)))
>>>> +        return;
>>>
>>> Why do you need the above after changes in PATCH 2/5? It's unnecessary
>>> and not complete. We have other places where device TLB invalidation is
>>> issued, right?
>>
>> This one could be regarded as optimization, no need to trapped into 
>> rabbit
>>
>> hole if we could predict the result. because the bad thing is we 
>> don't know
>>
>> what response to us in the rabbit hole from third party switch 
>> (bridges will
>>
>> feedback timeout to requester as PCIe spec mentioned if the endpoint is
>>
>> gone).
>
> The IOMMU hardware has its own timeout mechanism. This timeout might
> happen if:
>
> 1) The link to the endpoint is broken, so the invalidation completion
>    message is lost on the way.
> 2) The device has a longer timeout value, so the device is still busy
>    with handling the cache invalidation when IOMMU's timeout is
>    triggered.
>
> Here, we are doing the following:
>
> For Case 1, we return -ETIMEDOUT directly. For Case 2, we attempt to
> retry.

Yes, Intel VT-d will setup a hardware timer if devtlb invalidation 
issued and

wait descripton submitted, that hardware timer is limited resource, will 
tick

till gets the timeout if the endpoint is dead/broken etc.

even we bail out in qi_submit_sync() for case #1, the hardware timer still

ticks there, if many of such request issued, the iommu will run out of

hardware resouce.  so we should avoid such case as possible as we could.

though the Intel VT-d says the timeout value will not more than "

PCIe read timeout", but in fact, we got more than 12 seconds before get

ITE.


for case #2, the retry has pre-conditon as I know, there is fault, cleared.

So I call it "rabbit hole".

To run into that rabbit hole is last choice, not best.


Thanks,

Ethan

>
> Best regards,
> baolu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever
  2024-01-11  3:44         ` Ethan Zhao
@ 2024-01-11  6:09           ` Ethan Zhao
  0 siblings, 0 replies; 16+ messages in thread
From: Ethan Zhao @ 2024-01-11  6:09 UTC (permalink / raw)
  To: Baolu Lu, kevin.tian, bhelgaas, dwmw2, will, robin.murphy, lukas
  Cc: linux-pci, iommu, linux-kernel


On 1/11/2024 11:44 AM, Ethan Zhao wrote:
>
> On 1/11/2024 10:31 AM, Baolu Lu wrote:
>> On 1/10/24 4:40 PM, Ethan Zhao wrote:
>>>
>>> On 1/10/2024 1:28 PM, Baolu Lu wrote:
>>>> On 12/29/23 1:05 AM, Ethan Zhao wrote:
>>>>> When the ATS Invalidation request timeout happens, the 
>>>>> qi_submit_sync()
>>>>> will restart and loop for the invalidation request forever till it is
>>>>> done, it will block another Invalidation thread such as the fq_timer
>>>>> to issue invalidation request, cause the system lockup as following
>>>>>
>>>>> [exception RIP: native_queued_spin_lock_slowpath+92]
>>>>>
>>>>> RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002
>>>>>
>>>>> RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000
>>>>>
>>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0
>>>>>
>>>>> RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000
>>>>>
>>>>> R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000
>>>>>
>>>>> R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980
>>>>>
>>>>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>>>>>
>>>>> (the left part of exception see the hotplug case of ATS capable 
>>>>> device)
>>>>>
>>>>> If one endpoint device just no response to the ATS Invalidation 
>>>>> request,
>>>>> but is not gone, it will bring down the whole system, to avoid such
>>>>> case, don't try the timeout ATS Invalidation request forever.
>>>>>
>>>>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>>>>> ---
>>>>>   drivers/iommu/intel/dmar.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
>>>>> index 0a8d628a42ee..9edb4b44afca 100644
>>>>> --- a/drivers/iommu/intel/dmar.c
>>>>> +++ b/drivers/iommu/intel/dmar.c
>>>>> @@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu 
>>>>> *iommu, struct qi_desc *desc,
>>>>>       reclaim_free_desc(qi);
>>>>>       raw_spin_unlock_irqrestore(&qi->q_lock, flags);
>>>>>   -    if (rc == -EAGAIN)
>>>>> +    if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != 
>>>>> QI_DEIOTLB_TYPE)
>>>>>           goto restart;
>>>>>         if (iotlb_start_ktime)
>>>>
>>>> Above is also unnecessary if qi_check_fault() returns -ETIMEDOUT,
>>>> instead of -EAGAIN. Or did I miss anything?
>>>
>>> It is pro if we fold it into qi_check_fault(), the con is we have to 
>>> add
>>>
>>> more parameter to qi_check_fault(), no need check invalidation type
>>>
>>> of QI_DIOTLB_TYPE&QI_DEIOTLB_TYPE in qi_check_fault() ?
>>
>> No need to check the request type as multiple requests might be batched
>> together in a single call. This is also the reason why I asked you to
>> add a flag bit to this helper and make the intention explicit, say,
>>
>> "This includes requests to interact with a PCI endpoint. The device may
>>  become unavailable at any time, so do not attempt to retry if ITE is
>>  detected and the device has gone away."
>
> That is to say, the usage of this function finally becomes that way,
>
> the user space interface could submit request with mixed iotlb & devtlb
>
> invalidation together in the queue or seperated iotlb/devtlb 
> invalidation.
>
> we depend on caller to pass the QI_OPT_CHECK_ENDPOINT as option
>
> bit to bail out even there is other iotlb invalidation in the same 
> batch ?
>
> then is user's call to choose retry the iotbl /devtlb invalidation or 
> not.
>
> if the caller hits the case the endpoint dead, the caller will get 
> -ETIMEDOUT/
>
> -ENOTCONN as returned value, but no real ITE in its interested list, to
>
> tell userland user what happened, we fake a DMA_FSTS_ITE for user ?
>
> given we wouldn't read a ITE from DMA_FSTS_REG that moment.
>
>
> 1. checking the first request for devTLB invalidation will miss chance to
>
>    check endpoint state if the iotlb & devtlb invalidation were mixed.
>
>    here explict option bit would be better.  while valid pdev does the
>
>    same thing.  so if pdev passed, no need to check for QI_DIOTLB_TYPE
>
>    || QI_EIOTLB_TYPE in qi_submit_sync() & qi_check_fault().
>
>
> 2. seems not perfect to drop or retry whole batch of request if there is
>
>   devtlb invalidation within the batch, let caller to choose the later 
> action
>
>   is simpler than making the qi_submit_sync() too complex.
>
>
> 3. fake a DMA_FSTS_ITE for user's interested list on behalf of hardware
>
>   is better than no error/ fault feedback to user even it is predicted 
> not
>
>   happened yet.
>
>
See Intel VT-d spec r4.1, section 4.3 & section 6.5.2.10

We should keep the original retry logic intact, in order to not break

the fault handling flow. only breaks the loop when endpoint device

is gone with returned error code to reflect the reality.  not -ETIMEOUT,

that is not triggered yet, but will hit ITE later about previous request,

and software should handle it smoothly to let the other subsequent

requests could be done in next try.


Thanks,

Ethan

> my cents.
>
>
> Thanks,
>
> Ethan
>
>
>
>>
>> Best regards,
>> baolu
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever
  2023-12-28 17:05 ` [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever Ethan Zhao
  2023-12-28 17:10   ` Ethan Zhao
  2024-01-10  5:28   ` Baolu Lu
@ 2024-01-11  7:44   ` Ethan Zhao
  2 siblings, 0 replies; 16+ messages in thread
From: Ethan Zhao @ 2024-01-11  7:44 UTC (permalink / raw)
  To: kevin.tian, bhelgaas, baolu.lu, dwmw2, will, robin.murphy, lukas
  Cc: linux-pci, iommu, linux-kernel


On 12/29/2023 1:05 AM, Ethan Zhao wrote:
> When the ATS Invalidation request timeout happens, the qi_submit_sync()
> will restart and loop for the invalidation request forever till it is
> done, it will block another Invalidation thread such as the fq_timer
> to issue invalidation request, cause the system lockup as following
>
> [exception RIP: native_queued_spin_lock_slowpath+92]
>
> RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002
>
> RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000
>
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0
>
> RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000
>
> R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000
>
> R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980
>
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>
> (the left part of exception see the hotplug case of ATS capable device)
>
> If one endpoint device just no response to the ATS Invalidation request,
> but is not gone, it will bring down the whole system, to avoid such
> case, don't try the timeout ATS Invalidation request forever.
>
> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
> ---
>   drivers/iommu/intel/dmar.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index 0a8d628a42ee..9edb4b44afca 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
>   	reclaim_free_desc(qi);
>   	raw_spin_unlock_irqrestore(&qi->q_lock, flags);
>   
> -	if (rc == -EAGAIN)
> +	if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != QI_DEIOTLB_TYPE)
>   		goto restart;
>   
>   	if (iotlb_start_ktime)

mark, only break the loop when the sid of ITE is the same as current target

pdev.  need check the target dev is pf or vf.

The ITE is possible left by previous devtlb invalidation request for 
other device.


Thanks,

Ethan


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-01-11  7:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-28 17:05 [RFC PATCH v10 3/5] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
2023-12-28 17:05 ` [RFC PATCH v10 4/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected Ethan Zhao
2024-01-10  5:24   ` Baolu Lu
2024-01-10  8:37     ` Ethan Zhao
2024-01-11  2:24       ` Baolu Lu
2024-01-11  4:16         ` Ethan Zhao
2023-12-28 17:05 ` [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever Ethan Zhao
2023-12-28 17:10   ` Ethan Zhao
2024-01-10  5:28   ` Baolu Lu
2024-01-10  8:40     ` Ethan Zhao
2024-01-11  2:31       ` Baolu Lu
2024-01-11  3:44         ` Ethan Zhao
2024-01-11  6:09           ` Ethan Zhao
2024-01-11  7:44   ` Ethan Zhao
2024-01-10  5:25 ` [RFC PATCH v10 3/5] PCI: make pci_dev_is_disconnected() helper public for other drivers Baolu Lu
2024-01-10  8:47   ` Ethan Zhao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.