All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ethan Zhao <haifeng.zhao@linux.intel.com>
To: "Tian, Kevin" <kevin.tian@intel.com>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>,
	"dwmw2@infradead.org" <dwmw2@infradead.org>,
	"will@kernel.org" <will@kernel.org>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"lukas@wunner.de" <lukas@wunner.de>
Cc: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH v9 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever
Date: Thu, 28 Dec 2023 21:10:08 +0800	[thread overview]
Message-ID: <bb3a8a4c-6dad-4347-9076-0f28d1e23de3@linux.intel.com> (raw)
In-Reply-To: <BN9PR11MB527651C1A108721CFF057BCF8C9EA@BN9PR11MB5276.namprd11.prod.outlook.com>


On 12/28/2023 4:38 PM, Tian, Kevin wrote:
>> From: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> Sent: Thursday, December 28, 2023 8:17 AM
>>
>> When the ATS Invalidation request timeout happens, the qi_submit_sync()
>> will restart and loop for the invalidation request forever till it is
>> done, it will block another Invalidation thread such as the fq_timer
>> to issue invalidation request, cause the system lockup as following
>>
>> [exception RIP: native_queued_spin_lock_slowpath+92]
>>
>> RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002
>>
>> RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000
>>
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0
>>
>> RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000
>>
>> R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000
>>
>> R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980
>>
>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>>
>> #12 [ffffb202f268cdc8] native_queued_spin_lock_slowpath at
>> ffffffffa9d1025c
>>
>> #13 [ffffb202f268cdc8] do_raw_spin_lock at ffffffffa9d121f1
>>
>> #14 [ffffb202f268cdd8] _raw_spin_lock_irqsave at ffffffffaa51795b
>>
>> #15 [ffffb202f268cdf8] iommu_flush_dev_iotlb at ffffffffaa20df48
>>
>> #16 [ffffb202f268ce28] iommu_flush_iova at ffffffffaa20e182
>>
>> #17 [ffffb202f268ce60] iova_domain_flush at ffffffffaa220e27
>>
>> #18 [ffffb202f268ce70] fq_flush_timeout at ffffffffaa221c9d
>>
>> #19 [ffffb202f268cea8] call_timer_fn at ffffffffa9d46661
>>
>> #20 [ffffb202f268cf08] run_timer_softirq at ffffffffa9d47933
>>
>> #21 [ffffb202f268cf98] __softirqentry_text_start at ffffffffaa8000e0
>>
>> #22 [ffffb202f268cff0] asm_call_sysvec_on_stack at ffffffffaa60114f
>> --- ---
>> (the left part of exception see the hotplug case of ATS capable device)
>>
>> If one endpoint device just no response to the ATS Invalidation request,
>> but is not gone, it will bring down the whole system, to avoid such
>> case, don't try the timeout ATS Invalidation request forever.
>>
>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> ---
>>   drivers/iommu/intel/dmar.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
>> index 76903a8bf963..206ab0b7294f 100644
>> --- a/drivers/iommu/intel/dmar.c
>> +++ b/drivers/iommu/intel/dmar.c
>> @@ -1457,7 +1457,7 @@ int qi_submit_sync(struct intel_iommu *iommu,
>> struct qi_desc *desc,
>>   	reclaim_free_desc(qi);
>>   	raw_spin_unlock_irqrestore(&qi->q_lock, flags);
>>
>> -	if (rc == -EAGAIN)
>> +	if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type !=
>> QI_DEIOTLB_TYPE)
>>   		goto restart;
>>
> this change is moot.
>
> -EAGAIN is set only when hardware detects a ATS invalidation completion
> timeout in qi_check_fault(). so above just essentially kills the restart logic.

This change is intended to break the restar login when device-TLB

invalidation timeout happens, we don't know how long the ITE took

if the device is just no reponse.

>
> I'd wait for the maintainer of this driver to comment. this part doesn't
> look good but there might be some history reason so carefulness must
> be paid.

I would like to know the reason a hole is left here to hang the driver

forever.

Thanks,

Ethan


  reply	other threads:[~2023-12-28 13:10 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-28  0:16 [RFC PATCH v8 0/5] fix vt-d hard lockup when hotplug ATS capable device Ethan Zhao
2023-12-28  0:16 ` [RFC PATCH v9 1/5] iommu/vt-d: add flush_target_dev member to struct intel_iommu and pass device info to all ATS Invalidation functions Ethan Zhao
2023-12-28  8:10   ` Tian, Kevin
2023-12-28 13:20     ` Ethan Zhao
2023-12-28  0:16 ` [RFC PATCH v9 2/5] iommu/vt-d: break out ATS Invalidation if target device is gone Ethan Zhao
2023-12-28  8:30   ` Tian, Kevin
2023-12-28 13:03     ` Ethan Zhao
2023-12-29  8:06       ` Tian, Kevin
2023-12-29  9:07         ` Ethan Zhao
2023-12-29  9:19         ` Ethan Zhao
2023-12-28 13:35     ` Ethan Zhao
2023-12-28  0:16 ` [RFC PATCH v9 3/5] PCI: make pci_dev_is_disconnected() helper public for other drivers Ethan Zhao
2023-12-28  0:16 ` [RFC PATCH v9 4/5] iommu/vt-d: don't issue ATS Invalidation request when device is disconnected Ethan Zhao
2023-12-28  0:16 ` [RFC PATCH v9 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever Ethan Zhao
2023-12-28  8:38   ` Tian, Kevin
2023-12-28 13:10     ` Ethan Zhao [this message]
2023-12-29  8:17       ` Tian, Kevin
2023-12-29  9:24         ` Ethan Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb3a8a4c-6dad-4347-9076-0f28d1e23de3@linux.intel.com \
    --to=haifeng.zhao@linux.intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux.dev \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.