From: Vasant Hegde <vasant.hegde@amd.com>
To: Bjorn Helgaas <helgaas@kernel.org>, Baolu Lu <baolu.lu@linux.intel.com>
Cc: "Bjorn Helgaas" <bhelgaas@google.com>,
"Joerg Roedel" <jroedel@suse.de>,
"Matt Fagnani" <matt.fagnani@bell.net>,
"Christian König" <christian.koenig@amd.com>,
"Jason Gunthorpe" <jgg@nvidia.com>,
"Kevin Tian" <kevin.tian@intel.com>,
"Tony Zhu" <tony.zhu@intel.com>,
linux-pci@vger.kernel.org, iommu@lists.linux.dev,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 1/1] PCI: Add translated request only flag for pci_enable_pasid()
Date: Wed, 1 Feb 2023 10:48:01 +0530 [thread overview]
Message-ID: <bf0f165c-7952-77f5-3759-8487498ed186@amd.com> (raw)
In-Reply-To: <20230201001419.GA1776086@bhelgaas>
Bjorn,
On 2/1/2023 5:44 AM, Bjorn Helgaas wrote:
> On Tue, Jan 31, 2023 at 08:56:13PM +0800, Baolu Lu wrote:
>> On 2023/1/31 2:38, Bjorn Helgaas wrote:
>>>> PCI: Add translated request only flag for pci_enable_pasid()
>>>>
>>>> The PCIe fabric routes Memory Requests based on the TLP address, ignoring
>>>> the PASID. In order to ensure system integrity, commit 201007ef707a ("PCI:
>>>> Enable PASID only when ACS RR & UF enabled on upstream path") requires
>>>> some ACS features being supported on device's upstream path when enabling
>>>> PCI/PASID.
>
> Looking up 201007ef707a to see what ensuring system integrity means,
> it prevents Memory Requests with PASID, which should always be routed
> to the RC, from being mistakenly routed as peer-to-peer requests.
>
>>>> However, above change causes the Linux kernel boots to black screen on a
>>>> system with below graphic device:
>>>
>>> We need a PCIe concept-level description of the issue first, i.e., in
>>> terms of DMA, PASID, ACS, etc. Then we can mention the AMD GPU issue
>>> as an instance.
>>
>> How about below description?
>
> Thanks, this is exactly the sort of thing I'm looking for. But my
> understanding of ATS/PRI/PASID is weak, so I'm still working through
> this. Tell me when I say something wrong below...
>
>> PCIe endpoints can use ATS to request DMA remapping hardware to
>> translate an IOVA to its mapped physical address. If the translation is
>> missing or the permissions are insufficient, the PRI is used to trigger
>> an I/O page fault. The IOMMU driver will fill the mapping with desired
>> permissions and return the translated address to the device.
>
> In PCIe spec language, I think you're saying that a PCIe Function may
> contain an ATC. If the ATC Capability Enable bit is set, the Function
> can issue Translation Requests.
>
> The TA (aka IOMMU) will respond with a Translation Completion. If the
> Completion is a CplD, it contains the translated address and the
> Function can store the entry in its ATC. I assume the I/O page fault
> case corresponds to a Cpl (with no data) meaning that the TA could not
> translate the address.
>
> If the TA doesn't have a mapping with the desired permissions, and the
> Function's Page Request Capability Enable bit is set, it may issue a
> Page Request Message. It's up to the TA/IOMMU to make this message
> visible to the OS, which can make the page resident, create an IOMMU
> mapping, and enable a PRG Response Message. After the Function
> receives the PRG Response Message, it would issue another Translation
> Request.
>
>> The translated address is specified by the IOMMU driver. The IOMMU
>> driver ensures that the address is a DMA buffer address instead of any
>> P2P address in the PCI fabric. Therefore, any translated memory request
>> will eventually be routed to IOMMU regardless of whether there is ACS
>> control in the up-streaming path.
>
> A Memory Request with an address that is not a P2P address, i.e., it
> is not contained in any bridge aperture, will *always* be routed
> toward the RC, won't it? Isn't that the case regardless of whether
> the address is translated or untranslated, and even regardless of ACS?
>
> IIUC, ACS basically causes peer-to-peer requests to be routed upstream
> instead of directly to the peer.
>
> OK, reading this again, I realize that I just restated exactly what
> you had already written, sorry about that.
>
>> AMD GPU is one of those devices.
>
> I guess you mean the AMD GPU has ATS, PRI, and PASID Capabilities?
> And furthermore, that the GPU *always* uses Translated addresses with
> PASID?
>
> So I guess what's going on here is that if:
>
> - A device only uses PASID with Translated addresses, and
> - those Translated addresses are never P2P addresses, then
> - those transactions will always be routed to the RC.
>
> And this applies even if there is no ACS or ACS doesn't support
> PCI_ACS_RR and PCI_ACS_UF.
>
> The black screen happens because ... ?
>
> What can we include in the commit log to help people find this fix? I
> see these in the bugzilla:
>
> WARNING: CPU: 0 PID: 477 at drivers/pci/ats.c:251 pci_disable_pri+0x75/0x80
> WARNING: CPU: 0 PID: 477 at drivers/pci/ats.c:419 pci_disable_pasid+0x45/0x50
>
> (These look like defects in pdev_pri_ats_enable(), so really just
> distractions)
Right. We have fixed error handling path in this function. Joerg has queued the fix.
>
> kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:9874
> kfd kfd: amdgpu: device 1002:9874 NOT added due to errors
> BUG: kernel NULL pointer dereference, address: 0000000000000058
> RIP: 0010:report_iommu_fault+0x11/0x90
>
> I couldn't figure out the NULL pointer dereference. I expected it to
> be from a BUG() or similar in report_iommu_fault(), but I don't see
> that.
Its coming from below path :
- During system boot IOMMU allocates default domain
- AMD IOMMU v2 module (iommu_v2) created another domain and tried to attach
devices to new domain.
- In device attachment path (amd_iommu_attach_device()) it first detaches
device from current domain and tries to attach device to new domain. Here device
attachment failed as PASID enable check failed.
- We didn't recover from above failure (I have proposed fix for this [1]).
- So device to domain attachment is not in consistent state.
- Device tried to do DMA and hit IO fault. Above NULL pointer derefence is
coming from that path as dev to domain setup is not proper.
[1]
https://lore.kernel.org/linux-iommu/20230113135956.5788-1-vasant.hegde@amd.com/T/#t
-Vasant
next prev parent reply other threads:[~2023-02-01 5:18 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-14 7:34 [PATCH v3 1/1] PCI: Add translated request only flag for pci_enable_pasid() Lu Baolu
2023-01-16 15:42 ` Jason Gunthorpe
2023-01-27 11:30 ` Linux kernel regression tracking (Thorsten Leemhuis)
2023-01-27 17:30 ` Bjorn Helgaas
2023-01-28 7:52 ` Tian, Kevin
2023-01-29 8:42 ` Baolu Lu
2023-01-30 18:38 ` Bjorn Helgaas
2023-01-30 18:47 ` Jason Gunthorpe
2023-01-31 23:50 ` Bjorn Helgaas
2023-02-01 2:28 ` Jason Gunthorpe
2023-01-31 12:25 ` Baolu Lu
2023-02-01 16:58 ` Bjorn Helgaas
2023-02-02 3:08 ` Baolu Lu
2023-02-02 20:12 ` Bjorn Helgaas
2023-02-02 20:45 ` Jason Gunthorpe
2023-02-03 18:20 ` Bjorn Helgaas
2023-02-03 18:52 ` Jason Gunthorpe
2023-02-06 4:28 ` Tian, Kevin
2023-01-31 12:56 ` Baolu Lu
2023-02-01 0:14 ` Bjorn Helgaas
2023-02-01 2:36 ` Jason Gunthorpe
2023-02-01 14:09 ` Jonathan Cameron
2023-02-01 5:18 ` Vasant Hegde [this message]
2023-02-01 5:51 ` Baolu Lu
2023-02-01 5:59 ` Baolu Lu
2023-02-01 6:31 ` Baolu Lu
2023-02-01 14:22 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bf0f165c-7952-77f5-3759-8487498ed186@amd.com \
--to=vasant.hegde@amd.com \
--cc=baolu.lu@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=christian.koenig@amd.com \
--cc=helgaas@kernel.org \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=jroedel@suse.de \
--cc=kevin.tian@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=matt.fagnani@bell.net \
--cc=tony.zhu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).