From: Bjorn Helgaas <helgaas@kernel.org>
To: Baolu Lu <baolu.lu@linux.intel.com>
Cc: "Felix Kuehling" <felix.kuehling@amd.com>,
"Deucher, Alexander" <Alexander.Deucher@amd.com>,
"Hegde, Vasant" <Vasant.Hegde@amd.com>,
"Matt Fagnani" <matt.fagnani@bell.net>,
"Thorsten Leemhuis" <regressions@leemhuis.info>,
"Joerg Roedel" <jroedel@suse.de>,
"Jason Gunthorpe" <jgg@nvidia.com>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
LKML <linux-kernel@vger.kernel.org>,
"regressions@lists.linux.dev" <regressions@lists.linux.dev>,
"Linux PCI" <linux-pci@vger.kernel.org>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Christian König" <christian.koenig@amd.com>,
"Pan, Xinhui" <Xinhui.Pan@amd.com>,
amd-gfx@lists.freedesktop.org
Subject: Re: [regression, bisected, pci/iommu] Bug 216865 - Black screen when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled
Date: Wed, 15 Feb 2023 09:39:13 -0600 [thread overview]
Message-ID: <20230215153913.GA3189407@bhelgaas> (raw)
In-Reply-To: <7bbc0f65-e1c6-f388-29a8-390b8c9c92c8@linux.intel.com>
[+cc Christian, Xinhui, amd-gfx]
On Fri, Jan 06, 2023 at 01:48:11PM +0800, Baolu Lu wrote:
> On 1/5/23 11:27 PM, Felix Kuehling wrote:
> > Am 2023-01-05 um 09:46 schrieb Deucher, Alexander:
> > > > -----Original Message-----
> > > > From: Hegde, Vasant <Vasant.Hegde@amd.com>
> > > > On 1/5/2023 4:07 PM, Baolu Lu wrote:
> > > > > On 2023/1/5 18:27, Vasant Hegde wrote:
> > > > > > On 1/5/2023 6:39 AM, Matt Fagnani wrote:
> > > > > > > I built 6.2-rc2 with the patch applied. The same black
> > > > > > > screen problem happened with 6.2-rc2 with the patch. I
> > > > > > > tried to use early kdump with 6.2-rc2 with the patch
> > > > > > > twice by panicking the kernel with sysrq+alt+c after the
> > > > > > > black screen happened. The system rebooted after about
> > > > > > > 10-20 seconds both times, but no kdump and dmesg files
> > > > > > > were saved in /var/crash. I'm attaching the lspci -vvv
> > > > > > > output as requested. ...
> > > > > > Looking into lspci output, it doesn't list ACS feature
> > > > > > for Graphics card. So with your fix it didn't enable PASID
> > > > > > and hence it failed to boot. ...
> > > > > So do you mind telling why does the PASID need to be enabled
> > > > > for the graphic device? Or in another word, what does the
> > > > > graphic driver use the PASID for? ...
> > > The GPU driver uses the pasid for shared virtual memory between
> > > the CPU and GPU. I.e., so that the user apps can use the same
> > > virtual address space on the GPU and the CPU. It also uses
> > > pasid to take advantage of recoverable device page faults using
> > > PRS. ...
> > Agreed. This applies to GPU computing on some older AMD APUs that
> > take advantage of memory coherence and IOMMUv2 address translation
> > to create a shared virtual address space between the CPU and GPU.
> > In this case it seems to be a Carrizo APU. It is also true for
> > Raven APUs. ...
> Thanks for the explanation.
>
> This is actually the problem that commit 201007ef707a was trying to
> fix. The PCIe fabric routes Memory Requests based on the TLP
> address, ignoring any PASID (PCIe r6.0, sec 2.2.10.4), so a TLP with
> PASID that should go upstream to the IOMMU may instead be routed as
> a P2P Request if its address falls in a bridge window.
>
> In SVA case, the IOMMU shares the address space of a user
> application. The user application side has no knowledge about the
> PCI bridge window. It is entirely possible that the device is
> programed with a P2P address and results in a disaster.
Is this stalled? We explored the idea of changing the PCI core so
that for devices that use ATS/PRI, we could enable PASID without
checking for ACS [1], but IIUC we ultimately concluded that it was
based on a misunderstanding of how ATS Translation Requests are routed
and that an AMD driver change would be required [2].
So it seems like we still have this regression, and we're running out
of time before v6.2.
[1] https://lore.kernel.org/all/20230114073420.759989-1-baolu.lu@linux.intel.com/
[2] https://lore.kernel.org/all/Y91X9MeCOsa67CC6@nvidia.com/
next prev parent reply other threads:[~2023-02-15 15:39 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-30 8:18 [regression, bisected, pci/iommu] Bug 216865 - Black screen when amdgpu started during 6.2-rc1 boot with AMD IOMMU enabled Thorsten Leemhuis
2023-01-03 10:30 ` Joerg Roedel
2023-01-03 19:06 ` Matt Fagnani
[not found] ` <5aa0e698-f715-0481-36e5-46505024ebc1@bell.net>
2023-01-04 6:54 ` Baolu Lu
2023-01-04 15:50 ` Vasant Hegde
2023-01-05 1:09 ` Matt Fagnani
2023-01-05 10:27 ` Vasant Hegde
2023-01-05 10:37 ` Baolu Lu
2023-01-05 10:46 ` Vasant Hegde
2023-01-05 14:46 ` Deucher, Alexander
2023-01-05 15:27 ` Felix Kuehling
2023-01-06 5:48 ` Baolu Lu
2023-02-15 15:39 ` Bjorn Helgaas [this message]
2023-02-16 0:35 ` Felix Kuehling
2023-02-16 0:44 ` Jason Gunthorpe
2023-02-16 5:37 ` Vasant Hegde
2023-02-16 14:55 ` Felix Kuehling
2023-02-16 14:53 ` Felix Kuehling
2023-02-16 5:25 ` Vasant Hegde
2023-02-16 18:59 ` Matt Fagnani
2023-02-16 19:59 ` Felix Kuehling
2023-02-17 5:36 ` Vasant Hegde
2023-02-17 5:23 ` Vasant Hegde
2023-01-05 19:51 ` Matt Fagnani
2023-01-06 7:28 ` Matt Fagnani
2023-01-10 16:08 ` Vasant Hegde
2023-01-10 16:12 ` Vasant Hegde
2023-01-06 14:14 ` Jason Gunthorpe
2023-01-07 2:44 ` Baolu Lu
2023-01-09 13:43 ` Jason Gunthorpe
2023-01-10 5:28 ` Baolu Lu
2023-01-10 5:48 ` Baolu Lu
2023-01-10 8:06 ` Matt Fagnani
[not found] ` <bb3d5d1a-c222-9270-60fa-7d0b74bebd1a@linux.intel.com>
2023-01-10 22:12 ` Matt Fagnani
2023-01-10 13:25 ` Jason Gunthorpe
2023-01-10 13:45 ` Christian König
2023-01-10 13:51 ` Jason Gunthorpe
2023-01-10 13:56 ` Christian König
2023-01-10 20:51 ` Matt Fagnani
2023-01-11 8:35 ` Christian König
2023-01-10 15:05 ` Felix Kuehling
2023-01-10 15:19 ` Jason Gunthorpe
2023-01-10 15:21 ` Felix Kuehling
2023-01-11 3:16 ` Baolu Lu
2023-01-11 13:08 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230215153913.GA3189407@bhelgaas \
--to=helgaas@kernel.org \
--cc=Alexander.Deucher@amd.com \
--cc=Vasant.Hegde@amd.com \
--cc=Xinhui.Pan@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=baolu.lu@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=christian.koenig@amd.com \
--cc=felix.kuehling@amd.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=jroedel@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=matt.fagnani@bell.net \
--cc=regressions@leemhuis.info \
--cc=regressions@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).