linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Deucher, Alexander" <Alexander.Deucher@amd.com>
To: "Merger, Edgar [AUTOSOL/MAS/AUGS]" <Edgar.Merger@emerson.com>,
	"Huang, Ray" <Ray.Huang@amd.com>,
	"Kuehling, Felix" <Felix.Kuehling@amd.com>
Cc: Will Deacon <will@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Joerg Roedel <jroedel@suse.de>,
	"Zhu, Changfeng" <Changfeng.Zhu@amd.com>
Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken
Date: Tue, 24 Nov 2020 15:05:52 +0000	[thread overview]
Message-ID: <MN2PR12MB44884857E65E3599DA32D0B2F7FB0@MN2PR12MB4488.namprd12.prod.outlook.com> (raw)
In-Reply-To: <MWHPR10MB13108B04F4765EA6E278660B89FB0@MWHPR10MB1310.namprd10.prod.outlook.com>

[AMD Public Use]

> -----Original Message-----
> From: Merger, Edgar [AUTOSOL/MAS/AUGS]
> <Edgar.Merger@emerson.com>
> Sent: Tuesday, November 24, 2020 2:29 AM
> To: Huang, Ray <Ray.Huang@amd.com>; Kuehling, Felix
> <Felix.Kuehling@amd.com>
> Cc: Will Deacon <will@kernel.org>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; linux-kernel@vger.kernel.org; linux-
> pci@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn Helgaas
> <bhelgaas@google.com>; Joerg Roedel <jroedel@suse.de>; Zhu, Changfeng
> <Changfeng.Zhu@amd.com>
> Subject: RE: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as
> broken
> 
> Module Version : PiccasoCpu 10
> AGESA Version   : PiccasoPI 100A
> 
> I did not try to enter the system in any other way (like via ssh) than via
> Desktop.

You can get this information from the amdgpu driver.  E.g., sudo cat /sys/kernel/debug/dri/0/amdgpu_firmware_info .  Also what is the PCI revision id of your chip (from lspci)?  Also are you just seeing this on specific versions of the sbios?

Thanks,

Alex


> 
> -----Original Message-----
> From: Huang Rui <ray.huang@amd.com>
> Sent: Dienstag, 24. November 2020 07:43
> To: Kuehling, Felix <Felix.Kuehling@amd.com>
> Cc: Will Deacon <will@kernel.org>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; linux-kernel@vger.kernel.org; linux-
> pci@vger.kernel.org; iommu@lists.linux-foundation.org; Bjorn Helgaas
> <bhelgaas@google.com>; Merger, Edgar [AUTOSOL/MAS/AUGS]
> <Edgar.Merger@emerson.com>; Joerg Roedel <jroedel@suse.de>;
> Changfeng Zhu <changfeng.zhu@amd.com>
> Subject: [EXTERNAL] Re: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken
> 
> On Tue, Nov 24, 2020 at 06:51:11AM +0800, Kuehling, Felix wrote:
> > On 2020-11-23 5:33 p.m., Will Deacon wrote:
> > > On Mon, Nov 23, 2020 at 09:04:14PM +0000, Deucher, Alexander wrote:
> > >> [AMD Public Use]
> > >>
> > >>> -----Original Message-----
> > >>> From: Will Deacon <will@kernel.org>
> > >>> Sent: Monday, November 23, 2020 8:44 AM
> > >>> To: linux-kernel@vger.kernel.org
> > >>> Cc: linux-pci@vger.kernel.org; iommu@lists.linux-foundation.org;
> > >>> Will Deacon <will@kernel.org>; Bjorn Helgaas
> > >>> <bhelgaas@google.com>; Deucher, Alexander
> > >>> <Alexander.Deucher@amd.com>; Edgar Merger
> > >>> <Edgar.Merger@emerson.com>; Joerg Roedel <jroedel@suse.de>
> > >>> Subject: [PATCH] PCI: Mark AMD Raven iGPU ATS as broken
> > >>>
> > >>> Edgar Merger reports that the AMD Raven GPU does not work reliably
> > >>> on his system when the IOMMU is enabled:
> > >>>
> > >>>    | [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
> > >>> signaled seq=1, emitted seq=3
> > >>>    | [...]
> > >>>    | amdgpu 0000:0b:00.0: GPU reset begin!
> > >>>    | AMD-Vi: Completion-Wait loop timed out
> > >>>    | iommu ivhd0: AMD-Vi: Event logged [IOTLB_INV_TIMEOUT
> > >>> device=0b:00.0 address=0x38edc0970]
> > >>>
> > >>> This is indicative of a hardware/platform configuration issue so,
> > >>> since disabling ATS has been shown to resolve the problem, add a
> > >>> quirk to match this particular device while Edgar follows-up with AMD
> for more information.
> > >>>
> > >>> Cc: Bjorn Helgaas <bhelgaas@google.com>
> > >>> Cc: Alex Deucher <alexander.deucher@amd.com>
> > >>> Reported-by: Edgar Merger <Edgar.Merger@emerson.com>
> > >>> Suggested-by: Joerg Roedel <jroedel@suse.de>
> > >>> Link:
> > >>>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> efense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-
> 3A__lore%26d%3DDwIDAw%26c%3DjOURTkCZzT8tVB5xPEYIm3YJGoxoTaQs
> QPzPKJGaWbo%26r%3DBJxhacqqa4K1PJGm6_-
> 862rdSP13_P6LVp7j_9l1xmg%26m%3DlNXu2xwvyxEZ3PzoVmXMBXXS55jsmf
> DicuQFJqkIOH4%26s%3D_5VDNCRQdA7AhsvvZ3TJJtQZ2iBp9c9tFHIleTYT_ZM
> %26e%3D&amp;data=04%7C01%7CAlexander.Deucher%40amd.com%7C6d5f
> a241f9634692c03908d8904a942c%7C3dd8961fe4884e608e11a82d994e183d%7
> C0%7C0%7C637417997272974427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> MC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C100
> 0&amp;sdata=OEgYlw%2F1YP0C%2FnWBRQUxwBH56mGOJxYMWSQ%2Fj1Y
> 9f6Q%3D&amp;reserved=0 .
> > >>> kernel.org/linux-
> > >>>
> iommu/MWHPR10MB1310F042A30661D4158520B589FC0@MWHPR10M
> > >>> B1310.namprd10.prod.outlook.com
> > >>>
> her%40amd.com%7C1a883fe14d0c408e7d9508d88fb5df4e%7C3dd8961fe488
> > >>>
> 4e608e11a82d994e183d%7C0%7C0%7C637417358593629699%7CUnknown%7
> > >>>
> CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
> > >>>
> LCJXVCI6Mn0%3D%7C1000&amp;sdata=TMgKldWzsX8XZ0l7q3%2BszDWXQJJ
> > >>> LOUfX5oGaoLN8n%2B8%3D&amp;reserved=0
> > >>> Signed-off-by: Will Deacon <will@kernel.org>
> > >>> ---
> > >>>
> > >>> Hi all,
> > >>>
> > >>> Since Joerg is away at the moment, I'm posting this to try to make
> > >>> some progress with the thread in the Link: tag.
> > >> + Felix
> > >>
> > >> What system is this?  Can you provide more details?  Does a sbios
> > >> update fix this?  Disabling ATS for all Ravens will break GPU
> > >> compute for a lot of people.  I'd prefer to just black list this
> > >> particular system (e.g., just SSIDs or revision) if possible.
> >
> > +Ray
> >
> > There are already many systems where the IOMMU is disabled in the
> > BIOS, or the CRAT table reporting the APU compute capabilities is
> > broken. Ray has been working on a fallback to make APUs behave like
> > dGPUs on such systems. That should also cover this case where ATS is
> > blacklisted. That said, it affects the programming model, because we
> > don't support the unified and coherent memory model on dGPUs like we
> > do on APUs with IOMMUv2. So it would be good to make the conditions
> > for this workaround as narrow as possible.
> 
> Yes, besides the comments from Alex and Felix, may we get your firmware
> version (SMC firmware which is from SBIOS) and device id?
> 
> > >>>    | [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
> > >>> signaled seq=1, emitted seq=3
> 
> It looks only gfx ib test passed, and fails to lanuch desktop, am I right?
> 
> We would like to see whether it is Raven, Raven kicker (new Raven), or
> Picasso. In our side, per the internal test result, we didn't see the similiar
> issue on Raven kicker and Picasso platform.
> 
> Thanks,
> Ray
> 
> >
> > These are the relevant changes in KFD and Thunk for reference:
> >
> > ### KFD ###
> >
> > commit 914913ab04dfbcd0226ecb6bc99d276832ea2908
> > Author: Huang Rui <ray.huang@amd.com>
> > Date:   Tue Aug 18 14:54:23 2020 +0800
> >
> >      drm/amdkfd: implement the dGPU fallback path for apu (v6)
> >
> >      We still have a few iommu issues which need to address, so force
> > raven
> >      as "dgpu" path for the moment.
> >
> >      This is to add the fallback path to bypass IOMMU if IOMMU v2 is
> > disabled
> >      or ACPI CRAT table not correct.
> >
> >      v2: Use ignore_crat parameter to decide whether it will go with
> > IOMMUv2.
> >      v3: Align with existed thunk, don't change the way of raven, only
> > renoir
> >          will use "dgpu" path by default.
> >      v4: don't update global ignore_crat in the driver, and revise
> > fallback
> >          function if CRAT is broken.
> >      v5: refine acpi crat good but no iommu support case, and rename
> > the
> >          title.
> >      v6: fix the issue of dGPU initialized firstly, just modify the
> > report
> >          value in the node_show().
> >
> >      Signed-off-by: Huang Rui <ray.huang@amd.com>
> >      Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
> >      Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >
> > ### Thunk ###
> >
> > commit e32482fa4b9ca398c8bdc303920abfd672592764
> > Author: Huang Rui <ray.huang@amd.com>
> > Date:   Tue Aug 18 18:54:05 2020 +0800
> >
> >      libhsakmt: remove is_dgpu flag in the hsa_gfxip_table
> >
> >      Whether use dgpu path will check the props which exposed from kernel.
> >      We won't need hard code in the ASIC table.
> >
> >      Signed-off-by: Huang Rui <ray.huang@amd.com>
> >      Change-Id: I0c018a26b219914a41197ff36dbec7a75945d452
> >
> > commit 7c60f6d912034aa67ed27b47a29221422423f5cc
> > Author: Huang Rui <ray.huang@amd.com>
> > Date:   Thu Jul 30 10:22:23 2020 +0800
> >
> >      libhsakmt: implement the method that using flag which exposed by
> > kfd to configure is_dgpu
> >
> >      KFD already implemented the fallback path for APU. Thunk will use
> > flag
> >      which exposed by kfd to configure is_dgpu instead of hardcode before.
> >
> >      Signed-off-by: Huang Rui <ray.huang@amd.com>
> >      Change-Id: I445f6cf668f9484dd06cd9ae1bb3cfe7428ec7eb
> >
> > Regards,
> >    Felix
> >
> >
> > > Cheers, Alex. I'll have to defer to Edgar for the details, as my
> > > understanding from the original thread over at:
> > >
> > >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fur
> > > ldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-
> 3A__lore.kernel.org&a
> > >
> mp;data=04%7C01%7CAlexander.Deucher%40amd.com%7C6d5fa241f963469
> 2c039
> > >
> 08d8904a942c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63741
> 79972
> > >
> 72974427%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoi
> V2luMzI
> > >
> iLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=iKTPucGQqcRXET
> QZiQz
> > > j90WdJeCYDytdZHJ1ZiUyR%2FM%3D&amp;reserved=0
> > > _linux-2Diommu_MWHPR10MB1310CDB6829DDCF5EA84A14689150-
> 40MWHPR10MB131
> > >
> 0.namprd10.prod.outlook.com_&d=DwIDAw&c=jOURTkCZzT8tVB5xPEYIm3Y
> JGoxo
> > > TaQsQPzPKJGaWbo&r=BJxhacqqa4K1PJGm6_-
> 862rdSP13_P6LVp7j_9l1xmg&m=lNXu
> > >
> 2xwvyxEZ3PzoVmXMBXXS55jsmfDicuQFJqkIOH4&s=dsAVVJbD7gJIj3ctZpnnU
> 60y21
> > > ijWZmZ8xmOK1cO_O0&e=
> > >
> > > is that this is a board developed by his company.
> > >
> > > Edgar -- please can you answer Alex's questions?
> > >
> > > Will

  reply	other threads:[~2020-11-24 15:06 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-23 13:44 [PATCH] PCI: Mark AMD Raven iGPU ATS as broken Will Deacon
2020-11-23 21:04 ` Deucher, Alexander
2020-11-23 22:33   ` Will Deacon
2020-11-23 22:51     ` Felix Kuehling
2020-11-24  6:43       ` Huang Rui
2020-11-24  7:28         ` [EXTERNAL] " Merger, Edgar [AUTOSOL/MAS/AUGS]
2020-11-24 15:05           ` Deucher, Alexander [this message]
2020-11-25  6:05             ` Merger, Edgar [AUTOSOL/MAS/AUGS]
2020-11-25  9:16               ` Merger, Edgar [AUTOSOL/MAS/AUGS]
2020-11-25 10:03                 ` Merger, Edgar [AUTOSOL/MAS/AUGS]
2020-11-25 16:07                   ` Deucher, Alexander
2020-11-26  9:24                     ` Merger, Edgar [AUTOSOL/MAS/AUGS]
2020-11-30 18:36                       ` Deucher, Alexander
2020-12-07  4:53                         ` Merger, Edgar [AUTOSOL/MAS/AUGS]
2020-12-08  8:23                           ` Merger, Edgar [AUTOSOL/MAS/AUGS]
2020-12-09  7:59                             ` Merger, Edgar [AUTOSOL/MAS/AUGS]
2020-12-09 14:23                               ` Deucher, Alexander
2020-12-10 10:48                                 ` Merger, Edgar [AUTOSOL/MAS/AUGS]
2020-12-10 15:36                                   ` Deucher, Alexander
2020-12-10 16:25                                     ` Bjorn Helgaas
2020-11-24  5:32     ` Merger, Edgar [AUTOSOL/MAS/AUGS]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN2PR12MB44884857E65E3599DA32D0B2F7FB0@MN2PR12MB4488.namprd12.prod.outlook.com \
    --to=alexander.deucher@amd.com \
    --cc=Changfeng.Zhu@amd.com \
    --cc=Edgar.Merger@emerson.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=Ray.Huang@amd.com \
    --cc=bhelgaas@google.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jroedel@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).