linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: "Lazar, Lijo" <lijo.lazar@amd.com>, Stefan Roese <sr@denx.de>,
	Tom Seewald <tseewald@gmail.com>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	regressions@lists.linux.dev, David Airlie <airlied@linux.ie>,
	linux-pci@vger.kernel.org, Xinhui Pan <Xinhui.Pan@amd.com>,
	amd-gfx@lists.freedesktop.org,
	Kai-Heng Feng <kai.heng.feng@canonical.com>,
	Daniel Vetter <daniel@ffwll.ch>,
	Alex Deucher <alexander.deucher@amd.com>
Subject: Re: [Bug 216373] New: Uncorrected errors reported for AMD GPU
Date: Thu, 25 Aug 2022 10:18:28 +0200	[thread overview]
Message-ID: <0444020d-e7e6-2fe9-e94e-413c8d3bab38@amd.com> (raw)
In-Reply-To: <54874254-d21e-207c-9526-8b423bd97507@amd.com>

Am 25.08.22 um 09:54 schrieb Lazar, Lijo:
>
>
> On 8/25/2022 1:04 PM, Christian König wrote:
>> Am 25.08.22 um 08:40 schrieb Stefan Roese:
>>> On 24.08.22 16:45, Tom Seewald wrote:
>>>> On Wed, Aug 24, 2022 at 12:11 AM Lazar, Lijo <lijo.lazar@amd.com> 
>>>> wrote:
>>>>> Unfortunately, I don't have any NV platforms to test. Attached is an
>>>>> 'untested-patch' based on your trace logs.
>>>>>
>>>>> Thanks,
>>>>> Lijo
>>>>
>>>> Thank you for the patch. It applied cleanly to v6.0-rc2 and after
>>>> booting that kernel I no longer see any messages about PCI errors. I
>>>> have uploaded a dmesg log to the bug report:
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fattachment.cgi%3Fid%3D301642&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7Cd55a659245b24864bd2d08da8664ae2d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637970065087671063%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&amp;sdata=vbhJ9OB0jIYr%2FRkDIbQHhRRqhyklnnHOT9Xi8z17MYY%3D&amp;reserved=0 
>>>>
>>>
>>> I did not follow this thread in depth, but FWICT the bug is solved now
>>> with this patch. So is it correct, that the now fully enabled AER
>>> support in the PCI subsystem in v6.0 helped detecting a bug in the AMD
>>> GPU driver?
>>
>> It looks like it, but I'm not 100% sure about the rational behind it.
>>
>> Lijo can you explain more on this?
>>
>
> From the trace, during gmc hw_init it takes this route -
>
> gart_enable -> amdgpu_gtt_mgr_recover -> amdgpu_gart_invalidate_tlb -> 
> amdgpu_device_flush_hdp -> amdgpu_asic_flush_hdp (non-ring based HDP 
> flush)
>
> HDP flush is done using remapped offset which is MMIO_REG_HOLE_OFFSET 
> (0x80000 - PAGE_SIZE)
>
> WREG32_NO_KIQ((adev->rmmio_remap.reg_offset + 
> KFD_MMIO_REMAP_HDP_MEM_FLUSH_CNTL) >> 2, 0);
>
> However, the remapping is not yet done at this point. It's done at a 
> later point during common block initialization. Access to the unmapped 
> offset '(0x80000 - PAGE_SIZE)' seems to come back as unsupported 
> request and reported through AER.

That's interesting behavior. So far AER always indicated some kind of 
transmission error.

When that happens as well on unmapped areas of the MMIO BAR then we need 
to keep that in mind.

Thanks,
Christian.

>
> In the patch, I just moved the remapping before gmc block initialization.
>
> Thanks,
> Lijo
>
>> Thanks,
>> Christian.
>>
>>>
>>> Thanks,
>>> Stefan
>>


  reply	other threads:[~2022-08-25  8:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-216373-41252@https.bugzilla.kernel.org/>
2022-08-18 20:38 ` [Bug 216373] New: Uncorrected errors reported for AMD GPU Bjorn Helgaas
2022-08-19  7:05   ` Christian König
2022-08-19  8:33     ` Lazar, Lijo
2022-08-19 11:04       ` Bjorn Helgaas
2022-08-19 17:13   ` Bjorn Helgaas
2022-08-19 19:07     ` Bjorn Helgaas
2022-08-20  7:52       ` Lazar, Lijo
2022-08-23 17:04         ` Tom Seewald
2022-08-24  5:10           ` Lazar, Lijo
2022-08-24 14:45             ` Tom Seewald
2022-08-25  6:40               ` Stefan Roese
2022-08-25  7:34                 ` Christian König
2022-08-25  7:54                   ` Lazar, Lijo
2022-08-25  8:18                     ` Christian König [this message]
2022-08-25 17:48                       ` Bjorn Helgaas
2022-08-26  7:10                         ` Christian König
2022-08-25 15:05             ` Felix Kuehling
2022-08-23 16:01   ` [Bug 216373] New: Uncorrected errors reported for AMD GPU #forregzbot Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0444020d-e7e6-2fe9-e94e-413c8d3bab38@amd.com \
    --to=christian.koenig@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=airlied@linux.ie \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=daniel@ffwll.ch \
    --cc=helgaas@kernel.org \
    --cc=kai.heng.feng@canonical.com \
    --cc=lijo.lazar@amd.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=sr@denx.de \
    --cc=tseewald@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).