linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Limit AMD Radeon rebar hack to a single revision
@ 2021-10-26 20:46 Robin McCorkell
  2021-10-26 21:28 ` Bjorn Helgaas
  0 siblings, 1 reply; 8+ messages in thread
From: Robin McCorkell @ 2021-10-26 20:46 UTC (permalink / raw)
  To: linux-pci

A particular RX 5600 device requires a hack in the rebar logic, but the
current branch is too general and catches other devices too, breaking
them. This patch changes the branch to be more selective on the
particular revision.

See also: https://gitlab.freedesktop.org/drm/amd/-/issues/1707

This patch fixes intermittent freezes on other RX 5600 devices where the
hack is unnecessary. Credit to all contributors in the linked issue on
the AMD bug tracker.
---
 drivers/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ce2ab62b64cf..1fe75243019e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3647,7 +3647,7 @@ u32 pci_rebar_get_possible_sizes(struct pci_dev *pdev, int bar)
 
 	/* Sapphire RX 5600 XT Pulse has an invalid cap dword for BAR 0 */
 	if (pdev->vendor == PCI_VENDOR_ID_ATI && pdev->device == 0x731f &&
-	    bar == 0 && cap == 0x7000)
+	    pdev->revision == 0xC1 && bar == 0 && cap == 0x7000)
 		cap = 0x3f000;
 
 	return cap >> 4;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Limit AMD Radeon rebar hack to a single revision
  2021-10-26 20:46 [PATCH] Limit AMD Radeon rebar hack to a single revision Robin McCorkell
@ 2021-10-26 21:28 ` Bjorn Helgaas
  2021-10-26 21:44   ` [PATCH v2] PCI: Limit REBAR quirk to just Sapphire RX 5600 XT Pulse Robin McCorkell
  2021-10-26 21:49   ` [PATCH] Limit AMD Radeon rebar hack to a single revision Robin McCorkell
  0 siblings, 2 replies; 8+ messages in thread
From: Bjorn Helgaas @ 2021-10-26 21:28 UTC (permalink / raw)
  To: Robin McCorkell; +Cc: linux-pci, Christian König, Nirmoy Das

[+cc Christian, Nirmoy, authors of 907830b0fc9e]

Subject line should look like previous ones for this file, e.g.,

  88769e64cf99 ("PCI: Add ACS quirk for Pericom PI7C9X2G switches")
  e3f4bd3462f6 ("PCI: Mark Atheros QCA6174 to avoid bus reset")
  60b78ed088eb ("PCI: Add AMD GPU multi-function power dependencies")
  8304a3a199ee ("PCI: Set dma-can-stall for HiSilicon chips")
  8c09e896cef8 ("PCI: Allow PASID on fake PCIe devices without TLP prefixes")
  32837d8a8f63 ("PCI: Add ACS quirks for Cavium multi-function devices")
  e0bff4322092 ("PCI: Increase D3 delay for AMD Renoir/Cezanne XHCI")
  ...
  907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse")

Would be good to mention "Sapphire RX 5600 XT Pulse" explicitly since
that's what 907830b0fc9e said.

On Tue, Oct 26, 2021 at 09:46:38PM +0100, Robin McCorkell wrote:
> A particular RX 5600 device requires a hack in the rebar logic, but the
> current branch is too general and catches other devices too, breaking
> them. This patch changes the branch to be more selective on the
> particular revision.
> 
> See also: https://gitlab.freedesktop.org/drm/amd/-/issues/1707

There's a lot of legwork in the bug report to bisect this, but no
explanation of what the root cause turned out to be.

907830b0fc9e says RX 5600 XT Pulse advertises 256MB-1GB BAR 0 sizes but
actually supports up to 8GB.

Does this patch mean your RX 5600 XT Pulse supports fewer sizes and 
advertises them correctly?  And consequently we resize BAR 0 to
something that's too big, and something fails when we try to access
the part the isn't actually implemented by the device? 

It would be useful to attach your "lspci -vv" output to the bug
report.

> This patch fixes intermittent freezes on other RX 5600 devices where the
> hack is unnecessary. Credit to all contributors in the linked issue on
> the AMD bug tracker.

Thanks.  This would need a signed-off-by [1].

We should also include a "Fixes:" line for the commit the problem was
bisected to, 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire
RX 5600 XT Pulse"), if I understand correctly, e.g.,

  Fixes: 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse")

And probably a stable tag, since 907830b0fc9e appeared in v5.12, e.g.,

  Cc: stable@vger.kernel.org	# v5.12+

If stable maintainers backported 907830b0fc9e to earlier kernels, as
it appears they have, it's up to them to watch for fixes to
907830b0fc9e.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.14#n365

> ---
>  drivers/pci/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index ce2ab62b64cf..1fe75243019e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3647,7 +3647,7 @@ u32 pci_rebar_get_possible_sizes(struct pci_dev *pdev, int bar)
>  
>  	/* Sapphire RX 5600 XT Pulse has an invalid cap dword for BAR 0 */
>  	if (pdev->vendor == PCI_VENDOR_ID_ATI && pdev->device == 0x731f &&
> -	    bar == 0 && cap == 0x7000)
> +	    pdev->revision == 0xC1 && bar == 0 && cap == 0x7000)

I'd like to get the AMD folks to chime in and confirm that revision
0xC1 is the only one that requires this quirk.

>  		cap = 0x3f000;
>  
>  	return cap >> 4;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] PCI: Limit REBAR quirk to just Sapphire RX 5600 XT Pulse
  2021-10-26 21:28 ` Bjorn Helgaas
@ 2021-10-26 21:44   ` Robin McCorkell
  2021-10-27  9:47     ` Krzysztof Wilczyński
  2021-11-01 21:55     ` Bjorn Helgaas
  2021-10-26 21:49   ` [PATCH] Limit AMD Radeon rebar hack to a single revision Robin McCorkell
  1 sibling, 2 replies; 8+ messages in thread
From: Robin McCorkell @ 2021-10-26 21:44 UTC (permalink / raw)
  To: linux-pci; +Cc: stable

A particular RX 5600 device requires a hack in the rebar logic, but the
current branch is too general and catches other devices too, breaking
them. This patch changes the branch to be more selective on the
particular revision.

This patch fixes intermittent freezes on other RX 5600 devices where the
hack is unnecessary. Credit to all contributors in the linked issue on
the AMD bug tracker.

See also: https://gitlab.freedesktop.org/drm/amd/-/issues/1707

Fixes: 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse")
Cc: stable@vger.kernel.org    # v5.12+
Signed-off-by: Robin McCorkell <robin@mccorkell.me.uk>
Reported-by: Simon May <@Socob on gitlab.freedesktop.com>
Tested-by: Kain Centeno <@kaincenteno on gitlab.freedesktop.com>
Tested-by: Tobias Jakobi <@tobiasjakobi on gitlab.freedesktop.com>
Suggested-by: lijo lazar <@lijo on gitlab.freedesktop.com>
---
 drivers/pci/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ce2ab62b64cf..1fe75243019e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3647,7 +3647,7 @@ u32 pci_rebar_get_possible_sizes(struct pci_dev *pdev, int bar)
 
 	/* Sapphire RX 5600 XT Pulse has an invalid cap dword for BAR 0 */
 	if (pdev->vendor == PCI_VENDOR_ID_ATI && pdev->device == 0x731f &&
-	    bar == 0 && cap == 0x7000)
+	    pdev->revision == 0xC1 && bar == 0 && cap == 0x7000)
 		cap = 0x3f000;
 
 	return cap >> 4;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Limit AMD Radeon rebar hack to a single revision
  2021-10-26 21:28 ` Bjorn Helgaas
  2021-10-26 21:44   ` [PATCH v2] PCI: Limit REBAR quirk to just Sapphire RX 5600 XT Pulse Robin McCorkell
@ 2021-10-26 21:49   ` Robin McCorkell
  2021-10-27  6:38     ` Christian König
  1 sibling, 1 reply; 8+ messages in thread
From: Robin McCorkell @ 2021-10-26 21:49 UTC (permalink / raw)
  To: helgaas; +Cc: linux-pci, Christian König, Nirmoy Das

Thanks, v2 posted with the commit message fixes and correct tagging.

My device isn't a Sapphire RX 5600 XT Pulse, it's an RX 5600M and OEM
as far as I can tell. The condition in the original code was too broad
and was catching devices like mine (or the devices of the other bug
participants) where the hack was breaking things.

On Tue, 26 Oct 2021 at 22:28, Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Christian, Nirmoy, authors of 907830b0fc9e]
>
> Subject line should look like previous ones for this file, e.g.,
>
>   88769e64cf99 ("PCI: Add ACS quirk for Pericom PI7C9X2G switches")
>   e3f4bd3462f6 ("PCI: Mark Atheros QCA6174 to avoid bus reset")
>   60b78ed088eb ("PCI: Add AMD GPU multi-function power dependencies")
>   8304a3a199ee ("PCI: Set dma-can-stall for HiSilicon chips")
>   8c09e896cef8 ("PCI: Allow PASID on fake PCIe devices without TLP prefixes")
>   32837d8a8f63 ("PCI: Add ACS quirks for Cavium multi-function devices")
>   e0bff4322092 ("PCI: Increase D3 delay for AMD Renoir/Cezanne XHCI")
>   ...
>   907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse")
>
> Would be good to mention "Sapphire RX 5600 XT Pulse" explicitly since
> that's what 907830b0fc9e said.
>
> On Tue, Oct 26, 2021 at 09:46:38PM +0100, Robin McCorkell wrote:
> > A particular RX 5600 device requires a hack in the rebar logic, but the
> > current branch is too general and catches other devices too, breaking
> > them. This patch changes the branch to be more selective on the
> > particular revision.
> >
> > See also: https://gitlab.freedesktop.org/drm/amd/-/issues/1707
>
> There's a lot of legwork in the bug report to bisect this, but no
> explanation of what the root cause turned out to be.
>
> 907830b0fc9e says RX 5600 XT Pulse advertises 256MB-1GB BAR 0 sizes but
> actually supports up to 8GB.
>
> Does this patch mean your RX 5600 XT Pulse supports fewer sizes and
> advertises them correctly?  And consequently we resize BAR 0 to
> something that's too big, and something fails when we try to access
> the part the isn't actually implemented by the device?
>
> It would be useful to attach your "lspci -vv" output to the bug
> report.
>
> > This patch fixes intermittent freezes on other RX 5600 devices where the
> > hack is unnecessary. Credit to all contributors in the linked issue on
> > the AMD bug tracker.
>
> Thanks.  This would need a signed-off-by [1].
>
> We should also include a "Fixes:" line for the commit the problem was
> bisected to, 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire
> RX 5600 XT Pulse"), if I understand correctly, e.g.,
>
>   Fixes: 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse")
>
> And probably a stable tag, since 907830b0fc9e appeared in v5.12, e.g.,
>
>   Cc: stable@vger.kernel.org    # v5.12+
>
> If stable maintainers backported 907830b0fc9e to earlier kernels, as
> it appears they have, it's up to them to watch for fixes to
> 907830b0fc9e.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v5.14#n365
>
> > ---
> >  drivers/pci/pci.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index ce2ab62b64cf..1fe75243019e 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3647,7 +3647,7 @@ u32 pci_rebar_get_possible_sizes(struct pci_dev *pdev, int bar)
> >
> >       /* Sapphire RX 5600 XT Pulse has an invalid cap dword for BAR 0 */
> >       if (pdev->vendor == PCI_VENDOR_ID_ATI && pdev->device == 0x731f &&
> > -         bar == 0 && cap == 0x7000)
> > +         pdev->revision == 0xC1 && bar == 0 && cap == 0x7000)
>
> I'd like to get the AMD folks to chime in and confirm that revision
> 0xC1 is the only one that requires this quirk.
>
> >               cap = 0x3f000;
> >
> >       return cap >> 4;
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Limit AMD Radeon rebar hack to a single revision
  2021-10-26 21:49   ` [PATCH] Limit AMD Radeon rebar hack to a single revision Robin McCorkell
@ 2021-10-27  6:38     ` Christian König
  0 siblings, 0 replies; 8+ messages in thread
From: Christian König @ 2021-10-27  6:38 UTC (permalink / raw)
  To: Robin McCorkell, helgaas; +Cc: linux-pci, Nirmoy Das

Hi Robin and Bjorn,

Am 26.10.21 um 23:49 schrieb Robin McCorkell:
> Thanks, v2 posted with the commit message fixes and correct tagging.
>
> My device isn't a Sapphire RX 5600 XT Pulse, it's an RX 5600M and OEM
> as far as I can tell. The condition in the original code was too broad
> and was catching devices like mine (or the devices of the other bug
> participants) where the hack was breaking things.

well that doesn't really make sense. As far as I know the problem is an 
invalid PCIe config space on the whole series of the ASIC, but I can 
double check with the hardware engineers once more.

What could be is that a later hardware revision doesn't have that bug 
any more and we don't need to apply the workaround. That is trivial to 
confirm with an verbose lspci output, but even then the workaround won't 
hurt in any way.

> On Tue, 26 Oct 2021 at 22:28, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> [+cc Christian, Nirmoy, authors of 907830b0fc9e]
>>
>> Subject line should look like previous ones for this file, e.g.,
>>
>>    88769e64cf99 ("PCI: Add ACS quirk for Pericom PI7C9X2G switches")
>>    e3f4bd3462f6 ("PCI: Mark Atheros QCA6174 to avoid bus reset")
>>    60b78ed088eb ("PCI: Add AMD GPU multi-function power dependencies")
>>    8304a3a199ee ("PCI: Set dma-can-stall for HiSilicon chips")
>>    8c09e896cef8 ("PCI: Allow PASID on fake PCIe devices without TLP prefixes")
>>    32837d8a8f63 ("PCI: Add ACS quirks for Cavium multi-function devices")
>>    e0bff4322092 ("PCI: Increase D3 delay for AMD Renoir/Cezanne XHCI")
>>    ...
>>    907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse")
>>
>> Would be good to mention "Sapphire RX 5600 XT Pulse" explicitly since
>> that's what 907830b0fc9e said.
>>
>> On Tue, Oct 26, 2021 at 09:46:38PM +0100, Robin McCorkell wrote:
>>> A particular RX 5600 device requires a hack in the rebar logic, but the
>>> current branch is too general and catches other devices too, breaking
>>> them. This patch changes the branch to be more selective on the
>>> particular revision.
>>>
>>> See also: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1707&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C23c4b6057dc641fd111f08d998ca8666%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637708818349122129%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=lhBo74fAemYtZKYSqkfRtvN1Jo48%2BnZtNIaAAFS2DfY%3D&amp;reserved=0
>> There's a lot of legwork in the bug report to bisect this, but no
>> explanation of what the root cause turned out to be.

Yes agree, that bisection is extremely unlikely to be correct.

See the driver allocates addresses from both ends of the BAR and works 
towards the middle.

So if something doesn't work correctly we see that immediately because 
the driver isn't even able to initialize the hardware.

What could be is that resizing the BAR reduces the overhead drastically. 
Resulting in some use cases to have a 10-20% performance benefit. When 
we have a timing bug somewhere else the patch could potentially make 
that much more likely to be seen.

Going to comment on the bug report as well.

>> 907830b0fc9e says RX 5600 XT Pulse advertises 256MB-1GB BAR 0 sizes but
>> actually supports up to 8GB.
>>
>> Does this patch mean your RX 5600 XT Pulse supports fewer sizes and
>> advertises them correctly?  And consequently we resize BAR 0 to
>> something that's too big, and something fails when we try to access
>> the part the isn't actually implemented by the device?

We do have devices which have a non power of two local memory, e.g. 3GiB 
or 6GiB. In those cases we resize the BAR to the next power of two and 
that still works.

>>
>> It would be useful to attach your "lspci -vv" output to the bug
>> report.
>>
>>> This patch fixes intermittent freezes on other RX 5600 devices where the
>>> hack is unnecessary. Credit to all contributors in the linked issue on
>>> the AMD bug tracker.
>> Thanks.  This would need a signed-off-by [1].
>>
>> We should also include a "Fixes:" line for the commit the problem was
>> bisected to, 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire
>> RX 5600 XT Pulse"), if I understand correctly, e.g.,
>>
>>    Fixes: 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse")
>>
>> And probably a stable tag, since 907830b0fc9e appeared in v5.12, e.g.,
>>
>>    Cc: stable@vger.kernel.org    # v5.12+
>>
>> If stable maintainers backported 907830b0fc9e to earlier kernels, as
>> it appears they have, it's up to them to watch for fixes to
>> 907830b0fc9e.

What would be very helpful as well is to CC the relevant AMD engineers 
which wrote the original patch and signed it off.

Regards,
Christian.

>>
>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Ftree%2FDocumentation%2Fprocess%2Fsubmitting-patches.rst%3Fid%3Dv5.14%23n365&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C23c4b6057dc641fd111f08d998ca8666%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637708818349132123%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=cY7hOAFEuEqQuwUOY50hXmS0BvyP9D%2BQNBNsxn%2Bgmtg%3D&amp;reserved=0
>>
>>> ---
>>>   drivers/pci/pci.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>> index ce2ab62b64cf..1fe75243019e 100644
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -3647,7 +3647,7 @@ u32 pci_rebar_get_possible_sizes(struct pci_dev *pdev, int bar)
>>>
>>>        /* Sapphire RX 5600 XT Pulse has an invalid cap dword for BAR 0 */
>>>        if (pdev->vendor == PCI_VENDOR_ID_ATI && pdev->device == 0x731f &&
>>> -         bar == 0 && cap == 0x7000)
>>> +         pdev->revision == 0xC1 && bar == 0 && cap == 0x7000)
>> I'd like to get the AMD folks to chime in and confirm that revision
>> 0xC1 is the only one that requires this quirk.
>>
>>>                cap = 0x3f000;
>>>
>>>        return cap >> 4;
>>> --
>>> 2.31.1
>>>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] PCI: Limit REBAR quirk to just Sapphire RX 5600 XT Pulse
  2021-10-26 21:44   ` [PATCH v2] PCI: Limit REBAR quirk to just Sapphire RX 5600 XT Pulse Robin McCorkell
@ 2021-10-27  9:47     ` Krzysztof Wilczyński
  2021-11-01 21:55     ` Bjorn Helgaas
  1 sibling, 0 replies; 8+ messages in thread
From: Krzysztof Wilczyński @ 2021-10-27  9:47 UTC (permalink / raw)
  To: Robin McCorkell; +Cc: Bjorn Helgaas, linux-pci, stable

[+CC adding Bjorn as the PCI sub-system maintainer]

Hi Robin,

Thank you for sending the patch over!

> A particular RX 5600 device requires a hack in the rebar logic, but the
> current branch is too general and catches other devices too, breaking
> them. This patch changes the branch to be more selective on the
> particular revision.
> 
> This patch fixes intermittent freezes on other RX 5600 devices where the
> hack is unnecessary. Credit to all contributors in the linked issue on
> the AMD bug tracker.
> 
> See also: https://gitlab.freedesktop.org/drm/amd/-/issues/1707
[...]

The commit message could be improved a little bit so that it's more in
preferred imperative tone describing what precisely is broken and how it
fixes the problem for Sapphire RX 5600 XT and other ATI cards.  Also,
consistent capitalisation of "REBAR" between the subject and the commit
message would be a plus.

There is also no need to add "this patch" - we also know that this is this
very patch, especially since this isn't a series that comprises of multiple
other patches.

Also, sine this is a v2, it would be nice to include a small changelog,
even if the change is trivial, with helps as people don't have to go and
read other e-mail threads to find out what was changed and why.

> Reported-by: Simon May <@Socob on gitlab.freedesktop.com>
> Tested-by: Kain Centeno <@kaincenteno on gitlab.freedesktop.com>
> Tested-by: Tobias Jakobi <@tobiasjakobi on gitlab.freedesktop.com>
> Suggested-by: lijo lazar <@lijo on gitlab.freedesktop.com>

The above would be "gitlab.freedesktop.org", I believe.  Having said that,
I am not sure if we can accept username handles to some remote Git hosting
platform in lieu of proper, so to speak, e-mail addresses.

[...]
>  	/* Sapphire RX 5600 XT Pulse has an invalid cap dword for BAR 0 */
>  	if (pdev->vendor == PCI_VENDOR_ID_ATI && pdev->device == 0x731f &&
> -	    bar == 0 && cap == 0x7000)
> +	    pdev->revision == 0xC1 && bar == 0 && cap == 0x7000)

A small nitpick: lowercase hexadecimal values to match how it's been used
in other places.

	Krzysztof

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] PCI: Limit REBAR quirk to just Sapphire RX 5600 XT Pulse
  2021-10-26 21:44   ` [PATCH v2] PCI: Limit REBAR quirk to just Sapphire RX 5600 XT Pulse Robin McCorkell
  2021-10-27  9:47     ` Krzysztof Wilczyński
@ 2021-11-01 21:55     ` Bjorn Helgaas
  2021-11-02 12:27       ` Christian König
  1 sibling, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2021-11-01 21:55 UTC (permalink / raw)
  To: Robin McCorkell; +Cc: linux-pci, stable, Christian König, Nirmoy Das

[+cc Christian, Nirmoy]

On Tue, Oct 26, 2021 at 10:44:59PM +0100, Robin McCorkell wrote:
> A particular RX 5600 device requires a hack in the rebar logic, but the
> current branch is too general and catches other devices too, breaking
> them. This patch changes the branch to be more selective on the
> particular revision.
> 
> This patch fixes intermittent freezes on other RX 5600 devices where the
> hack is unnecessary. Credit to all contributors in the linked issue on
> the AMD bug tracker.
> 
> See also: https://gitlab.freedesktop.org/drm/amd/-/issues/1707
> 
> Fixes: 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse")
> Cc: stable@vger.kernel.org    # v5.12+
> Signed-off-by: Robin McCorkell <robin@mccorkell.me.uk>
> Reported-by: Simon May <@Socob on gitlab.freedesktop.com>
> Tested-by: Kain Centeno <@kaincenteno on gitlab.freedesktop.com>
> Tested-by: Tobias Jakobi <@tobiasjakobi on gitlab.freedesktop.com>
> Suggested-by: lijo lazar <@lijo on gitlab.freedesktop.com>

I'll wait for an ack from Christian on this one, since it doesn't seem
to make sense to him.

> ---
>  drivers/pci/pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index ce2ab62b64cf..1fe75243019e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3647,7 +3647,7 @@ u32 pci_rebar_get_possible_sizes(struct pci_dev *pdev, int bar)
>  
>  	/* Sapphire RX 5600 XT Pulse has an invalid cap dword for BAR 0 */
>  	if (pdev->vendor == PCI_VENDOR_ID_ATI && pdev->device == 0x731f &&
> -	    bar == 0 && cap == 0x7000)
> +	    pdev->revision == 0xC1 && bar == 0 && cap == 0x7000)
>  		cap = 0x3f000;
>  
>  	return cap >> 4;
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] PCI: Limit REBAR quirk to just Sapphire RX 5600 XT Pulse
  2021-11-01 21:55     ` Bjorn Helgaas
@ 2021-11-02 12:27       ` Christian König
  0 siblings, 0 replies; 8+ messages in thread
From: Christian König @ 2021-11-02 12:27 UTC (permalink / raw)
  To: Bjorn Helgaas, Robin McCorkell; +Cc: linux-pci, stable, Nirmoy Das

Am 01.11.21 um 22:55 schrieb Bjorn Helgaas:
> [+cc Christian, Nirmoy]
>
> On Tue, Oct 26, 2021 at 10:44:59PM +0100, Robin McCorkell wrote:
>> A particular RX 5600 device requires a hack in the rebar logic, but the
>> current branch is too general and catches other devices too, breaking
>> them. This patch changes the branch to be more selective on the
>> particular revision.
>>
>> This patch fixes intermittent freezes on other RX 5600 devices where the
>> hack is unnecessary. Credit to all contributors in the linked issue on
>> the AMD bug tracker.
>>
>> See also: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fdrm%2Famd%2F-%2Fissues%2F1707&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cfa71c149d6084ca3254508d99d824b9b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637714005225516666%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Vu%2FVbyBEGrnTjqcmAkJDJHa1BbUICRkSB1Jfre%2BhDhM%3D&amp;reserved=0
>>
>> Fixes: 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse")
>> Cc: stable@vger.kernel.org    # v5.12+
>> Signed-off-by: Robin McCorkell <robin@mccorkell.me.uk>
>> Reported-by: Simon May <@Socob on gitlab.freedesktop.com>
>> Tested-by: Kain Centeno <@kaincenteno on gitlab.freedesktop.com>
>> Tested-by: Tobias Jakobi <@tobiasjakobi on gitlab.freedesktop.com>
>> Suggested-by: lijo lazar <@lijo on gitlab.freedesktop.com>
> I'll wait for an ack from Christian on this one, since it doesn't seem
> to make sense to him.

Please just completely drop the patch for now.

It's really interesting that resizing the BAR makes the problem on that 
hardware more likely to appear, but we have already found people 
reporting issues even with the patch in question completely reverted.

So that is most likely not the root cause and we need to dig deeper.

Thanks,
Christian.

>
>> ---
>>   drivers/pci/pci.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index ce2ab62b64cf..1fe75243019e 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -3647,7 +3647,7 @@ u32 pci_rebar_get_possible_sizes(struct pci_dev *pdev, int bar)
>>   
>>   	/* Sapphire RX 5600 XT Pulse has an invalid cap dword for BAR 0 */
>>   	if (pdev->vendor == PCI_VENDOR_ID_ATI && pdev->device == 0x731f &&
>> -	    bar == 0 && cap == 0x7000)
>> +	    pdev->revision == 0xC1 && bar == 0 && cap == 0x7000)
>>   		cap = 0x3f000;
>>   
>>   	return cap >> 4;
>> -- 
>> 2.31.1
>>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-11-02 12:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-26 20:46 [PATCH] Limit AMD Radeon rebar hack to a single revision Robin McCorkell
2021-10-26 21:28 ` Bjorn Helgaas
2021-10-26 21:44   ` [PATCH v2] PCI: Limit REBAR quirk to just Sapphire RX 5600 XT Pulse Robin McCorkell
2021-10-27  9:47     ` Krzysztof Wilczyński
2021-11-01 21:55     ` Bjorn Helgaas
2021-11-02 12:27       ` Christian König
2021-10-26 21:49   ` [PATCH] Limit AMD Radeon rebar hack to a single revision Robin McCorkell
2021-10-27  6:38     ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).