All of lore.kernel.org
 help / color / mirror / Atom feed
* ACPI: what should Linux do for "call-order-swap" quirk from firmware?
@ 2023-05-21 19:26 Ratchanan Srirattanamet
  2023-05-22  9:44 ` Rafael J. Wysocki
  0 siblings, 1 reply; 4+ messages in thread
From: Ratchanan Srirattanamet @ 2023-05-21 19:26 UTC (permalink / raw)
  To: linux-pm

Hello,

I'm trying to debug an issue where Nouveau is unable to runtime-resume 
an Nvidia GTX 1650 Ti in an AMD-based laptop [1]. As part of this, I've 
traced ACPI calls for the same device on Windows. And it seems like this 
device has a weird quirk, which I call it "call-order-swap" for a lack 
of better words, when it transitions from D3cold to D0.

So, a bit of context: Lenovo Legion 5-15ARH05 [2] is a laptop sporting 
AMD Ryzen 7 4800H with Radeon Graphics + Nvidia GTX 1650 Ti. This 
device's PCI-E topology to the GPU is:

00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir 
PCIe GPP Bridge [1022:1633]
         +- 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation 
TU117M [GeForce GTX 1650 Ti Mobile] [10de:1f95] (rev a1)

And for ACPI perspective (according to my interpretation), a power 
resource \_SB.PCI0.GPP0 seems to represent the PCI bridge, having 
\_SB.PCI0.GPP0.PG00 as a power resource, and \_SB.PCI0.GPP0.PEGP seems 
to represent the GPU itself, which doesn't seem to have its own power 
resource. All ACPI table dumps and infos can be found in the issue on 
Freedesktop GitLab [1].

Now, if I understand the specs correctly, when transitioning the GPU & 
the bridge back from D3cold to D0, the kernel should start up the bridge 
before the GPU itself. From the ACPI perspective, I should see calls for 
.PG00._ON() (power resource for the bridge) before .PEGP.PS0().

However, on Windows [3], instead it seems like .PEGP.PS0() is called 
before .PG00._ON(), for some reason. This is weird, because if 
.PG00._ON() has not been called yet, .PEGP.PS0() should be even valid to 
call. Now, I have no idea on what part of the Windows system is supposed 
to call those ACPI functions, but my feeling is that it must be either 
Nvidia or AMD driver that does this kind of quirks.

As for what Linux does... well it seems like when Linux resumes the PCI 
bridge, it calls only .PG00._ON(), skipping .PEGP.PS0() on the ground 
that the downstream devices must have been reset when that happens. I'm 
not sure that's the right thing to happen either, but at least it makes 
more sense. Nvidia's proprietary driver seems to disable runtime D3 
support inside it completely on this device, so I think Nvidia must have 
a quirk for this chipset, as I briefly borrowed my friend's laptop 
sporting AMD 6000 series CPU and it doesn't disable runtime D3.

So... I'm not sure what the correct behavior is here. I'm a developer 
myself, but kernel is not where I'm familiar with. Please advise me on 
where I should look next.

Ratchanan.

P.S. please make sure to include me in the reply, as I'm not the list's 
subscriber.

[1] https://gitlab.freedesktop.org/drm/nouveau/-/issues/79
[2] 
https://pcsupport.lenovo.com/th/en/products/laptops-and-netbooks/legion-series/legion-5-15arh05/82b5/82b500fqta
[3] 
https://gitlab.freedesktop.org/drm/nouveau/uploads/2659e5cb41a52290ebf18d9906408d62/nvamli1-processed.txt

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ACPI: what should Linux do for "call-order-swap" quirk from firmware?
  2023-05-21 19:26 ACPI: what should Linux do for "call-order-swap" quirk from firmware? Ratchanan Srirattanamet
@ 2023-05-22  9:44 ` Rafael J. Wysocki
  2023-05-22 13:13   ` Mario Limonciello
  0 siblings, 1 reply; 4+ messages in thread
From: Rafael J. Wysocki @ 2023-05-22  9:44 UTC (permalink / raw)
  To: Ratchanan Srirattanamet
  Cc: linux-pm, ACPI Devel Maling List, Mario Limonciello

+Mario and linux-acpi

On Sun, May 21, 2023 at 9:26 PM Ratchanan Srirattanamet
<peathot@hotmail.com> wrote:
>
> Hello,
>
> I'm trying to debug an issue where Nouveau is unable to runtime-resume
> an Nvidia GTX 1650 Ti in an AMD-based laptop [1]. As part of this, I've
> traced ACPI calls for the same device on Windows. And it seems like this
> device has a weird quirk, which I call it "call-order-swap" for a lack
> of better words, when it transitions from D3cold to D0.
>
> So, a bit of context: Lenovo Legion 5-15ARH05 [2] is a laptop sporting
> AMD Ryzen 7 4800H with Radeon Graphics + Nvidia GTX 1650 Ti. This
> device's PCI-E topology to the GPU is:
>
> 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir
> PCIe GPP Bridge [1022:1633]
>          +- 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation
> TU117M [GeForce GTX 1650 Ti Mobile] [10de:1f95] (rev a1)
>
> And for ACPI perspective (according to my interpretation), a power
> resource \_SB.PCI0.GPP0 seems to represent the PCI bridge, having
> \_SB.PCI0.GPP0.PG00 as a power resource, and \_SB.PCI0.GPP0.PEGP seems
> to represent the GPU itself, which doesn't seem to have its own power
> resource. All ACPI table dumps and infos can be found in the issue on
> Freedesktop GitLab [1].
>
> Now, if I understand the specs correctly, when transitioning the GPU &
> the bridge back from D3cold to D0, the kernel should start up the bridge
> before the GPU itself. From the ACPI perspective, I should see calls for
> .PG00._ON() (power resource for the bridge) before .PEGP.PS0().
>
> However, on Windows [3], instead it seems like .PEGP.PS0() is called
> before .PG00._ON(), for some reason. This is weird, because if
> .PG00._ON() has not been called yet, .PEGP.PS0() should be even valid to
> call. Now, I have no idea on what part of the Windows system is supposed
> to call those ACPI functions, but my feeling is that it must be either
> Nvidia or AMD driver that does this kind of quirks.
>
> As for what Linux does... well it seems like when Linux resumes the PCI
> bridge, it calls only .PG00._ON(), skipping .PEGP.PS0() on the ground
> that the downstream devices must have been reset when that happens. I'm
> not sure that's the right thing to happen either, but at least it makes
> more sense. Nvidia's proprietary driver seems to disable runtime D3
> support inside it completely on this device, so I think Nvidia must have
> a quirk for this chipset, as I briefly borrowed my friend's laptop
> sporting AMD 6000 series CPU and it doesn't disable runtime D3.
>
> So... I'm not sure what the correct behavior is here. I'm a developer
> myself, but kernel is not where I'm familiar with. Please advise me on
> where I should look next.
>
> Ratchanan.
>
> P.S. please make sure to include me in the reply, as I'm not the list's
> subscriber.
>
> [1] https://gitlab.freedesktop.org/drm/nouveau/-/issues/79
> [2]
> https://pcsupport.lenovo.com/th/en/products/laptops-and-netbooks/legion-series/legion-5-15arh05/82b5/82b500fqta
> [3]
> https://gitlab.freedesktop.org/drm/nouveau/uploads/2659e5cb41a52290ebf18d9906408d62/nvamli1-processed.txt

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ACPI: what should Linux do for "call-order-swap" quirk from firmware?
  2023-05-22  9:44 ` Rafael J. Wysocki
@ 2023-05-22 13:13   ` Mario Limonciello
  2023-05-23 21:32     ` Ratchanan Srirattanamet
  0 siblings, 1 reply; 4+ messages in thread
From: Mario Limonciello @ 2023-05-22 13:13 UTC (permalink / raw)
  To: Rafael J. Wysocki, Ratchanan Srirattanamet
  Cc: linux-pm, ACPI Devel Maling List

On 5/22/23 04:44, Rafael J. Wysocki wrote:
> +Mario and linux-acpi
> 
> On Sun, May 21, 2023 at 9:26 PM Ratchanan Srirattanamet
> <peathot@hotmail.com> wrote:
>>
>> Hello,
>>
>> I'm trying to debug an issue where Nouveau is unable to runtime-resume
>> an Nvidia GTX 1650 Ti in an AMD-based laptop [1]. As part of this, I've
>> traced ACPI calls for the same device on Windows. And it seems like this
>> device has a weird quirk, which I call it "call-order-swap" for a lack
>> of better words, when it transitions from D3cold to D0.
>>
>> So, a bit of context: Lenovo Legion 5-15ARH05 [2] is a laptop sporting
>> AMD Ryzen 7 4800H with Radeon Graphics + Nvidia GTX 1650 Ti. This
>> device's PCI-E topology to the GPU is:
>>
>> 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir
>> PCIe GPP Bridge [1022:1633]
>>           +- 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation
>> TU117M [GeForce GTX 1650 Ti Mobile] [10de:1f95] (rev a1)
>>
>> And for ACPI perspective (according to my interpretation), a power
>> resource \_SB.PCI0.GPP0 seems to represent the PCI bridge, having
>> \_SB.PCI0.GPP0.PG00 as a power resource, and \_SB.PCI0.GPP0.PEGP seems
>> to represent the GPU itself, which doesn't seem to have its own power
>> resource. All ACPI table dumps and infos can be found in the issue on
>> Freedesktop GitLab [1].
>>
>> Now, if I understand the specs correctly, when transitioning the GPU &
>> the bridge back from D3cold to D0, the kernel should start up the bridge
>> before the GPU itself. From the ACPI perspective, I should see calls for
>> .PG00._ON() (power resource for the bridge) before .PEGP.PS0().
>>
>> However, on Windows [3], instead it seems like .PEGP.PS0() is called
>> before .PG00._ON(), for some reason. This is weird, because if
>> .PG00._ON() has not been called yet, .PEGP.PS0() should be even valid to
>> call. Now, I have no idea on what part of the Windows system is supposed
>> to call those ACPI functions, but my feeling is that it must be either
>> Nvidia or AMD driver that does this kind of quirks.

I don't think it could be an AMD driver in this case for Windows as the 
PCIe root port uses "inbox" drivers.

>>
>> As for what Linux does... well it seems like when Linux resumes the PCI
>> bridge, it calls only .PG00._ON(), skipping .PEGP.PS0() on the ground
>> that the downstream devices must have been reset when that happens. I'm
>> not sure that's the right thing to happen either, but at least it makes
>> more sense. Nvidia's proprietary driver seems to disable runtime D3
>> support inside it completely on this device, so I think Nvidia must have
>> a quirk for this chipset, as I briefly borrowed my friend's laptop
>> sporting AMD 6000 series CPU and it doesn't disable runtime D3. >>
>> So... I'm not sure what the correct behavior is here. I'm a developer
>> myself, but kernel is not where I'm familiar with. Please advise me on
>> where I should look next.

Yeah if it's working properly on newer hardware it does seem like a good 
argument for a quirk in the Nouveau driver to me when this older 
combination is encountered.

>>
>> Ratchanan.
>>
>> P.S. please make sure to include me in the reply, as I'm not the list's
>> subscriber.
>>
>> [1] https://gitlab.freedesktop.org/drm/nouveau/-/issues/79
>> [2]
>> https://pcsupport.lenovo.com/th/en/products/laptops-and-netbooks/legion-series/legion-5-15arh05/82b5/82b500fqta
>> [3]
>> https://gitlab.freedesktop.org/drm/nouveau/uploads/2659e5cb41a52290ebf18d9906408d62/nvamli1-processed.txt


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: ACPI: what should Linux do for "call-order-swap" quirk from firmware?
  2023-05-22 13:13   ` Mario Limonciello
@ 2023-05-23 21:32     ` Ratchanan Srirattanamet
  0 siblings, 0 replies; 4+ messages in thread
From: Ratchanan Srirattanamet @ 2023-05-23 21:32 UTC (permalink / raw)
  To: Mario Limonciello, Rafael J. Wysocki; +Cc: linux-pm, ACPI Devel Maling List



เมื่อ 22/5/66 เวลา 20:13 Mario Limonciello เขียนว่า:
> On 5/22/23 04:44, Rafael J. Wysocki wrote:
>> +Mario and linux-acpi
>>
>> On Sun, May 21, 2023 at 9:26 PM Ratchanan Srirattanamet
>> <peathot@hotmail.com> wrote:
>>>
>>> Hello,
>>>
>>> I'm trying to debug an issue where Nouveau is unable to runtime-resume
>>> an Nvidia GTX 1650 Ti in an AMD-based laptop [1]. As part of this, I've
>>> traced ACPI calls for the same device on Windows. And it seems like this
>>> device has a weird quirk, which I call it "call-order-swap" for a lack
>>> of better words, when it transitions from D3cold to D0.

Hello,

Turns out, the problem is actually elsewhere and the current method call 
ordering in Linux, while seemingly differs from Windows, doesn't seem to 
actually be a problem.

For reference, the actual problem comes from Nouveau incorrectly 
re-initializing the GPU after it returns from D3cold, which is 
subsequently masked by Nouveau mis-detecting the presence of power 
resource causing it to use a custom DSM, confusing the ACPI code.

Sorry for the earlier email.

Ratchanan

>>> So, a bit of context: Lenovo Legion 5-15ARH05 [2] is a laptop sporting
>>> AMD Ryzen 7 4800H with Radeon Graphics + Nvidia GTX 1650 Ti. This
>>> device's PCI-E topology to the GPU is:
>>>
>>> 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir
>>> PCIe GPP Bridge [1022:1633]
>>>           +- 01:00.0 VGA compatible controller [0300]: NVIDIA 
>>> Corporation
>>> TU117M [GeForce GTX 1650 Ti Mobile] [10de:1f95] (rev a1)
>>>
>>> And for ACPI perspective (according to my interpretation), a power
>>> resource \_SB.PCI0.GPP0 seems to represent the PCI bridge, having
>>> \_SB.PCI0.GPP0.PG00 as a power resource, and \_SB.PCI0.GPP0.PEGP seems
>>> to represent the GPU itself, which doesn't seem to have its own power
>>> resource. All ACPI table dumps and infos can be found in the issue on
>>> Freedesktop GitLab [1].
>>>
>>> Now, if I understand the specs correctly, when transitioning the GPU &
>>> the bridge back from D3cold to D0, the kernel should start up the bridge
>>> before the GPU itself. From the ACPI perspective, I should see calls for
>>> .PG00._ON() (power resource for the bridge) before .PEGP.PS0().
>>>
>>> However, on Windows [3], instead it seems like .PEGP.PS0() is called
>>> before .PG00._ON(), for some reason. This is weird, because if
>>> .PG00._ON() has not been called yet, .PEGP.PS0() should be even valid to
>>> call. Now, I have no idea on what part of the Windows system is supposed
>>> to call those ACPI functions, but my feeling is that it must be either
>>> Nvidia or AMD driver that does this kind of quirks.
> 
> I don't think it could be an AMD driver in this case for Windows as the 
> PCIe root port uses "inbox" drivers.
> 
>>>
>>> As for what Linux does... well it seems like when Linux resumes the PCI
>>> bridge, it calls only .PG00._ON(), skipping .PEGP.PS0() on the ground
>>> that the downstream devices must have been reset when that happens. I'm
>>> not sure that's the right thing to happen either, but at least it makes
>>> more sense. Nvidia's proprietary driver seems to disable runtime D3
>>> support inside it completely on this device, so I think Nvidia must have
>>> a quirk for this chipset, as I briefly borrowed my friend's laptop
>>> sporting AMD 6000 series CPU and it doesn't disable runtime D3. >>
>>> So... I'm not sure what the correct behavior is here. I'm a developer
>>> myself, but kernel is not where I'm familiar with. Please advise me on
>>> where I should look next.
> 
> Yeah if it's working properly on newer hardware it does seem like a good 
> argument for a quirk in the Nouveau driver to me when this older 
> combination is encountered.
> 
>>>
>>> Ratchanan.
>>>
>>> P.S. please make sure to include me in the reply, as I'm not the list's
>>> subscriber.
>>>
>>> [1] https://gitlab.freedesktop.org/drm/nouveau/-/issues/79
>>> [2]
>>> https://pcsupport.lenovo.com/th/en/products/laptops-and-netbooks/legion-series/legion-5-15arh05/82b5/82b500fqta
>>> [3]
>>> https://gitlab.freedesktop.org/drm/nouveau/uploads/2659e5cb41a52290ebf18d9906408d62/nvamli1-processed.txt
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-23 21:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-21 19:26 ACPI: what should Linux do for "call-order-swap" quirk from firmware? Ratchanan Srirattanamet
2023-05-22  9:44 ` Rafael J. Wysocki
2023-05-22 13:13   ` Mario Limonciello
2023-05-23 21:32     ` Ratchanan Srirattanamet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.