* ACPI: what should Linux do for "call-order-swap" quirk from firmware?
@ 2023-05-21 19:26 Ratchanan Srirattanamet
2023-05-22 9:44 ` Rafael J. Wysocki
0 siblings, 1 reply; 4+ messages in thread
From: Ratchanan Srirattanamet @ 2023-05-21 19:26 UTC (permalink / raw)
To: linux-pm
Hello,
I'm trying to debug an issue where Nouveau is unable to runtime-resume
an Nvidia GTX 1650 Ti in an AMD-based laptop [1]. As part of this, I've
traced ACPI calls for the same device on Windows. And it seems like this
device has a weird quirk, which I call it "call-order-swap" for a lack
of better words, when it transitions from D3cold to D0.
So, a bit of context: Lenovo Legion 5-15ARH05 [2] is a laptop sporting
AMD Ryzen 7 4800H with Radeon Graphics + Nvidia GTX 1650 Ti. This
device's PCI-E topology to the GPU is:
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir
PCIe GPP Bridge [1022:1633]
+- 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation
TU117M [GeForce GTX 1650 Ti Mobile] [10de:1f95] (rev a1)
And for ACPI perspective (according to my interpretation), a power
resource \_SB.PCI0.GPP0 seems to represent the PCI bridge, having
\_SB.PCI0.GPP0.PG00 as a power resource, and \_SB.PCI0.GPP0.PEGP seems
to represent the GPU itself, which doesn't seem to have its own power
resource. All ACPI table dumps and infos can be found in the issue on
Freedesktop GitLab [1].
Now, if I understand the specs correctly, when transitioning the GPU &
the bridge back from D3cold to D0, the kernel should start up the bridge
before the GPU itself. From the ACPI perspective, I should see calls for
.PG00._ON() (power resource for the bridge) before .PEGP.PS0().
However, on Windows [3], instead it seems like .PEGP.PS0() is called
before .PG00._ON(), for some reason. This is weird, because if
.PG00._ON() has not been called yet, .PEGP.PS0() should be even valid to
call. Now, I have no idea on what part of the Windows system is supposed
to call those ACPI functions, but my feeling is that it must be either
Nvidia or AMD driver that does this kind of quirks.
As for what Linux does... well it seems like when Linux resumes the PCI
bridge, it calls only .PG00._ON(), skipping .PEGP.PS0() on the ground
that the downstream devices must have been reset when that happens. I'm
not sure that's the right thing to happen either, but at least it makes
more sense. Nvidia's proprietary driver seems to disable runtime D3
support inside it completely on this device, so I think Nvidia must have
a quirk for this chipset, as I briefly borrowed my friend's laptop
sporting AMD 6000 series CPU and it doesn't disable runtime D3.
So... I'm not sure what the correct behavior is here. I'm a developer
myself, but kernel is not where I'm familiar with. Please advise me on
where I should look next.
Ratchanan.
P.S. please make sure to include me in the reply, as I'm not the list's
subscriber.
[1] https://gitlab.freedesktop.org/drm/nouveau/-/issues/79
[2]
https://pcsupport.lenovo.com/th/en/products/laptops-and-netbooks/legion-series/legion-5-15arh05/82b5/82b500fqta
[3]
https://gitlab.freedesktop.org/drm/nouveau/uploads/2659e5cb41a52290ebf18d9906408d62/nvamli1-processed.txt
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ACPI: what should Linux do for "call-order-swap" quirk from firmware?
2023-05-21 19:26 ACPI: what should Linux do for "call-order-swap" quirk from firmware? Ratchanan Srirattanamet
@ 2023-05-22 9:44 ` Rafael J. Wysocki
2023-05-22 13:13 ` Mario Limonciello
0 siblings, 1 reply; 4+ messages in thread
From: Rafael J. Wysocki @ 2023-05-22 9:44 UTC (permalink / raw)
To: Ratchanan Srirattanamet
Cc: linux-pm, ACPI Devel Maling List, Mario Limonciello
+Mario and linux-acpi
On Sun, May 21, 2023 at 9:26 PM Ratchanan Srirattanamet
<peathot@hotmail.com> wrote:
>
> Hello,
>
> I'm trying to debug an issue where Nouveau is unable to runtime-resume
> an Nvidia GTX 1650 Ti in an AMD-based laptop [1]. As part of this, I've
> traced ACPI calls for the same device on Windows. And it seems like this
> device has a weird quirk, which I call it "call-order-swap" for a lack
> of better words, when it transitions from D3cold to D0.
>
> So, a bit of context: Lenovo Legion 5-15ARH05 [2] is a laptop sporting
> AMD Ryzen 7 4800H with Radeon Graphics + Nvidia GTX 1650 Ti. This
> device's PCI-E topology to the GPU is:
>
> 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir
> PCIe GPP Bridge [1022:1633]
> +- 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation
> TU117M [GeForce GTX 1650 Ti Mobile] [10de:1f95] (rev a1)
>
> And for ACPI perspective (according to my interpretation), a power
> resource \_SB.PCI0.GPP0 seems to represent the PCI bridge, having
> \_SB.PCI0.GPP0.PG00 as a power resource, and \_SB.PCI0.GPP0.PEGP seems
> to represent the GPU itself, which doesn't seem to have its own power
> resource. All ACPI table dumps and infos can be found in the issue on
> Freedesktop GitLab [1].
>
> Now, if I understand the specs correctly, when transitioning the GPU &
> the bridge back from D3cold to D0, the kernel should start up the bridge
> before the GPU itself. From the ACPI perspective, I should see calls for
> .PG00._ON() (power resource for the bridge) before .PEGP.PS0().
>
> However, on Windows [3], instead it seems like .PEGP.PS0() is called
> before .PG00._ON(), for some reason. This is weird, because if
> .PG00._ON() has not been called yet, .PEGP.PS0() should be even valid to
> call. Now, I have no idea on what part of the Windows system is supposed
> to call those ACPI functions, but my feeling is that it must be either
> Nvidia or AMD driver that does this kind of quirks.
>
> As for what Linux does... well it seems like when Linux resumes the PCI
> bridge, it calls only .PG00._ON(), skipping .PEGP.PS0() on the ground
> that the downstream devices must have been reset when that happens. I'm
> not sure that's the right thing to happen either, but at least it makes
> more sense. Nvidia's proprietary driver seems to disable runtime D3
> support inside it completely on this device, so I think Nvidia must have
> a quirk for this chipset, as I briefly borrowed my friend's laptop
> sporting AMD 6000 series CPU and it doesn't disable runtime D3.
>
> So... I'm not sure what the correct behavior is here. I'm a developer
> myself, but kernel is not where I'm familiar with. Please advise me on
> where I should look next.
>
> Ratchanan.
>
> P.S. please make sure to include me in the reply, as I'm not the list's
> subscriber.
>
> [1] https://gitlab.freedesktop.org/drm/nouveau/-/issues/79
> [2]
> https://pcsupport.lenovo.com/th/en/products/laptops-and-netbooks/legion-series/legion-5-15arh05/82b5/82b500fqta
> [3]
> https://gitlab.freedesktop.org/drm/nouveau/uploads/2659e5cb41a52290ebf18d9906408d62/nvamli1-processed.txt
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ACPI: what should Linux do for "call-order-swap" quirk from firmware?
2023-05-22 9:44 ` Rafael J. Wysocki
@ 2023-05-22 13:13 ` Mario Limonciello
2023-05-23 21:32 ` Ratchanan Srirattanamet
0 siblings, 1 reply; 4+ messages in thread
From: Mario Limonciello @ 2023-05-22 13:13 UTC (permalink / raw)
To: Rafael J. Wysocki, Ratchanan Srirattanamet
Cc: linux-pm, ACPI Devel Maling List
On 5/22/23 04:44, Rafael J. Wysocki wrote:
> +Mario and linux-acpi
>
> On Sun, May 21, 2023 at 9:26 PM Ratchanan Srirattanamet
> <peathot@hotmail.com> wrote:
>>
>> Hello,
>>
>> I'm trying to debug an issue where Nouveau is unable to runtime-resume
>> an Nvidia GTX 1650 Ti in an AMD-based laptop [1]. As part of this, I've
>> traced ACPI calls for the same device on Windows. And it seems like this
>> device has a weird quirk, which I call it "call-order-swap" for a lack
>> of better words, when it transitions from D3cold to D0.
>>
>> So, a bit of context: Lenovo Legion 5-15ARH05 [2] is a laptop sporting
>> AMD Ryzen 7 4800H with Radeon Graphics + Nvidia GTX 1650 Ti. This
>> device's PCI-E topology to the GPU is:
>>
>> 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir
>> PCIe GPP Bridge [1022:1633]
>> +- 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation
>> TU117M [GeForce GTX 1650 Ti Mobile] [10de:1f95] (rev a1)
>>
>> And for ACPI perspective (according to my interpretation), a power
>> resource \_SB.PCI0.GPP0 seems to represent the PCI bridge, having
>> \_SB.PCI0.GPP0.PG00 as a power resource, and \_SB.PCI0.GPP0.PEGP seems
>> to represent the GPU itself, which doesn't seem to have its own power
>> resource. All ACPI table dumps and infos can be found in the issue on
>> Freedesktop GitLab [1].
>>
>> Now, if I understand the specs correctly, when transitioning the GPU &
>> the bridge back from D3cold to D0, the kernel should start up the bridge
>> before the GPU itself. From the ACPI perspective, I should see calls for
>> .PG00._ON() (power resource for the bridge) before .PEGP.PS0().
>>
>> However, on Windows [3], instead it seems like .PEGP.PS0() is called
>> before .PG00._ON(), for some reason. This is weird, because if
>> .PG00._ON() has not been called yet, .PEGP.PS0() should be even valid to
>> call. Now, I have no idea on what part of the Windows system is supposed
>> to call those ACPI functions, but my feeling is that it must be either
>> Nvidia or AMD driver that does this kind of quirks.
I don't think it could be an AMD driver in this case for Windows as the
PCIe root port uses "inbox" drivers.
>>
>> As for what Linux does... well it seems like when Linux resumes the PCI
>> bridge, it calls only .PG00._ON(), skipping .PEGP.PS0() on the ground
>> that the downstream devices must have been reset when that happens. I'm
>> not sure that's the right thing to happen either, but at least it makes
>> more sense. Nvidia's proprietary driver seems to disable runtime D3
>> support inside it completely on this device, so I think Nvidia must have
>> a quirk for this chipset, as I briefly borrowed my friend's laptop
>> sporting AMD 6000 series CPU and it doesn't disable runtime D3. >>
>> So... I'm not sure what the correct behavior is here. I'm a developer
>> myself, but kernel is not where I'm familiar with. Please advise me on
>> where I should look next.
Yeah if it's working properly on newer hardware it does seem like a good
argument for a quirk in the Nouveau driver to me when this older
combination is encountered.
>>
>> Ratchanan.
>>
>> P.S. please make sure to include me in the reply, as I'm not the list's
>> subscriber.
>>
>> [1] https://gitlab.freedesktop.org/drm/nouveau/-/issues/79
>> [2]
>> https://pcsupport.lenovo.com/th/en/products/laptops-and-netbooks/legion-series/legion-5-15arh05/82b5/82b500fqta
>> [3]
>> https://gitlab.freedesktop.org/drm/nouveau/uploads/2659e5cb41a52290ebf18d9906408d62/nvamli1-processed.txt
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ACPI: what should Linux do for "call-order-swap" quirk from firmware?
2023-05-22 13:13 ` Mario Limonciello
@ 2023-05-23 21:32 ` Ratchanan Srirattanamet
0 siblings, 0 replies; 4+ messages in thread
From: Ratchanan Srirattanamet @ 2023-05-23 21:32 UTC (permalink / raw)
To: Mario Limonciello, Rafael J. Wysocki; +Cc: linux-pm, ACPI Devel Maling List
เมื่อ 22/5/66 เวลา 20:13 Mario Limonciello เขียนว่า:
> On 5/22/23 04:44, Rafael J. Wysocki wrote:
>> +Mario and linux-acpi
>>
>> On Sun, May 21, 2023 at 9:26 PM Ratchanan Srirattanamet
>> <peathot@hotmail.com> wrote:
>>>
>>> Hello,
>>>
>>> I'm trying to debug an issue where Nouveau is unable to runtime-resume
>>> an Nvidia GTX 1650 Ti in an AMD-based laptop [1]. As part of this, I've
>>> traced ACPI calls for the same device on Windows. And it seems like this
>>> device has a weird quirk, which I call it "call-order-swap" for a lack
>>> of better words, when it transitions from D3cold to D0.
Hello,
Turns out, the problem is actually elsewhere and the current method call
ordering in Linux, while seemingly differs from Windows, doesn't seem to
actually be a problem.
For reference, the actual problem comes from Nouveau incorrectly
re-initializing the GPU after it returns from D3cold, which is
subsequently masked by Nouveau mis-detecting the presence of power
resource causing it to use a custom DSM, confusing the ACPI code.
Sorry for the earlier email.
Ratchanan
>>> So, a bit of context: Lenovo Legion 5-15ARH05 [2] is a laptop sporting
>>> AMD Ryzen 7 4800H with Radeon Graphics + Nvidia GTX 1650 Ti. This
>>> device's PCI-E topology to the GPU is:
>>>
>>> 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir
>>> PCIe GPP Bridge [1022:1633]
>>> +- 01:00.0 VGA compatible controller [0300]: NVIDIA
>>> Corporation
>>> TU117M [GeForce GTX 1650 Ti Mobile] [10de:1f95] (rev a1)
>>>
>>> And for ACPI perspective (according to my interpretation), a power
>>> resource \_SB.PCI0.GPP0 seems to represent the PCI bridge, having
>>> \_SB.PCI0.GPP0.PG00 as a power resource, and \_SB.PCI0.GPP0.PEGP seems
>>> to represent the GPU itself, which doesn't seem to have its own power
>>> resource. All ACPI table dumps and infos can be found in the issue on
>>> Freedesktop GitLab [1].
>>>
>>> Now, if I understand the specs correctly, when transitioning the GPU &
>>> the bridge back from D3cold to D0, the kernel should start up the bridge
>>> before the GPU itself. From the ACPI perspective, I should see calls for
>>> .PG00._ON() (power resource for the bridge) before .PEGP.PS0().
>>>
>>> However, on Windows [3], instead it seems like .PEGP.PS0() is called
>>> before .PG00._ON(), for some reason. This is weird, because if
>>> .PG00._ON() has not been called yet, .PEGP.PS0() should be even valid to
>>> call. Now, I have no idea on what part of the Windows system is supposed
>>> to call those ACPI functions, but my feeling is that it must be either
>>> Nvidia or AMD driver that does this kind of quirks.
>
> I don't think it could be an AMD driver in this case for Windows as the
> PCIe root port uses "inbox" drivers.
>
>>>
>>> As for what Linux does... well it seems like when Linux resumes the PCI
>>> bridge, it calls only .PG00._ON(), skipping .PEGP.PS0() on the ground
>>> that the downstream devices must have been reset when that happens. I'm
>>> not sure that's the right thing to happen either, but at least it makes
>>> more sense. Nvidia's proprietary driver seems to disable runtime D3
>>> support inside it completely on this device, so I think Nvidia must have
>>> a quirk for this chipset, as I briefly borrowed my friend's laptop
>>> sporting AMD 6000 series CPU and it doesn't disable runtime D3. >>
>>> So... I'm not sure what the correct behavior is here. I'm a developer
>>> myself, but kernel is not where I'm familiar with. Please advise me on
>>> where I should look next.
>
> Yeah if it's working properly on newer hardware it does seem like a good
> argument for a quirk in the Nouveau driver to me when this older
> combination is encountered.
>
>>>
>>> Ratchanan.
>>>
>>> P.S. please make sure to include me in the reply, as I'm not the list's
>>> subscriber.
>>>
>>> [1] https://gitlab.freedesktop.org/drm/nouveau/-/issues/79
>>> [2]
>>> https://pcsupport.lenovo.com/th/en/products/laptops-and-netbooks/legion-series/legion-5-15arh05/82b5/82b500fqta
>>> [3]
>>> https://gitlab.freedesktop.org/drm/nouveau/uploads/2659e5cb41a52290ebf18d9906408d62/nvamli1-processed.txt
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-05-23 21:32 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-21 19:26 ACPI: what should Linux do for "call-order-swap" quirk from firmware? Ratchanan Srirattanamet
2023-05-22 9:44 ` Rafael J. Wysocki
2023-05-22 13:13 ` Mario Limonciello
2023-05-23 21:32 ` Ratchanan Srirattanamet
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.