* Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-24 3:31 ` Daniel Drake
0 siblings, 0 replies; 24+ messages in thread
From: Daniel Drake @ 2018-08-24 3:31 UTC (permalink / raw)
To: linux-pci, nouveau, Linux PM; +Cc: Endless Linux Upstreaming Team
Hi,
We are facing a suspend/resume problem with many different Asus laptop
models (30+ products) with Intel chipsets (multiple generations) and
nvidia GPUs (several different ones). Reproducers include:
1. Boot
2. Suspend/resume
3. Load nouveau driver
4. Start X
5. Observe slow X startup and many many errors in logs (primarily
nouveau fifo faults)
or
1. Boot
2. Load nouveau driver
3. Start X
4. Run glxgears - observe spinning gears
4. Suspend/resume
5. Run glxgears - observe that output is all black
or
1. Boot
2. Load proprietary nvidia driver
3. Start X
4. Suspend/resume
5. Observe screen all black, Xorg using 100% CPU
So, suspend/resume basically kills the nvidia card in some way.
After a lot of experimentation I found a workaround: during resume,
set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge.
Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine.
As an example of an affected product, take the Asus X542UQ (Intel
KabyLake i7-7500U with Nvidia GeForce 940MX). The PCI bridge is:
00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI
Express Root Port [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 120
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: ee000000-ef0fffff
Prefetchable memory behind bridge: 00000000d0000000-00000000e1ffffff
Capabilities: [40] Express Root Port (Slot+), MSI 00
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Sunrise
Point-LP PCI Express Root Port [1043:1a00]
Capabilities: [a0] Power Management version 3
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Access Control Services
Capabilities: [200] L1 PM Substates
Capabilities: [220] #19
Kernel driver in use: pcieport
The really weird thing here is that the workaround register
PCI_PREF_BASE_UPPER32 already appears to have value 0, as shown above
and also verified during resume. But simply writing value 0 again
definitely results in all the problems going away.
1. Is the Intel PCI bridge misbehaving here? Why does writing the same
value of PCI_PREF_BASE_UPPER32 make any difference at all?
2. Who is responsible for saving and restoring PCI bridge
configuration during suspend and resume? Linux? ACPI? BIOS?
I could not see any Linux code to save and restore these registers.
Likewise I didn't find anything in the ACPI DSDT/SSDT - neither on the
affected products, nor on a similar product that does not suffer this
nvidia issue. Linux does put the PCI bridge into D3 power state during
suspend, and upon resume the lower 32 bits of the prefetch address are
still set to the same value, so through some means this info is not
being lost.
3. Any other suggestions, hints or experiments I could do to help move
forward on this issue?
My goal is to add a workaround to Linux (perhaps as a pci quirk) for
existing devices, but also we are in conversation with Asus engineers
and if we can come up with a concrete diagnosis, we should be able to
have them fix this at the BIOS level in future products.
Thanks
Daniel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-24 3:31 ` Daniel Drake
0 siblings, 0 replies; 24+ messages in thread
From: Daniel Drake @ 2018-08-24 3:31 UTC (permalink / raw)
To: linux-pci-u79uwXL29TY76Z2rM5mHXA,
nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Linux PM
Cc: Endless Linux Upstreaming Team
Hi,
We are facing a suspend/resume problem with many different Asus laptop
models (30+ products) with Intel chipsets (multiple generations) and
nvidia GPUs (several different ones). Reproducers include:
1. Boot
2. Suspend/resume
3. Load nouveau driver
4. Start X
5. Observe slow X startup and many many errors in logs (primarily
nouveau fifo faults)
or
1. Boot
2. Load nouveau driver
3. Start X
4. Run glxgears - observe spinning gears
4. Suspend/resume
5. Run glxgears - observe that output is all black
or
1. Boot
2. Load proprietary nvidia driver
3. Start X
4. Suspend/resume
5. Observe screen all black, Xorg using 100% CPU
So, suspend/resume basically kills the nvidia card in some way.
After a lot of experimentation I found a workaround: during resume,
set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge.
Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine.
As an example of an affected product, take the Asus X542UQ (Intel
KabyLake i7-7500U with Nvidia GeForce 940MX). The PCI bridge is:
00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI
Express Root Port [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0, IRQ 120
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000e000-0000efff
Memory behind bridge: ee000000-ef0fffff
Prefetchable memory behind bridge: 00000000d0000000-00000000e1ffffff
Capabilities: [40] Express Root Port (Slot+), MSI 00
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Sunrise
Point-LP PCI Express Root Port [1043:1a00]
Capabilities: [a0] Power Management version 3
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Access Control Services
Capabilities: [200] L1 PM Substates
Capabilities: [220] #19
Kernel driver in use: pcieport
The really weird thing here is that the workaround register
PCI_PREF_BASE_UPPER32 already appears to have value 0, as shown above
and also verified during resume. But simply writing value 0 again
definitely results in all the problems going away.
1. Is the Intel PCI bridge misbehaving here? Why does writing the same
value of PCI_PREF_BASE_UPPER32 make any difference at all?
2. Who is responsible for saving and restoring PCI bridge
configuration during suspend and resume? Linux? ACPI? BIOS?
I could not see any Linux code to save and restore these registers.
Likewise I didn't find anything in the ACPI DSDT/SSDT - neither on the
affected products, nor on a similar product that does not suffer this
nvidia issue. Linux does put the PCI bridge into D3 power state during
suspend, and upon resume the lower 32 bits of the prefetch address are
still set to the same value, so through some means this info is not
being lost.
3. Any other suggestions, hints or experiments I could do to help move
forward on this issue?
My goal is to add a workaround to Linux (perhaps as a pci quirk) for
existing devices, but also we are in conversation with Asus engineers
and if we can come up with a concrete diagnosis, we should be able to
have them fix this at the BIOS level in future products.
Thanks
Daniel
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-24 15:42 ` Peter Wu
0 siblings, 0 replies; 24+ messages in thread
From: Peter Wu @ 2018-08-24 15:42 UTC (permalink / raw)
To: Daniel Drake; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team
Hi Daniel,
On Fri, Aug 24, 2018 at 11:31:54AM +0800, Daniel Drake wrote:
> Hi,
>
> We are facing a suspend/resume problem with many different Asus laptop
> models (30+ products) with Intel chipsets (multiple generations) and
> nvidia GPUs (several different ones). Reproducers include:
Are these systems also affected through runtime power management? For
example:
modprobe nouveau # should enable runtime PM
sleep 6 # wait for runtime suspend to kick in
lspci -s1: # runtime resume by reading PCI config space
On laptops from about 2015-2016 with a GTX 9xxM this sequence results in
hangs on various laptops
(https://bugzilla.kernel.org/show_bug.cgi?id=156341).
I wonder if you are experiencing the same issue. Do you have a list of
affected models, an acpidump, the output of "lspci -nnvvvxxxx" and the
corresponding BIOS version (e.g. from /sys/class/dmi/id/)?
> After a lot of experimentation I found a workaround: during resume,
> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge.
> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine.
I am curious, how did you discover this? While this could work, perhaps
there are alternative workarounds/fixes?
When you say "parent PCI" bridge, is that actually the device you see in
"lspci -tv"? On a Dell XPS 9560, the GPU is under a different device:
-[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
+-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile]
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
Under 00:1c.0, there is a wireless adapter.
> As an example of an affected product, take the Asus X542UQ (Intel
> KabyLake i7-7500U with Nvidia GeForce 940MX). The PCI bridge is:
>
> 00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI
> Express Root Port [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
> Flags: bus master, fast devsel, latency 0, IRQ 120
> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> I/O behind bridge: 0000e000-0000efff
> Memory behind bridge: ee000000-ef0fffff
> Prefetchable memory behind bridge: 00000000d0000000-00000000e1ffffff
> Capabilities: [40] Express Root Port (Slot+), MSI 00
> Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
> Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Sunrise
> Point-LP PCI Express Root Port [1043:1a00]
> Capabilities: [a0] Power Management version 3
> Capabilities: [100] Advanced Error Reporting
> Capabilities: [140] Access Control Services
> Capabilities: [200] L1 PM Substates
> Capabilities: [220] #19
> Kernel driver in use: pcieport
>
> The really weird thing here is that the workaround register
> PCI_PREF_BASE_UPPER32 already appears to have value 0, as shown above
> and also verified during resume. But simply writing value 0 again
> definitely results in all the problems going away.
>
> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same
> value of PCI_PREF_BASE_UPPER32 make any difference at all?
At what point in the suspend code path did you insert this write? It is
possible that the write somehow acted as a fence/memory barrier?
> 2. Who is responsible for saving and restoring PCI bridge
> configuration during suspend and resume? Linux? ACPI? BIOS?
Not sure about PCI bridges, but at least for the PCI Express Capability
registers, it is in control of the OS when control is granted via the
ACPI _OSC method.
> I could not see any Linux code to save and restore these registers.
> Likewise I didn't find anything in the ACPI DSDT/SSDT - neither on the
> affected products, nor on a similar product that does not suffer this
> nvidia issue. Linux does put the PCI bridge into D3 power state during
> suspend, and upon resume the lower 32 bits of the prefetch address are
> still set to the same value, so through some means this info is not
> being lost.
>
>
> 3. Any other suggestions, hints or experiments I could do to help move
> forward on this issue?
>
> My goal is to add a workaround to Linux (perhaps as a pci quirk) for
> existing devices, but also we are in conversation with Asus engineers
> and if we can come up with a concrete diagnosis, we should be able to
> have them fix this at the BIOS level in future products.
As Windows is probably not affected by this issue, a change must be
possible to make Linux more compatible with Windows. Though I am not
sure what change is needed.
I recently compared PCI configuration space access and ACPI method
invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10
(1803). There were differences like disabling MSI/interrupts before
suspend, setting the Enable Clock Power Management bit in PCI Express
Link Control and more, but applying these changes were so far not really
successful.
Some supporting files for that investigation are here:
https://github.com/Lekensteyn/acpi-stuff/tree/master/d3test
Karol noticed that by not setting the State in PMCSR to D3 for the
Nvidia GPU during runtime suspend, then the device would successfully
resume. However, based on traces using VFIO-PCI, it does not seem a good
solution as Windows does not behave like that.
--
Kind regards,
Peter Wu
https://lekensteyn.nl
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-24 15:42 ` Peter Wu
0 siblings, 0 replies; 24+ messages in thread
From: Peter Wu @ 2018-08-24 15:42 UTC (permalink / raw)
To: Daniel Drake
Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, Linux PM,
Endless Linux Upstreaming Team,
nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
Hi Daniel,
On Fri, Aug 24, 2018 at 11:31:54AM +0800, Daniel Drake wrote:
> Hi,
>
> We are facing a suspend/resume problem with many different Asus laptop
> models (30+ products) with Intel chipsets (multiple generations) and
> nvidia GPUs (several different ones). Reproducers include:
Are these systems also affected through runtime power management? For
example:
modprobe nouveau # should enable runtime PM
sleep 6 # wait for runtime suspend to kick in
lspci -s1: # runtime resume by reading PCI config space
On laptops from about 2015-2016 with a GTX 9xxM this sequence results in
hangs on various laptops
(https://bugzilla.kernel.org/show_bug.cgi?id=156341).
I wonder if you are experiencing the same issue. Do you have a list of
affected models, an acpidump, the output of "lspci -nnvvvxxxx" and the
corresponding BIOS version (e.g. from /sys/class/dmi/id/)?
> After a lot of experimentation I found a workaround: during resume,
> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge.
> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine.
I am curious, how did you discover this? While this could work, perhaps
there are alternative workarounds/fixes?
When you say "parent PCI" bridge, is that actually the device you see in
"lspci -tv"? On a Dell XPS 9560, the GPU is under a different device:
-[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
+-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile]
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
Under 00:1c.0, there is a wireless adapter.
> As an example of an affected product, take the Asus X542UQ (Intel
> KabyLake i7-7500U with Nvidia GeForce 940MX). The PCI bridge is:
>
> 00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI
> Express Root Port [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
> Flags: bus master, fast devsel, latency 0, IRQ 120
> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> I/O behind bridge: 0000e000-0000efff
> Memory behind bridge: ee000000-ef0fffff
> Prefetchable memory behind bridge: 00000000d0000000-00000000e1ffffff
> Capabilities: [40] Express Root Port (Slot+), MSI 00
> Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
> Capabilities: [90] Subsystem: ASUSTeK Computer Inc. Sunrise
> Point-LP PCI Express Root Port [1043:1a00]
> Capabilities: [a0] Power Management version 3
> Capabilities: [100] Advanced Error Reporting
> Capabilities: [140] Access Control Services
> Capabilities: [200] L1 PM Substates
> Capabilities: [220] #19
> Kernel driver in use: pcieport
>
> The really weird thing here is that the workaround register
> PCI_PREF_BASE_UPPER32 already appears to have value 0, as shown above
> and also verified during resume. But simply writing value 0 again
> definitely results in all the problems going away.
>
> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same
> value of PCI_PREF_BASE_UPPER32 make any difference at all?
At what point in the suspend code path did you insert this write? It is
possible that the write somehow acted as a fence/memory barrier?
> 2. Who is responsible for saving and restoring PCI bridge
> configuration during suspend and resume? Linux? ACPI? BIOS?
Not sure about PCI bridges, but at least for the PCI Express Capability
registers, it is in control of the OS when control is granted via the
ACPI _OSC method.
> I could not see any Linux code to save and restore these registers.
> Likewise I didn't find anything in the ACPI DSDT/SSDT - neither on the
> affected products, nor on a similar product that does not suffer this
> nvidia issue. Linux does put the PCI bridge into D3 power state during
> suspend, and upon resume the lower 32 bits of the prefetch address are
> still set to the same value, so through some means this info is not
> being lost.
>
>
> 3. Any other suggestions, hints or experiments I could do to help move
> forward on this issue?
>
> My goal is to add a workaround to Linux (perhaps as a pci quirk) for
> existing devices, but also we are in conversation with Asus engineers
> and if we can come up with a concrete diagnosis, we should be able to
> have them fix this at the BIOS level in future products.
As Windows is probably not affected by this issue, a change must be
possible to make Linux more compatible with Windows. Though I am not
sure what change is needed.
I recently compared PCI configuration space access and ACPI method
invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10
(1803). There were differences like disabling MSI/interrupts before
suspend, setting the Enable Clock Power Management bit in PCI Express
Link Control and more, but applying these changes were so far not really
successful.
Some supporting files for that investigation are here:
https://github.com/Lekensteyn/acpi-stuff/tree/master/d3test
Karol noticed that by not setting the State in PMCSR to D3 for the
Nvidia GPU during runtime suspend, then the device would successfully
resume. However, based on traces using VFIO-PCI, it does not seem a good
solution as Windows does not behave like that.
--
Kind regards,
Peter Wu
https://lekensteyn.nl
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
2018-08-24 15:42 ` Peter Wu
@ 2018-08-28 2:23 ` Daniel Drake
-1 siblings, 0 replies; 24+ messages in thread
From: Daniel Drake @ 2018-08-28 2:23 UTC (permalink / raw)
To: Peter Wu; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team
On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> Are these systems also affected through runtime power management? For
> example:
>
> modprobe nouveau # should enable runtime PM
> sleep 6 # wait for runtime suspend to kick in
> lspci -s1: # runtime resume by reading PCI config space
>
> On laptops from about 2015-2016 with a GTX 9xxM this sequence results in
> hangs on various laptops
> (https://bugzilla.kernel.org/show_bug.cgi?id=156341).
This works fine here. I'm facing a different issue.
>> After a lot of experimentation I found a workaround: during resume,
>> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge.
>> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine.
>
> I am curious, how did you discover this? While this could work, perhaps
> there are alternative workarounds/fixes?
Based on the observation that the following procedure works fine (note
the addition of step 3):
1. Boot
2. Suspend/resume
3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan
4. Load nouveau driver
5. Start X
I worked through the rescan codepath until I had isolated the specific
code which magically makes things work (in pci_bridge_check_ranges).
Having found that, step 3 in the above test procedure can be replaced
with a simple:
setpci -s 00:1c.0 0x28.l=0
> When you say "parent PCI" bridge, is that actually the device you see in
> "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device:
>
> -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
> +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile]
>
> 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
Yes, it's the parent bridge shown by lspci. The address of this varies
from system to system.
>> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same
>> value of PCI_PREF_BASE_UPPER32 make any difference at all?
>
> At what point in the suspend code path did you insert this write? It is
> possible that the write somehow acted as a fence/memory barrier?
static void quirk_pref_base_upper32(struct pci_dev *dev)
{
u32 pref_base_upper32;
pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32);
pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32);
}
DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32);
I don't think it's acting as a barrier. I tried changing this code to
rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes
the bug come back.
>> 2. Who is responsible for saving and restoring PCI bridge
>> configuration during suspend and resume? Linux? ACPI? BIOS?
>
> Not sure about PCI bridges, but at least for the PCI Express Capability
> registers, it is in control of the OS when control is granted via the
> ACPI _OSC method.
I guess you are referring to pci_save_pcie_state(). I can't see
anything equivalent for the bridge registers.
> As Windows is probably not affected by this issue, a change must be
> possible to make Linux more compatible with Windows. Though I am not
> sure what change is needed.
I agree. There's a definite difference with Windows here and it would
be great to find a fix along those lines.
> I recently compared PCI configuration space access and ACPI method
> invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10
> (1803). There were differences like disabling MSI/interrupts before
> suspend, setting the Enable Clock Power Management bit in PCI Express
> Link Control and more, but applying these changes were so far not really
> successful.
Interesting. Do you know any way that I could spy on Windows' accesses
to the PCI bridge registers?
Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF
I suspect VFIO would not help me here.
It says:
Note: If they are grouped with other devices in this manner, pci
root ports and bridges should neither be bound to vfio at boot, nor be
added to the VM.
Thanks
Daniel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-28 2:23 ` Daniel Drake
0 siblings, 0 replies; 24+ messages in thread
From: Daniel Drake @ 2018-08-28 2:23 UTC (permalink / raw)
To: Peter Wu
Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, Linux PM,
Endless Linux Upstreaming Team,
nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> Are these systems also affected through runtime power management? For
> example:
>
> modprobe nouveau # should enable runtime PM
> sleep 6 # wait for runtime suspend to kick in
> lspci -s1: # runtime resume by reading PCI config space
>
> On laptops from about 2015-2016 with a GTX 9xxM this sequence results in
> hangs on various laptops
> (https://bugzilla.kernel.org/show_bug.cgi?id=156341).
This works fine here. I'm facing a different issue.
>> After a lot of experimentation I found a workaround: during resume,
>> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge.
>> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine.
>
> I am curious, how did you discover this? While this could work, perhaps
> there are alternative workarounds/fixes?
Based on the observation that the following procedure works fine (note
the addition of step 3):
1. Boot
2. Suspend/resume
3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan
4. Load nouveau driver
5. Start X
I worked through the rescan codepath until I had isolated the specific
code which magically makes things work (in pci_bridge_check_ranges).
Having found that, step 3 in the above test procedure can be replaced
with a simple:
setpci -s 00:1c.0 0x28.l=0
> When you say "parent PCI" bridge, is that actually the device you see in
> "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device:
>
> -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
> +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile]
>
> 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
Yes, it's the parent bridge shown by lspci. The address of this varies
from system to system.
>> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same
>> value of PCI_PREF_BASE_UPPER32 make any difference at all?
>
> At what point in the suspend code path did you insert this write? It is
> possible that the write somehow acted as a fence/memory barrier?
static void quirk_pref_base_upper32(struct pci_dev *dev)
{
u32 pref_base_upper32;
pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32);
pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32);
}
DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32);
I don't think it's acting as a barrier. I tried changing this code to
rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes
the bug come back.
>> 2. Who is responsible for saving and restoring PCI bridge
>> configuration during suspend and resume? Linux? ACPI? BIOS?
>
> Not sure about PCI bridges, but at least for the PCI Express Capability
> registers, it is in control of the OS when control is granted via the
> ACPI _OSC method.
I guess you are referring to pci_save_pcie_state(). I can't see
anything equivalent for the bridge registers.
> As Windows is probably not affected by this issue, a change must be
> possible to make Linux more compatible with Windows. Though I am not
> sure what change is needed.
I agree. There's a definite difference with Windows here and it would
be great to find a fix along those lines.
> I recently compared PCI configuration space access and ACPI method
> invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10
> (1803). There were differences like disabling MSI/interrupts before
> suspend, setting the Enable Clock Power Management bit in PCI Express
> Link Control and more, but applying these changes were so far not really
> successful.
Interesting. Do you know any way that I could spy on Windows' accesses
to the PCI bridge registers?
Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF
I suspect VFIO would not help me here.
It says:
Note: If they are grouped with other devices in this manner, pci
root ports and bridges should neither be bound to vfio at boot, nor be
added to the VM.
Thanks
Daniel
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-28 9:57 ` Peter Wu
0 siblings, 0 replies; 24+ messages in thread
From: Peter Wu @ 2018-08-28 9:57 UTC (permalink / raw)
To: Daniel Drake; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team
On Tue, Aug 28, 2018 at 10:23:24AM +0800, Daniel Drake wrote:
> On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> > Are these systems also affected through runtime power management? For
> > example:
> >
> > modprobe nouveau # should enable runtime PM
> > sleep 6 # wait for runtime suspend to kick in
> > lspci -s1: # runtime resume by reading PCI config space
> >
> > On laptops from about 2015-2016 with a GTX 9xxM this sequence results in
> > hangs on various laptops
> > (https://bugzilla.kernel.org/show_bug.cgi?id=156341).
>
> This works fine here. I'm facing a different issue.
Just to be sure, after "sleep", do both devices report "suspended" in
/sys/bus/pci/devices/0000:00:1c.0/power/runtime_status
/sys/bus/pci/devices/0000:01:00.0/power/runtime_status
and was this reproduced with a recent mainline kernel with no special
cmdline options? The endlessm kernel on Github seems to have quite some
patches, one of them explicitly disable runtime PM:
https://github.com/endlessm/linux/commit/8b128b50cd6725eee2ae9025a1510a221d9b42f2
> >> After a lot of experimentation I found a workaround: during resume,
> >> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge.
> >> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine.
> >
> > I am curious, how did you discover this? While this could work, perhaps
> > there are alternative workarounds/fixes?
>
> Based on the observation that the following procedure works fine (note
> the addition of step 3):
>
> 1. Boot
> 2. Suspend/resume
> 3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan
> 4. Load nouveau driver
> 5. Start X
>
> I worked through the rescan codepath until I had isolated the specific
> code which magically makes things work (in pci_bridge_check_ranges).
>
> Having found that, step 3 in the above test procedure can be replaced
> with a simple:
> setpci -s 00:1c.0 0x28.l=0
>
> > When you say "parent PCI" bridge, is that actually the device you see in
> > "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device:
> >
> > -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
> > +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile]
> >
> > 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
>
> Yes, it's the parent bridge shown by lspci. The address of this varies
> from system to system.
Could you share some details:
- acpidump
- lspci -nnxxxxvvv
- BIOS version (from /sys/class/dmi/id/)
- kernel version (mainline?)
Perhaps there is some magic in the ACPI suspend or resume path that
causes this.
> >> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same
> >> value of PCI_PREF_BASE_UPPER32 make any difference at all?
> >
> > At what point in the suspend code path did you insert this write? It is
> > possible that the write somehow acted as a fence/memory barrier?
>
> static void quirk_pref_base_upper32(struct pci_dev *dev)
> {
> u32 pref_base_upper32;
> pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32);
> pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32);
> }
> DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32);
>
> I don't think it's acting as a barrier. I tried changing this code to
> rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes
> the bug come back.
>
> >> 2. Who is responsible for saving and restoring PCI bridge
> >> configuration during suspend and resume? Linux? ACPI? BIOS?
> >
> > Not sure about PCI bridges, but at least for the PCI Express Capability
> > registers, it is in control of the OS when control is granted via the
> > ACPI _OSC method.
>
> I guess you are referring to pci_save_pcie_state(). I can't see
> anything equivalent for the bridge registers.
Yes that would be the function, called via pci_save_state.
> > I recently compared PCI configuration space access and ACPI method
> > invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10
> > (1803). There were differences like disabling MSI/interrupts before
> > suspend, setting the Enable Clock Power Management bit in PCI Express
> > Link Control and more, but applying these changes were so far not really
> > successful.
>
> Interesting. Do you know any way that I could spy on Windows' accesses
> to the PCI bridge registers?
> Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF
> I suspect VFIO would not help me here.
> It says:
> Note: If they are grouped with other devices in this manner, pci
> root ports and bridges should neither be bound to vfio at boot, nor be
> added to the VM.
Only non-bridge devices can be passed to a guest, but perhaps logging
access to the emulated bridge is already sufficient. The Prefetchable
Base Upper 32 Bits register is at offset 0x28.
In a trace where the Nvidia device is disabled/enabled via Device
Manager, I see writes on the enable path:
2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4)
For Linux, I only see one write at startup, none on runtime resume.
I did not test system sleep/resume. (disable/enable is arguably a bit
different from system s/r, you may want to do additional testing here.)
Full log for WIndows 10 and Linux:
https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/slogs/win10-rp-enable-disable.txt#L3418
https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/slogs/linux-rp.txt
lspci for the emulated bridge:
https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/lspci-vm-vfio.txt#L359
The rp_*_config trace points are non-standard and require patches:
https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/patches/qemu-trace.diff
--
Kind regards,
Peter Wu
https://lekensteyn.nl
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-28 9:57 ` Peter Wu
0 siblings, 0 replies; 24+ messages in thread
From: Peter Wu @ 2018-08-28 9:57 UTC (permalink / raw)
To: Daniel Drake
Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, Linux PM,
Endless Linux Upstreaming Team,
nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On Tue, Aug 28, 2018 at 10:23:24AM +0800, Daniel Drake wrote:
> On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> > Are these systems also affected through runtime power management? For
> > example:
> >
> > modprobe nouveau # should enable runtime PM
> > sleep 6 # wait for runtime suspend to kick in
> > lspci -s1: # runtime resume by reading PCI config space
> >
> > On laptops from about 2015-2016 with a GTX 9xxM this sequence results in
> > hangs on various laptops
> > (https://bugzilla.kernel.org/show_bug.cgi?id=156341).
>
> This works fine here. I'm facing a different issue.
Just to be sure, after "sleep", do both devices report "suspended" in
/sys/bus/pci/devices/0000:00:1c.0/power/runtime_status
/sys/bus/pci/devices/0000:01:00.0/power/runtime_status
and was this reproduced with a recent mainline kernel with no special
cmdline options? The endlessm kernel on Github seems to have quite some
patches, one of them explicitly disable runtime PM:
https://github.com/endlessm/linux/commit/8b128b50cd6725eee2ae9025a1510a221d9b42f2
> >> After a lot of experimentation I found a workaround: during resume,
> >> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge.
> >> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine.
> >
> > I am curious, how did you discover this? While this could work, perhaps
> > there are alternative workarounds/fixes?
>
> Based on the observation that the following procedure works fine (note
> the addition of step 3):
>
> 1. Boot
> 2. Suspend/resume
> 3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan
> 4. Load nouveau driver
> 5. Start X
>
> I worked through the rescan codepath until I had isolated the specific
> code which magically makes things work (in pci_bridge_check_ranges).
>
> Having found that, step 3 in the above test procedure can be replaced
> with a simple:
> setpci -s 00:1c.0 0x28.l=0
>
> > When you say "parent PCI" bridge, is that actually the device you see in
> > "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device:
> >
> > -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
> > +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile]
> >
> > 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
>
> Yes, it's the parent bridge shown by lspci. The address of this varies
> from system to system.
Could you share some details:
- acpidump
- lspci -nnxxxxvvv
- BIOS version (from /sys/class/dmi/id/)
- kernel version (mainline?)
Perhaps there is some magic in the ACPI suspend or resume path that
causes this.
> >> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same
> >> value of PCI_PREF_BASE_UPPER32 make any difference at all?
> >
> > At what point in the suspend code path did you insert this write? It is
> > possible that the write somehow acted as a fence/memory barrier?
>
> static void quirk_pref_base_upper32(struct pci_dev *dev)
> {
> u32 pref_base_upper32;
> pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32);
> pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32);
> }
> DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32);
>
> I don't think it's acting as a barrier. I tried changing this code to
> rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes
> the bug come back.
>
> >> 2. Who is responsible for saving and restoring PCI bridge
> >> configuration during suspend and resume? Linux? ACPI? BIOS?
> >
> > Not sure about PCI bridges, but at least for the PCI Express Capability
> > registers, it is in control of the OS when control is granted via the
> > ACPI _OSC method.
>
> I guess you are referring to pci_save_pcie_state(). I can't see
> anything equivalent for the bridge registers.
Yes that would be the function, called via pci_save_state.
> > I recently compared PCI configuration space access and ACPI method
> > invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10
> > (1803). There were differences like disabling MSI/interrupts before
> > suspend, setting the Enable Clock Power Management bit in PCI Express
> > Link Control and more, but applying these changes were so far not really
> > successful.
>
> Interesting. Do you know any way that I could spy on Windows' accesses
> to the PCI bridge registers?
> Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF
> I suspect VFIO would not help me here.
> It says:
> Note: If they are grouped with other devices in this manner, pci
> root ports and bridges should neither be bound to vfio at boot, nor be
> added to the VM.
Only non-bridge devices can be passed to a guest, but perhaps logging
access to the emulated bridge is already sufficient. The Prefetchable
Base Upper 32 Bits register is at offset 0x28.
In a trace where the Nvidia device is disabled/enabled via Device
Manager, I see writes on the enable path:
2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4)
For Linux, I only see one write at startup, none on runtime resume.
I did not test system sleep/resume. (disable/enable is arguably a bit
different from system s/r, you may want to do additional testing here.)
Full log for WIndows 10 and Linux:
https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/slogs/win10-rp-enable-disable.txt#L3418
https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/slogs/linux-rp.txt
lspci for the emulated bridge:
https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/lspci-vm-vfio.txt#L359
The rp_*_config trace points are non-standard and require patches:
https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/patches/qemu-trace.diff
--
Kind regards,
Peter Wu
https://lekensteyn.nl
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
2018-08-28 9:57 ` Peter Wu
@ 2018-08-29 0:19 ` Karol Herbst
-1 siblings, 0 replies; 24+ messages in thread
From: Karol Herbst @ 2018-08-29 0:19 UTC (permalink / raw)
To: Peter Wu
Cc: Daniel Drake, linux-pci, Linux PM,
Endless Linux Upstreaming Team, nouveau
hi everybody.
I came up with another workaround for the runtime suspend/resume
issues we have as well:
https://github.com/karolherbst/linux/commit/3cab4c50f77cf97c6c19a9b1e7884366f78f35a5.patch
I don't think this is really a bug inside the kernel or not directly.
If you for example not use Nouveau but simply enable the runpm
features without a driver or a very dumb stub driver, the GPU should
be able to suspend and resume correctly. At least this is the case on
my laptop.
I was able to disable enough part of Nouveaus code to be able to tell
that running some signed firmware embedded in the vbios on the GPU
embedded PMU is starting the runpm issues to appear on my laptop. This
firmware is also used by the nvidia driver, which makes the argument
"it happens with Nouveau and nvidia" a useless one.
I have no idea what this is all about, but it might be the
hardware/firmware just being overprotecting and bailing out on an
untrusted state, maybe it is a bug inside the kernel, maybe a bug
inside nvidias firmware, which would be super hard to fix as it's
embedded in the vbios.
On Tue, Aug 28, 2018 at 11:57 AM, Peter Wu <peter@lekensteyn.nl> wrote:
> On Tue, Aug 28, 2018 at 10:23:24AM +0800, Daniel Drake wrote:
>> On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@lekensteyn.nl> wrote:
>> > Are these systems also affected through runtime power management? For
>> > example:
>> >
>> > modprobe nouveau # should enable runtime PM
>> > sleep 6 # wait for runtime suspend to kick in
>> > lspci -s1: # runtime resume by reading PCI config space
>> >
>> > On laptops from about 2015-2016 with a GTX 9xxM this sequence results in
>> > hangs on various laptops
>> > (https://bugzilla.kernel.org/show_bug.cgi?id=156341).
>>
>> This works fine here. I'm facing a different issue.
>
> Just to be sure, after "sleep", do both devices report "suspended" in
> /sys/bus/pci/devices/0000:00:1c.0/power/runtime_status
> /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
>
> and was this reproduced with a recent mainline kernel with no special
> cmdline options? The endlessm kernel on Github seems to have quite some
> patches, one of them explicitly disable runtime PM:
> https://github.com/endlessm/linux/commit/8b128b50cd6725eee2ae9025a1510a221d9b42f2
>
>> >> After a lot of experimentation I found a workaround: during resume,
>> >> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge.
>> >> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine.
>> >
>> > I am curious, how did you discover this? While this could work, perhaps
>> > there are alternative workarounds/fixes?
>>
>> Based on the observation that the following procedure works fine (note
>> the addition of step 3):
>>
>> 1. Boot
>> 2. Suspend/resume
>> 3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan
>> 4. Load nouveau driver
>> 5. Start X
>>
>> I worked through the rescan codepath until I had isolated the specific
>> code which magically makes things work (in pci_bridge_check_ranges).
>>
>> Having found that, step 3 in the above test procedure can be replaced
>> with a simple:
>> setpci -s 00:1c.0 0x28.l=0
>>
>> > When you say "parent PCI" bridge, is that actually the device you see in
>> > "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device:
>> >
>> > -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
>> > +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile]
>> >
>> > 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
>>
>> Yes, it's the parent bridge shown by lspci. The address of this varies
>> from system to system.
>
> Could you share some details:
> - acpidump
> - lspci -nnxxxxvvv
> - BIOS version (from /sys/class/dmi/id/)
> - kernel version (mainline?)
>
> Perhaps there is some magic in the ACPI suspend or resume path that
> causes this.
>
>> >> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same
>> >> value of PCI_PREF_BASE_UPPER32 make any difference at all?
>> >
>> > At what point in the suspend code path did you insert this write? It is
>> > possible that the write somehow acted as a fence/memory barrier?
>>
>> static void quirk_pref_base_upper32(struct pci_dev *dev)
>> {
>> u32 pref_base_upper32;
>> pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32);
>> pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32);
>> }
>> DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32);
>>
>> I don't think it's acting as a barrier. I tried changing this code to
>> rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes
>> the bug come back.
>>
>> >> 2. Who is responsible for saving and restoring PCI bridge
>> >> configuration during suspend and resume? Linux? ACPI? BIOS?
>> >
>> > Not sure about PCI bridges, but at least for the PCI Express Capability
>> > registers, it is in control of the OS when control is granted via the
>> > ACPI _OSC method.
>>
>> I guess you are referring to pci_save_pcie_state(). I can't see
>> anything equivalent for the bridge registers.
>
> Yes that would be the function, called via pci_save_state.
>
>> > I recently compared PCI configuration space access and ACPI method
>> > invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10
>> > (1803). There were differences like disabling MSI/interrupts before
>> > suspend, setting the Enable Clock Power Management bit in PCI Express
>> > Link Control and more, but applying these changes were so far not really
>> > successful.
>>
>> Interesting. Do you know any way that I could spy on Windows' accesses
>> to the PCI bridge registers?
>> Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF
>> I suspect VFIO would not help me here.
>> It says:
>> Note: If they are grouped with other devices in this manner, pci
>> root ports and bridges should neither be bound to vfio at boot, nor be
>> added to the VM.
>
> Only non-bridge devices can be passed to a guest, but perhaps logging
> access to the emulated bridge is already sufficient. The Prefetchable
> Base Upper 32 Bits register is at offset 0x28.
>
> In a trace where the Nvidia device is disabled/enabled via Device
> Manager, I see writes on the enable path:
>
> 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4)
>
> For Linux, I only see one write at startup, none on runtime resume.
> I did not test system sleep/resume. (disable/enable is arguably a bit
> different from system s/r, you may want to do additional testing here.)
>
> Full log for WIndows 10 and Linux:
> https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/slogs/win10-rp-enable-disable.txt#L3418
> https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/slogs/linux-rp.txt
> lspci for the emulated bridge:
> https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/lspci-vm-vfio.txt#L359
> The rp_*_config trace points are non-standard and require patches:
> https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/patches/qemu-trace.diff
> --
> Kind regards,
> Peter Wu
> https://lekensteyn.nl
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-29 0:19 ` Karol Herbst
0 siblings, 0 replies; 24+ messages in thread
From: Karol Herbst @ 2018-08-29 0:19 UTC (permalink / raw)
To: Peter Wu
Cc: nouveau, linux-pci-u79uwXL29TY76Z2rM5mHXA, Linux PM,
Endless Linux Upstreaming Team, Daniel Drake
hi everybody.
I came up with another workaround for the runtime suspend/resume
issues we have as well:
https://github.com/karolherbst/linux/commit/3cab4c50f77cf97c6c19a9b1e7884366f78f35a5.patch
I don't think this is really a bug inside the kernel or not directly.
If you for example not use Nouveau but simply enable the runpm
features without a driver or a very dumb stub driver, the GPU should
be able to suspend and resume correctly. At least this is the case on
my laptop.
I was able to disable enough part of Nouveaus code to be able to tell
that running some signed firmware embedded in the vbios on the GPU
embedded PMU is starting the runpm issues to appear on my laptop. This
firmware is also used by the nvidia driver, which makes the argument
"it happens with Nouveau and nvidia" a useless one.
I have no idea what this is all about, but it might be the
hardware/firmware just being overprotecting and bailing out on an
untrusted state, maybe it is a bug inside the kernel, maybe a bug
inside nvidias firmware, which would be super hard to fix as it's
embedded in the vbios.
On Tue, Aug 28, 2018 at 11:57 AM, Peter Wu <peter@lekensteyn.nl> wrote:
> On Tue, Aug 28, 2018 at 10:23:24AM +0800, Daniel Drake wrote:
>> On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@lekensteyn.nl> wrote:
>> > Are these systems also affected through runtime power management? For
>> > example:
>> >
>> > modprobe nouveau # should enable runtime PM
>> > sleep 6 # wait for runtime suspend to kick in
>> > lspci -s1: # runtime resume by reading PCI config space
>> >
>> > On laptops from about 2015-2016 with a GTX 9xxM this sequence results in
>> > hangs on various laptops
>> > (https://bugzilla.kernel.org/show_bug.cgi?id=156341).
>>
>> This works fine here. I'm facing a different issue.
>
> Just to be sure, after "sleep", do both devices report "suspended" in
> /sys/bus/pci/devices/0000:00:1c.0/power/runtime_status
> /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
>
> and was this reproduced with a recent mainline kernel with no special
> cmdline options? The endlessm kernel on Github seems to have quite some
> patches, one of them explicitly disable runtime PM:
> https://github.com/endlessm/linux/commit/8b128b50cd6725eee2ae9025a1510a221d9b42f2
>
>> >> After a lot of experimentation I found a workaround: during resume,
>> >> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge.
>> >> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine.
>> >
>> > I am curious, how did you discover this? While this could work, perhaps
>> > there are alternative workarounds/fixes?
>>
>> Based on the observation that the following procedure works fine (note
>> the addition of step 3):
>>
>> 1. Boot
>> 2. Suspend/resume
>> 3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan
>> 4. Load nouveau driver
>> 5. Start X
>>
>> I worked through the rescan codepath until I had isolated the specific
>> code which magically makes things work (in pci_bridge_check_ranges).
>>
>> Having found that, step 3 in the above test procedure can be replaced
>> with a simple:
>> setpci -s 00:1c.0 0x28.l=0
>>
>> > When you say "parent PCI" bridge, is that actually the device you see in
>> > "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device:
>> >
>> > -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
>> > +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile]
>> >
>> > 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
>>
>> Yes, it's the parent bridge shown by lspci. The address of this varies
>> from system to system.
>
> Could you share some details:
> - acpidump
> - lspci -nnxxxxvvv
> - BIOS version (from /sys/class/dmi/id/)
> - kernel version (mainline?)
>
> Perhaps there is some magic in the ACPI suspend or resume path that
> causes this.
>
>> >> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same
>> >> value of PCI_PREF_BASE_UPPER32 make any difference at all?
>> >
>> > At what point in the suspend code path did you insert this write? It is
>> > possible that the write somehow acted as a fence/memory barrier?
>>
>> static void quirk_pref_base_upper32(struct pci_dev *dev)
>> {
>> u32 pref_base_upper32;
>> pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32);
>> pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32);
>> }
>> DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32);
>>
>> I don't think it's acting as a barrier. I tried changing this code to
>> rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes
>> the bug come back.
>>
>> >> 2. Who is responsible for saving and restoring PCI bridge
>> >> configuration during suspend and resume? Linux? ACPI? BIOS?
>> >
>> > Not sure about PCI bridges, but at least for the PCI Express Capability
>> > registers, it is in control of the OS when control is granted via the
>> > ACPI _OSC method.
>>
>> I guess you are referring to pci_save_pcie_state(). I can't see
>> anything equivalent for the bridge registers.
>
> Yes that would be the function, called via pci_save_state.
>
>> > I recently compared PCI configuration space access and ACPI method
>> > invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10
>> > (1803). There were differences like disabling MSI/interrupts before
>> > suspend, setting the Enable Clock Power Management bit in PCI Express
>> > Link Control and more, but applying these changes were so far not really
>> > successful.
>>
>> Interesting. Do you know any way that I could spy on Windows' accesses
>> to the PCI bridge registers?
>> Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF
>> I suspect VFIO would not help me here.
>> It says:
>> Note: If they are grouped with other devices in this manner, pci
>> root ports and bridges should neither be bound to vfio at boot, nor be
>> added to the VM.
>
> Only non-bridge devices can be passed to a guest, but perhaps logging
> access to the emulated bridge is already sufficient. The Prefetchable
> Base Upper 32 Bits register is at offset 0x28.
>
> In a trace where the Nvidia device is disabled/enabled via Device
> Manager, I see writes on the enable path:
>
> 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4)
>
> For Linux, I only see one write at startup, none on runtime resume.
> I did not test system sleep/resume. (disable/enable is arguably a bit
> different from system s/r, you may want to do additional testing here.)
>
> Full log for WIndows 10 and Linux:
> https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/slogs/win10-rp-enable-disable.txt#L3418
> https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/slogs/linux-rp.txt
> lspci for the emulated bridge:
> https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/lspci-vm-vfio.txt#L359
> The rp_*_config trace points are non-standard and require patches:
> https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/patches/qemu-trace.diff
> --
> Kind regards,
> Peter Wu
> https://lekensteyn.nl
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-29 12:40 ` Karol Herbst
0 siblings, 0 replies; 24+ messages in thread
From: Karol Herbst @ 2018-08-29 12:40 UTC (permalink / raw)
To: Daniel Drake
Cc: Peter Wu, linux-pci, Linux PM, Endless Linux Upstreaming Team, nouveau
On Tue, Aug 28, 2018 at 4:23 AM, Daniel Drake <drake@endlessm.com> wrote:
> On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@lekensteyn.nl> wrote:
>> Are these systems also affected through runtime power management? For
>> example:
>>
>> modprobe nouveau # should enable runtime PM
>> sleep 6 # wait for runtime suspend to kick in
>> lspci -s1: # runtime resume by reading PCI config space
>>
>> On laptops from about 2015-2016 with a GTX 9xxM this sequence results in
>> hangs on various laptops
>> (https://bugzilla.kernel.org/show_bug.cgi?id=156341).
>
> This works fine here. I'm facing a different issue.
>
>>> After a lot of experimentation I found a workaround: during resume,
>>> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge.
>>> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine.
>>
>> I am curious, how did you discover this? While this could work, perhaps
>> there are alternative workarounds/fixes?
>
> Based on the observation that the following procedure works fine (note
> the addition of step 3):
>
> 1. Boot
> 2. Suspend/resume
> 3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan
> 4. Load nouveau driver
> 5. Start X
>
> I worked through the rescan codepath until I had isolated the specific
> code which magically makes things work (in pci_bridge_check_ranges).
>
> Having found that, step 3 in the above test procedure can be replaced
> with a simple:
> setpci -s 00:1c.0 0x28.l=0
>
>> When you say "parent PCI" bridge, is that actually the device you see in
>> "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device:
>>
>> -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
>> +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile]
>>
>> 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
>
> Yes, it's the parent bridge shown by lspci. The address of this varies
> from system to system.
>
>>> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same
>>> value of PCI_PREF_BASE_UPPER32 make any difference at all?
>>
>> At what point in the suspend code path did you insert this write? It is
>> possible that the write somehow acted as a fence/memory barrier?
>
> static void quirk_pref_base_upper32(struct pci_dev *dev)
> {
> u32 pref_base_upper32;
> pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32);
> pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32);
> }
> DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32);
>
this workaround fixes runtime suspend/resume on my laptop as well...
but what baffles me most is, unloading nouveau does as well. I will
see what bits are exactly "fixing" it in the nouveau unloading path
and maybe we can get around this issue inside nouveau. It would be
still nice to get to the root cause of all of this as there are three
known workarounds (at least on my system):
1. unload nouveau
2. skip setting the D3 power state via PCI config space (and still do
the ACPI bits)
3. write value of PCI_PREF_BASE_UPPER32
> I don't think it's acting as a barrier. I tried changing this code to
> rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes
> the bug come back.
>
>>> 2. Who is responsible for saving and restoring PCI bridge
>>> configuration during suspend and resume? Linux? ACPI? BIOS?
>>
>> Not sure about PCI bridges, but at least for the PCI Express Capability
>> registers, it is in control of the OS when control is granted via the
>> ACPI _OSC method.
>
> I guess you are referring to pci_save_pcie_state(). I can't see
> anything equivalent for the bridge registers.
>
>> As Windows is probably not affected by this issue, a change must be
>> possible to make Linux more compatible with Windows. Though I am not
>> sure what change is needed.
>
> I agree. There's a definite difference with Windows here and it would
> be great to find a fix along those lines.
>
>> I recently compared PCI configuration space access and ACPI method
>> invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10
>> (1803). There were differences like disabling MSI/interrupts before
>> suspend, setting the Enable Clock Power Management bit in PCI Express
>> Link Control and more, but applying these changes were so far not really
>> successful.
>
> Interesting. Do you know any way that I could spy on Windows' accesses
> to the PCI bridge registers?
> Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF
> I suspect VFIO would not help me here.
> It says:
> Note: If they are grouped with other devices in this manner, pci
> root ports and bridges should neither be bound to vfio at boot, nor be
> added to the VM.
>
> Thanks
> Daniel
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-29 12:40 ` Karol Herbst
0 siblings, 0 replies; 24+ messages in thread
From: Karol Herbst @ 2018-08-29 12:40 UTC (permalink / raw)
To: Daniel Drake
Cc: nouveau, linux-pci-u79uwXL29TY76Z2rM5mHXA,
Endless Linux Upstreaming Team, Linux PM
On Tue, Aug 28, 2018 at 4:23 AM, Daniel Drake <drake@endlessm.com> wrote:
> On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@lekensteyn.nl> wrote:
>> Are these systems also affected through runtime power management? For
>> example:
>>
>> modprobe nouveau # should enable runtime PM
>> sleep 6 # wait for runtime suspend to kick in
>> lspci -s1: # runtime resume by reading PCI config space
>>
>> On laptops from about 2015-2016 with a GTX 9xxM this sequence results in
>> hangs on various laptops
>> (https://bugzilla.kernel.org/show_bug.cgi?id=156341).
>
> This works fine here. I'm facing a different issue.
>
>>> After a lot of experimentation I found a workaround: during resume,
>>> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge.
>>> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine.
>>
>> I am curious, how did you discover this? While this could work, perhaps
>> there are alternative workarounds/fixes?
>
> Based on the observation that the following procedure works fine (note
> the addition of step 3):
>
> 1. Boot
> 2. Suspend/resume
> 3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan
> 4. Load nouveau driver
> 5. Start X
>
> I worked through the rescan codepath until I had isolated the specific
> code which magically makes things work (in pci_bridge_check_ranges).
>
> Having found that, step 3 in the above test procedure can be replaced
> with a simple:
> setpci -s 00:1c.0 0x28.l=0
>
>> When you say "parent PCI" bridge, is that actually the device you see in
>> "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device:
>>
>> -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
>> +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile]
>>
>> 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
>
> Yes, it's the parent bridge shown by lspci. The address of this varies
> from system to system.
>
>>> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same
>>> value of PCI_PREF_BASE_UPPER32 make any difference at all?
>>
>> At what point in the suspend code path did you insert this write? It is
>> possible that the write somehow acted as a fence/memory barrier?
>
> static void quirk_pref_base_upper32(struct pci_dev *dev)
> {
> u32 pref_base_upper32;
> pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32);
> pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32);
> }
> DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32);
>
this workaround fixes runtime suspend/resume on my laptop as well...
but what baffles me most is, unloading nouveau does as well. I will
see what bits are exactly "fixing" it in the nouveau unloading path
and maybe we can get around this issue inside nouveau. It would be
still nice to get to the root cause of all of this as there are three
known workarounds (at least on my system):
1. unload nouveau
2. skip setting the D3 power state via PCI config space (and still do
the ACPI bits)
3. write value of PCI_PREF_BASE_UPPER32
> I don't think it's acting as a barrier. I tried changing this code to
> rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes
> the bug come back.
>
>>> 2. Who is responsible for saving and restoring PCI bridge
>>> configuration during suspend and resume? Linux? ACPI? BIOS?
>>
>> Not sure about PCI bridges, but at least for the PCI Express Capability
>> registers, it is in control of the OS when control is granted via the
>> ACPI _OSC method.
>
> I guess you are referring to pci_save_pcie_state(). I can't see
> anything equivalent for the bridge registers.
>
>> As Windows is probably not affected by this issue, a change must be
>> possible to make Linux more compatible with Windows. Though I am not
>> sure what change is needed.
>
> I agree. There's a definite difference with Windows here and it would
> be great to find a fix along those lines.
>
>> I recently compared PCI configuration space access and ACPI method
>> invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10
>> (1803). There were differences like disabling MSI/interrupts before
>> suspend, setting the Enable Clock Power Management bit in PCI Express
>> Link Control and more, but applying these changes were so far not really
>> successful.
>
> Interesting. Do you know any way that I could spy on Windows' accesses
> to the PCI bridge registers?
> Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF
> I suspect VFIO would not help me here.
> It says:
> Note: If they are grouped with other devices in this manner, pci
> root ports and bridges should neither be bound to vfio at boot, nor be
> added to the VM.
>
> Thanks
> Daniel
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-30 0:13 ` Karol Herbst
0 siblings, 0 replies; 24+ messages in thread
From: Karol Herbst @ 2018-08-30 0:13 UTC (permalink / raw)
To: Daniel Drake
Cc: Peter Wu, linux-pci, Linux PM, Endless Linux Upstreaming Team, nouveau
ohh actually, I was testing with a kernel without this workaround
applied, so I need to retest it later.
On Wed, Aug 29, 2018 at 2:40 PM, Karol Herbst <kherbst@redhat.com> wrote:
> On Tue, Aug 28, 2018 at 4:23 AM, Daniel Drake <drake@endlessm.com> wrote:
>> On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@lekensteyn.nl> wrote:
>>> Are these systems also affected through runtime power management? For
>>> example:
>>>
>>> modprobe nouveau # should enable runtime PM
>>> sleep 6 # wait for runtime suspend to kick in
>>> lspci -s1: # runtime resume by reading PCI config space
>>>
>>> On laptops from about 2015-2016 with a GTX 9xxM this sequence results in
>>> hangs on various laptops
>>> (https://bugzilla.kernel.org/show_bug.cgi?id=156341).
>>
>> This works fine here. I'm facing a different issue.
>>
>>>> After a lot of experimentation I found a workaround: during resume,
>>>> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge.
>>>> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine.
>>>
>>> I am curious, how did you discover this? While this could work, perhaps
>>> there are alternative workarounds/fixes?
>>
>> Based on the observation that the following procedure works fine (note
>> the addition of step 3):
>>
>> 1. Boot
>> 2. Suspend/resume
>> 3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan
>> 4. Load nouveau driver
>> 5. Start X
>>
>> I worked through the rescan codepath until I had isolated the specific
>> code which magically makes things work (in pci_bridge_check_ranges).
>>
>> Having found that, step 3 in the above test procedure can be replaced
>> with a simple:
>> setpci -s 00:1c.0 0x28.l=0
>>
>>> When you say "parent PCI" bridge, is that actually the device you see in
>>> "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device:
>>>
>>> -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
>>> +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile]
>>>
>>> 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
>>
>> Yes, it's the parent bridge shown by lspci. The address of this varies
>> from system to system.
>>
>>>> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same
>>>> value of PCI_PREF_BASE_UPPER32 make any difference at all?
>>>
>>> At what point in the suspend code path did you insert this write? It is
>>> possible that the write somehow acted as a fence/memory barrier?
>>
>> static void quirk_pref_base_upper32(struct pci_dev *dev)
>> {
>> u32 pref_base_upper32;
>> pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32);
>> pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32);
>> }
>> DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32);
>>
>
> this workaround fixes runtime suspend/resume on my laptop as well...
> but what baffles me most is, unloading nouveau does as well. I will
> see what bits are exactly "fixing" it in the nouveau unloading path
> and maybe we can get around this issue inside nouveau. It would be
> still nice to get to the root cause of all of this as there are three
> known workarounds (at least on my system):
> 1. unload nouveau
> 2. skip setting the D3 power state via PCI config space (and still do
> the ACPI bits)
> 3. write value of PCI_PREF_BASE_UPPER32
>
>> I don't think it's acting as a barrier. I tried changing this code to
>> rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes
>> the bug come back.
>>
>>>> 2. Who is responsible for saving and restoring PCI bridge
>>>> configuration during suspend and resume? Linux? ACPI? BIOS?
>>>
>>> Not sure about PCI bridges, but at least for the PCI Express Capability
>>> registers, it is in control of the OS when control is granted via the
>>> ACPI _OSC method.
>>
>> I guess you are referring to pci_save_pcie_state(). I can't see
>> anything equivalent for the bridge registers.
>>
>>> As Windows is probably not affected by this issue, a change must be
>>> possible to make Linux more compatible with Windows. Though I am not
>>> sure what change is needed.
>>
>> I agree. There's a definite difference with Windows here and it would
>> be great to find a fix along those lines.
>>
>>> I recently compared PCI configuration space access and ACPI method
>>> invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10
>>> (1803). There were differences like disabling MSI/interrupts before
>>> suspend, setting the Enable Clock Power Management bit in PCI Express
>>> Link Control and more, but applying these changes were so far not really
>>> successful.
>>
>> Interesting. Do you know any way that I could spy on Windows' accesses
>> to the PCI bridge registers?
>> Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF
>> I suspect VFIO would not help me here.
>> It says:
>> Note: If they are grouped with other devices in this manner, pci
>> root ports and bridges should neither be bound to vfio at boot, nor be
>> added to the VM.
>>
>> Thanks
>> Daniel
>> _______________________________________________
>> Nouveau mailing list
>> Nouveau@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-30 0:13 ` Karol Herbst
0 siblings, 0 replies; 24+ messages in thread
From: Karol Herbst @ 2018-08-30 0:13 UTC (permalink / raw)
To: Daniel Drake
Cc: nouveau, linux-pci-u79uwXL29TY76Z2rM5mHXA,
Endless Linux Upstreaming Team, Linux PM
ohh actually, I was testing with a kernel without this workaround
applied, so I need to retest it later.
On Wed, Aug 29, 2018 at 2:40 PM, Karol Herbst <kherbst@redhat.com> wrote:
> On Tue, Aug 28, 2018 at 4:23 AM, Daniel Drake <drake@endlessm.com> wrote:
>> On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@lekensteyn.nl> wrote:
>>> Are these systems also affected through runtime power management? For
>>> example:
>>>
>>> modprobe nouveau # should enable runtime PM
>>> sleep 6 # wait for runtime suspend to kick in
>>> lspci -s1: # runtime resume by reading PCI config space
>>>
>>> On laptops from about 2015-2016 with a GTX 9xxM this sequence results in
>>> hangs on various laptops
>>> (https://bugzilla.kernel.org/show_bug.cgi?id=156341).
>>
>> This works fine here. I'm facing a different issue.
>>
>>>> After a lot of experimentation I found a workaround: during resume,
>>>> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge.
>>>> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine.
>>>
>>> I am curious, how did you discover this? While this could work, perhaps
>>> there are alternative workarounds/fixes?
>>
>> Based on the observation that the following procedure works fine (note
>> the addition of step 3):
>>
>> 1. Boot
>> 2. Suspend/resume
>> 3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan
>> 4. Load nouveau driver
>> 5. Start X
>>
>> I worked through the rescan codepath until I had isolated the specific
>> code which magically makes things work (in pci_bridge_check_ranges).
>>
>> Having found that, step 3 in the above test procedure can be replaced
>> with a simple:
>> setpci -s 00:1c.0 0x28.l=0
>>
>>> When you say "parent PCI" bridge, is that actually the device you see in
>>> "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device:
>>>
>>> -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers
>>> +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile]
>>>
>>> 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
>>
>> Yes, it's the parent bridge shown by lspci. The address of this varies
>> from system to system.
>>
>>>> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same
>>>> value of PCI_PREF_BASE_UPPER32 make any difference at all?
>>>
>>> At what point in the suspend code path did you insert this write? It is
>>> possible that the write somehow acted as a fence/memory barrier?
>>
>> static void quirk_pref_base_upper32(struct pci_dev *dev)
>> {
>> u32 pref_base_upper32;
>> pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32);
>> pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32);
>> }
>> DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32);
>>
>
> this workaround fixes runtime suspend/resume on my laptop as well...
> but what baffles me most is, unloading nouveau does as well. I will
> see what bits are exactly "fixing" it in the nouveau unloading path
> and maybe we can get around this issue inside nouveau. It would be
> still nice to get to the root cause of all of this as there are three
> known workarounds (at least on my system):
> 1. unload nouveau
> 2. skip setting the D3 power state via PCI config space (and still do
> the ACPI bits)
> 3. write value of PCI_PREF_BASE_UPPER32
>
>> I don't think it's acting as a barrier. I tried changing this code to
>> rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes
>> the bug come back.
>>
>>>> 2. Who is responsible for saving and restoring PCI bridge
>>>> configuration during suspend and resume? Linux? ACPI? BIOS?
>>>
>>> Not sure about PCI bridges, but at least for the PCI Express Capability
>>> registers, it is in control of the OS when control is granted via the
>>> ACPI _OSC method.
>>
>> I guess you are referring to pci_save_pcie_state(). I can't see
>> anything equivalent for the bridge registers.
>>
>>> As Windows is probably not affected by this issue, a change must be
>>> possible to make Linux more compatible with Windows. Though I am not
>>> sure what change is needed.
>>
>> I agree. There's a definite difference with Windows here and it would
>> be great to find a fix along those lines.
>>
>>> I recently compared PCI configuration space access and ACPI method
>>> invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10
>>> (1803). There were differences like disabling MSI/interrupts before
>>> suspend, setting the Enable Clock Power Management bit in PCI Express
>>> Link Control and more, but applying these changes were so far not really
>>> successful.
>>
>> Interesting. Do you know any way that I could spy on Windows' accesses
>> to the PCI bridge registers?
>> Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF
>> I suspect VFIO would not help me here.
>> It says:
>> Note: If they are grouped with other devices in this manner, pci
>> root ports and bridges should neither be bound to vfio at boot, nor be
>> added to the VM.
>>
>> Thanks
>> Daniel
>> _______________________________________________
>> Nouveau mailing list
>> Nouveau@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/nouveau
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
2018-08-28 9:57 ` Peter Wu
@ 2018-08-30 7:41 ` Daniel Drake
-1 siblings, 0 replies; 24+ messages in thread
From: Daniel Drake @ 2018-08-30 7:41 UTC (permalink / raw)
To: Peter Wu; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team
On Tue, Aug 28, 2018 at 5:57 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> Just to be sure, after "sleep", do both devices report "suspended" in
> /sys/bus/pci/devices/0000:00:1c.0/power/runtime_status
> /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
>
> and was this reproduced with a recent mainline kernel with no special
> cmdline options? The endlessm kernel on Github seems to have quite some
> patches, one of them explicitly disable runtime PM:
> https://github.com/endlessm/linux/commit/8b128b50cd6725eee2ae9025a1510a221d9b42f2
Yes, I checked for this issue in the past and I'm certain that nouveau
runtime pm works fine.
I also checked again now on X542UQ and the results are the same.
nouveau can do runtime suspend/resume (confirmed by reading
runtime_status) and then render 3D graphics OK. lspci is fine too. It
is just S3 suspend that is affected. This was testing on Linux 4.18
unmodified. I had to set nouveau runpm parameter to 1 for it to use
runtime pm.
Also checked with Karol's patch, the S3 issue is still there. Seems
like 2 different issues.
> Could you share some details:
> - acpidump
> - lspci -nnxxxxvvv
> - BIOS version (from /sys/class/dmi/id/)
> - kernel version (mainline?)
Linux 4.18 mainline
BIOS version: X542UQ.202
acpidump: https://gist.githubusercontent.com/dsd/79352284d4adce14f30d70e94fad89f2/raw/ed9480e924be413fff567da2edd5a2a7a86619d0/gistfile1.txt
pci: https://gist.githubusercontent.com/dsd/79352284d4adce14f30d70e94fad89f2/raw/ed9480e924be413fff567da2edd5a2a7a86619d0/pci
> Only non-bridge devices can be passed to a guest, but perhaps logging
> access to the emulated bridge is already sufficient. The Prefetchable
> Base Upper 32 Bits register is at offset 0x28.
>
> In a trace where the Nvidia device is disabled/enabled via Device
> Manager, I see writes on the enable path:
>
> 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4)
>
> For Linux, I only see one write at startup, none on runtime resume.
> I did not test system sleep/resume. (disable/enable is arguably a bit
> different from system s/r, you may want to do additional testing here.)
I managed to install Win10 Home under virt-manager with the nvidia
device passed through.
However the nvidia windows driver installer refuses to install, says:
The NVIDIA graphics driver is not compatible with this version of Windows.
This graphics driver could not find compatible graphics hardware.
One trick for similar sounding problems is to change hypervisor vendor
ID but no luck here.
I was going to check if I can monitor PCI bridge config space access
even without the nvidia driver installed, but I can't find a way to
make the windows VM suspend and resume - the option is not available
in the VM.
Daniel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-30 7:41 ` Daniel Drake
0 siblings, 0 replies; 24+ messages in thread
From: Daniel Drake @ 2018-08-30 7:41 UTC (permalink / raw)
To: Peter Wu
Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, Linux PM,
Endless Linux Upstreaming Team,
nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On Tue, Aug 28, 2018 at 5:57 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> Just to be sure, after "sleep", do both devices report "suspended" in
> /sys/bus/pci/devices/0000:00:1c.0/power/runtime_status
> /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
>
> and was this reproduced with a recent mainline kernel with no special
> cmdline options? The endlessm kernel on Github seems to have quite some
> patches, one of them explicitly disable runtime PM:
> https://github.com/endlessm/linux/commit/8b128b50cd6725eee2ae9025a1510a221d9b42f2
Yes, I checked for this issue in the past and I'm certain that nouveau
runtime pm works fine.
I also checked again now on X542UQ and the results are the same.
nouveau can do runtime suspend/resume (confirmed by reading
runtime_status) and then render 3D graphics OK. lspci is fine too. It
is just S3 suspend that is affected. This was testing on Linux 4.18
unmodified. I had to set nouveau runpm parameter to 1 for it to use
runtime pm.
Also checked with Karol's patch, the S3 issue is still there. Seems
like 2 different issues.
> Could you share some details:
> - acpidump
> - lspci -nnxxxxvvv
> - BIOS version (from /sys/class/dmi/id/)
> - kernel version (mainline?)
Linux 4.18 mainline
BIOS version: X542UQ.202
acpidump: https://gist.githubusercontent.com/dsd/79352284d4adce14f30d70e94fad89f2/raw/ed9480e924be413fff567da2edd5a2a7a86619d0/gistfile1.txt
pci: https://gist.githubusercontent.com/dsd/79352284d4adce14f30d70e94fad89f2/raw/ed9480e924be413fff567da2edd5a2a7a86619d0/pci
> Only non-bridge devices can be passed to a guest, but perhaps logging
> access to the emulated bridge is already sufficient. The Prefetchable
> Base Upper 32 Bits register is at offset 0x28.
>
> In a trace where the Nvidia device is disabled/enabled via Device
> Manager, I see writes on the enable path:
>
> 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4)
>
> For Linux, I only see one write at startup, none on runtime resume.
> I did not test system sleep/resume. (disable/enable is arguably a bit
> different from system s/r, you may want to do additional testing here.)
I managed to install Win10 Home under virt-manager with the nvidia
device passed through.
However the nvidia windows driver installer refuses to install, says:
The NVIDIA graphics driver is not compatible with this version of Windows.
This graphics driver could not find compatible graphics hardware.
One trick for similar sounding problems is to change hypervisor vendor
ID but no luck here.
I was going to check if I can monitor PCI bridge config space access
even without the nvidia driver installed, but I can't find a way to
make the windows VM suspend and resume - the option is not available
in the VM.
Daniel
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-30 9:40 ` Peter Wu
0 siblings, 0 replies; 24+ messages in thread
From: Peter Wu @ 2018-08-30 9:40 UTC (permalink / raw)
To: Daniel Drake; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team
On Thu, Aug 30, 2018 at 03:41:43PM +0800, Daniel Drake wrote:
> On Tue, Aug 28, 2018 at 5:57 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> > Just to be sure, after "sleep", do both devices report "suspended" in
> > /sys/bus/pci/devices/0000:00:1c.0/power/runtime_status
> > /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
> >
> > and was this reproduced with a recent mainline kernel with no special
> > cmdline options? The endlessm kernel on Github seems to have quite some
> > patches, one of them explicitly disable runtime PM:
> > https://github.com/endlessm/linux/commit/8b128b50cd6725eee2ae9025a1510a221d9b42f2
>
> Yes, I checked for this issue in the past and I'm certain that nouveau
> runtime pm works fine.
>
> I also checked again now on X542UQ and the results are the same.
> nouveau can do runtime suspend/resume (confirmed by reading
> runtime_status) and then render 3D graphics OK. lspci is fine too. It
> is just S3 suspend that is affected. This was testing on Linux 4.18
> unmodified. I had to set nouveau runpm parameter to 1 for it to use
> runtime pm.
>
> Also checked with Karol's patch, the S3 issue is still there. Seems
> like 2 different issues.
>
> > Could you share some details:
> > - acpidump
> > - lspci -nnxxxxvvv
> > - BIOS version (from /sys/class/dmi/id/)
> > - kernel version (mainline?)
>
> Linux 4.18 mainline
> BIOS version: X542UQ.202
> acpidump: https://gist.githubusercontent.com/dsd/79352284d4adce14f30d70e94fad89f2/raw/ed9480e924be413fff567da2edd5a2a7a86619d0/gistfile1.txt
> pci: https://gist.githubusercontent.com/dsd/79352284d4adce14f30d70e94fad89f2/raw/ed9480e924be413fff567da2edd5a2a7a86619d0/pci
Thanks, based on the \_SB.PCI0.HGOF implementation, it looks like this
model will not be affected by the runtime suspend issue (it sets the
"Link Disable" register which is known to work for other models).
As the BIOS date is not visible, can you also confirm that this message
is visible in dmesg?
nouveau: detected PR support, will not use DSM
FWIW, the latest BIOS version is 305, released at 2018/08/07:
https://www.asus.com/Laptops/ASUS-VivoBook-15-X542UQ/HelpDesk_BIOS/
> > Only non-bridge devices can be passed to a guest, but perhaps logging
> > access to the emulated bridge is already sufficient. The Prefetchable
> > Base Upper 32 Bits register is at offset 0x28.
> >
> > In a trace where the Nvidia device is disabled/enabled via Device
> > Manager, I see writes on the enable path:
> >
> > 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4)
> >
> > For Linux, I only see one write at startup, none on runtime resume.
> > I did not test system sleep/resume. (disable/enable is arguably a bit
> > different from system s/r, you may want to do additional testing here.)
>
> I managed to install Win10 Home under virt-manager with the nvidia
> device passed through.
> However the nvidia windows driver installer refuses to install, says:
> The NVIDIA graphics driver is not compatible with this version of Windows.
> This graphics driver could not find compatible graphics hardware.
>
> One trick for similar sounding problems is to change hypervisor vendor
> ID but no luck here.
For laptops, it appears that you have to do at least two things:
- Ensure that the Subsystem Vendor/Product ID are set.
- Expose a _ROM ACPI method that provides VBIOS.
Perhaps you also need to provide a "_DSM" method that emulates at least
the "Optimus" interface for GUID a486d8f8-0bda-471b-a72b-6042a6b5bee0.
You probably lost interest here, but if you want to continue anyway this
is what allowed me to install the driver on the XPS 9560:
https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/fakedev.asl
If you adapt if for your environment, note:
- I have only tested this with the q35 machine type with an additional
ioh3420 root port. See the XPS956/boot-vm script.
- The \_SB.PCI0.SE0 device should match the root port:
cat /sys/bus/pci/devices/0000:00:1c.0/firmware_node/path
(the SE0 name is chosen by QEMU.)
- The "NET" (\_SB.PCI0.SE0.NET) device name is arbitrary chosen by me,
it currently assumes PCI address 01:00.0:
Name (_ADR, 0x00000000) // _ADR: Address (dev+fn only, 01:00.0)
- The _DSM method is copied from the XPS 9560 SSDT with external method
references removed (focus on the code with "OPCI" true, the other two
with NBCI and SGCI are irrelevant). One obvious difference with your
SSDT is function 0x10, your OPVK ("Optimus Validation Key Object" is
different and there is another "OPDR" check afterwards.
> I was going to check if I can monitor PCI bridge config space access
> even without the nvidia driver installed, but I can't find a way to
> make the windows VM suspend and resume - the option is not available
> in the VM.
The system cannot be suspended if the GPU device has no driver.
--
Kind regards,
Peter Wu
https://lekensteyn.nl
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-30 9:40 ` Peter Wu
0 siblings, 0 replies; 24+ messages in thread
From: Peter Wu @ 2018-08-30 9:40 UTC (permalink / raw)
To: Daniel Drake
Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, Linux PM,
Endless Linux Upstreaming Team,
nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On Thu, Aug 30, 2018 at 03:41:43PM +0800, Daniel Drake wrote:
> On Tue, Aug 28, 2018 at 5:57 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> > Just to be sure, after "sleep", do both devices report "suspended" in
> > /sys/bus/pci/devices/0000:00:1c.0/power/runtime_status
> > /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
> >
> > and was this reproduced with a recent mainline kernel with no special
> > cmdline options? The endlessm kernel on Github seems to have quite some
> > patches, one of them explicitly disable runtime PM:
> > https://github.com/endlessm/linux/commit/8b128b50cd6725eee2ae9025a1510a221d9b42f2
>
> Yes, I checked for this issue in the past and I'm certain that nouveau
> runtime pm works fine.
>
> I also checked again now on X542UQ and the results are the same.
> nouveau can do runtime suspend/resume (confirmed by reading
> runtime_status) and then render 3D graphics OK. lspci is fine too. It
> is just S3 suspend that is affected. This was testing on Linux 4.18
> unmodified. I had to set nouveau runpm parameter to 1 for it to use
> runtime pm.
>
> Also checked with Karol's patch, the S3 issue is still there. Seems
> like 2 different issues.
>
> > Could you share some details:
> > - acpidump
> > - lspci -nnxxxxvvv
> > - BIOS version (from /sys/class/dmi/id/)
> > - kernel version (mainline?)
>
> Linux 4.18 mainline
> BIOS version: X542UQ.202
> acpidump: https://gist.githubusercontent.com/dsd/79352284d4adce14f30d70e94fad89f2/raw/ed9480e924be413fff567da2edd5a2a7a86619d0/gistfile1.txt
> pci: https://gist.githubusercontent.com/dsd/79352284d4adce14f30d70e94fad89f2/raw/ed9480e924be413fff567da2edd5a2a7a86619d0/pci
Thanks, based on the \_SB.PCI0.HGOF implementation, it looks like this
model will not be affected by the runtime suspend issue (it sets the
"Link Disable" register which is known to work for other models).
As the BIOS date is not visible, can you also confirm that this message
is visible in dmesg?
nouveau: detected PR support, will not use DSM
FWIW, the latest BIOS version is 305, released at 2018/08/07:
https://www.asus.com/Laptops/ASUS-VivoBook-15-X542UQ/HelpDesk_BIOS/
> > Only non-bridge devices can be passed to a guest, but perhaps logging
> > access to the emulated bridge is already sufficient. The Prefetchable
> > Base Upper 32 Bits register is at offset 0x28.
> >
> > In a trace where the Nvidia device is disabled/enabled via Device
> > Manager, I see writes on the enable path:
> >
> > 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4)
> >
> > For Linux, I only see one write at startup, none on runtime resume.
> > I did not test system sleep/resume. (disable/enable is arguably a bit
> > different from system s/r, you may want to do additional testing here.)
>
> I managed to install Win10 Home under virt-manager with the nvidia
> device passed through.
> However the nvidia windows driver installer refuses to install, says:
> The NVIDIA graphics driver is not compatible with this version of Windows.
> This graphics driver could not find compatible graphics hardware.
>
> One trick for similar sounding problems is to change hypervisor vendor
> ID but no luck here.
For laptops, it appears that you have to do at least two things:
- Ensure that the Subsystem Vendor/Product ID are set.
- Expose a _ROM ACPI method that provides VBIOS.
Perhaps you also need to provide a "_DSM" method that emulates at least
the "Optimus" interface for GUID a486d8f8-0bda-471b-a72b-6042a6b5bee0.
You probably lost interest here, but if you want to continue anyway this
is what allowed me to install the driver on the XPS 9560:
https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/fakedev.asl
If you adapt if for your environment, note:
- I have only tested this with the q35 machine type with an additional
ioh3420 root port. See the XPS956/boot-vm script.
- The \_SB.PCI0.SE0 device should match the root port:
cat /sys/bus/pci/devices/0000:00:1c.0/firmware_node/path
(the SE0 name is chosen by QEMU.)
- The "NET" (\_SB.PCI0.SE0.NET) device name is arbitrary chosen by me,
it currently assumes PCI address 01:00.0:
Name (_ADR, 0x00000000) // _ADR: Address (dev+fn only, 01:00.0)
- The _DSM method is copied from the XPS 9560 SSDT with external method
references removed (focus on the code with "OPCI" true, the other two
with NBCI and SGCI are irrelevant). One obvious difference with your
SSDT is function 0x10, your OPVK ("Optimus Validation Key Object" is
different and there is another "OPDR" check afterwards.
> I was going to check if I can monitor PCI bridge config space access
> even without the nvidia driver installed, but I can't find a way to
> make the windows VM suspend and resume - the option is not available
> in the VM.
The system cannot be suspended if the GPU device has no driver.
--
Kind regards,
Peter Wu
https://lekensteyn.nl
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
2018-08-30 9:40 ` Peter Wu
@ 2018-08-31 7:17 ` Daniel Drake
-1 siblings, 0 replies; 24+ messages in thread
From: Daniel Drake @ 2018-08-31 7:17 UTC (permalink / raw)
To: Peter Wu; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team
On Thu, Aug 30, 2018 at 5:40 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> As the BIOS date is not visible, can you also confirm that this message
> is visible in dmesg?
>
> nouveau: detected PR support, will not use DSM
Yes, that gets logged.
> For laptops, it appears that you have to do at least two things:
> - Ensure that the Subsystem Vendor/Product ID are set.
> - Expose a _ROM ACPI method that provides VBIOS.
>
> Perhaps you also need to provide a "_DSM" method that emulates at least
> the "Optimus" interface for GUID a486d8f8-0bda-471b-a72b-6042a6b5bee0.
>
> You probably lost interest here, but if you want to continue anyway this
> is what allowed me to install the driver on the XPS 9560:
> https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/fakedev.asl
Indeed. I'm going to submit the workaround and I'll look to come back
to this qemu/vfio analysis later.
Thanks
Daniel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-08-31 7:17 ` Daniel Drake
0 siblings, 0 replies; 24+ messages in thread
From: Daniel Drake @ 2018-08-31 7:17 UTC (permalink / raw)
To: Peter Wu
Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, Linux PM,
Endless Linux Upstreaming Team,
nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On Thu, Aug 30, 2018 at 5:40 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> As the BIOS date is not visible, can you also confirm that this message
> is visible in dmesg?
>
> nouveau: detected PR support, will not use DSM
Yes, that gets logged.
> For laptops, it appears that you have to do at least two things:
> - Ensure that the Subsystem Vendor/Product ID are set.
> - Expose a _ROM ACPI method that provides VBIOS.
>
> Perhaps you also need to provide a "_DSM" method that emulates at least
> the "Optimus" interface for GUID a486d8f8-0bda-471b-a72b-6042a6b5bee0.
>
> You probably lost interest here, but if you want to continue anyway this
> is what allowed me to install the driver on the XPS 9560:
> https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/fakedev.asl
Indeed. I'm going to submit the workaround and I'll look to come back
to this qemu/vfio analysis later.
Thanks
Daniel
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
2018-08-28 9:57 ` Peter Wu
@ 2018-09-05 6:26 ` Daniel Drake
-1 siblings, 0 replies; 24+ messages in thread
From: Daniel Drake @ 2018-09-05 6:26 UTC (permalink / raw)
To: Peter Wu; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team
On Tue, Aug 28, 2018 at 5:57 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> Only non-bridge devices can be passed to a guest, but perhaps logging
> access to the emulated bridge is already sufficient. The Prefetchable
> Base Upper 32 Bits register is at offset 0x28.
>
> In a trace where the Nvidia device is disabled/enabled via Device
> Manager, I see writes on the enable path:
>
> 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4)
Did you do anything special to get an emulated bridge included in this setup?
Folllowing the instructions at
https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF I can
successfully pass through devices to windows running under
virt-manager. In the nvidia GPU case I haven't got passed the driver
installation failure, but I can pass through other devices OK and
install their drivers.
However I do not end up with any PCI-to-PCI bridges in this setup. The
passed through device sits at address 00:08.0, parent is the PCI host
bridge 00:00.0.
(I'm trying to spy if Windows appears to restore or reset the PCI
bridge prefetch registers upon resume)
Thanks
Daniel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-09-05 6:26 ` Daniel Drake
0 siblings, 0 replies; 24+ messages in thread
From: Daniel Drake @ 2018-09-05 6:26 UTC (permalink / raw)
To: Peter Wu
Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, Linux PM,
Endless Linux Upstreaming Team,
nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On Tue, Aug 28, 2018 at 5:57 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> Only non-bridge devices can be passed to a guest, but perhaps logging
> access to the emulated bridge is already sufficient. The Prefetchable
> Base Upper 32 Bits register is at offset 0x28.
>
> In a trace where the Nvidia device is disabled/enabled via Device
> Manager, I see writes on the enable path:
>
> 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4)
Did you do anything special to get an emulated bridge included in this setup?
Folllowing the instructions at
https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF I can
successfully pass through devices to windows running under
virt-manager. In the nvidia GPU case I haven't got passed the driver
installation failure, but I can pass through other devices OK and
install their drivers.
However I do not end up with any PCI-to-PCI bridges in this setup. The
passed through device sits at address 00:08.0, parent is the PCI host
bridge 00:00.0.
(I'm trying to spy if Windows appears to restore or reset the PCI
bridge prefetch registers upon resume)
Thanks
Daniel
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Nouveau] Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-09-05 16:02 ` Peter Wu
0 siblings, 0 replies; 24+ messages in thread
From: Peter Wu @ 2018-09-05 16:02 UTC (permalink / raw)
To: Daniel Drake; +Cc: linux-pci, nouveau, Linux PM, Endless Linux Upstreaming Team
On Wed, Sep 05, 2018 at 02:26:51PM +0800, Daniel Drake wrote:
> On Tue, Aug 28, 2018 at 5:57 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> > Only non-bridge devices can be passed to a guest, but perhaps logging
> > access to the emulated bridge is already sufficient. The Prefetchable
> > Base Upper 32 Bits register is at offset 0x28.
> >
> > In a trace where the Nvidia device is disabled/enabled via Device
> > Manager, I see writes on the enable path:
> >
> > 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4)
>
> Did you do anything special to get an emulated bridge included in this setup?
Yes, I followed instructions in QEMU's docs/pcie.txt and ended up with:
-device ioh3420,id=rp1,bus=pcie.0,addr=1c.0,port=1
-device vfio-pci,bus=rp1,host=01:00.0,rombar=0,x-pci-sub-vendor-id=0x1028,x-pci-sub-device-id=0x07be
(Subvendor/device IDs are from lspci -nnv).
> Folllowing the instructions at
> https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF I can
> successfully pass through devices to windows running under
> virt-manager. In the nvidia GPU case I haven't got passed the driver
> installation failure, but I can pass through other devices OK and
> install their drivers.
After installing drivers, it would still not start. For that to work I
had to pass the VBIOS via an ACPI _ROM method:
-acpitable file=fakedev.aml
-fw_cfg name=opt/nl.lekensteyn/vfio-vbios,file=vbios.rom
These options were taken from:
https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/boot-vm
fakedev.asl source file and instructions to extract the VBIOS:
https://github.com/Lekensteyn/acpi-stuff/tree/master/d3test
> However I do not end up with any PCI-to-PCI bridges in this setup. The
> passed through device sits at address 00:08.0, parent is the PCI host
> bridge 00:00.0.
>
> (I'm trying to spy if Windows appears to restore or reset the PCI
> bridge prefetch registers upon resume)
If you want to suspend the guest, note that Windows refuses suspend
with the default VGA adapter (see "devicequery /a"). Try the QXL adapter
with https://gitlab.freedesktop.org/spice/win32/qxl-wddm-dod
-vga qxl -device qemu-xhci -device usb-tablet
Not sure how well tested this is, I had to patch Linux to avoid an oops.
If I try this on Windows, it successfully suspends ("info status" in
QEMU monitor says "paused (suspended)"), but resume ends up with a black
screen...
Luckily, the important information is already logged. Windows 10 indeed
seems to write to "Prefetchable Base Upper 32 Bits" on resume[1].
--
Kind regards,
Peter Wu
https://lekensteyn.nl
[1]: QEMU output (annotated with register names) for
./run-vm.sh -device usb-tablet -vga qxl /tmp/w10.qcow2 -trace rp_read_config,file=/dev/stdout -trace rp_write_config,file=/dev/stdout
<suspend>
NET._PS3
32481@1536163097.415976:rp_write_config (ioh3420, @0x12c, 0x0, len=0x4) AER: Root Error Command
32481@1536163097.415999:rp_write_config (ioh3420, @0xac, 0x0, len=0x2) PCIE: Root Control
32481@1536163097.416008:rp_read_config (ioh3420, @0xac, len=0x2) 0x0 PCIE: Root Control
32481@1536163097.416017:rp_read_config (ioh3420, @0xa0, len=0x2) 0x0 PCIE: Link Control
32481@1536163097.416024:rp_write_config (ioh3420, @0xb0, 0x10000, len=0x4) PCIE: Root Status
32481@1536163097.416057:rp_read_config (ioh3420, @0x4, len=0x2) 0x506 Command
32481@1536163097.416066:rp_read_config (ioh3420, @0xc, len=0x1) 0x0 Cacheline Size
32481@1536163097.416073:rp_read_config (ioh3420, @0xd, len=0x1) 0x0 Latency Timer
32481@1536163097.416081:rp_read_config (ioh3420, @0x3c, len=0x1) 0x0 Interrupt Line
32481@1536163097.416088:rp_read_config (ioh3420, @0x19, len=0x1) 0x1 Secondary Bus Number
32481@1536163097.416095:rp_read_config (ioh3420, @0x1a, len=0x1) 0x1 Subordiante Bus Number
32481@1536163097.416103:rp_read_config (ioh3420, @0x3e, len=0x2) 0x2 Bridge Control
32481@1536163097.416129:rp_read_config (ioh3420, @0x0, len=0x2) 0x8086 Device ID
32481@1536163097.416136:rp_read_config (ioh3420, @0x2, len=0x2) 0x3420 Vendor ID
32481@1536163097.416143:rp_read_config (ioh3420, @0x8, len=0x1) 0x2 Revision
32481@1536163097.416150:rp_read_config (ioh3420, @0x9, len=0x1) 0x0 Class Code
32481@1536163097.416156:rp_read_config (ioh3420, @0xa, len=0x1) 0x4 +1 Class Code
32481@1536163097.416164:rp_read_config (ioh3420, @0xb, len=0x1) 0x6 +2 Class Code
32481@1536163097.416172:rp_read_config (ioh3420, @0xe, len=0x1) 0x1 Header Type
32481@1536163097.416180:rp_read_config (ioh3420, @0x6, len=0x2) 0x10 Status
32481@1536163097.416187:rp_read_config (ioh3420, @0x34, len=0x1) 0xe0 Capabilities Pointer
32481@1536163097.416195:rp_read_config (ioh3420, @0xe0, len=0x2) 0x9001
32481@1536163097.416203:rp_read_config (ioh3420, @0x90, len=0x2) 0x6010 PCI Express
32481@1536163097.416210:rp_read_config (ioh3420, @0x60, len=0x2) 0x4005 Message Signaled Interrupts
32481@1536163097.416218:rp_read_config (ioh3420, @0x40, len=0x2) 0xd Bridge subsystem vendor/device ID
32481@1536163097.416226:rp_read_config (ioh3420, @0x44, len=0x2) 0x8086
32481@1536163097.416234:rp_read_config (ioh3420, @0x46, len=0x2) 0x0
32481@1536163097.416241:rp_read_config (ioh3420, @0x98, len=0x2) 0x7 PCIE: Device Control
32481@1536163097.416249:rp_read_config (ioh3420, @0xb8, len=0x2) 0x0 PCIE: Device Control 2
32481@1536163097.416257:rp_read_config (ioh3420, @0x0, len=0x2) 0x8086 Device ID
32481@1536163097.416265:rp_read_config (ioh3420, @0x2, len=0x2) 0x3420 Vendor ID
32481@1536163097.416272:rp_read_config (ioh3420, @0x8, len=0x1) 0x2 Revision
32481@1536163097.416280:rp_read_config (ioh3420, @0x9, len=0x1) 0x0 Class Code
32481@1536163097.416287:rp_read_config (ioh3420, @0xa, len=0x1) 0x4 +1 Class Code
32481@1536163097.416295:rp_read_config (ioh3420, @0xb, len=0x1) 0x6 +2 Class Code
32481@1536163097.416303:rp_read_config (ioh3420, @0xe, len=0x1) 0x1 Header Type
32481@1536163097.416310:rp_read_config (ioh3420, @0x6, len=0x2) 0x10 Status
32481@1536163097.416318:rp_read_config (ioh3420, @0x34, len=0x1) 0xe0 Capabilities Pointer
32481@1536163097.416325:rp_read_config (ioh3420, @0xe0, len=0x2) 0x9001
32481@1536163097.416333:rp_read_config (ioh3420, @0x90, len=0x2) 0x6010 PCI Express
32481@1536163097.416341:rp_read_config (ioh3420, @0x60, len=0x2) 0x4005 Message Signaled Interrupts
32481@1536163097.416349:rp_read_config (ioh3420, @0x40, len=0x2) 0xd Bridge subsystem vendor/device ID
32481@1536163097.416356:rp_read_config (ioh3420, @0x44, len=0x2) 0x8086
32481@1536163097.416364:rp_read_config (ioh3420, @0x46, len=0x2) 0x0
32481@1536163097.416372:rp_read_config (ioh3420, @0x4, len=0x2) 0x506 Command
32481@1536163097.416380:rp_write_config (ioh3420, @0x4, 0x506, len=0x2) Command
32481@1536163097.416742:rp_read_config (ioh3420, @0x62, len=0x2) 0x103 MSI: Message Control
32481@1536163097.416753:rp_write_config (ioh3420, @0x62, 0x102, len=0x2) MSI: Message Control
32481@1536163097.416762:rp_read_config (ioh3420, @0x4, len=0x2) 0x506 Command
32481@1536163097.416770:rp_write_config (ioh3420, @0x4, 0x500, len=0x2) Command
32481@1536163097.417356:rp_read_config (ioh3420, @0x9a, len=0x2) 0x0 PCIE: Device Status
32481@1536163097.417367:rp_read_config (ioh3420, @0xe0, len=0x4) 0xc8039001
32481@1536163097.417375:rp_read_config (ioh3420, @0xe4, len=0x4) 0x8
32481@1536163097.417383:rp_write_config (ioh3420, @0xe4, 0xb, len=0x2)
32481@1536163097.456781:rp_read_config (ioh3420, @0xe4, len=0x2) 0xb
_PS3
PG00._ON
PG00._OFF
<resume>
PG00._ON
PG00._ON
_PS0
32481@1536163120.049599:rp_read_config (ioh3420, @0x0, len=0x2) 0x8086 Device ID
32481@1536163120.049655:rp_read_config (ioh3420, @0x2, len=0x2) 0x3420 Vendor ID
32481@1536163120.049680:rp_read_config (ioh3420, @0x8, len=0x1) 0x2 Revision
32481@1536163120.049708:rp_read_config (ioh3420, @0x9, len=0x1) 0x0 Class Code
32481@1536163120.049734:rp_read_config (ioh3420, @0xa, len=0x1) 0x4 +1 Class Code
32481@1536163120.049760:rp_read_config (ioh3420, @0xb, len=0x1) 0x6 +2 Class Code
32481@1536163120.049785:rp_read_config (ioh3420, @0xe, len=0x1) 0x1 Header Type
32481@1536163120.049811:rp_read_config (ioh3420, @0x6, len=0x2) 0x10 Status
32481@1536163120.049837:rp_read_config (ioh3420, @0x34, len=0x1) 0xe0 Capabilities Pointer
32481@1536163120.049862:rp_read_config (ioh3420, @0xe0, len=0x2) 0x9001
32481@1536163120.049887:rp_read_config (ioh3420, @0x90, len=0x2) 0x6010 PCI Express
32481@1536163120.049909:rp_read_config (ioh3420, @0x60, len=0x2) 0x4005 Message Signaled Interrupts
32481@1536163120.049932:rp_read_config (ioh3420, @0x40, len=0x2) 0xd Bridge subsystem vendor/device ID
32481@1536163120.049958:rp_read_config (ioh3420, @0x44, len=0x2) 0x8086
32481@1536163120.049985:rp_read_config (ioh3420, @0x46, len=0x2) 0x0
32481@1536163120.050015:rp_read_config (ioh3420, @0xe0, len=0x4) 0xc8039001
32481@1536163120.050040:rp_read_config (ioh3420, @0xe4, len=0x4) 0xb
32481@1536163120.050072:rp_write_config (ioh3420, @0xe4, 0x8, len=0x2)
32481@1536163120.068096:rp_read_config (ioh3420, @0x0, len=0x2) 0x8086 Device ID
32481@1536163120.068157:rp_read_config (ioh3420, @0x2, len=0x2) 0x3420 Vendor ID
32481@1536163120.068194:rp_read_config (ioh3420, @0x8, len=0x1) 0x2 Revision
32481@1536163120.068222:rp_read_config (ioh3420, @0x9, len=0x1) 0x0 Class Code
32481@1536163120.068250:rp_read_config (ioh3420, @0xa, len=0x1) 0x4 +1 Class Code
32481@1536163120.068284:rp_read_config (ioh3420, @0xb, len=0x1) 0x6 +2 Class Code
32481@1536163120.068309:rp_read_config (ioh3420, @0xe, len=0x1) 0x1 Header Type
32481@1536163120.068333:rp_read_config (ioh3420, @0x6, len=0x2) 0x10 Status
32481@1536163120.068361:rp_read_config (ioh3420, @0x34, len=0x1) 0xe0 Capabilities Pointer
32481@1536163120.068395:rp_read_config (ioh3420, @0xe0, len=0x2) 0x9001
32481@1536163120.068421:rp_read_config (ioh3420, @0x90, len=0x2) 0x6010 PCI Express
32481@1536163120.068446:rp_read_config (ioh3420, @0x60, len=0x2) 0x4005 Message Signaled Interrupts
32481@1536163120.068471:rp_read_config (ioh3420, @0x40, len=0x2) 0xd Bridge subsystem vendor/device ID
32481@1536163120.068495:rp_read_config (ioh3420, @0x44, len=0x2) 0x8086
32481@1536163120.068519:rp_read_config (ioh3420, @0x46, len=0x2) 0x0
32481@1536163120.068547:rp_read_config (ioh3420, @0xe4, len=0x2) 0x8
32481@1536163120.068575:rp_write_config (ioh3420, @0x10, 0x0, len=0x4) BAR0
32481@1536163120.068607:rp_write_config (ioh3420, @0x14, 0x0, len=0x4) BAR1
32481@1536163120.068636:rp_write_config (ioh3420, @0x1c, 0xff, len=0x2) I/O Base
32481@1536163120.069825:rp_write_config (ioh3420, @0x20, 0xfc10fc00, len=0x4) Memory Base
32481@1536163120.070928:rp_write_config (ioh3420, @0x24, 0xfeb0fea0, len=0x4) Prefetchable Memory Base
32481@1536163120.071968:rp_write_config (ioh3420, @0x28, 0x0, len=0x4) Prefetchable Base Upper 32 Bits
32481@1536163120.072946:rp_write_config (ioh3420, @0x2c, 0x0, len=0x4) Prefetchable Limit Upper 32 Bits
32481@1536163120.073901:rp_write_config (ioh3420, @0x30, 0x0, len=0x4) I/O Base Upper 16 Bits
32481@1536163120.074969:rp_write_config (ioh3420, @0x38, 0x0, len=0x4)
32481@1536163120.075006:rp_write_config (ioh3420, @0x3c, 0x0, len=0x1) Interrupt Line
32481@1536163120.075028:rp_write_config (ioh3420, @0x3e, 0x2, len=0x2) Bridge Control
32481@1536163120.075996:rp_read_config (ioh3420, @0x3e, len=0x2) 0x2 Bridge Control
32481@1536163120.076028:rp_write_config (ioh3420, @0x18, 0x0, len=0x1) Primary Bus Number
32481@1536163120.076051:rp_write_config (ioh3420, @0x19, 0x1, len=0x1) Secondary Bus Number
32481@1536163120.076074:rp_write_config (ioh3420, @0x1a, 0x1, len=0x1) Subordiante Bus Number
32481@1536163120.076097:rp_write_config (ioh3420, @0xc, 0x0, len=0x1) Cacheline Size
32481@1536163120.076118:rp_write_config (ioh3420, @0xd, 0x0, len=0x1) Latency Timer
32481@1536163120.076137:rp_write_config (ioh3420, @0x4, 0x500, len=0x2) Command
32481@1536163120.077194:rp_write_config (ioh3420, @0x98, 0x7, len=0x2) PCIE: Device Control
32481@1536163120.077225:rp_write_config (ioh3420, @0xb8, 0x0, len=0x2) PCIE: Device Control 2
32481@1536163120.077246:rp_read_config (ioh3420, @0x4, len=0x2) 0x500 Command
32481@1536163120.077270:rp_write_config (ioh3420, @0x4, 0x506, len=0x2) Command
32481@1536163120.078918:rp_write_config (ioh3420, @0x6, 0xf900, len=0x2) Status
32481@1536163120.078950:rp_write_config (ioh3420, @0x1e, 0xf900, len=0x2) Secondary Status
32481@1536163120.078972:rp_read_config (ioh3420, @0x4, len=0x2) 0x506 Command
32481@1536163120.078995:rp_write_config (ioh3420, @0x4, 0x506, len=0x2) Command
32481@1536163120.079701:rp_read_config (ioh3420, @0x62, len=0x2) 0x102 MSI: Message Control
32481@1536163120.079722:rp_write_config (ioh3420, @0x62, 0x102, len=0x2) MSI: Message Control
32481@1536163120.079739:rp_read_config (ioh3420, @0x62, len=0x2) 0x102 MSI: Message Control
32481@1536163120.079753:rp_write_config (ioh3420, @0x64, 0xfee0100c, len=0x4) MSI: Message Address
32481@1536163120.079770:rp_write_config (ioh3420, @0x68, 0x4950, len=0x2) MSI: Message Upper Address
32481@1536163120.079786:rp_write_config (ioh3420, @0x6c, 0xfffffffe, len=0x4) MSI: Message Data
32481@1536163120.079801:rp_write_config (ioh3420, @0x62, 0x103, len=0x2) MSI: Message Control
32481@1536163120.079855:rp_write_config (ioh3420, @0xac, 0x0, len=0x2) PCIE: Root Control
32481@1536163120.079872:rp_write_config (ioh3420, @0xa0, 0x0, len=0x2) PCIE: Link Control
32481@1536163120.079887:rp_read_config (ioh3420, @0x98, len=0x2) 0x7 PCIE: Device Control
32481@1536163120.079903:rp_write_config (ioh3420, @0x98, 0x7, len=0x2) PCIE: Device Control
32481@1536163120.079918:rp_read_config (ioh3420, @0xb0, len=0x4) 0x0 PCIE: Root Status
32481@1536163120.079934:rp_write_config (ioh3420, @0x12c, 0x7, len=0x4) AER: Root Error Command
32481@1536163120.079950:rp_write_config (ioh3420, @0xac, 0x8, len=0x2) PCIE: Root Control
NET._PS0
32481@1536163120.175514:rp_write_config (ioh3420, @0xb8, 0x0, len=0x2) PCIE: Device Control 2
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues
@ 2018-09-05 16:02 ` Peter Wu
0 siblings, 0 replies; 24+ messages in thread
From: Peter Wu @ 2018-09-05 16:02 UTC (permalink / raw)
To: Daniel Drake
Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA, Linux PM,
Endless Linux Upstreaming Team,
nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
On Wed, Sep 05, 2018 at 02:26:51PM +0800, Daniel Drake wrote:
> On Tue, Aug 28, 2018 at 5:57 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> > Only non-bridge devices can be passed to a guest, but perhaps logging
> > access to the emulated bridge is already sufficient. The Prefetchable
> > Base Upper 32 Bits register is at offset 0x28.
> >
> > In a trace where the Nvidia device is disabled/enabled via Device
> > Manager, I see writes on the enable path:
> >
> > 2571@1535108904.593107:rp_write_config (ioh3420, @0x28, 0x0, len=0x4)
>
> Did you do anything special to get an emulated bridge included in this setup?
Yes, I followed instructions in QEMU's docs/pcie.txt and ended up with:
-device ioh3420,id=rp1,bus=pcie.0,addr=1c.0,port=1
-device vfio-pci,bus=rp1,host=01:00.0,rombar=0,x-pci-sub-vendor-id=0x1028,x-pci-sub-device-id=0x07be
(Subvendor/device IDs are from lspci -nnv).
> Folllowing the instructions at
> https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF I can
> successfully pass through devices to windows running under
> virt-manager. In the nvidia GPU case I haven't got passed the driver
> installation failure, but I can pass through other devices OK and
> install their drivers.
After installing drivers, it would still not start. For that to work I
had to pass the VBIOS via an ACPI _ROM method:
-acpitable file=fakedev.aml
-fw_cfg name=opt/nl.lekensteyn/vfio-vbios,file=vbios.rom
These options were taken from:
https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/XPS9560/boot-vm
fakedev.asl source file and instructions to extract the VBIOS:
https://github.com/Lekensteyn/acpi-stuff/tree/master/d3test
> However I do not end up with any PCI-to-PCI bridges in this setup. The
> passed through device sits at address 00:08.0, parent is the PCI host
> bridge 00:00.0.
>
> (I'm trying to spy if Windows appears to restore or reset the PCI
> bridge prefetch registers upon resume)
If you want to suspend the guest, note that Windows refuses suspend
with the default VGA adapter (see "devicequery /a"). Try the QXL adapter
with https://gitlab.freedesktop.org/spice/win32/qxl-wddm-dod
-vga qxl -device qemu-xhci -device usb-tablet
Not sure how well tested this is, I had to patch Linux to avoid an oops.
If I try this on Windows, it successfully suspends ("info status" in
QEMU monitor says "paused (suspended)"), but resume ends up with a black
screen...
Luckily, the important information is already logged. Windows 10 indeed
seems to write to "Prefetchable Base Upper 32 Bits" on resume[1].
--
Kind regards,
Peter Wu
https://lekensteyn.nl
[1]: QEMU output (annotated with register names) for
./run-vm.sh -device usb-tablet -vga qxl /tmp/w10.qcow2 -trace rp_read_config,file=/dev/stdout -trace rp_write_config,file=/dev/stdout
<suspend>
NET._PS3
32481@1536163097.415976:rp_write_config (ioh3420, @0x12c, 0x0, len=0x4) AER: Root Error Command
32481@1536163097.415999:rp_write_config (ioh3420, @0xac, 0x0, len=0x2) PCIE: Root Control
32481@1536163097.416008:rp_read_config (ioh3420, @0xac, len=0x2) 0x0 PCIE: Root Control
32481@1536163097.416017:rp_read_config (ioh3420, @0xa0, len=0x2) 0x0 PCIE: Link Control
32481@1536163097.416024:rp_write_config (ioh3420, @0xb0, 0x10000, len=0x4) PCIE: Root Status
32481@1536163097.416057:rp_read_config (ioh3420, @0x4, len=0x2) 0x506 Command
32481@1536163097.416066:rp_read_config (ioh3420, @0xc, len=0x1) 0x0 Cacheline Size
32481@1536163097.416073:rp_read_config (ioh3420, @0xd, len=0x1) 0x0 Latency Timer
32481@1536163097.416081:rp_read_config (ioh3420, @0x3c, len=0x1) 0x0 Interrupt Line
32481@1536163097.416088:rp_read_config (ioh3420, @0x19, len=0x1) 0x1 Secondary Bus Number
32481@1536163097.416095:rp_read_config (ioh3420, @0x1a, len=0x1) 0x1 Subordiante Bus Number
32481@1536163097.416103:rp_read_config (ioh3420, @0x3e, len=0x2) 0x2 Bridge Control
32481@1536163097.416129:rp_read_config (ioh3420, @0x0, len=0x2) 0x8086 Device ID
32481@1536163097.416136:rp_read_config (ioh3420, @0x2, len=0x2) 0x3420 Vendor ID
32481@1536163097.416143:rp_read_config (ioh3420, @0x8, len=0x1) 0x2 Revision
32481@1536163097.416150:rp_read_config (ioh3420, @0x9, len=0x1) 0x0 Class Code
32481@1536163097.416156:rp_read_config (ioh3420, @0xa, len=0x1) 0x4 +1 Class Code
32481@1536163097.416164:rp_read_config (ioh3420, @0xb, len=0x1) 0x6 +2 Class Code
32481@1536163097.416172:rp_read_config (ioh3420, @0xe, len=0x1) 0x1 Header Type
32481@1536163097.416180:rp_read_config (ioh3420, @0x6, len=0x2) 0x10 Status
32481@1536163097.416187:rp_read_config (ioh3420, @0x34, len=0x1) 0xe0 Capabilities Pointer
32481@1536163097.416195:rp_read_config (ioh3420, @0xe0, len=0x2) 0x9001
32481@1536163097.416203:rp_read_config (ioh3420, @0x90, len=0x2) 0x6010 PCI Express
32481@1536163097.416210:rp_read_config (ioh3420, @0x60, len=0x2) 0x4005 Message Signaled Interrupts
32481@1536163097.416218:rp_read_config (ioh3420, @0x40, len=0x2) 0xd Bridge subsystem vendor/device ID
32481@1536163097.416226:rp_read_config (ioh3420, @0x44, len=0x2) 0x8086
32481@1536163097.416234:rp_read_config (ioh3420, @0x46, len=0x2) 0x0
32481@1536163097.416241:rp_read_config (ioh3420, @0x98, len=0x2) 0x7 PCIE: Device Control
32481@1536163097.416249:rp_read_config (ioh3420, @0xb8, len=0x2) 0x0 PCIE: Device Control 2
32481@1536163097.416257:rp_read_config (ioh3420, @0x0, len=0x2) 0x8086 Device ID
32481@1536163097.416265:rp_read_config (ioh3420, @0x2, len=0x2) 0x3420 Vendor ID
32481@1536163097.416272:rp_read_config (ioh3420, @0x8, len=0x1) 0x2 Revision
32481@1536163097.416280:rp_read_config (ioh3420, @0x9, len=0x1) 0x0 Class Code
32481@1536163097.416287:rp_read_config (ioh3420, @0xa, len=0x1) 0x4 +1 Class Code
32481@1536163097.416295:rp_read_config (ioh3420, @0xb, len=0x1) 0x6 +2 Class Code
32481@1536163097.416303:rp_read_config (ioh3420, @0xe, len=0x1) 0x1 Header Type
32481@1536163097.416310:rp_read_config (ioh3420, @0x6, len=0x2) 0x10 Status
32481@1536163097.416318:rp_read_config (ioh3420, @0x34, len=0x1) 0xe0 Capabilities Pointer
32481@1536163097.416325:rp_read_config (ioh3420, @0xe0, len=0x2) 0x9001
32481@1536163097.416333:rp_read_config (ioh3420, @0x90, len=0x2) 0x6010 PCI Express
32481@1536163097.416341:rp_read_config (ioh3420, @0x60, len=0x2) 0x4005 Message Signaled Interrupts
32481@1536163097.416349:rp_read_config (ioh3420, @0x40, len=0x2) 0xd Bridge subsystem vendor/device ID
32481@1536163097.416356:rp_read_config (ioh3420, @0x44, len=0x2) 0x8086
32481@1536163097.416364:rp_read_config (ioh3420, @0x46, len=0x2) 0x0
32481@1536163097.416372:rp_read_config (ioh3420, @0x4, len=0x2) 0x506 Command
32481@1536163097.416380:rp_write_config (ioh3420, @0x4, 0x506, len=0x2) Command
32481@1536163097.416742:rp_read_config (ioh3420, @0x62, len=0x2) 0x103 MSI: Message Control
32481@1536163097.416753:rp_write_config (ioh3420, @0x62, 0x102, len=0x2) MSI: Message Control
32481@1536163097.416762:rp_read_config (ioh3420, @0x4, len=0x2) 0x506 Command
32481@1536163097.416770:rp_write_config (ioh3420, @0x4, 0x500, len=0x2) Command
32481@1536163097.417356:rp_read_config (ioh3420, @0x9a, len=0x2) 0x0 PCIE: Device Status
32481@1536163097.417367:rp_read_config (ioh3420, @0xe0, len=0x4) 0xc8039001
32481@1536163097.417375:rp_read_config (ioh3420, @0xe4, len=0x4) 0x8
32481@1536163097.417383:rp_write_config (ioh3420, @0xe4, 0xb, len=0x2)
32481@1536163097.456781:rp_read_config (ioh3420, @0xe4, len=0x2) 0xb
_PS3
PG00._ON
PG00._OFF
<resume>
PG00._ON
PG00._ON
_PS0
32481@1536163120.049599:rp_read_config (ioh3420, @0x0, len=0x2) 0x8086 Device ID
32481@1536163120.049655:rp_read_config (ioh3420, @0x2, len=0x2) 0x3420 Vendor ID
32481@1536163120.049680:rp_read_config (ioh3420, @0x8, len=0x1) 0x2 Revision
32481@1536163120.049708:rp_read_config (ioh3420, @0x9, len=0x1) 0x0 Class Code
32481@1536163120.049734:rp_read_config (ioh3420, @0xa, len=0x1) 0x4 +1 Class Code
32481@1536163120.049760:rp_read_config (ioh3420, @0xb, len=0x1) 0x6 +2 Class Code
32481@1536163120.049785:rp_read_config (ioh3420, @0xe, len=0x1) 0x1 Header Type
32481@1536163120.049811:rp_read_config (ioh3420, @0x6, len=0x2) 0x10 Status
32481@1536163120.049837:rp_read_config (ioh3420, @0x34, len=0x1) 0xe0 Capabilities Pointer
32481@1536163120.049862:rp_read_config (ioh3420, @0xe0, len=0x2) 0x9001
32481@1536163120.049887:rp_read_config (ioh3420, @0x90, len=0x2) 0x6010 PCI Express
32481@1536163120.049909:rp_read_config (ioh3420, @0x60, len=0x2) 0x4005 Message Signaled Interrupts
32481@1536163120.049932:rp_read_config (ioh3420, @0x40, len=0x2) 0xd Bridge subsystem vendor/device ID
32481@1536163120.049958:rp_read_config (ioh3420, @0x44, len=0x2) 0x8086
32481@1536163120.049985:rp_read_config (ioh3420, @0x46, len=0x2) 0x0
32481@1536163120.050015:rp_read_config (ioh3420, @0xe0, len=0x4) 0xc8039001
32481@1536163120.050040:rp_read_config (ioh3420, @0xe4, len=0x4) 0xb
32481@1536163120.050072:rp_write_config (ioh3420, @0xe4, 0x8, len=0x2)
32481@1536163120.068096:rp_read_config (ioh3420, @0x0, len=0x2) 0x8086 Device ID
32481@1536163120.068157:rp_read_config (ioh3420, @0x2, len=0x2) 0x3420 Vendor ID
32481@1536163120.068194:rp_read_config (ioh3420, @0x8, len=0x1) 0x2 Revision
32481@1536163120.068222:rp_read_config (ioh3420, @0x9, len=0x1) 0x0 Class Code
32481@1536163120.068250:rp_read_config (ioh3420, @0xa, len=0x1) 0x4 +1 Class Code
32481@1536163120.068284:rp_read_config (ioh3420, @0xb, len=0x1) 0x6 +2 Class Code
32481@1536163120.068309:rp_read_config (ioh3420, @0xe, len=0x1) 0x1 Header Type
32481@1536163120.068333:rp_read_config (ioh3420, @0x6, len=0x2) 0x10 Status
32481@1536163120.068361:rp_read_config (ioh3420, @0x34, len=0x1) 0xe0 Capabilities Pointer
32481@1536163120.068395:rp_read_config (ioh3420, @0xe0, len=0x2) 0x9001
32481@1536163120.068421:rp_read_config (ioh3420, @0x90, len=0x2) 0x6010 PCI Express
32481@1536163120.068446:rp_read_config (ioh3420, @0x60, len=0x2) 0x4005 Message Signaled Interrupts
32481@1536163120.068471:rp_read_config (ioh3420, @0x40, len=0x2) 0xd Bridge subsystem vendor/device ID
32481@1536163120.068495:rp_read_config (ioh3420, @0x44, len=0x2) 0x8086
32481@1536163120.068519:rp_read_config (ioh3420, @0x46, len=0x2) 0x0
32481@1536163120.068547:rp_read_config (ioh3420, @0xe4, len=0x2) 0x8
32481@1536163120.068575:rp_write_config (ioh3420, @0x10, 0x0, len=0x4) BAR0
32481@1536163120.068607:rp_write_config (ioh3420, @0x14, 0x0, len=0x4) BAR1
32481@1536163120.068636:rp_write_config (ioh3420, @0x1c, 0xff, len=0x2) I/O Base
32481@1536163120.069825:rp_write_config (ioh3420, @0x20, 0xfc10fc00, len=0x4) Memory Base
32481@1536163120.070928:rp_write_config (ioh3420, @0x24, 0xfeb0fea0, len=0x4) Prefetchable Memory Base
32481@1536163120.071968:rp_write_config (ioh3420, @0x28, 0x0, len=0x4) Prefetchable Base Upper 32 Bits
32481@1536163120.072946:rp_write_config (ioh3420, @0x2c, 0x0, len=0x4) Prefetchable Limit Upper 32 Bits
32481@1536163120.073901:rp_write_config (ioh3420, @0x30, 0x0, len=0x4) I/O Base Upper 16 Bits
32481@1536163120.074969:rp_write_config (ioh3420, @0x38, 0x0, len=0x4)
32481@1536163120.075006:rp_write_config (ioh3420, @0x3c, 0x0, len=0x1) Interrupt Line
32481@1536163120.075028:rp_write_config (ioh3420, @0x3e, 0x2, len=0x2) Bridge Control
32481@1536163120.075996:rp_read_config (ioh3420, @0x3e, len=0x2) 0x2 Bridge Control
32481@1536163120.076028:rp_write_config (ioh3420, @0x18, 0x0, len=0x1) Primary Bus Number
32481@1536163120.076051:rp_write_config (ioh3420, @0x19, 0x1, len=0x1) Secondary Bus Number
32481@1536163120.076074:rp_write_config (ioh3420, @0x1a, 0x1, len=0x1) Subordiante Bus Number
32481@1536163120.076097:rp_write_config (ioh3420, @0xc, 0x0, len=0x1) Cacheline Size
32481@1536163120.076118:rp_write_config (ioh3420, @0xd, 0x0, len=0x1) Latency Timer
32481@1536163120.076137:rp_write_config (ioh3420, @0x4, 0x500, len=0x2) Command
32481@1536163120.077194:rp_write_config (ioh3420, @0x98, 0x7, len=0x2) PCIE: Device Control
32481@1536163120.077225:rp_write_config (ioh3420, @0xb8, 0x0, len=0x2) PCIE: Device Control 2
32481@1536163120.077246:rp_read_config (ioh3420, @0x4, len=0x2) 0x500 Command
32481@1536163120.077270:rp_write_config (ioh3420, @0x4, 0x506, len=0x2) Command
32481@1536163120.078918:rp_write_config (ioh3420, @0x6, 0xf900, len=0x2) Status
32481@1536163120.078950:rp_write_config (ioh3420, @0x1e, 0xf900, len=0x2) Secondary Status
32481@1536163120.078972:rp_read_config (ioh3420, @0x4, len=0x2) 0x506 Command
32481@1536163120.078995:rp_write_config (ioh3420, @0x4, 0x506, len=0x2) Command
32481@1536163120.079701:rp_read_config (ioh3420, @0x62, len=0x2) 0x102 MSI: Message Control
32481@1536163120.079722:rp_write_config (ioh3420, @0x62, 0x102, len=0x2) MSI: Message Control
32481@1536163120.079739:rp_read_config (ioh3420, @0x62, len=0x2) 0x102 MSI: Message Control
32481@1536163120.079753:rp_write_config (ioh3420, @0x64, 0xfee0100c, len=0x4) MSI: Message Address
32481@1536163120.079770:rp_write_config (ioh3420, @0x68, 0x4950, len=0x2) MSI: Message Upper Address
32481@1536163120.079786:rp_write_config (ioh3420, @0x6c, 0xfffffffe, len=0x4) MSI: Message Data
32481@1536163120.079801:rp_write_config (ioh3420, @0x62, 0x103, len=0x2) MSI: Message Control
32481@1536163120.079855:rp_write_config (ioh3420, @0xac, 0x0, len=0x2) PCIE: Root Control
32481@1536163120.079872:rp_write_config (ioh3420, @0xa0, 0x0, len=0x2) PCIE: Link Control
32481@1536163120.079887:rp_read_config (ioh3420, @0x98, len=0x2) 0x7 PCIE: Device Control
32481@1536163120.079903:rp_write_config (ioh3420, @0x98, 0x7, len=0x2) PCIE: Device Control
32481@1536163120.079918:rp_read_config (ioh3420, @0xb0, len=0x4) 0x0 PCIE: Root Status
32481@1536163120.079934:rp_write_config (ioh3420, @0x12c, 0x7, len=0x4) AER: Root Error Command
32481@1536163120.079950:rp_write_config (ioh3420, @0xac, 0x8, len=0x2) PCIE: Root Control
NET._PS0
32481@1536163120.175514:rp_write_config (ioh3420, @0xb8, 0x0, len=0x2) PCIE: Device Control 2
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2018-09-05 20:33 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-24 3:31 Rewriting Intel PCI bridge prefetch base address bits solves nvidia graphics issues Daniel Drake
2018-08-24 3:31 ` Daniel Drake
2018-08-24 15:42 ` [Nouveau] " Peter Wu
2018-08-24 15:42 ` Peter Wu
2018-08-28 2:23 ` [Nouveau] " Daniel Drake
2018-08-28 2:23 ` Daniel Drake
2018-08-28 9:57 ` [Nouveau] " Peter Wu
2018-08-28 9:57 ` Peter Wu
2018-08-29 0:19 ` [Nouveau] " Karol Herbst
2018-08-29 0:19 ` Karol Herbst
2018-08-30 7:41 ` [Nouveau] " Daniel Drake
2018-08-30 7:41 ` Daniel Drake
2018-08-30 9:40 ` [Nouveau] " Peter Wu
2018-08-30 9:40 ` Peter Wu
2018-08-31 7:17 ` [Nouveau] " Daniel Drake
2018-08-31 7:17 ` Daniel Drake
2018-09-05 6:26 ` [Nouveau] " Daniel Drake
2018-09-05 6:26 ` Daniel Drake
2018-09-05 16:02 ` [Nouveau] " Peter Wu
2018-09-05 16:02 ` Peter Wu
2018-08-29 12:40 ` [Nouveau] " Karol Herbst
2018-08-29 12:40 ` Karol Herbst
2018-08-30 0:13 ` [Nouveau] " Karol Herbst
2018-08-30 0:13 ` Karol Herbst
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.