All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] PCI: Reprogram bridge prefetch registers on resume
@ 2018-09-07  5:36 ` Daniel Drake
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Drake @ 2018-09-07  5:36 UTC (permalink / raw)
  To: bhelgaas-hpIqsD4AKlfQT0dZR+AlfA
  Cc: mika.westerberg-VuQAYsv1563Yd54FQh9/CA,
	linux-pm-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w,
	nic_swsd-Rasf1IRRPZFBDgjK7y7TUQ,
	keith.busch-ral2JQCrhuEAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	andriy.shevchenko-VuQAYsv1563Yd54FQh9/CA,
	linux-6IF/jdPJHihWk0Htik3J/w, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	hkallweit1-Re5JQEeQqe8AvxtiuMwx3w

On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
after S3 suspend/resume. The affected products include multiple
generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
many errors such as:

    fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04 [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
    DRM: failed to idle channel 0 [DRM]

Similarly, the nvidia proprietary driver also fails after resume
(black screen, 100% CPU usage in Xorg process). We shipped a sample
to Nvidia for diagnosis, and their response indicated that it's a
problem with the parent PCI bridge (on the Intel SoC), not the GPU.

Runtime suspend/resume works fine, only S3 suspend is affected.

We found a workaround: on resume, rewrite the Intel PCI bridge
'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
the cases that I checked, this register has value 0 and we just have to
rewrite that value.

It's very strange that rewriting the exact same register value
makes a difference, but it definitely makes the issue go away.
It's not just acting as some kind of memory barrier, because rewriting
other bridge registers does not work around the issue. There's something
magic in this particular register. We have confirmed this on all
the affected models we have in-hands (X542UQ, UX533FD, X530UN, V272UN).

Additionally, this workaround solves an issue where r8169 MSI-X
interrupts were broken after S3 suspend/resume on Asus X441UAR. This
issue was recently worked around in commit 7bb05b85bc2d ("r8169:
don't use MSI-X on RTL8106e"). It also fixes the same issue on
RTL6186evl/8111evl on an Aimfor-tech laptop that we had not yet
patched. I suspect it will also fix the issue that was worked around in
commit 7c53a722459c ("r8169: don't use MSI-X on RTL8168g").

Thomas Martitz reports that this workaround also solves an issue where
the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
after S3 suspend/resume.

From our testing, the affected Intel PCI bridges are:
Intel Skylake PCIe Controller (x16) [8086:1901] (rev 05)
Intel Skylake PCIe Controller (x16) [8086:1901] (rev 07)
Intel Device [8086:31d8] (rev f3)
Intel Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 [8086:5ad6] (rev fb)
Intel Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port A #1 [8086:5ad8] (rev fb)
Intel Sunrise Point-LP PCI Express Root Port [8086:9d10] (rev f1)
Intel Sunrise Point-LP PCI Express Root Port #5 [8086:9d14] (rev f1)
Intel Device [8086:9dbc] (rev f0)

On resume, reprogram the PCI bridge prefetch registers, including the
magic register mentioned above.

This matches Win10 behaviour, which also rewrites these registers
during S3 resume (checked with qemu tracing).

Link: https://marc.info/?i=CAD8Lp46Y2eOR7WE28xToUL8s-aYiqPa0nS=1GSD0AxkddXq6+A@mail.gmail.com
Link: https://bugs.freedesktop.org/show_bug.cgi?id=105760
Signed-off-by: Daniel Drake <drake@endlessm.com>
---

Notes:
    Replaces patch:
       PCI: add prefetch quirk to work around Asus/Nvidia suspend issues
    
    Below is the list of Asus products with Intel/Nvidia that we
    believe are affected by the GPU resume issue.
    
    I revised my counting method from my last patch to eliminate duplicate
    platforms that had multiple SKUs with the same DMI/GPU/bridge, that's why
    the product count reduced from 43 to 38.
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: FX502VD
    product_name: FX502VD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev ff) (prog-if ff)
    	!!! Unknown header type 7f
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: FX570UD
    product_name: ASUS Gaming FX570UD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1f40]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: GL553VD
    product_name: GL553VD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:15e0]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: GL753VD
    product_name: GL753VD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1590]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: K401UQK
    product_name: K401UQK
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:14b0]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P1440UF
    product_name: ASUSPRO P1440UF
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:174d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1f10]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P2440UQ
    product_name: P2440UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:13ce]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P2540NV
    product_name: P2540NV
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:17f0]
    00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:5ad8] (rev fb) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P2540UV
    product_name: P2540UV
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:132e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P4540UQ
    product_name: P4540UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1650]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: UX331UN
    product_name: UX331UN
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1d12] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:15de]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: UX410UQK
    product_name: UX410UQK
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:138e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: UX430UQ
    product_name: UX430UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:139e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: UX533FD
    product_name: ZenBook UX533FD_UX533FD
    02:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:14a1]
    00:1c.4 PCI bridge [0604]: Intel Corporation Device [8086:9dbc] (rev f0) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: V221ID
    product_name: V221ID
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:15f0]
    00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:5ad8] (rev fb) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: V272UN
    product_name: Vivo AIO 27 V272UN
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1d10] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:17be]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X430UN
    product_name: VivoBook S14 X430UN
    01:00.0 3D controller [0302]: NVIDIA Corporation GP108M [GeForce MX150] [10de:1d10] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP108M [GeForce MX150] [1043:199e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X441MB
    product_name: X441MB
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:174e] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:171e]
    00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:31d8] (rev f3) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X456UF
    product_name: X456UF
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1346] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:245a]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X510UQ
    product_name: X510UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:145e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X530UN
    product_name: VivoBook S15 X530UN
    01:00.0 3D controller [0302]: NVIDIA Corporation GP108M [GeForce MX150] [10de:1d10] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP108M [GeForce MX150] [1043:18ce]
    00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X541UV
    product_name: X541UV
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:11ee]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X542UN
    product_name: X542UN
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1d10] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1b10]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X542UQ
    product_name: X542UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation GM108M [GeForce 940MX] [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. GM108M [GeForce 940MX] [1043:142e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X555UB
    product_name: X555UB
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1347] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:246a]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X555UQ
    product_name: X555UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation GM108M [GeForce 940MX] [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. GM108M [GeForce 940MX] [1043:246a]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X556URK
    product_name: X556URK
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134e] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1490]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X570ZD
    product_name: VivoBook_ASUS Laptop X570ZD
    01:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:11d1]
    00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:15d3] (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X580GD
    product_name: VivoBook_ASUSLaptop X580GD_X580GD
    01:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:1fc0]
    00:01.0 PCI bridge [0604]: Intel Corporation Skylake PCIe Controller (x16) [8086:1901] (rev 07) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X580VD
    product_name: X580VD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1a10]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X705FD
    product_name: VivoBook Pro 17 X705FD_X705FD
    02:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:1431]
    00:1c.4 PCI bridge [0604]: Intel Corporation Device [8086:9dbc] (rev f0) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X705UD
    product_name: X705UD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1b30]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X705UQ
    product_name: X705UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:148e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X751NV
    product_name: X751NV
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:13be]
    00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:5ad8] (rev fb) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: Z240IE
    product_name: Z240IE
    01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1c8d] (rev a1) (prog-if 00 [VGA controller])
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1750]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: ZN220IC-K
    product_name: ZN220IC-K
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134e] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:117e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: ZN241IC
    product_name: ZN241IC
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1900]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: ZN270IE
    product_name: ZN270IE
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1720]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])

 drivers/pci/pci-driver.c | 14 ++++++++++++++
 drivers/pci/setup-bus.c  |  2 +-
 include/linux/pci.h      |  1 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index bef17c3fca67..034f816570ad 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -524,6 +524,20 @@ static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
 	pci_power_up(pci_dev);
 	pci_restore_state(pci_dev);
 	pci_pme_restore(pci_dev);
+
+	/*
+	 * Redo the PCI bridge prefetch register setup.
+	 *
+	 * This works around an Intel PCI bridge issue seen on Asus and HP
+	 * laptops, where the GPU is not usable after S3 resume.
+	 * Even though PCI bridge register contents appear to be intact
+	 * at resume time, rewriting the value of PREF_BASE_UPPER32 is
+	 * required to make the GPU work.
+	 * Windows 10 also reprograms these registers during S3 resume.
+	 */
+	if (pci_dev->class == PCI_CLASS_BRIDGE_PCI << 8)
+		pci_setup_bridge_mmio_pref(pci_dev);
+
 	pci_fixup_device(pci_fixup_resume_early, pci_dev);
 }
 
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 79b1824e83b4..cb88288d2a69 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -630,7 +630,7 @@ static void pci_setup_bridge_mmio(struct pci_dev *bridge)
 	pci_write_config_dword(bridge, PCI_MEMORY_BASE, l);
 }
 
-static void pci_setup_bridge_mmio_pref(struct pci_dev *bridge)
+void pci_setup_bridge_mmio_pref(struct pci_dev *bridge)
 {
 	struct resource *res;
 	struct pci_bus_region region;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e72ca8dd6241..b15828fc26a4 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -934,6 +934,7 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn);
 void pci_device_add(struct pci_dev *dev, struct pci_bus *bus);
 unsigned int pci_scan_child_bus(struct pci_bus *bus);
 void pci_bus_add_device(struct pci_dev *dev);
+void pci_setup_bridge_mmio_pref(struct pci_dev *bridge);
 void pci_read_bridge_bases(struct pci_bus *child);
 struct resource *pci_find_parent_resource(const struct pci_dev *dev,
 					  struct resource *res);
-- 
2.17.1

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH] PCI: Reprogram bridge prefetch registers on resume
@ 2018-09-07  5:36 ` Daniel Drake
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Drake @ 2018-09-07  5:36 UTC (permalink / raw)
  To: bhelgaas
  Cc: linux-pci, linux, nouveau, linux-pm, peter, kherbst,
	andriy.shevchenko, rafael.j.wysocki, keith.busch,
	mika.westerberg, jonathan.derrick, kugel, davem, hkallweit1,
	netdev, nic_swsd

On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
after S3 suspend/resume. The affected products include multiple
generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
many errors such as:

    fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04 [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
    DRM: failed to idle channel 0 [DRM]

Similarly, the nvidia proprietary driver also fails after resume
(black screen, 100% CPU usage in Xorg process). We shipped a sample
to Nvidia for diagnosis, and their response indicated that it's a
problem with the parent PCI bridge (on the Intel SoC), not the GPU.

Runtime suspend/resume works fine, only S3 suspend is affected.

We found a workaround: on resume, rewrite the Intel PCI bridge
'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
the cases that I checked, this register has value 0 and we just have to
rewrite that value.

It's very strange that rewriting the exact same register value
makes a difference, but it definitely makes the issue go away.
It's not just acting as some kind of memory barrier, because rewriting
other bridge registers does not work around the issue. There's something
magic in this particular register. We have confirmed this on all
the affected models we have in-hands (X542UQ, UX533FD, X530UN, V272UN).

Additionally, this workaround solves an issue where r8169 MSI-X
interrupts were broken after S3 suspend/resume on Asus X441UAR. This
issue was recently worked around in commit 7bb05b85bc2d ("r8169:
don't use MSI-X on RTL8106e"). It also fixes the same issue on
RTL6186evl/8111evl on an Aimfor-tech laptop that we had not yet
patched. I suspect it will also fix the issue that was worked around in
commit 7c53a722459c ("r8169: don't use MSI-X on RTL8168g").

Thomas Martitz reports that this workaround also solves an issue where
the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
after S3 suspend/resume.

>From our testing, the affected Intel PCI bridges are:
Intel Skylake PCIe Controller (x16) [8086:1901] (rev 05)
Intel Skylake PCIe Controller (x16) [8086:1901] (rev 07)
Intel Device [8086:31d8] (rev f3)
Intel Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 [8086:5ad6] (rev fb)
Intel Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port A #1 [8086:5ad8] (rev fb)
Intel Sunrise Point-LP PCI Express Root Port [8086:9d10] (rev f1)
Intel Sunrise Point-LP PCI Express Root Port #5 [8086:9d14] (rev f1)
Intel Device [8086:9dbc] (rev f0)

On resume, reprogram the PCI bridge prefetch registers, including the
magic register mentioned above.

This matches Win10 behaviour, which also rewrites these registers
during S3 resume (checked with qemu tracing).

Link: https://marc.info/?i=CAD8Lp46Y2eOR7WE28xToUL8s-aYiqPa0nS=1GSD0AxkddXq6+A@mail.gmail.com
Link: https://bugs.freedesktop.org/show_bug.cgi?id=105760
Signed-off-by: Daniel Drake <drake@endlessm.com>
---

Notes:
    Replaces patch:
       PCI: add prefetch quirk to work around Asus/Nvidia suspend issues
    
    Below is the list of Asus products with Intel/Nvidia that we
    believe are affected by the GPU resume issue.
    
    I revised my counting method from my last patch to eliminate duplicate
    platforms that had multiple SKUs with the same DMI/GPU/bridge, that's why
    the product count reduced from 43 to 38.
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: FX502VD
    product_name: FX502VD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev ff) (prog-if ff)
    	!!! Unknown header type 7f
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: FX570UD
    product_name: ASUS Gaming FX570UD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1f40]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: GL553VD
    product_name: GL553VD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:15e0]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: GL753VD
    product_name: GL753VD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1590]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: K401UQK
    product_name: K401UQK
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:14b0]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P1440UF
    product_name: ASUSPRO P1440UF
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:174d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1f10]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P2440UQ
    product_name: P2440UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:13ce]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P2540NV
    product_name: P2540NV
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:17f0]
    00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:5ad8] (rev fb) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P2540UV
    product_name: P2540UV
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:132e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: P4540UQ
    product_name: P4540UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1650]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: UX331UN
    product_name: UX331UN
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1d12] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:15de]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: UX410UQK
    product_name: UX410UQK
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:138e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: UX430UQ
    product_name: UX430UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:139e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: UX533FD
    product_name: ZenBook UX533FD_UX533FD
    02:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:14a1]
    00:1c.4 PCI bridge [0604]: Intel Corporation Device [8086:9dbc] (rev f0) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: V221ID
    product_name: V221ID
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:15f0]
    00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:5ad8] (rev fb) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: V272UN
    product_name: Vivo AIO 27 V272UN
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1d10] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:17be]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X430UN
    product_name: VivoBook S14 X430UN
    01:00.0 3D controller [0302]: NVIDIA Corporation GP108M [GeForce MX150] [10de:1d10] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP108M [GeForce MX150] [1043:199e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X441MB
    product_name: X441MB
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:174e] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:171e]
    00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:31d8] (rev f3) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X456UF
    product_name: X456UF
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1346] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:245a]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X510UQ
    product_name: X510UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:145e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X530UN
    product_name: VivoBook S15 X530UN
    01:00.0 3D controller [0302]: NVIDIA Corporation GP108M [GeForce MX150] [10de:1d10] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP108M [GeForce MX150] [1043:18ce]
    00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express Root Port [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X541UV
    product_name: X541UV
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:11ee]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X542UN
    product_name: X542UN
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1d10] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1b10]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X542UQ
    product_name: X542UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation GM108M [GeForce 940MX] [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. GM108M [GeForce 940MX] [1043:142e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X555UB
    product_name: X555UB
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1347] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:246a]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X555UQ
    product_name: X555UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation GM108M [GeForce 940MX] [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. GM108M [GeForce 940MX] [1043:246a]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X556URK
    product_name: X556URK
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134e] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1490]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X570ZD
    product_name: VivoBook_ASUS Laptop X570ZD
    01:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:11d1]
    00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:15d3] (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X580GD
    product_name: VivoBook_ASUSLaptop X580GD_X580GD
    01:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:1fc0]
    00:01.0 PCI bridge [0604]: Intel Corporation Skylake PCIe Controller (x16) [8086:1901] (rev 07) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X580VD
    product_name: X580VD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1a10]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X705FD
    product_name: VivoBook Pro 17 X705FD_X705FD
    02:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. GP107M [GeForce GTX 1050 Mobile] [1043:1431]
    00:1c.4 PCI bridge [0604]: Intel Corporation Device [8086:9dbc] (rev f0) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X705UD
    product_name: X705UD
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev a1)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1b30]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X705UQ
    product_name: X705UQ
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:148e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: X751NV
    product_name: X751NV
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134f] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:13be]
    00:13.0 PCI bridge [0604]: Intel Corporation Device [8086:5ad8] (rev fb) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: Z240IE
    product_name: Z240IE
    01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1c8d] (rev a1) (prog-if 00 [VGA controller])
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1750]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: ZN220IC-K
    product_name: ZN220IC-K
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134e] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:117e]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: ZN241IC
    product_name: ZN241IC
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1900]
    00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) (prog-if 00 [Normal decode])
    
    sys_vendor: ASUSTeK COMPUTER INC.
    board_name: ZN270IE
    product_name: ZN270IE
    01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:134d] (rev a2)
    	Subsystem: ASUSTeK Computer Inc. Device [1043:1720]
    00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])

 drivers/pci/pci-driver.c | 14 ++++++++++++++
 drivers/pci/setup-bus.c  |  2 +-
 include/linux/pci.h      |  1 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index bef17c3fca67..034f816570ad 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -524,6 +524,20 @@ static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
 	pci_power_up(pci_dev);
 	pci_restore_state(pci_dev);
 	pci_pme_restore(pci_dev);
+
+	/*
+	 * Redo the PCI bridge prefetch register setup.
+	 *
+	 * This works around an Intel PCI bridge issue seen on Asus and HP
+	 * laptops, where the GPU is not usable after S3 resume.
+	 * Even though PCI bridge register contents appear to be intact
+	 * at resume time, rewriting the value of PREF_BASE_UPPER32 is
+	 * required to make the GPU work.
+	 * Windows 10 also reprograms these registers during S3 resume.
+	 */
+	if (pci_dev->class == PCI_CLASS_BRIDGE_PCI << 8)
+		pci_setup_bridge_mmio_pref(pci_dev);
+
 	pci_fixup_device(pci_fixup_resume_early, pci_dev);
 }
 
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 79b1824e83b4..cb88288d2a69 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -630,7 +630,7 @@ static void pci_setup_bridge_mmio(struct pci_dev *bridge)
 	pci_write_config_dword(bridge, PCI_MEMORY_BASE, l);
 }
 
-static void pci_setup_bridge_mmio_pref(struct pci_dev *bridge)
+void pci_setup_bridge_mmio_pref(struct pci_dev *bridge)
 {
 	struct resource *res;
 	struct pci_bus_region region;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e72ca8dd6241..b15828fc26a4 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -934,6 +934,7 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn);
 void pci_device_add(struct pci_dev *dev, struct pci_bus *bus);
 unsigned int pci_scan_child_bus(struct pci_bus *bus);
 void pci_bus_add_device(struct pci_dev *dev);
+void pci_setup_bridge_mmio_pref(struct pci_dev *bridge);
 void pci_read_bridge_bases(struct pci_bus *child);
 struct resource *pci_find_parent_resource(const struct pci_dev *dev,
 					  struct resource *res);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Nouveau] [PATCH] PCI: Reprogram bridge prefetch registers on resume
  2018-09-07  5:36 ` Daniel Drake
  (?)
@ 2018-09-07  5:49 ` Lukas Wunner
  -1 siblings, 0 replies; 18+ messages in thread
From: Lukas Wunner @ 2018-09-07  5:49 UTC (permalink / raw)
  To: Daniel Drake
  Cc: bhelgaas, mika.westerberg, linux-pm, linux-pci, rafael.j.wysocki,
	nic_swsd, keith.busch, netdev, nouveau, andriy.shevchenko, linux,
	davem, jonathan.derrick, hkallweit1

On Fri, Sep 07, 2018 at 01:36:14PM +0800, Daniel Drake wrote:
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -934,6 +934,7 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn);
>  void pci_device_add(struct pci_dev *dev, struct pci_bus *bus);
>  unsigned int pci_scan_child_bus(struct pci_bus *bus);
>  void pci_bus_add_device(struct pci_dev *dev);
> +void pci_setup_bridge_mmio_pref(struct pci_dev *bridge);
>  void pci_read_bridge_bases(struct pci_bus *child);
>  struct resource *pci_find_parent_resource(const struct pci_dev *dev,
>  					  struct resource *res);

Since this is only used internally in the PCI core, the declaration
can live in drivers/pci/pci.h.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
  2018-09-07  5:36 ` Daniel Drake
  (?)
  (?)
@ 2018-09-07  6:40 ` Sinan Kaya
       [not found]   ` <3b37e4fd-c793-bd52-7d70-950f846a7d5a-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
  -1 siblings, 1 reply; 18+ messages in thread
From: Sinan Kaya @ 2018-09-07  6:40 UTC (permalink / raw)
  To: Daniel Drake, bhelgaas
  Cc: linux-pci, linux, nouveau, linux-pm, peter, kherbst,
	andriy.shevchenko, rafael.j.wysocki, keith.busch,
	mika.westerberg, jonathan.derrick, kugel, davem, hkallweit1,
	netdev, nic_swsd

On 9/6/2018 10:36 PM, Daniel Drake wrote:
> +	if (pci_dev->class == PCI_CLASS_BRIDGE_PCI << 8)
> +		pci_setup_bridge_mmio_pref(pci_dev);

This should probably some kind of a quirk rather than default
for the listed card as it sounds like you are dealing with
broken hardware.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
  2018-09-07  6:40 ` Sinan Kaya
@ 2018-09-07  8:06       ` Daniel Drake
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Drake @ 2018-09-07  8:06 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Mika Westerberg, Linux PM, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	Wysocki, Rafael J, nic_swsd-Rasf1IRRPZFBDgjK7y7TUQ, Keith Busch,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Bjorn Helgaas,
	Andy Shevchenko, Linux Upstreaming Team,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, Jon Derrick,
	hkallweit1-Re5JQEeQqe8AvxtiuMwx3w

On Fri, Sep 7, 2018 at 2:40 PM, Sinan Kaya <okaya@kernel.org> wrote:
> On 9/6/2018 10:36 PM, Daniel Drake wrote:
>>
>> +       if (pci_dev->class == PCI_CLASS_BRIDGE_PCI << 8)
>> +               pci_setup_bridge_mmio_pref(pci_dev);
>
>
> This should probably some kind of a quirk rather than default
> for the listed card as it sounds like you are dealing with
> broken hardware.

With that approach there's a sizeable list that your quirk list is
incomplete or out of date.

And when the bug bites, it's extremely cryptic. We've spent months
working on this issue and only found this magic register write mostly
through a stroke of good luck. Separately there's been a flurry of
mails around the r8169 MSI-X problem but as far as I can see nobody
suggested even looking at the values of the parent bridge prefetch
registers. And even if they did, they'd probably have said "values are
fine, nothing to see here" (exactly as we did 4 months ago when Nvidia
mentioned these registers as a possible cause - oops!).

So here I'm instead following a suggestion from Bjorn, after also
having confirmed the windows behaviour:

https://marc.info/?l=linux-pci&m=153574276126484&w=2
> Can we tell whether Windows rewrites this register unconditionally at
> resume-time?  If so, it may be more robust for Linux to do the same.
> The whole thing is black magic, which I hate, but if it's our only
> choice, it may be better to have this applied everywhere so we don't
> keep stubbing our toes on new systems that require the quirk.

Also, we just spoke to Asus BIOS engineers who told us that the BIOS
does already restore the PCI bridge prefetch registers on resume. I
guess this is why the other registers like the (non-zero) prefetch
base address lower 32 bits do have the right value on resume even
before my patch. It sounds like a more subtle bug related to register
write timing or sequence, in that case it will be harder to define who
is responsible for the breakage and hence under which conditions the
quirk should apply.

Daniel
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
@ 2018-09-07  8:06       ` Daniel Drake
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Drake @ 2018-09-07  8:06 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Bjorn Helgaas, linux-pci, Linux Upstreaming Team, nouveau,
	Linux PM, Peter Wu, kherbst, Andy Shevchenko, Wysocki, Rafael J,
	Keith Busch, Mika Westerberg, Jon Derrick, Thomas Martitz, davem,
	hkallweit1, netdev, nic_swsd

On Fri, Sep 7, 2018 at 2:40 PM, Sinan Kaya <okaya@kernel.org> wrote:
> On 9/6/2018 10:36 PM, Daniel Drake wrote:
>>
>> +       if (pci_dev->class == PCI_CLASS_BRIDGE_PCI << 8)
>> +               pci_setup_bridge_mmio_pref(pci_dev);
>
>
> This should probably some kind of a quirk rather than default
> for the listed card as it sounds like you are dealing with
> broken hardware.

With that approach there's a sizeable list that your quirk list is
incomplete or out of date.

And when the bug bites, it's extremely cryptic. We've spent months
working on this issue and only found this magic register write mostly
through a stroke of good luck. Separately there's been a flurry of
mails around the r8169 MSI-X problem but as far as I can see nobody
suggested even looking at the values of the parent bridge prefetch
registers. And even if they did, they'd probably have said "values are
fine, nothing to see here" (exactly as we did 4 months ago when Nvidia
mentioned these registers as a possible cause - oops!).

So here I'm instead following a suggestion from Bjorn, after also
having confirmed the windows behaviour:

https://marc.info/?l=linux-pci&m=153574276126484&w=2
> Can we tell whether Windows rewrites this register unconditionally at
> resume-time?  If so, it may be more robust for Linux to do the same.
> The whole thing is black magic, which I hate, but if it's our only
> choice, it may be better to have this applied everywhere so we don't
> keep stubbing our toes on new systems that require the quirk.

Also, we just spoke to Asus BIOS engineers who told us that the BIOS
does already restore the PCI bridge prefetch registers on resume. I
guess this is why the other registers like the (non-zero) prefetch
base address lower 32 bits do have the right value on resume even
before my patch. It sounds like a more subtle bug related to register
write timing or sequence, in that case it will be harder to define who
is responsible for the breakage and hence under which conditions the
quirk should apply.

Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
  2018-09-07  5:36 ` Daniel Drake
@ 2018-09-07 15:05     ` Peter Wu
  -1 siblings, 0 replies; 18+ messages in thread
From: Peter Wu @ 2018-09-07 15:05 UTC (permalink / raw)
  To: Daniel Drake
  Cc: mika.westerberg-VuQAYsv1563Yd54FQh9/CA,
	linux-pm-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w,
	nic_swsd-Rasf1IRRPZFBDgjK7y7TUQ,
	keith.busch-ral2JQCrhuEAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	andriy.shevchenko-VuQAYsv1563Yd54FQh9/CA,
	linux-6IF/jdPJHihWk0Htik3J/w, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	hkallweit1-Re5JQEeQqe8AvxtiuMwx3w

On Fri, Sep 07, 2018 at 01:36:14PM +0800, Daniel Drake wrote:
<..>
> Thomas Martitz reports that this workaround also solves an issue where
> the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
> after S3 suspend/resume.

Where was this claimed? It is not stated in the linked bug:
(https://bugs.freedesktop.org/show_bug.cgi?id=105760

> On resume, reprogram the PCI bridge prefetch registers, including the
> magic register mentioned above.
> 
> This matches Win10 behaviour, which also rewrites these registers
> during S3 resume (checked with qemu tracing).

Windows 10 unconditionally rewrites these registers (BAR, I/O Base +
Limit, Memory Base + Limit, etc. from top to bottom), see annotations:
https://www.spinics.net/lists/linux-pci/msg75856.html

Linux has a generic "restore" operation that works backwards from the
end of the PCI config space to the beginning, see
pci_restore_config_space. Do you have a dmesg where you see the
"restoring config space at offset" messages?

Would it be reasonable to unconditionally write these registers in
pci_restore_config_dword, like Windows does?

Kind regards,
Peter
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
@ 2018-09-07 15:05     ` Peter Wu
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Wu @ 2018-09-07 15:05 UTC (permalink / raw)
  To: Daniel Drake
  Cc: bhelgaas, linux-pci, linux, nouveau, linux-pm, kherbst,
	andriy.shevchenko, rafael.j.wysocki, keith.busch,
	mika.westerberg, jonathan.derrick, kugel, davem, hkallweit1,
	netdev, nic_swsd

On Fri, Sep 07, 2018 at 01:36:14PM +0800, Daniel Drake wrote:
<..>
> Thomas Martitz reports that this workaround also solves an issue where
> the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
> after S3 suspend/resume.

Where was this claimed? It is not stated in the linked bug:
(https://bugs.freedesktop.org/show_bug.cgi?id=105760

> On resume, reprogram the PCI bridge prefetch registers, including the
> magic register mentioned above.
> 
> This matches Win10 behaviour, which also rewrites these registers
> during S3 resume (checked with qemu tracing).

Windows 10 unconditionally rewrites these registers (BAR, I/O Base +
Limit, Memory Base + Limit, etc. from top to bottom), see annotations:
https://www.spinics.net/lists/linux-pci/msg75856.html

Linux has a generic "restore" operation that works backwards from the
end of the PCI config space to the beginning, see
pci_restore_config_space. Do you have a dmesg where you see the
"restoring config space at offset" messages?

Would it be reasonable to unconditionally write these registers in
pci_restore_config_dword, like Windows does?

Kind regards,
Peter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
  2018-09-07  5:36 ` Daniel Drake
                   ` (3 preceding siblings ...)
  (?)
@ 2018-09-07 21:48 ` Bjorn Helgaas
  -1 siblings, 0 replies; 18+ messages in thread
From: Bjorn Helgaas @ 2018-09-07 21:48 UTC (permalink / raw)
  To: Daniel Drake
  Cc: bhelgaas, linux-pci, linux, nouveau, linux-pm, peter, kherbst,
	andriy.shevchenko, rafael.j.wysocki, keith.busch,
	mika.westerberg, jonathan.derrick, kugel, davem, hkallweit1,
	netdev, nic_swsd, linux-kernel

[+cc LKML]

On Fri, Sep 07, 2018 at 01:36:14PM +0800, Daniel Drake wrote:
> On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
> after S3 suspend/resume. The affected products include multiple
> generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
> many errors such as:
> 
>     fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04 [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
>     DRM: failed to idle channel 0 [DRM]
> 
> Similarly, the nvidia proprietary driver also fails after resume
> (black screen, 100% CPU usage in Xorg process). We shipped a sample
> to Nvidia for diagnosis, and their response indicated that it's a
> problem with the parent PCI bridge (on the Intel SoC), not the GPU.
> 
> Runtime suspend/resume works fine, only S3 suspend is affected.
> 
> We found a workaround: on resume, rewrite the Intel PCI bridge
> 'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
> the cases that I checked, this register has value 0 and we just have to
> rewrite that value.
> 
> It's very strange that rewriting the exact same register value
> makes a difference, but it definitely makes the issue go away.
> It's not just acting as some kind of memory barrier, because rewriting
> other bridge registers does not work around the issue. There's something
> magic in this particular register. We have confirmed this on all
> the affected models we have in-hands (X542UQ, UX533FD, X530UN, V272UN).
> 
> Additionally, this workaround solves an issue where r8169 MSI-X
> interrupts were broken after S3 suspend/resume on Asus X441UAR. This
> issue was recently worked around in commit 7bb05b85bc2d ("r8169:
> don't use MSI-X on RTL8106e"). It also fixes the same issue on
> RTL6186evl/8111evl on an Aimfor-tech laptop that we had not yet
> patched. I suspect it will also fix the issue that was worked around in
> commit 7c53a722459c ("r8169: don't use MSI-X on RTL8168g").

This is crazy.  I would think *lots* of devices, like anything that
uses prefetchable memory, would be affected by this.

> Thomas Martitz reports that this workaround also solves an issue where
> the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
> after S3 suspend/resume.
> 
> From our testing, the affected Intel PCI bridges are:
> Intel Skylake PCIe Controller (x16) [8086:1901] (rev 05)
> Intel Skylake PCIe Controller (x16) [8086:1901] (rev 07)
> Intel Device [8086:31d8] (rev f3)
> Intel Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 [8086:5ad6] (rev fb)
> Intel Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port A #1 [8086:5ad8] (rev fb)
> Intel Sunrise Point-LP PCI Express Root Port [8086:9d10] (rev f1)
> Intel Sunrise Point-LP PCI Express Root Port #5 [8086:9d14] (rev f1)
> Intel Device [8086:9dbc] (rev f0)
> 
> On resume, reprogram the PCI bridge prefetch registers, including the
> magic register mentioned above.
> 
> This matches Win10 behaviour, which also rewrites these registers
> during S3 resume (checked with qemu tracing).
> 
> Link: https://marc.info/?i=CAD8Lp46Y2eOR7WE28xToUL8s-aYiqPa0nS=1GSD0AxkddXq6+A@mail.gmail.com

Can you use

  https://lkml.kernel.org/r/CAD8Lp46Y2eOR7WE28xToUL8s-aYiqPa0nS=1GSD0AxkddXq6+A@mail.gmail.com

instead, so we don't depend on marc.info?  lkml.kernel.org doesn't have
linux-pci archives, but it might someday, and it does redirect to other
archives already.

> Link: https://bugs.freedesktop.org/show_bug.cgi?id=105760

I would really like to have a bugzilla.kernel.org issue with the
excellent debugging you and Peter did attached.  Then we don't also
have to depend on github.com, etc., sticking around.

The list of platforms below could also be attached there, since you
went to a lot of trouble to collect it, but it's probably more than is
necessary for the changelog.

> Signed-off-by: Daniel Drake <drake@endlessm.com>
> ---
> 
> Notes:
>     Replaces patch:
>        PCI: add prefetch quirk to work around Asus/Nvidia suspend issues
>     
>     Below is the list of Asus products with Intel/Nvidia that we
>     believe are affected by the GPU resume issue.
>     
>     I revised my counting method from my last patch to eliminate duplicate
>     platforms that had multiple SKUs with the same DMI/GPU/bridge, that's why
>     the product count reduced from 43 to 38.
>     
>     sys_vendor: ASUSTeK COMPUTER INC.
>     board_name: FX502VD
>     product_name: FX502VD
>     01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev ff) (prog-if ff)
>     	!!! Unknown header type 7f
>     00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) (prog-if 00 [Normal decode])
> ...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
  2018-09-07 15:05     ` Peter Wu
  (?)
@ 2018-09-07 22:26     ` Bjorn Helgaas
  2018-09-08 12:14       ` Peter Wu
  -1 siblings, 1 reply; 18+ messages in thread
From: Bjorn Helgaas @ 2018-09-07 22:26 UTC (permalink / raw)
  To: Peter Wu
  Cc: Daniel Drake, bhelgaas, linux-pci, linux, nouveau, linux-pm,
	kherbst, andriy.shevchenko, rafael.j.wysocki, keith.busch,
	mika.westerberg, jonathan.derrick, kugel, davem, hkallweit1,
	netdev, nic_swsd, linux-kernel, Dave Jones, Luming Yu

[+cc LKML, Dave, Luming]

On Fri, Sep 07, 2018 at 05:05:15PM +0200, Peter Wu wrote:
> On Fri, Sep 07, 2018 at 01:36:14PM +0800, Daniel Drake wrote:
> <..>
> > Thomas Martitz reports that this workaround also solves an issue where
> > the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
> > after S3 suspend/resume.
> 
> Where was this claimed? It is not stated in the linked bug:
> (https://bugs.freedesktop.org/show_bug.cgi?id=105760
> 
> > On resume, reprogram the PCI bridge prefetch registers, including the
> > magic register mentioned above.
> > 
> > This matches Win10 behaviour, which also rewrites these registers
> > during S3 resume (checked with qemu tracing).
> 
> Windows 10 unconditionally rewrites these registers (BAR, I/O Base +
> Limit, Memory Base + Limit, etc. from top to bottom), see annotations:
> https://www.spinics.net/lists/linux-pci/msg75856.html
> 
> Linux has a generic "restore" operation that works backwards from the
> end of the PCI config space to the beginning, see
> pci_restore_config_space. Do you have a dmesg where you see the
> "restoring config space at offset" messages?
> 
> Would it be reasonable to unconditionally write these registers in
> pci_restore_config_dword, like Windows does?

That sounds reasonable to me.

We did write them unconditionally, prior to 04d9c1a1100b ("[PATCH]
PCI: Improve PCI config space writeback") [1].  That commit apparently
fixed suspend on some laptop.

But at that time, we restored the config space in order of dword 0, 1,
2, ... 15, which means we restored the command register before the
BARs and windows, and it's conceivable that the problem was the
ordering, not the rewriting of the same value.

Only a week later, we reversed the order with 8b8c8d280ab2 ("[PATCH]
PCI: reverse pci config space restore order") [2], which seems like a
good idea and even includes a spec reference.  I found similar
language in the Intel ICH 10 datasheet, sec 14.1.3 [3].

So it seems reasonable to me to try writing them unconditionally in
reverse order (the same order we use today).

[1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=04d9c1a1100b
[2] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8b8c8d280ab2
[3] Intel ICH 10 IBL 373635

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
  2018-09-07 15:05     ` Peter Wu
@ 2018-09-08 11:05       ` Thomas Martitz
  -1 siblings, 0 replies; 18+ messages in thread
From: Thomas Martitz @ 2018-09-08 11:05 UTC (permalink / raw)
  To: Peter Wu, Daniel Drake
  Cc: nic_swsd-Rasf1IRRPZFBDgjK7y7TUQ,
	mika.westerberg-VuQAYsv1563Yd54FQh9/CA,
	linux-pm-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w,
	keith.busch-ral2JQCrhuEAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	andriy.shevchenko-VuQAYsv1563Yd54FQh9/CA,
	linux-6IF/jdPJHihWk0Htik3J/w, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	hkallweit1-Re5JQEeQqe8AvxtiuMwx3w

Am 07.09.18 um 17:05 schrieb Peter Wu:
> On Fri, Sep 07, 2018 at 01:36:14PM +0800, Daniel Drake wrote:
> <..>
>> Thomas Martitz reports that this workaround also solves an issue where
>> the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
>> after S3 suspend/resume.
> 
> Where was this claimed? It is not stated in the linked bug:
> (https://bugs.freedesktop.org/show_bug.cgi?id=105760
> 


Actually, I reported that https://patchwork.kernel.org/patch/10583229/ 
works. I updated the bug now, I didn't do so yet because it's closed.

However, I did not actually test the exact patch *this* thread is about. 
Do you want me to give it a spin?

Best regards.
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
@ 2018-09-08 11:05       ` Thomas Martitz
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Martitz @ 2018-09-08 11:05 UTC (permalink / raw)
  To: Peter Wu, Daniel Drake
  Cc: bhelgaas, linux-pci, linux, nouveau, linux-pm, kherbst,
	andriy.shevchenko, rafael.j.wysocki, keith.busch,
	mika.westerberg, jonathan.derrick, davem, hkallweit1, netdev,
	nic_swsd

Am 07.09.18 um 17:05 schrieb Peter Wu:
> On Fri, Sep 07, 2018 at 01:36:14PM +0800, Daniel Drake wrote:
> <..>
>> Thomas Martitz reports that this workaround also solves an issue where
>> the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
>> after S3 suspend/resume.
> 
> Where was this claimed? It is not stated in the linked bug:
> (https://bugs.freedesktop.org/show_bug.cgi?id=105760
> 


Actually, I reported that https://patchwork.kernel.org/patch/10583229/ 
works. I updated the bug now, I didn't do so yet because it's closed.

However, I did not actually test the exact patch *this* thread is about. 
Do you want me to give it a spin?

Best regards.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
  2018-09-07 22:26     ` Bjorn Helgaas
@ 2018-09-08 12:14       ` Peter Wu
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Wu @ 2018-09-08 12:14 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Daniel Drake, bhelgaas, linux-pci, linux, nouveau, linux-pm,
	kherbst, andriy.shevchenko, rafael.j.wysocki, keith.busch,
	mika.westerberg, jonathan.derrick, kugel, davem, hkallweit1,
	netdev, nic_swsd, linux-kernel, Dave Jones, Luming Yu

On Fri, Sep 07, 2018 at 05:26:47PM -0500, Bjorn Helgaas wrote:
> [+cc LKML, Dave, Luming]
> 
> On Fri, Sep 07, 2018 at 05:05:15PM +0200, Peter Wu wrote:
> > On Fri, Sep 07, 2018 at 01:36:14PM +0800, Daniel Drake wrote:
> > <..>
> > > Thomas Martitz reports that this workaround also solves an issue where
> > > the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
> > > after S3 suspend/resume.
> > 
> > Where was this claimed? It is not stated in the linked bug:
> > (https://bugs.freedesktop.org/show_bug.cgi?id=105760
> > 
> > > On resume, reprogram the PCI bridge prefetch registers, including the
> > > magic register mentioned above.
> > > 
> > > This matches Win10 behaviour, which also rewrites these registers
> > > during S3 resume (checked with qemu tracing).
> > 
> > Windows 10 unconditionally rewrites these registers (BAR, I/O Base +
> > Limit, Memory Base + Limit, etc. from top to bottom), see annotations:
> > https://www.spinics.net/lists/linux-pci/msg75856.html
> > 
> > Linux has a generic "restore" operation that works backwards from the
> > end of the PCI config space to the beginning, see
> > pci_restore_config_space. Do you have a dmesg where you see the
> > "restoring config space at offset" messages?
> > 
> > Would it be reasonable to unconditionally write these registers in
> > pci_restore_config_dword, like Windows does?
> 
> That sounds reasonable to me.
> 
> We did write them unconditionally, prior to 04d9c1a1100b ("[PATCH]
> PCI: Improve PCI config space writeback") [1].  That commit apparently
> fixed suspend on some laptop.
> 
> But at that time, we restored the config space in order of dword 0, 1,
> 2, ... 15, which means we restored the command register before the
> BARs and windows, and it's conceivable that the problem was the
> ordering, not the rewriting of the same value.
> 
> Only a week later, we reversed the order with 8b8c8d280ab2 ("[PATCH]
> PCI: reverse pci config space restore order") [2], which seems like a
> good idea and even includes a spec reference.  I found similar
> language in the Intel ICH 10 datasheet, sec 14.1.3 [3].
> 
> So it seems reasonable to me to try writing them unconditionally in
> reverse order (the same order we use today).

Some fields are readonly (device/vendor ID), others have side-effects
(write-to-clear semantics like Master Data Parity Error in the Status
register).

So in addition to always write the registers, what about writing only a
subset? Following the logic applied by Windows, for Type-1:

- Restore memory-related stuff:
  - Restore BAR0 (offset 0x10, dword).
  - Restore BAR1 (offset 0x14, dword).
  - Restore I/O Base + Limit (offset 0x1c, WORD, not dword).
  - Restore Memory Base + Limit (offset 0x20, dword).
  - Restore Prefetchable Memory Base + Limit (offset 0x24, dword).
  - Restore Prefetchable Base Upper 32 bits (offset 0x28, dword).
    (+wait for 500us, why?)
  - Restore Prefetchable Limit Upper 32 bits (offset 0x2c, dword).
    (+wait for 500us, why?)
  - Restore I/O Base + Limit Upper 32 bits (offset 0x30, dword).
    (+wait for 500us, why?)
  - Restore Expansion ROM Address (offset 0x38, dword).
- Restore at offset 0x3c:
  - Restore Interrupt Line (offset 0x3c, byte).
  - Restore Bridge Control (offset 0x3e, word).
    (+read value again, why? Some kind of flush?)
- Restore at offset 0x18:
  - Restore Primary Bus Number (offset 0x18, byte).
  - Restore Secondary Bus Number (offset 0x19, byte).
  - Restore Subordinate Bus Number (offset 0x1a, byte).
  - (Restore Secondary Status (offset 0x1b, byte)? Not seen in Windows.)
- Restore at offset 0x0c:
  - Cacheline Size (offset 0x0c, byte).
  - Primary Latency Timer (offset 0x0d, byte).
- Restore at offset 0x04:
  - Restore command register.
    (+wait for 500us)
  - (Actually Windows clears BM+Mem+I/O bits, then sets
    PCIE.DeviceControl=7, PCIE.DeviceControl2=0 and then enables
    BM+Mem+I/O again).
  - Restore (or write to clear?) Status (offset 0x06, word).
- Note that the following fields are omitted (mostly readonly):
  - Device ID + Vendor ID (offset 0x00, dword).
  - Revision ID + Class Code  (offset 0x08, dword).
  - Header Type + BIST (offset 0x0e, byte + byte).
  - Capabilities Pointer + Reserved (offset 0x34, dword).
  - Interrupt Pin (offset 0x3d, byte).

This is more complicated than the current loop, but the logic of writing
addresses first and then configuring others sound reasonable to me.

Kind regards,
Peter

> 
> [1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=04d9c1a1100b
> [2] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8b8c8d280ab2
> [3] Intel ICH 10 IBL 373635

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
  2018-09-07  5:36 ` Daniel Drake
@ 2018-09-10 19:57     ` Thomas Martitz
  -1 siblings, 0 replies; 18+ messages in thread
From: Thomas Martitz @ 2018-09-10 19:57 UTC (permalink / raw)
  To: Daniel Drake, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA
  Cc: nic_swsd-Rasf1IRRPZFBDgjK7y7TUQ,
	mika.westerberg-VuQAYsv1563Yd54FQh9/CA,
	linux-pm-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	rafael.j.wysocki-ral2JQCrhuEAvxtiuMwx3w,
	keith.busch-ral2JQCrhuEAvxtiuMwx3w,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	andriy.shevchenko-VuQAYsv1563Yd54FQh9/CA,
	linux-6IF/jdPJHihWk0Htik3J/w, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	jonathan.derrick-ral2JQCrhuEAvxtiuMwx3w,
	hkallweit1-Re5JQEeQqe8AvxtiuMwx3w

Hello Daniel,

Am 07.09.18 um 07:36 schrieb Daniel Drake:
> On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
> after S3 suspend/resume. The affected products include multiple
> generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
> many errors such as:
> 
>      fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04 [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
>      DRM: failed to idle channel 0 [DRM]
> 
> Similarly, the nvidia proprietary driver also fails after resume
> (black screen, 100% CPU usage in Xorg process). We shipped a sample
> to Nvidia for diagnosis, and their response indicated that it's a
> problem with the parent PCI bridge (on the Intel SoC), not the GPU.
> 
> Runtime suspend/resume works fine, only S3 suspend is affected.
> 
> We found a workaround: on resume, rewrite the Intel PCI bridge
> 'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
> the cases that I checked, this register has value 0 and we just have to
> rewrite that value.
> 
> It's very strange that rewriting the exact same register value
> makes a difference, but it definitely makes the issue go away.
> It's not just acting as some kind of memory barrier, because rewriting
> other bridge registers does not work around the issue. There's something
> magic in this particular register. We have confirmed this on all
> the affected models we have in-hands (X542UQ, UX533FD, X530UN, V272UN).
> 
> Additionally, this workaround solves an issue where r8169 MSI-X
> interrupts were broken after S3 suspend/resume on Asus X441UAR. This
> issue was recently worked around in commit 7bb05b85bc2d ("r8169:
> don't use MSI-X on RTL8106e"). It also fixes the same issue on
> RTL6186evl/8111evl on an Aimfor-tech laptop that we had not yet
> patched. I suspect it will also fix the issue that was worked around in
> commit 7c53a722459c ("r8169: don't use MSI-X on RTL8168g").
> 
> Thomas Martitz reports that this workaround also solves an issue where
> the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
> after S3 suspend/resume.


I can confirm that this exact patch also helps on my HP Zbook. Thanks 
for your work on this, resume has been a real pain until now.



> 
>   drivers/pci/pci-driver.c | 14 ++++++++++++++
>   drivers/pci/setup-bus.c  |  2 +-
>   include/linux/pci.h      |  1 +
>   3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index bef17c3fca67..034f816570ad 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -524,6 +524,20 @@ static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
>   	pci_power_up(pci_dev);
>   	pci_restore_state(pci_dev);
>   	pci_pme_restore(pci_dev);
> +
> +	/*
> +	 * Redo the PCI bridge prefetch register setup.
> +	 *
> +	 * This works around an Intel PCI bridge issue seen on Asus and HP
> +	 * laptops, where the GPU is not usable after S3 resume.
> +	 * Even though PCI bridge register contents appear to be intact
> +	 * at resume time, rewriting the value of PREF_BASE_UPPER32 is
> +	 * required to make the GPU work.
> +	 * Windows 10 also reprograms these registers during S3 resume.
> +	 */
> +	if (pci_dev->class == PCI_CLASS_BRIDGE_PCI << 8)
> +		pci_setup_bridge_mmio_pref(pci_dev);
> +
>   	pci_fixup_device(pci_fixup_resume_early, pci_dev);
>   }
>   
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 79b1824e83b4..cb88288d2a69 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -630,7 +630,7 @@ static void pci_setup_bridge_mmio(struct pci_dev *bridge)
>   	pci_write_config_dword(bridge, PCI_MEMORY_BASE, l);
>   }
>   
> -static void pci_setup_bridge_mmio_pref(struct pci_dev *bridge)
> +void pci_setup_bridge_mmio_pref(struct pci_dev *bridge)
>   {
>   	struct resource *res;
>   	struct pci_bus_region region;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index e72ca8dd6241..b15828fc26a4 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -934,6 +934,7 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn);
>   void pci_device_add(struct pci_dev *dev, struct pci_bus *bus);
>   unsigned int pci_scan_child_bus(struct pci_bus *bus);
>   void pci_bus_add_device(struct pci_dev *dev);
> +void pci_setup_bridge_mmio_pref(struct pci_dev *bridge);
>   void pci_read_bridge_bases(struct pci_bus *child);
>   struct resource *pci_find_parent_resource(const struct pci_dev *dev,
>   					  struct resource *res);
> 

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
@ 2018-09-10 19:57     ` Thomas Martitz
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Martitz @ 2018-09-10 19:57 UTC (permalink / raw)
  To: Daniel Drake, bhelgaas
  Cc: linux-pci, linux, nouveau, linux-pm, peter, kherbst,
	andriy.shevchenko, rafael.j.wysocki, keith.busch,
	mika.westerberg, jonathan.derrick, davem, hkallweit1, netdev,
	nic_swsd

Hello Daniel,

Am 07.09.18 um 07:36 schrieb Daniel Drake:
> On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
> after S3 suspend/resume. The affected products include multiple
> generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
> many errors such as:
> 
>      fifo: fault 00 [READ] at 0000005555555000 engine 00 [GR] client 04 [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
>      DRM: failed to idle channel 0 [DRM]
> 
> Similarly, the nvidia proprietary driver also fails after resume
> (black screen, 100% CPU usage in Xorg process). We shipped a sample
> to Nvidia for diagnosis, and their response indicated that it's a
> problem with the parent PCI bridge (on the Intel SoC), not the GPU.
> 
> Runtime suspend/resume works fine, only S3 suspend is affected.
> 
> We found a workaround: on resume, rewrite the Intel PCI bridge
> 'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
> the cases that I checked, this register has value 0 and we just have to
> rewrite that value.
> 
> It's very strange that rewriting the exact same register value
> makes a difference, but it definitely makes the issue go away.
> It's not just acting as some kind of memory barrier, because rewriting
> other bridge registers does not work around the issue. There's something
> magic in this particular register. We have confirmed this on all
> the affected models we have in-hands (X542UQ, UX533FD, X530UN, V272UN).
> 
> Additionally, this workaround solves an issue where r8169 MSI-X
> interrupts were broken after S3 suspend/resume on Asus X441UAR. This
> issue was recently worked around in commit 7bb05b85bc2d ("r8169:
> don't use MSI-X on RTL8106e"). It also fixes the same issue on
> RTL6186evl/8111evl on an Aimfor-tech laptop that we had not yet
> patched. I suspect it will also fix the issue that was worked around in
> commit 7c53a722459c ("r8169: don't use MSI-X on RTL8168g").
> 
> Thomas Martitz reports that this workaround also solves an issue where
> the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
> after S3 suspend/resume.


I can confirm that this exact patch also helps on my HP Zbook. Thanks 
for your work on this, resume has been a real pain until now.



> 
>   drivers/pci/pci-driver.c | 14 ++++++++++++++
>   drivers/pci/setup-bus.c  |  2 +-
>   include/linux/pci.h      |  1 +
>   3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index bef17c3fca67..034f816570ad 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -524,6 +524,20 @@ static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
>   	pci_power_up(pci_dev);
>   	pci_restore_state(pci_dev);
>   	pci_pme_restore(pci_dev);
> +
> +	/*
> +	 * Redo the PCI bridge prefetch register setup.
> +	 *
> +	 * This works around an Intel PCI bridge issue seen on Asus and HP
> +	 * laptops, where the GPU is not usable after S3 resume.
> +	 * Even though PCI bridge register contents appear to be intact
> +	 * at resume time, rewriting the value of PREF_BASE_UPPER32 is
> +	 * required to make the GPU work.
> +	 * Windows 10 also reprograms these registers during S3 resume.
> +	 */
> +	if (pci_dev->class == PCI_CLASS_BRIDGE_PCI << 8)
> +		pci_setup_bridge_mmio_pref(pci_dev);
> +
>   	pci_fixup_device(pci_fixup_resume_early, pci_dev);
>   }
>   
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index 79b1824e83b4..cb88288d2a69 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -630,7 +630,7 @@ static void pci_setup_bridge_mmio(struct pci_dev *bridge)
>   	pci_write_config_dword(bridge, PCI_MEMORY_BASE, l);
>   }
>   
> -static void pci_setup_bridge_mmio_pref(struct pci_dev *bridge)
> +void pci_setup_bridge_mmio_pref(struct pci_dev *bridge)
>   {
>   	struct resource *res;
>   	struct pci_bus_region region;
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index e72ca8dd6241..b15828fc26a4 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -934,6 +934,7 @@ struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn);
>   void pci_device_add(struct pci_dev *dev, struct pci_bus *bus);
>   unsigned int pci_scan_child_bus(struct pci_bus *bus);
>   void pci_bus_add_device(struct pci_dev *dev);
> +void pci_setup_bridge_mmio_pref(struct pci_dev *bridge);
>   void pci_read_bridge_bases(struct pci_bus *child);
>   struct resource *pci_find_parent_resource(const struct pci_dev *dev,
>   					  struct resource *res);
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
  2018-09-07 15:05     ` Peter Wu
@ 2018-09-11  3:35       ` Daniel Drake
  -1 siblings, 0 replies; 18+ messages in thread
From: Daniel Drake @ 2018-09-11  3:35 UTC (permalink / raw)
  To: Peter Wu
  Cc: Bjorn Helgaas, linux-pci, Linux Upstreaming Team, nouveau,
	Linux PM, kherbst, Andy Shevchenko, Wysocki, Rafael J,
	Keith Busch, Mika Westerberg, Jon Derrick, Thomas Martitz, davem,
	hkallweit1, netdev, nic_swsd, Linux Kernel

I have created https://bugzilla.kernel.org/show_bug.cgi?id=201069 to
archive the research done so far.

On Fri, Sep 7, 2018 at 11:05 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> Windows 10 unconditionally rewrites these registers (BAR, I/O Base +
> Limit, Memory Base + Limit, etc. from top to bottom), see annotations:
> https://www.spinics.net/lists/linux-pci/msg75856.html
>
> Linux has a generic "restore" operation that works backwards from the
> end of the PCI config space to the beginning, see
> pci_restore_config_space. Do you have a dmesg where you see the
> "restoring config space at offset" messages?

Interesting, I had not spotted this code. The logs for the affected
bridge on Asus X542UQ:

 0000:00:1c.0: restoring config space at offset 0x3c (was 0x100,
writing 0x1001ff)
 0000:00:1c.0: restoring config space at offset 0x24 (was 0x10001,
writing 0xe1f1d001)
 0000:00:1c.0: restoring config space at offset 0x20 (was 0x0, writing
0xef00ee00)
 0000:00:1c.0: restoring config space at offset 0xc (was 0x810000,
writing 0x810010)
 0000:00:1c.0: restoring config space at offset 0x4 (was 0x100007,
writing 0x100407)

So it didn't restore 0x28 (the magic register) and also register 0x2c
(prefetch limit upper 32 bits) because their values were already 0 on
resume.

https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23 has an
assertion that Intel recognises this issue and calls for all those
registers to be restored on resume.

I propose for Linux 4.19 we apply a minimally intrusive change that
"forcibly" restores dword registers 0x24, 0x28 and 0x2c for PCI
bridges (regardless of their current value), while retaining the
existing restore order (high to low) over all registers and doing the
read-then-maybe-write thing for all of the other regs (per current
behaviour).


Then we also consider swiftly applying a followup patch implementing a
more thorough approach and we give it some time in linux-next before
releasing in Linux 4.20 which does something more thorough. I think
perhaps more discussion is needed there, or at least some more input
from Bjorn. Seems like we have 3 approaches to consider:

1. Peter Wu suggests following what windows does. Windows appears to
start with low registers and works its way upwards, which means it
writes BAR addresses, I/O base, etc, before writing prefetch
registers. It skips over read-only and write-to-clear registers and
also only writes some of the registers at the very end - like the
command register.

To be thorough here we would probably also have to study and copy what
Windows does for non-bridge devices (not sure how many device classes
we would want to study here?). Also it is a bit risky in that Bjorn
has pointed out that Linux's earlier approach with a high level of
similarity here (restore registers in ascending order, regardless of
their current value) caused a laptop to hang on resume.


2. Bjorn suggested tweaking the existing code to always write the
saved value even when the hardware has that same value. The
read-maybe-write logic was introduced to avoid a laptop hang, but at
that time we were restoring registers in ascending order, now we are
descending it might be worth giving this another try.


3. Do nothing else beyond the minimal change that I propose for v4.19?
Looking a bit more into git history this seems to be a sensitive and
problematic area, more changes might mean more trouble. For example
right now pci_restore_config_space() does:
        pci_restore_config_space_range(pdev, 10, 15, 0, 0);
        /* Restore BARs before the command register. */
        pci_restore_config_space_range(pdev, 4, 9, 10, 0);
        pci_restore_config_space_range(pdev, 0, 3, 0, 0);
but pci_restore_config_space_range() already goes in descending order,
so the above is already equivalent to the code in the "else" path that
follows:
        pci_restore_config_space_range(pdev, 0, 15, 0, 0);

and if you look at the history behind this "mixup" there's stuff that
goes back to 2012 like commit a6cb9ee7cabe ("PCI: Retry BARs
restoration for Type 0 headers only") which was fixing up another
problematic commit in this area, etc.

Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
@ 2018-09-11  3:35       ` Daniel Drake
  0 siblings, 0 replies; 18+ messages in thread
From: Daniel Drake @ 2018-09-11  3:35 UTC (permalink / raw)
  To: Peter Wu
  Cc: Mika Westerberg, Linux PM, linux-pci-u79uwXL29TY76Z2rM5mHXA,
	Wysocki, Rafael J, Linux Kernel, nic_swsd-Rasf1IRRPZFBDgjK7y7TUQ,
	Keith Busch, netdev-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Bjorn Helgaas,
	Andy Shevchenko, Linux Upstreaming Team,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, Jon Derrick,
	hkallweit1-Re5JQEeQqe8AvxtiuMwx3w

I have created https://bugzilla.kernel.org/show_bug.cgi?id=201069 to
archive the research done so far.

On Fri, Sep 7, 2018 at 11:05 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> Windows 10 unconditionally rewrites these registers (BAR, I/O Base +
> Limit, Memory Base + Limit, etc. from top to bottom), see annotations:
> https://www.spinics.net/lists/linux-pci/msg75856.html
>
> Linux has a generic "restore" operation that works backwards from the
> end of the PCI config space to the beginning, see
> pci_restore_config_space. Do you have a dmesg where you see the
> "restoring config space at offset" messages?

Interesting, I had not spotted this code. The logs for the affected
bridge on Asus X542UQ:

 0000:00:1c.0: restoring config space at offset 0x3c (was 0x100,
writing 0x1001ff)
 0000:00:1c.0: restoring config space at offset 0x24 (was 0x10001,
writing 0xe1f1d001)
 0000:00:1c.0: restoring config space at offset 0x20 (was 0x0, writing
0xef00ee00)
 0000:00:1c.0: restoring config space at offset 0xc (was 0x810000,
writing 0x810010)
 0000:00:1c.0: restoring config space at offset 0x4 (was 0x100007,
writing 0x100407)

So it didn't restore 0x28 (the magic register) and also register 0x2c
(prefetch limit upper 32 bits) because their values were already 0 on
resume.

https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23 has an
assertion that Intel recognises this issue and calls for all those
registers to be restored on resume.

I propose for Linux 4.19 we apply a minimally intrusive change that
"forcibly" restores dword registers 0x24, 0x28 and 0x2c for PCI
bridges (regardless of their current value), while retaining the
existing restore order (high to low) over all registers and doing the
read-then-maybe-write thing for all of the other regs (per current
behaviour).


Then we also consider swiftly applying a followup patch implementing a
more thorough approach and we give it some time in linux-next before
releasing in Linux 4.20 which does something more thorough. I think
perhaps more discussion is needed there, or at least some more input
from Bjorn. Seems like we have 3 approaches to consider:

1. Peter Wu suggests following what windows does. Windows appears to
start with low registers and works its way upwards, which means it
writes BAR addresses, I/O base, etc, before writing prefetch
registers. It skips over read-only and write-to-clear registers and
also only writes some of the registers at the very end - like the
command register.

To be thorough here we would probably also have to study and copy what
Windows does for non-bridge devices (not sure how many device classes
we would want to study here?). Also it is a bit risky in that Bjorn
has pointed out that Linux's earlier approach with a high level of
similarity here (restore registers in ascending order, regardless of
their current value) caused a laptop to hang on resume.


2. Bjorn suggested tweaking the existing code to always write the
saved value even when the hardware has that same value. The
read-maybe-write logic was introduced to avoid a laptop hang, but at
that time we were restoring registers in ascending order, now we are
descending it might be worth giving this another try.


3. Do nothing else beyond the minimal change that I propose for v4.19?
Looking a bit more into git history this seems to be a sensitive and
problematic area, more changes might mean more trouble. For example
right now pci_restore_config_space() does:
        pci_restore_config_space_range(pdev, 10, 15, 0, 0);
        /* Restore BARs before the command register. */
        pci_restore_config_space_range(pdev, 4, 9, 10, 0);
        pci_restore_config_space_range(pdev, 0, 3, 0, 0);
but pci_restore_config_space_range() already goes in descending order,
so the above is already equivalent to the code in the "else" path that
follows:
        pci_restore_config_space_range(pdev, 0, 15, 0, 0);

and if you look at the history behind this "mixup" there's stuff that
goes back to 2012 like commit a6cb9ee7cabe ("PCI: Retry BARs
restoration for Type 0 headers only") which was fixing up another
problematic commit in this area, etc.

Daniel
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] PCI: Reprogram bridge prefetch registers on resume
  2018-09-11  3:35       ` Daniel Drake
  (?)
@ 2018-09-11  9:08       ` Rafael J. Wysocki
  -1 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2018-09-11  9:08 UTC (permalink / raw)
  To: Daniel Drake, Bjorn Helgaas
  Cc: Peter Wu, linux-pci, Linux Upstreaming Team, nouveau, Linux PM,
	kherbst, Andy Shevchenko, Wysocki, Rafael J, Keith Busch,
	Mika Westerberg, Jon Derrick, Thomas Martitz, davem, hkallweit1,
	netdev, nic_swsd, Linux Kernel

On Tuesday, September 11, 2018 5:35:13 AM CEST Daniel Drake wrote:
> I have created https://bugzilla.kernel.org/show_bug.cgi?id=201069 to
> archive the research done so far.
> 
> On Fri, Sep 7, 2018 at 11:05 PM, Peter Wu <peter@lekensteyn.nl> wrote:
> > Windows 10 unconditionally rewrites these registers (BAR, I/O Base +
> > Limit, Memory Base + Limit, etc. from top to bottom), see annotations:
> > https://www.spinics.net/lists/linux-pci/msg75856.html
> >
> > Linux has a generic "restore" operation that works backwards from the
> > end of the PCI config space to the beginning, see
> > pci_restore_config_space. Do you have a dmesg where you see the
> > "restoring config space at offset" messages?
> 
> Interesting, I had not spotted this code. The logs for the affected
> bridge on Asus X542UQ:
> 
>  0000:00:1c.0: restoring config space at offset 0x3c (was 0x100,
> writing 0x1001ff)
>  0000:00:1c.0: restoring config space at offset 0x24 (was 0x10001,
> writing 0xe1f1d001)
>  0000:00:1c.0: restoring config space at offset 0x20 (was 0x0, writing
> 0xef00ee00)
>  0000:00:1c.0: restoring config space at offset 0xc (was 0x810000,
> writing 0x810010)
>  0000:00:1c.0: restoring config space at offset 0x4 (was 0x100007,
> writing 0x100407)
> 
> So it didn't restore 0x28 (the magic register) and also register 0x2c
> (prefetch limit upper 32 bits) because their values were already 0 on
> resume.
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23 has an
> assertion that Intel recognises this issue and calls for all those
> registers to be restored on resume.
> 
> I propose for Linux 4.19 we apply a minimally intrusive change that
> "forcibly" restores dword registers 0x24, 0x28 and 0x2c for PCI
> bridges (regardless of their current value), while retaining the
> existing restore order (high to low) over all registers and doing the
> read-then-maybe-write thing for all of the other regs (per current
> behaviour).

I agree here.  It would be good to cover this gap and possibly backport
the fix to "stable".

> Then we also consider swiftly applying a followup patch implementing a
> more thorough approach and we give it some time in linux-next before
> releasing in Linux 4.20 which does something more thorough. I think
> perhaps more discussion is needed there, or at least some more input
> from Bjorn. Seems like we have 3 approaches to consider:
> 
> 1. Peter Wu suggests following what windows does. Windows appears to
> start with low registers and works its way upwards, which means it
> writes BAR addresses, I/O base, etc, before writing prefetch
> registers. It skips over read-only and write-to-clear registers and
> also only writes some of the registers at the very end - like the
> command register.
> 
> To be thorough here we would probably also have to study and copy what
> Windows does for non-bridge devices (not sure how many device classes
> we would want to study here?). Also it is a bit risky in that Bjorn
> has pointed out that Linux's earlier approach with a high level of
> similarity here (restore registers in ascending order, regardless of
> their current value) caused a laptop to hang on resume.
> 
> 
> 2. Bjorn suggested tweaking the existing code to always write the
> saved value even when the hardware has that same value. The
> read-maybe-write logic was introduced to avoid a laptop hang, but at
> that time we were restoring registers in ascending order, now we are
> descending it might be worth giving this another try.

I agree with Bjorn here.

> 3. Do nothing else beyond the minimal change that I propose for v4.19?
> Looking a bit more into git history this seems to be a sensitive and
> problematic area, more changes might mean more trouble. For example
> right now pci_restore_config_space() does:
>         pci_restore_config_space_range(pdev, 10, 15, 0, 0);
>         /* Restore BARs before the command register. */
>         pci_restore_config_space_range(pdev, 4, 9, 10, 0);
>         pci_restore_config_space_range(pdev, 0, 3, 0, 0);
> but pci_restore_config_space_range() already goes in descending order,
> so the above is already equivalent to the code in the "else" path that
> follows:
>         pci_restore_config_space_range(pdev, 0, 15, 0, 0);
> 
> and if you look at the history behind this "mixup" there's stuff that
> goes back to 2012 like commit a6cb9ee7cabe ("PCI: Retry BARs
> restoration for Type 0 headers only") which was fixing up another
> problematic commit in this area, etc.

So let's first fix what's known broken and see how this goes IMO.

On top of that and later, try to make the changes suggested by Bjorn and see
how far we can get with that.

Having done that, revisit and see if there still are any problems to address
in this area.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-09-11  9:11 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-07  5:36 [PATCH] PCI: Reprogram bridge prefetch registers on resume Daniel Drake
2018-09-07  5:36 ` Daniel Drake
2018-09-07  5:49 ` [Nouveau] " Lukas Wunner
2018-09-07  6:40 ` Sinan Kaya
     [not found]   ` <3b37e4fd-c793-bd52-7d70-950f846a7d5a-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2018-09-07  8:06     ` Daniel Drake
2018-09-07  8:06       ` Daniel Drake
     [not found] ` <20180907053614.6540-1-drake-6IF/jdPJHihWk0Htik3J/w@public.gmane.org>
2018-09-07 15:05   ` Peter Wu
2018-09-07 15:05     ` Peter Wu
2018-09-07 22:26     ` Bjorn Helgaas
2018-09-08 12:14       ` Peter Wu
2018-09-08 11:05     ` Thomas Martitz
2018-09-08 11:05       ` Thomas Martitz
2018-09-11  3:35     ` Daniel Drake
2018-09-11  3:35       ` Daniel Drake
2018-09-11  9:08       ` Rafael J. Wysocki
2018-09-10 19:57   ` Thomas Martitz
2018-09-10 19:57     ` Thomas Martitz
2018-09-07 21:48 ` Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.