All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] PCI: Add missing link delays
@ 2019-10-04 12:39 Mika Westerberg
  2019-10-04 12:39 ` [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay() Mika Westerberg
                   ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Mika Westerberg @ 2019-10-04 12:39 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Len Brown, Lukas Wunner, Keith Busch, Alex Williamson,
	Alexandru Gagniuc, Kai-Heng Feng, Matthias Andree, Paul Menzel,
	Nicholas Johnson, Mika Westerberg, linux-pci, linux-kernel

Hi,

This is second version of the reworked PCIe link delay patch posted earlier
here:

  https://patchwork.kernel.org/patch/11106611/

Changes from v1:

  * Introduce pcie_wait_for_link_delay() in a separate patch
  * Tidy up changelog, remove some debug output
  * Rename pcie_wait_downstream_accessible() to
    pci_bridge_wait_for_secondary_bus() and make it generic to all PCI
    bridges.
  * Handle Tpvrh + Trhfa for conventional PCI even though we don't do PM
    for them right now.
  * Use pci_dbg() instead of dev_dbg().
  * Dropped check for pm_suspend_no_platform() and only check for D3cold.
  * Drop pcie_get_downstream_delay(), same delay applies equally to all
    devices (it is not entirely clear from the spec).

I'm still checking for downstream device because I think we can skip the
delays if there is nothing connected. The reason is that if device is added
when the downstream/root port is in D3 the delay is handled by pciehp in
its board_added(). In case of ACPI hotplug the firmware is supposed to
configure the device (and handle the delay).

I also checked we do resume sibling devices in paraller (I think due to
async_suspend).

@Matthias, @Paul and @Nicholas, I appreciate if you could check that this
does not cause any issues for your systems.

Mika Westerberg (2):
  PCI: Introduce pcie_wait_for_link_delay()
  PCI: Add missing link delays required by the PCIe spec

 drivers/pci/pci-driver.c |  18 +++++++
 drivers/pci/pci.c        | 113 +++++++++++++++++++++++++++++++++++----
 drivers/pci/pci.h        |   1 +
 3 files changed, 122 insertions(+), 10 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay()
  2019-10-04 12:39 [PATCH v2 0/2] PCI: Add missing link delays Mika Westerberg
@ 2019-10-04 12:39 ` Mika Westerberg
  2020-08-08 20:22   ` Marc MERLIN
  2019-10-04 12:39 ` [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec Mika Westerberg
  2019-10-04 12:57 ` [PATCH v2 0/2] PCI: Add missing link delays Matthias Andree
  2 siblings, 1 reply; 77+ messages in thread
From: Mika Westerberg @ 2019-10-04 12:39 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Len Brown, Lukas Wunner, Keith Busch, Alex Williamson,
	Alexandru Gagniuc, Kai-Heng Feng, Matthias Andree, Paul Menzel,
	Nicholas Johnson, Mika Westerberg, linux-pci, linux-kernel

This is otherwise similar to pcie_wait_for_link() but allows passing
custom activation delay in milliseconds.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/pci.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e7982af9a5d8..bfd92e018925 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4607,14 +4607,17 @@ static int pci_pm_reset(struct pci_dev *dev, int probe)
 
 	return pci_dev_wait(dev, "PM D3->D0", PCIE_RESET_READY_POLL_MS);
 }
+
 /**
- * pcie_wait_for_link - Wait until link is active or inactive
+ * pcie_wait_for_link_delay - Wait until link is active or inactive
  * @pdev: Bridge device
  * @active: waiting for active or inactive?
+ * @delay: Delay to wait after link has become active (in ms)
  *
  * Use this to wait till link becomes active or inactive.
  */
-bool pcie_wait_for_link(struct pci_dev *pdev, bool active)
+static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
+				     int delay)
 {
 	int timeout = 1000;
 	bool ret;
@@ -4651,13 +4654,25 @@ bool pcie_wait_for_link(struct pci_dev *pdev, bool active)
 		timeout -= 10;
 	}
 	if (active && ret)
-		msleep(100);
+		msleep(delay);
 	else if (ret != active)
 		pci_info(pdev, "Data Link Layer Link Active not %s in 1000 msec\n",
 			active ? "set" : "cleared");
 	return ret == active;
 }
 
+/**
+ * pcie_wait_for_link - Wait until link is active or inactive
+ * @pdev: Bridge device
+ * @active: waiting for active or inactive?
+ *
+ * Use this to wait till link becomes active or inactive.
+ */
+bool pcie_wait_for_link(struct pci_dev *pdev, bool active)
+{
+	return pcie_wait_for_link_delay(pdev, active, 100);
+}
+
 void pci_reset_secondary_bus(struct pci_dev *dev)
 {
 	u16 ctrl;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-10-04 12:39 [PATCH v2 0/2] PCI: Add missing link delays Mika Westerberg
  2019-10-04 12:39 ` [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay() Mika Westerberg
@ 2019-10-04 12:39 ` Mika Westerberg
  2019-10-26 14:19   ` Bjorn Helgaas
  2019-10-29 20:54   ` Bjorn Helgaas
  2019-10-04 12:57 ` [PATCH v2 0/2] PCI: Add missing link delays Matthias Andree
  2 siblings, 2 replies; 77+ messages in thread
From: Mika Westerberg @ 2019-10-04 12:39 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki
  Cc: Len Brown, Lukas Wunner, Keith Busch, Alex Williamson,
	Alexandru Gagniuc, Kai-Heng Feng, Matthias Andree, Paul Menzel,
	Nicholas Johnson, Mika Westerberg, linux-pci, linux-kernel

Currently Linux does not follow PCIe spec regarding the required delays
after reset. A concrete example is a Thunderbolt add-in-card that
consists of a PCIe switch and two PCIe endpoints:

  +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
                                  +-01.0-[04-36]-- DS hotplug port
                                  +-02.0-[37]----00.0 xHCI controller
                                  \-04.0-[38-6b]-- DS hotplug port

The root port (1b.0) and the PCIe switch downstream ports are all PCIe
gen3 so they support 8GT/s link speeds.

We wait for the PCIe hierarchy to enter D3cold (runtime):

  pcieport 0000:00:1b.0: power state changed by ACPI to D3cold

When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
PCIe switch is put to reset and its power is re-applied. This means that
we must follow the rules in PCIe 4.0 section 6.6.1.

For the PCIe gen3 ports we are dealing with here, the following applies:

  With a Downstream Port that supports Link speeds greater than 5.0
  GT/s, software must wait a minimum of 100 ms after Link training
  completes before sending a Configuration Request to the device
  immediately below that Port. Software can determine when Link training
  completes by polling the Data Link Layer Link Active bit or by setting
  up an associated interrupt (see Section 6.7.3.3).

Translating this into the above topology we would need to do this (DLLLA
stands for Data Link Layer Link Active):

  0000:00:1b.0: wait for 100 ms after DLLLA is set before access to 0000:01:00.0
  0000:02:00.0: wait for 100 ms after DLLLA is set before access to 0000:03:00.0
  0000:02:02.0: wait for 100 ms after DLLLA is set before access to 0000:37:00.0

I've instrumented the kernel with some additional logging so we can see
the actual delays performed:

  pcieport 0000:00:1b.0: power state changed by ACPI to D0
  pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms
  pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms
  pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms
  pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms

For the switch upstream port (01:00.0 reachable through 00:1b.0 root
port) we wait for 100 ms but not taking into account the DLLLA
requirement. We then wait 10 ms for D3hot -> D0 transition of the root
port and the two downstream hotplug ports. This means that we deviate
from what the spec requires.

Performing the same check for system sleep (s2idle) transitions it turns
out to be even worse. None of the mandatory delays are performed. If
this would be S3 instead of s2idle then according to PCI FW spec 3.2
section 4.6.8. there is a specific _DSM that allows the OS to skip the
delays but this platform does not provide the _DSM and does not go to S3
anyway so no firmware is involved that could already handle these
delays.

On this particular platform these delays are not actually needed because
there is an additional delay as part of the ACPI power resource that is
used to turn on power to the hierarchy but since that additional delay
is not required by any of standards (PCIe, ACPI) it is not present in
the Intel Ice Lake, for example where missing the mandatory delays
causes pciehp to start tearing down the stack too early (links are not
yet trained). Below is an example how it looks like when this happens:

  pcieport 0000:83:04.0: pciehp: Slot(4): Card not present
  pcieport 0000:87:04.0: PME# disabled
  pcieport 0000:83:04.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 0000:86:00
  pcieport 0000:86:00.0: Refused to change power state, currently in D3
  pcieport 0000:86:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x201ff)
  pcieport 0000:86:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
  ...

There is also one reported case (see the bugzilla link below) where the
missing delay causes xHCI on a Titan Ridge controller fail to runtime
resume when USB-C dock is plugged. This does not involve pciehp but
instead it PCI core fails to runtime resume the xHCI device:

  pcieport 0000:04:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
  pcieport 0000:04:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100406)
  xhci_hcd 0000:39:00.0: Refused to change power state, currently in D3
  xhci_hcd 0000:39:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
  xhci_hcd 0000:39:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
  ...

For this reason, introduce a new function pci_bridge_wait_for_secondary_bus()
that is called on PCI core resume and runtime resume paths accordingly
if the bridge entered D3cold (and thus went through reset).

This is second attempt to add the missing delays. The previous solution
in commit c2bf1fc212f7 ("PCI: Add missing link delays required by the
PCIe spec") was reverted because of two issues it caused:

  1. One system become unresponsive after S3 resume due to PME service
     spinning in pcie_pme_work_fn(). The root port in question reports
     that the xHCI sent PME but the xHCI device itself does not have PME
     status set. The PME status bit is never cleared in the root port
     resulting the indefinite loop in pcie_pme_work_fn().

  2. Slows down resume if the root/downstream port does not support
     Data Link Layer Active Reporting because pcie_wait_for_link_delay()
     waits 1100 ms in that case.

This version should avoid the above issues because we restrict the delay
to happen only if the port went into D3cold.

Link: https://lore.kernel.org/linux-pci/SL2P216MB01878BBCD75F21D882AEEA2880C60@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203885
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/pci-driver.c | 18 ++++++++
 drivers/pci/pci.c        | 92 +++++++++++++++++++++++++++++++++++++---
 drivers/pci/pci.h        |  1 +
 3 files changed, 104 insertions(+), 7 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index a8124e47bf6e..74a144c9cf4e 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -917,6 +917,7 @@ static int pci_pm_suspend_noirq(struct device *dev)
 static int pci_pm_resume_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
+	bool d3cold = pci_dev->current_state == PCI_D3cold;
 	struct device_driver *drv = dev->driver;
 	int error = 0;
 
@@ -947,6 +948,14 @@ static int pci_pm_resume_noirq(struct device *dev)
 
 	pcie_pme_root_status_cleanup(pci_dev);
 
+	/*
+	 * If the hierarchy went into D3cold wait for the secondary bus to
+	 * become accessible. This is important for PCIe to prevent pciehp
+	 * from tearing down the downstream devices too soon.
+	 */
+	if (d3cold)
+		pci_bridge_wait_for_secondary_bus(pci_dev);
+
 	if (drv && drv->pm && drv->pm->resume_noirq)
 		error = drv->pm->resume_noirq(dev);
 
@@ -1329,6 +1338,7 @@ static int pci_pm_runtime_resume(struct device *dev)
 	int rc = 0;
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+	bool d3cold = pci_dev->current_state == PCI_D3cold;
 
 	/*
 	 * Restoring config space is necessary even if the device is not bound
@@ -1344,6 +1354,14 @@ static int pci_pm_runtime_resume(struct device *dev)
 	pci_enable_wake(pci_dev, PCI_D0, false);
 	pci_fixup_device(pci_fixup_resume, pci_dev);
 
+	/*
+	 * If the hierarchy went into D3cold wait for the secondary bus to
+	 * become accessible. This is important for PCIe to prevent pciehp
+	 * from tearing down the downstream devices too soon.
+	 */
+	if (d3cold)
+		pci_bridge_wait_for_secondary_bus(pci_dev);
+
 	if (pm && pm->runtime_resume)
 		rc = pm->runtime_resume(dev);
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index bfd92e018925..749c4625dea4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1025,15 +1025,11 @@ static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state)
 	if (state == PCI_D0) {
 		pci_platform_power_transition(dev, PCI_D0);
 		/*
-		 * Mandatory power management transition delays, see
-		 * PCI Express Base Specification Revision 2.0 Section
-		 * 6.6.1: Conventional Reset.  Do not delay for
-		 * devices powered on/off by corresponding bridge,
-		 * because have already delayed for the bridge.
+		 * Mandatory power management transition delays are handled
+		 * in pci_pm_runtime_resume() of the corresponding
+		 * downstream/root port.
 		 */
 		if (dev->runtime_d3cold) {
-			if (dev->d3cold_delay && !dev->imm_ready)
-				msleep(dev->d3cold_delay);
 			/*
 			 * When powering on a bridge from D3cold, the
 			 * whole hierarchy may be powered on into
@@ -4673,6 +4669,88 @@ bool pcie_wait_for_link(struct pci_dev *pdev, bool active)
 	return pcie_wait_for_link_delay(pdev, active, 100);
 }
 
+/**
+ * pci_bridge_wait_for_secondary_bus - Wait secondary bus to be accessible
+ * @dev: PCI bridge
+ *
+ * Handle necessary delays before access to the devices on the secondary
+ * side of the bridge are permitted after D3cold to D0 transition.
+ *
+ * For PCIe this means the delays in PCIe 4.0 chapter 6.6.1. For
+ * conventional PCI it means Tpvrh + Trhfa specified in PCI 3.0 chapter
+ * 4.3.2.
+ */
+void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
+{
+	struct pci_dev *child;
+
+	if (pci_dev_is_disconnected(dev))
+		return;
+
+	if (!pci_is_bridge(dev) || !dev->bridge_d3)
+		return;
+
+	/*
+	 * We only deal with devices that are present currently on the bus.
+	 * For any hot-added devices the access delay is handled in pciehp
+	 * board_added(). In case of ACPI hotplug the firmware is expected
+	 * to configure the devices before OS is notified.
+	 */
+	if (!dev->subordinate || list_empty(&dev->subordinate->devices))
+		return;
+
+	/*
+	 * Conventional PCI and PCI-X we need to wait Tpvrh + Trhfa before
+	 * accessing the device after reset (that is 100 ms + 1000 ms). In
+	 * practice this should not be needed because we don't do power
+	 * management for them (see pci_bridge_d3_possible()).
+	 */
+	if (!pci_is_pcie(dev)) {
+		pci_dbg(dev, "waiting 1100 ms for secondary bus\n");
+		msleep(1100);
+		return;
+	}
+
+	/*
+	 * For PCIe downstream and root ports that do not support speeds
+	 * greater than 5 GT/s need to wait minimum 100 ms. For higher
+	 * speeds (gen3) we need to wait first for the data link layer to
+	 * become active.
+	 *
+	 * However, 100 ms is the minimum and the PCIe spec says the
+	 * software must allow at least 1s before it can determine that the
+	 * device that did not respond is a broken device. There is
+	 * evidence that 100 ms is not always enough, for example certain
+	 * Titan Ridge xHCI controller does not always respond to
+	 * configuration requests if we only wait for 100 ms (see
+	 * https://bugzilla.kernel.org/show_bug.cgi?id=203885).
+	 *
+	 * Therefore we wait for 100 ms and check for the device presence.
+	 * If it is still not present give it an additional 100 ms.
+	 */
+	if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT &&
+	    pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)
+		return;
+
+	if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
+		pci_dbg(dev, "waiting 100 ms for downstream link\n");
+		msleep(100);
+	} else {
+		pci_dbg(dev, "waiting 100 ms for downstream link, after activation\n");
+		if (!pcie_wait_for_link_delay(dev, true, 100)) {
+			/* Did not train, no need to wait any further */
+			return;
+		}
+	}
+
+	child = list_first_entry(&dev->subordinate->devices, struct pci_dev,
+				 bus_list);
+	if (!pci_device_is_present(child)) {
+		pci_dbg(child, "waiting additional 100 ms to become accessible\n");
+		msleep(100);
+	}
+}
+
 void pci_reset_secondary_bus(struct pci_dev *dev)
 {
 	u16 ctrl;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 3f6947ee3324..7ade8f077f6e 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -104,6 +104,7 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev);
 void pci_free_cap_save_buffers(struct pci_dev *dev);
 bool pci_bridge_d3_possible(struct pci_dev *dev);
 void pci_bridge_d3_update(struct pci_dev *dev);
+void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev);
 
 static inline void pci_wakeup_event(struct pci_dev *dev)
 {
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 0/2] PCI: Add missing link delays
  2019-10-04 12:39 [PATCH v2 0/2] PCI: Add missing link delays Mika Westerberg
  2019-10-04 12:39 ` [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay() Mika Westerberg
  2019-10-04 12:39 ` [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec Mika Westerberg
@ 2019-10-04 12:57 ` Matthias Andree
  2019-10-04 13:06   ` Mika Westerberg
  2 siblings, 1 reply; 77+ messages in thread
From: Matthias Andree @ 2019-10-04 12:57 UTC (permalink / raw)
  To: Mika Westerberg, Bjorn Helgaas, Rafael J. Wysocki
  Cc: Len Brown, Lukas Wunner, Keith Busch, Alex Williamson,
	Alexandru Gagniuc, Kai-Heng Feng, Paul Menzel, Nicholas Johnson,
	linux-pci, linux-kernel

Am 04.10.19 um 14:39 schrieb Mika Westerberg:
> @Matthias, @Paul and @Nicholas, I appreciate if you could check that this
> does not cause any issues for your systems.

Just to be sure: is this intended to be applied against the 5.4-rc*
master branch?


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 0/2] PCI: Add missing link delays
  2019-10-04 12:57 ` [PATCH v2 0/2] PCI: Add missing link delays Matthias Andree
@ 2019-10-04 13:06   ` Mika Westerberg
  2019-10-05  7:34     ` Matthias Andree
  0 siblings, 1 reply; 77+ messages in thread
From: Mika Westerberg @ 2019-10-04 13:06 UTC (permalink / raw)
  To: Matthias Andree
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Paul Menzel, Nicholas Johnson, linux-pci, linux-kernel

On Fri, Oct 04, 2019 at 02:57:21PM +0200, Matthias Andree wrote:
> Am 04.10.19 um 14:39 schrieb Mika Westerberg:
> > @Matthias, @Paul and @Nicholas, I appreciate if you could check that this
> > does not cause any issues for your systems.
> 
> Just to be sure: is this intended to be applied against the 5.4-rc*
> master branch?

Yes, it applies on top of v5.4-rc1.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 0/2] PCI: Add missing link delays
  2019-10-04 13:06   ` Mika Westerberg
@ 2019-10-05  7:34     ` Matthias Andree
  2019-10-07  9:32       ` Mika Westerberg
  0 siblings, 1 reply; 77+ messages in thread
From: Matthias Andree @ 2019-10-05  7:34 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Paul Menzel, Nicholas Johnson, linux-pci, linux-kernel

Am 04.10.19 um 15:06 schrieb Mika Westerberg:
> On Fri, Oct 04, 2019 at 02:57:21PM +0200, Matthias Andree wrote:
>> Am 04.10.19 um 14:39 schrieb Mika Westerberg:
>>> @Matthias, @Paul and @Nicholas, I appreciate if you could check that this
>>> does not cause any issues for your systems.
>> Just to be sure: is this intended to be applied against the 5.4-rc*
>> master branch?
> Yes, it applies on top of v5.4-rc1.

I am sorry to say that I cannot currently test - my computer has a
GeForce 1060-6GB an no onboard/on-chip graphics.
The nvidia module 435.21 does not compile against 5.4-rc* for me (5.3.1
was fine).

For some reasons I don't understand, it first complains about missing or
empty  Module.symvers, (which I do have and which has 12967 lines)
and if I bypass that check, it complains about undeclared DRIVER_PRIME
"here (outside a function)" - sorry for the German locale:

/var/lib/dkms/nvidia/435.21/build/nvidia-drm/nvidia-drm-drv.c:662:44:
Fehler: »DRIVER_PRIME« ist hier (außerhalb einer Funktion) nicht
deklariert; meinten Sie »DRIVER_PCI_DMA«?
  662 |     .driver_features        = DRIVER_GEM | DRIVER_PRIME |
DRIVER_RENDER,
      |                                            ^~~~~~~~~~~~
      |                                            DRIVER_PCI_DMA

I need NOT try this hardware without nvidia proprietary driver, nouveau
has always been underfeatured and I never got suspend/resume working
with it, so I don't bother else it would skew the findings.

(Someone let me know if switching to AMD 5x00 (XT) is worthwhile or
premature. Vega and earlier consume way too much power to bother. I'm
not buying a new PSU.)



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 0/2] PCI: Add missing link delays
  2019-10-05  7:34     ` Matthias Andree
@ 2019-10-07  9:32       ` Mika Westerberg
  2019-10-07 15:15         ` Matthias Andree
  0 siblings, 1 reply; 77+ messages in thread
From: Mika Westerberg @ 2019-10-07  9:32 UTC (permalink / raw)
  To: Matthias Andree
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Paul Menzel, Nicholas Johnson, linux-pci, linux-kernel

On Sat, Oct 05, 2019 at 09:34:41AM +0200, Matthias Andree wrote:
> Am 04.10.19 um 15:06 schrieb Mika Westerberg:
> > On Fri, Oct 04, 2019 at 02:57:21PM +0200, Matthias Andree wrote:
> >> Am 04.10.19 um 14:39 schrieb Mika Westerberg:
> >>> @Matthias, @Paul and @Nicholas, I appreciate if you could check that this
> >>> does not cause any issues for your systems.
> >> Just to be sure: is this intended to be applied against the 5.4-rc*
> >> master branch?
> > Yes, it applies on top of v5.4-rc1.
> 
> I am sorry to say that I cannot currently test - my computer has a
> GeForce 1060-6GB an no onboard/on-chip graphics.
> The nvidia module 435.21 does not compile against 5.4-rc* for me (5.3.1
> was fine).

I think the two patches should apply cleanly on 5.3.x as well.

> For some reasons I don't understand, it first complains about missing or
> empty  Module.symvers, (which I do have and which has 12967 lines)
> and if I bypass that check, it complains about undeclared DRIVER_PRIME
> "here (outside a function)" - sorry for the German locale:

Possibly v5.4-rcX moved/renamed some symbol(s) which than makes the
out-of-tree driver fail to build.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 0/2] PCI: Add missing link delays
  2019-10-07  9:32       ` Mika Westerberg
@ 2019-10-07 15:15         ` Matthias Andree
  2019-10-08  9:05           ` Mika Westerberg
  0 siblings, 1 reply; 77+ messages in thread
From: Matthias Andree @ 2019-10-07 15:15 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Paul Menzel, Nicholas Johnson, linux-pci, linux-kernel

Am 07.10.19 um 11:32 schrieb Mika Westerberg:
> On Sat, Oct 05, 2019 at 09:34:41AM +0200, Matthias Andree wrote:
>> Am 04.10.19 um 15:06 schrieb Mika Westerberg:
>>> On Fri, Oct 04, 2019 at 02:57:21PM +0200, Matthias Andree wrote:
>>>> Am 04.10.19 um 14:39 schrieb Mika Westerberg:
>>>>> @Matthias, @Paul and @Nicholas, I appreciate if you could check that this
>>>>> does not cause any issues for your systems.
>>>> Just to be sure: is this intended to be applied against the 5.4-rc*
>>>> master branch?
>>> Yes, it applies on top of v5.4-rc1.
>> I am sorry to say that I cannot currently test - my computer has a
>> GeForce 1060-6GB an no onboard/on-chip graphics.
>> The nvidia module 435.21 does not compile against 5.4-rc* for me (5.3.1
>> was fine).
> I think the two patches should apply cleanly on 5.3.x as well.

Mika, that worked.

With your two patches on top of Linux 5.3.4, two Suspend-to-RAM cycles
(ACPI S3), one Suspend-to-disk cycle (ACPI S4),
no regressions observed => success?

Let me know off-list if you need any of my "usual logs" from my test script.

One blank line added to delineate Greg's release from your patches:

> * 9c2dfb396722 2019-10-04 | PCI: Add missing link delays required by
> the PCIe spec (HEAD -> linux-5.3.y) [Mika Westerberg]
> * 00103c8c3fa8 2019-10-04 | PCI: Introduce pcie_wait_for_link_delay()
> [Mika Westerberg]
>
> * ed56826f1779 2019-10-05 | Linux 5.3.4 (tag: v5.3.4,
> stable/linux-5.3.y) [Greg Kroah-Hartman]
> * d0b85a37c06b 2019-09-04 | platform/chrome: cros_ec_rpmsg: Fix race
> with host command when probe failed [Pi-Hsun Shih]
> * bec8c6dec605 2019-09-22 | mt76: mt7615: fix mt7615 firmware path
> definitions [Lorenzo Bianconi]
> * 5dab55b417ca 2019-07-02 | mt76: mt7615: always release sem in
> mt7615_load_patch [Lorenzo Bianconi]
> * 88688a6cd741 2019-09-09 | md/raid0: avoid RAID0 data corruption due
> to layout confusion. [NeilBrown]
Regards,
Matthias


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 0/2] PCI: Add missing link delays
  2019-10-07 15:15         ` Matthias Andree
@ 2019-10-08  9:05           ` Mika Westerberg
  0 siblings, 0 replies; 77+ messages in thread
From: Mika Westerberg @ 2019-10-08  9:05 UTC (permalink / raw)
  To: Matthias Andree
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Paul Menzel, Nicholas Johnson, linux-pci, linux-kernel

On Mon, Oct 07, 2019 at 05:15:24PM +0200, Matthias Andree wrote:
> Am 07.10.19 um 11:32 schrieb Mika Westerberg:
> > On Sat, Oct 05, 2019 at 09:34:41AM +0200, Matthias Andree wrote:
> >> Am 04.10.19 um 15:06 schrieb Mika Westerberg:
> >>> On Fri, Oct 04, 2019 at 02:57:21PM +0200, Matthias Andree wrote:
> >>>> Am 04.10.19 um 14:39 schrieb Mika Westerberg:
> >>>>> @Matthias, @Paul and @Nicholas, I appreciate if you could check that this
> >>>>> does not cause any issues for your systems.
> >>>> Just to be sure: is this intended to be applied against the 5.4-rc*
> >>>> master branch?
> >>> Yes, it applies on top of v5.4-rc1.
> >> I am sorry to say that I cannot currently test - my computer has a
> >> GeForce 1060-6GB an no onboard/on-chip graphics.
> >> The nvidia module 435.21 does not compile against 5.4-rc* for me (5.3.1
> >> was fine).
> > I think the two patches should apply cleanly on 5.3.x as well.
> 
> Mika, that worked.
> 
> With your two patches on top of Linux 5.3.4, two Suspend-to-RAM cycles
> (ACPI S3), one Suspend-to-disk cycle (ACPI S4),
> no regressions observed => success?

Yes, if it did not hang during resume (because of the PME loop) I think
it should be declared as success :)

Thanks a lot for testing!

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-10-04 12:39 ` [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec Mika Westerberg
@ 2019-10-26 14:19   ` Bjorn Helgaas
  2019-10-28 11:28     ` Mika Westerberg
  2019-10-29 20:54   ` Bjorn Helgaas
  1 sibling, 1 reply; 77+ messages in thread
From: Bjorn Helgaas @ 2019-10-26 14:19 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Matthias Andree, Paul Menzel, Nicholas Johnson, linux-pci,
	linux-kernel

On Fri, Oct 04, 2019 at 03:39:47PM +0300, Mika Westerberg wrote:
> Currently Linux does not follow PCIe spec regarding the required delays
> after reset. A concrete example is a Thunderbolt add-in-card that
> consists of a PCIe switch and two PCIe endpoints:
> 
>   +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
>                                   +-01.0-[04-36]-- DS hotplug port
>                                   +-02.0-[37]----00.0 xHCI controller
>                                   \-04.0-[38-6b]-- DS hotplug port
> 
> The root port (1b.0) and the PCIe switch downstream ports are all PCIe
> gen3 so they support 8GT/s link speeds.
> 
> We wait for the PCIe hierarchy to enter D3cold (runtime):
> 
>   pcieport 0000:00:1b.0: power state changed by ACPI to D3cold
> 
> When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
> PCIe switch is put to reset and its power is re-applied. This means that
> we must follow the rules in PCIe 4.0 section 6.6.1.

If you have the PCIe 5.0 spec, can you update these references to
that?  If not, I'm happy to do it for you.

> For the PCIe gen3 ports we are dealing with here, the following applies:
> 
>   With a Downstream Port that supports Link speeds greater than 5.0
>   GT/s, software must wait a minimum of 100 ms after Link training
>   completes before sending a Configuration Request to the device
>   immediately below that Port. Software can determine when Link training
>   completes by polling the Data Link Layer Link Active bit or by setting
>   up an associated interrupt (see Section 6.7.3.3).
> 
> Translating this into the above topology we would need to do this (DLLLA
> stands for Data Link Layer Link Active):
> 
>   0000:00:1b.0: wait for 100 ms after DLLLA is set before access to 0000:01:00.0
>   0000:02:00.0: wait for 100 ms after DLLLA is set before access to 0000:03:00.0
>   0000:02:02.0: wait for 100 ms after DLLLA is set before access to 0000:37:00.0
> 
> I've instrumented the kernel with some additional logging so we can see
> the actual delays performed:
> 
>   pcieport 0000:00:1b.0: power state changed by ACPI to D0
>   pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms
>   pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms
>   pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms
>   pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms
> 
> For the switch upstream port (01:00.0 reachable through 00:1b.0 root
> port) we wait for 100 ms but not taking into account the DLLLA
> requirement. We then wait 10 ms for D3hot -> D0 transition of the root
> port and the two downstream hotplug ports. This means that we deviate
> from what the spec requires.
> 
> Performing the same check for system sleep (s2idle) transitions it turns
> out to be even worse. None of the mandatory delays are performed. If
> this would be S3 instead of s2idle then according to PCI FW spec 3.2
> section 4.6.8. there is a specific _DSM that allows the OS to skip the
> delays but this platform does not provide the _DSM and does not go to S3
> anyway so no firmware is involved that could already handle these
> delays.
> 
> On this particular platform these delays are not actually needed because
> there is an additional delay as part of the ACPI power resource that is
> used to turn on power to the hierarchy but since that additional delay
> is not required by any of standards (PCIe, ACPI) it is not present in
> the Intel Ice Lake, for example where missing the mandatory delays
> causes pciehp to start tearing down the stack too early (links are not
> yet trained). Below is an example how it looks like when this happens:
> 
>   pcieport 0000:83:04.0: pciehp: Slot(4): Card not present
>   pcieport 0000:87:04.0: PME# disabled
>   pcieport 0000:83:04.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 0000:86:00
>   pcieport 0000:86:00.0: Refused to change power state, currently in D3
>   pcieport 0000:86:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x201ff)
>   pcieport 0000:86:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
>   ...
> 
> There is also one reported case (see the bugzilla link below) where the
> missing delay causes xHCI on a Titan Ridge controller fail to runtime
> resume when USB-C dock is plugged. This does not involve pciehp but
> instead it PCI core fails to runtime resume the xHCI device:
> 
>   pcieport 0000:04:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
>   pcieport 0000:04:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100406)
>   xhci_hcd 0000:39:00.0: Refused to change power state, currently in D3
>   xhci_hcd 0000:39:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
>   xhci_hcd 0000:39:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
>   ...
> 
> For this reason, introduce a new function pci_bridge_wait_for_secondary_bus()
> that is called on PCI core resume and runtime resume paths accordingly
> if the bridge entered D3cold (and thus went through reset).
> 
> This is second attempt to add the missing delays. The previous solution
> in commit c2bf1fc212f7 ("PCI: Add missing link delays required by the
> PCIe spec") was reverted because of two issues it caused:
> 
>   1. One system become unresponsive after S3 resume due to PME service
>      spinning in pcie_pme_work_fn(). The root port in question reports
>      that the xHCI sent PME but the xHCI device itself does not have PME
>      status set. The PME status bit is never cleared in the root port
>      resulting the indefinite loop in pcie_pme_work_fn().
> 
>   2. Slows down resume if the root/downstream port does not support
>      Data Link Layer Active Reporting because pcie_wait_for_link_delay()
>      waits 1100 ms in that case.
> 
> This version should avoid the above issues because we restrict the delay
> to happen only if the port went into D3cold.
> 
> Link: https://lore.kernel.org/linux-pci/SL2P216MB01878BBCD75F21D882AEEA2880C60@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM/
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203885
> Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
>  drivers/pci/pci-driver.c | 18 ++++++++
>  drivers/pci/pci.c        | 92 +++++++++++++++++++++++++++++++++++++---
>  drivers/pci/pci.h        |  1 +
>  3 files changed, 104 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index a8124e47bf6e..74a144c9cf4e 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -917,6 +917,7 @@ static int pci_pm_suspend_noirq(struct device *dev)
>  static int pci_pm_resume_noirq(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
> +	bool d3cold = pci_dev->current_state == PCI_D3cold;
>  	struct device_driver *drv = dev->driver;
>  	int error = 0;
>  
> @@ -947,6 +948,14 @@ static int pci_pm_resume_noirq(struct device *dev)
>  
>  	pcie_pme_root_status_cleanup(pci_dev);
>  
> +	/*
> +	 * If the hierarchy went into D3cold wait for the secondary bus to
> +	 * become accessible. This is important for PCIe to prevent pciehp
> +	 * from tearing down the downstream devices too soon.

The pciehp connection isn't obvious to me.

> +	 */
> +	if (d3cold)
> +		pci_bridge_wait_for_secondary_bus(pci_dev);

This will need to be rebased on top of my pci/pm branch, but I think
that's minor.

Can we move this closer to where we initiate the reset?  It's pretty
hard to tell from looking at pci_pm_resume_noirq() that there's a
reset happening here.

For D3cold->D0, I guess that would be somewhere down in
platform_pci_set_power_state()?  Maybe acpi_pci_set_power_state()?
What about the mid_pci_set_power_state() path?  Does that need this
too?

In the ACPI spec, _PS0 doesn't say anything about delays.  _ON (which
I assume is not for PCI devices themselves) *does* say firmware is
responsible for sequencing delays, so I would tend to assume it's
really firmware's job and we shouldn't need to do this in the kernel
at all.

What about D3hot->D0?  When a bridge (Root Port or Switch Downstream
Port) is in D3hot, I'm not really clear on the state of its link.  If
the link is down, I assume putting the bridge in D0 will bring it up
and we'd have to wait for that?  If so, we'd need to do something in
the reset path, e.g., pci_pm_reset()?

Sketch of control flow for this patch:

  pci_pm_resume_noirq
    if (...)
      pci_pm_default_resume_early
	pci_power_up
	  if (...)
	    platform_pci_set_power_state
	      acpi_pci_set_power_state
	  pci_raw_set_power_state(D0)
+   if (d3cold)
+     pci_bridge_wait_for_secondary_bus
    if (legacy)
      pci_legacy_resume_early
    else
      drv->pm->resume_noirq

>  	if (drv && drv->pm && drv->pm->resume_noirq)
>  		error = drv->pm->resume_noirq(dev);

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-10-26 14:19   ` Bjorn Helgaas
@ 2019-10-28 11:28     ` Mika Westerberg
  2019-10-28 13:42       ` Bjorn Helgaas
  0 siblings, 1 reply; 77+ messages in thread
From: Mika Westerberg @ 2019-10-28 11:28 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Matthias Andree, Paul Menzel, Nicholas Johnson, linux-pci,
	linux-kernel

On Sat, Oct 26, 2019 at 09:19:38AM -0500, Bjorn Helgaas wrote:
> On Fri, Oct 04, 2019 at 03:39:47PM +0300, Mika Westerberg wrote:
> > Currently Linux does not follow PCIe spec regarding the required delays
> > after reset. A concrete example is a Thunderbolt add-in-card that
> > consists of a PCIe switch and two PCIe endpoints:
> > 
> >   +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
> >                                   +-01.0-[04-36]-- DS hotplug port
> >                                   +-02.0-[37]----00.0 xHCI controller
> >                                   \-04.0-[38-6b]-- DS hotplug port
> > 
> > The root port (1b.0) and the PCIe switch downstream ports are all PCIe
> > gen3 so they support 8GT/s link speeds.
> > 
> > We wait for the PCIe hierarchy to enter D3cold (runtime):
> > 
> >   pcieport 0000:00:1b.0: power state changed by ACPI to D3cold
> > 
> > When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
> > PCIe switch is put to reset and its power is re-applied. This means that
> > we must follow the rules in PCIe 4.0 section 6.6.1.
> 
> If you have the PCIe 5.0 spec, can you update these references to
> that?  If not, I'm happy to do it for you.

I do have it and sure, I'll update them.

> > For the PCIe gen3 ports we are dealing with here, the following applies:
> > 
> >   With a Downstream Port that supports Link speeds greater than 5.0
> >   GT/s, software must wait a minimum of 100 ms after Link training
> >   completes before sending a Configuration Request to the device
> >   immediately below that Port. Software can determine when Link training
> >   completes by polling the Data Link Layer Link Active bit or by setting
> >   up an associated interrupt (see Section 6.7.3.3).
> > 
> > Translating this into the above topology we would need to do this (DLLLA
> > stands for Data Link Layer Link Active):
> > 
> >   0000:00:1b.0: wait for 100 ms after DLLLA is set before access to 0000:01:00.0
> >   0000:02:00.0: wait for 100 ms after DLLLA is set before access to 0000:03:00.0
> >   0000:02:02.0: wait for 100 ms after DLLLA is set before access to 0000:37:00.0
> > 
> > I've instrumented the kernel with some additional logging so we can see
> > the actual delays performed:
> > 
> >   pcieport 0000:00:1b.0: power state changed by ACPI to D0
> >   pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms
> >   pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms
> >   pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms
> >   pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms
> > 
> > For the switch upstream port (01:00.0 reachable through 00:1b.0 root
> > port) we wait for 100 ms but not taking into account the DLLLA
> > requirement. We then wait 10 ms for D3hot -> D0 transition of the root
> > port and the two downstream hotplug ports. This means that we deviate
> > from what the spec requires.
> > 
> > Performing the same check for system sleep (s2idle) transitions it turns
> > out to be even worse. None of the mandatory delays are performed. If
> > this would be S3 instead of s2idle then according to PCI FW spec 3.2
> > section 4.6.8. there is a specific _DSM that allows the OS to skip the
> > delays but this platform does not provide the _DSM and does not go to S3
> > anyway so no firmware is involved that could already handle these
> > delays.
> > 
> > On this particular platform these delays are not actually needed because
> > there is an additional delay as part of the ACPI power resource that is
> > used to turn on power to the hierarchy but since that additional delay
> > is not required by any of standards (PCIe, ACPI) it is not present in
> > the Intel Ice Lake, for example where missing the mandatory delays
> > causes pciehp to start tearing down the stack too early (links are not
> > yet trained). Below is an example how it looks like when this happens:
> > 
> >   pcieport 0000:83:04.0: pciehp: Slot(4): Card not present
> >   pcieport 0000:87:04.0: PME# disabled
> >   pcieport 0000:83:04.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 0000:86:00
> >   pcieport 0000:86:00.0: Refused to change power state, currently in D3
> >   pcieport 0000:86:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x201ff)
> >   pcieport 0000:86:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
> >   ...
> > 
> > There is also one reported case (see the bugzilla link below) where the
> > missing delay causes xHCI on a Titan Ridge controller fail to runtime
> > resume when USB-C dock is plugged. This does not involve pciehp but
> > instead it PCI core fails to runtime resume the xHCI device:
> > 
> >   pcieport 0000:04:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
> >   pcieport 0000:04:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100406)
> >   xhci_hcd 0000:39:00.0: Refused to change power state, currently in D3
> >   xhci_hcd 0000:39:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
> >   xhci_hcd 0000:39:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
> >   ...
> > 
> > For this reason, introduce a new function pci_bridge_wait_for_secondary_bus()
> > that is called on PCI core resume and runtime resume paths accordingly
> > if the bridge entered D3cold (and thus went through reset).
> > 
> > This is second attempt to add the missing delays. The previous solution
> > in commit c2bf1fc212f7 ("PCI: Add missing link delays required by the
> > PCIe spec") was reverted because of two issues it caused:
> > 
> >   1. One system become unresponsive after S3 resume due to PME service
> >      spinning in pcie_pme_work_fn(). The root port in question reports
> >      that the xHCI sent PME but the xHCI device itself does not have PME
> >      status set. The PME status bit is never cleared in the root port
> >      resulting the indefinite loop in pcie_pme_work_fn().
> > 
> >   2. Slows down resume if the root/downstream port does not support
> >      Data Link Layer Active Reporting because pcie_wait_for_link_delay()
> >      waits 1100 ms in that case.
> > 
> > This version should avoid the above issues because we restrict the delay
> > to happen only if the port went into D3cold.
> > 
> > Link: https://lore.kernel.org/linux-pci/SL2P216MB01878BBCD75F21D882AEEA2880C60@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM/
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=203885
> > Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > ---
> >  drivers/pci/pci-driver.c | 18 ++++++++
> >  drivers/pci/pci.c        | 92 +++++++++++++++++++++++++++++++++++++---
> >  drivers/pci/pci.h        |  1 +
> >  3 files changed, 104 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > index a8124e47bf6e..74a144c9cf4e 100644
> > --- a/drivers/pci/pci-driver.c
> > +++ b/drivers/pci/pci-driver.c
> > @@ -917,6 +917,7 @@ static int pci_pm_suspend_noirq(struct device *dev)
> >  static int pci_pm_resume_noirq(struct device *dev)
> >  {
> >  	struct pci_dev *pci_dev = to_pci_dev(dev);
> > +	bool d3cold = pci_dev->current_state == PCI_D3cold;
> >  	struct device_driver *drv = dev->driver;
> >  	int error = 0;
> >  
> > @@ -947,6 +948,14 @@ static int pci_pm_resume_noirq(struct device *dev)
> >  
> >  	pcie_pme_root_status_cleanup(pci_dev);
> >  
> > +	/*
> > +	 * If the hierarchy went into D3cold wait for the secondary bus to
> > +	 * become accessible. This is important for PCIe to prevent pciehp
> > +	 * from tearing down the downstream devices too soon.
> 
> The pciehp connection isn't obvious to me.

I tried to explain it in the changelog but maybe I'll just don't mention
it here at all to avoid confusion.

> > +	 */
> > +	if (d3cold)
> > +		pci_bridge_wait_for_secondary_bus(pci_dev);
> 
> This will need to be rebased on top of my pci/pm branch, but I think
> that's minor.

OK.

> Can we move this closer to where we initiate the reset?  It's pretty
> hard to tell from looking at pci_pm_resume_noirq() that there's a
> reset happening here.

Well we actually don't do explicit reset but instead we power the thing
on from D3cold.

> For D3cold->D0, I guess that would be somewhere down in
> platform_pci_set_power_state()?  Maybe acpi_pci_set_power_state()?
> What about the mid_pci_set_power_state() path?  Does that need this
> too?

I can take a look if it can be placed there. Yes,
mid_pci_set_power_state() may at least in theory need it too although I
don't remember any MID platforms with real PCIe devices.

> In the ACPI spec, _PS0 doesn't say anything about delays.  _ON (which
> I assume is not for PCI devices themselves) *does* say firmware is
> responsible for sequencing delays, so I would tend to assume it's
> really firmware's job and we shouldn't need to do this in the kernel
> at all.

_ON is also for PCI device itself but all those methods are not just for
PCI so they don't really talk about any PCI specific delays. You need to
look at other specs. For example PCI FW spec v3.2 section 4.6.9 says
this about the _DSM that can be used to decrease the delays:

  This function is optional. If the platform does not provide it, the
  operating system must adhere to all timing requirements as described
  in the PCI Express Base specification and/or applicable form factor
  specification, including values contained in Readiness Time Reporting
  capability structure.
  
Relevant PCIe spec section is 6.6.1 (also referenced in the changelog).

[If you have access to ECN titled "Async Hot-Plug Updates" (you can find
it in PCI-SIG site) that document has a nice table about the delays in
page 32. It compares surprise hotplug with downstream port containment
for async hotplug]

> What about D3hot->D0?  When a bridge (Root Port or Switch Downstream
> Port) is in D3hot, I'm not really clear on the state of its link.  If
> the link is down, I assume putting the bridge in D0 will bring it up
> and we'd have to wait for that?  If so, we'd need to do something in
> the reset path, e.g., pci_pm_reset()?

AFAIK the link goes into L1 when the function is programmed to any other
D state than D0. If we don't put the device into D3cold then I think it
stays in L1 where it can be brought back by writing D0 to PM register
which does not need any other delay than the D3hot -> D0 (10ms).

[There is a table in PCIe spec 5.0 section 5.3.2 that shows the link
state and power management D-state relationship.]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-10-28 11:28     ` Mika Westerberg
@ 2019-10-28 13:42       ` Bjorn Helgaas
  2019-10-28 18:06         ` Mika Westerberg
  0 siblings, 1 reply; 77+ messages in thread
From: Bjorn Helgaas @ 2019-10-28 13:42 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Matthias Andree, Paul Menzel, Nicholas Johnson, linux-pci,
	linux-kernel

On Mon, Oct 28, 2019 at 01:28:52PM +0200, Mika Westerberg wrote:
> On Sat, Oct 26, 2019 at 09:19:38AM -0500, Bjorn Helgaas wrote:
> > On Fri, Oct 04, 2019 at 03:39:47PM +0300, Mika Westerberg wrote:
> > > Currently Linux does not follow PCIe spec regarding the required delays
> > > after reset. A concrete example is a Thunderbolt add-in-card that
> > > consists of a PCIe switch and two PCIe endpoints:
> > > 
> > >   +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
> > >                                   +-01.0-[04-36]-- DS hotplug port
> > >                                   +-02.0-[37]----00.0 xHCI controller
> > >                                   \-04.0-[38-6b]-- DS hotplug port
> > > 
> > > The root port (1b.0) and the PCIe switch downstream ports are all PCIe
> > > gen3 so they support 8GT/s link speeds.
> > > 
> > > We wait for the PCIe hierarchy to enter D3cold (runtime):
> > > 
> > >   pcieport 0000:00:1b.0: power state changed by ACPI to D3cold
> > > 
> > > When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
> > > PCIe switch is put to reset and its power is re-applied. This means that
> > > we must follow the rules in PCIe 4.0 section 6.6.1.
> > 
> > If you have the PCIe 5.0 spec, can you update these references to
> > that?  If not, I'm happy to do it for you.
> 
> I do have it and sure, I'll update them.
> 
> > > For the PCIe gen3 ports we are dealing with here, the following applies:
> > > 
> > >   With a Downstream Port that supports Link speeds greater than 5.0
> > >   GT/s, software must wait a minimum of 100 ms after Link training
> > >   completes before sending a Configuration Request to the device
> > >   immediately below that Port. Software can determine when Link training
> > >   completes by polling the Data Link Layer Link Active bit or by setting
> > >   up an associated interrupt (see Section 6.7.3.3).
> > > 
> > > Translating this into the above topology we would need to do this (DLLLA
> > > stands for Data Link Layer Link Active):
> > > 
> > >   0000:00:1b.0: wait for 100 ms after DLLLA is set before access to 0000:01:00.0
> > >   0000:02:00.0: wait for 100 ms after DLLLA is set before access to 0000:03:00.0
> > >   0000:02:02.0: wait for 100 ms after DLLLA is set before access to 0000:37:00.0
> > > 
> > > I've instrumented the kernel with some additional logging so we can see
> > > the actual delays performed:
> > > 
> > >   pcieport 0000:00:1b.0: power state changed by ACPI to D0
> > >   pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms
> > >   pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms
> > >   pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms
> > >   pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms
> > > 
> > > For the switch upstream port (01:00.0 reachable through 00:1b.0 root
> > > port) we wait for 100 ms but not taking into account the DLLLA
> > > requirement. We then wait 10 ms for D3hot -> D0 transition of the root
> > > port and the two downstream hotplug ports. This means that we deviate
> > > from what the spec requires.
> > > 
> > > Performing the same check for system sleep (s2idle) transitions it turns
> > > out to be even worse. None of the mandatory delays are performed. If
> > > this would be S3 instead of s2idle then according to PCI FW spec 3.2
> > > section 4.6.8. there is a specific _DSM that allows the OS to skip the
> > > delays but this platform does not provide the _DSM and does not go to S3
> > > anyway so no firmware is involved that could already handle these
> > > delays.
> > > 
> > > On this particular platform these delays are not actually needed because
> > > there is an additional delay as part of the ACPI power resource that is
> > > used to turn on power to the hierarchy but since that additional delay
> > > is not required by any of standards (PCIe, ACPI) it is not present in
> > > the Intel Ice Lake, for example where missing the mandatory delays
> > > causes pciehp to start tearing down the stack too early (links are not
> > > yet trained). Below is an example how it looks like when this happens:
> > > 
> > >   pcieport 0000:83:04.0: pciehp: Slot(4): Card not present
> > >   pcieport 0000:87:04.0: PME# disabled
> > >   pcieport 0000:83:04.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 0000:86:00
> > >   pcieport 0000:86:00.0: Refused to change power state, currently in D3
> > >   pcieport 0000:86:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x201ff)
> > >   pcieport 0000:86:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
> > >   ...
> > > 
> > > There is also one reported case (see the bugzilla link below) where the
> > > missing delay causes xHCI on a Titan Ridge controller fail to runtime
> > > resume when USB-C dock is plugged. This does not involve pciehp but
> > > instead it PCI core fails to runtime resume the xHCI device:
> > > 
> > >   pcieport 0000:04:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
> > >   pcieport 0000:04:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100406)
> > >   xhci_hcd 0000:39:00.0: Refused to change power state, currently in D3
> > >   xhci_hcd 0000:39:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
> > >   xhci_hcd 0000:39:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
> > >   ...
> > > 
> > > For this reason, introduce a new function pci_bridge_wait_for_secondary_bus()
> > > that is called on PCI core resume and runtime resume paths accordingly
> > > if the bridge entered D3cold (and thus went through reset).
> > > 
> > > This is second attempt to add the missing delays. The previous solution
> > > in commit c2bf1fc212f7 ("PCI: Add missing link delays required by the
> > > PCIe spec") was reverted because of two issues it caused:
> > > 
> > >   1. One system become unresponsive after S3 resume due to PME service
> > >      spinning in pcie_pme_work_fn(). The root port in question reports
> > >      that the xHCI sent PME but the xHCI device itself does not have PME
> > >      status set. The PME status bit is never cleared in the root port
> > >      resulting the indefinite loop in pcie_pme_work_fn().
> > > 
> > >   2. Slows down resume if the root/downstream port does not support
> > >      Data Link Layer Active Reporting because pcie_wait_for_link_delay()
> > >      waits 1100 ms in that case.
> > > 
> > > This version should avoid the above issues because we restrict the delay
> > > to happen only if the port went into D3cold.
> > > 
> > > Link: https://lore.kernel.org/linux-pci/SL2P216MB01878BBCD75F21D882AEEA2880C60@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM/
> > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=203885
> > > Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > > Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > ---
> > >  drivers/pci/pci-driver.c | 18 ++++++++
> > >  drivers/pci/pci.c        | 92 +++++++++++++++++++++++++++++++++++++---
> > >  drivers/pci/pci.h        |  1 +
> > >  3 files changed, 104 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > > index a8124e47bf6e..74a144c9cf4e 100644
> > > --- a/drivers/pci/pci-driver.c
> > > +++ b/drivers/pci/pci-driver.c
> > > @@ -917,6 +917,7 @@ static int pci_pm_suspend_noirq(struct device *dev)
> > >  static int pci_pm_resume_noirq(struct device *dev)
> > >  {
> > >  	struct pci_dev *pci_dev = to_pci_dev(dev);
> > > +	bool d3cold = pci_dev->current_state == PCI_D3cold;
> > >  	struct device_driver *drv = dev->driver;
> > >  	int error = 0;
> > >  
> > > @@ -947,6 +948,14 @@ static int pci_pm_resume_noirq(struct device *dev)
> > >  
> > >  	pcie_pme_root_status_cleanup(pci_dev);
> > >  
> > > +	/*
> > > +	 * If the hierarchy went into D3cold wait for the secondary bus to
> > > +	 * become accessible. This is important for PCIe to prevent pciehp
> > > +	 * from tearing down the downstream devices too soon.

> > Can we move this closer to where we initiate the reset?  It's pretty
> > hard to tell from looking at pci_pm_resume_noirq() that there's a
> > reset happening here.
> 
> Well we actually don't do explicit reset but instead we power the thing
> on from D3cold.

The point is that it's too hard to maintain unless we can connect the
delay with the related hardware event.

> > For D3cold->D0, I guess that would be somewhere down in
> > platform_pci_set_power_state()?  Maybe acpi_pci_set_power_state()?
> > What about the mid_pci_set_power_state() path?  Does that need this
> > too?
> 
> I can take a look if it can be placed there. Yes,
> mid_pci_set_power_state() may at least in theory need it too although I
> don't remember any MID platforms with real PCIe devices.

I don't know how the OS is supposed to know if these are real PCIe
devices or not.  If we don't know, we have to assume they work per
spec and may require the delays per spec.

> > In the ACPI spec, _PS0 doesn't say anything about delays.  _ON (which
> > I assume is not for PCI devices themselves) *does* say firmware is
> > responsible for sequencing delays, so I would tend to assume it's
> > really firmware's job and we shouldn't need to do this in the kernel
> > at all.
> 
> _ON is also for PCI device itself but all those methods are not just for
> PCI so they don't really talk about any PCI specific delays. You need to
> look at other specs. For example PCI FW spec v3.2 section 4.6.9 says
> this about the _DSM that can be used to decrease the delays:
> 
>   This function is optional. If the platform does not provide it, the
>   operating system must adhere to all timing requirements as described
>   in the PCI Express Base specification and/or applicable form factor
>   specification, including values contained in Readiness Time Reporting
>   capability structure.

I don't think this _DSM tells us anything about delays after _ON,
_PS0, etc.  All the delays it mentions are for transitions the OS can
do natively without the _ON, _PS0, etc methods.  It makes no mention
of those methods, or of the D3cold->D0 transition (which would require
them).

> Relevant PCIe spec section is 6.6.1 (also referenced in the changelog).
> 
> [If you have access to ECN titled "Async Hot-Plug Updates" (you can find
> it in PCI-SIG site) that document has a nice table about the delays in
> page 32. It compares surprise hotplug with downstream port containment
> for async hotplug]

Thanks for the pointer, that ECN looks very useful.  It does talk
about delays in general, but I don't see anything that clarifies
whether ACPI methods or the OS is responsible for them.

> > What about D3hot->D0?  When a bridge (Root Port or Switch Downstream
> > Port) is in D3hot, I'm not really clear on the state of its link.  If
> > the link is down, I assume putting the bridge in D0 will bring it up
> > and we'd have to wait for that?  If so, we'd need to do something in
> > the reset path, e.g., pci_pm_reset()?
> 
> AFAIK the link goes into L1 when the function is programmed to any other
> D state than D0. 

Yes, and the "function" here is the one on the *downstream* end, e.g.,
the Endpoint or Switch Upstream Port.  When the upstream bridge (Root
Port or Switch Downstream Port) is in a non-D0 state, the downstream
component is unreachable (memory, I/O, and type 1 config requests are
terminated by the bridge as unsupported requests).

> If we don't put the device into D3cold then I think it
> stays in L1 where it can be brought back by writing D0 to PM register
> which does not need any other delay than the D3hot -> D0 (10ms).

In pci_pm_reset(), we're doing the D0->D3hot->D0 transitions
specifically to do a reset, so No_Soft_Reset is false.  Doesn't 6.6.1
say we need at least 100ms here?

> [There is a table in PCIe spec 5.0 section 5.3.2 that shows the link
> state and power management D-state relationship.]



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-10-28 13:42       ` Bjorn Helgaas
@ 2019-10-28 18:06         ` Mika Westerberg
  2019-10-28 20:16           ` Bjorn Helgaas
  0 siblings, 1 reply; 77+ messages in thread
From: Mika Westerberg @ 2019-10-28 18:06 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Matthias Andree, Paul Menzel, Nicholas Johnson, linux-pci,
	linux-kernel

On Mon, Oct 28, 2019 at 08:42:47AM -0500, Bjorn Helgaas wrote:
> On Mon, Oct 28, 2019 at 01:28:52PM +0200, Mika Westerberg wrote:
> > On Sat, Oct 26, 2019 at 09:19:38AM -0500, Bjorn Helgaas wrote:
> > > On Fri, Oct 04, 2019 at 03:39:47PM +0300, Mika Westerberg wrote:
> > > > Currently Linux does not follow PCIe spec regarding the required delays
> > > > after reset. A concrete example is a Thunderbolt add-in-card that
> > > > consists of a PCIe switch and two PCIe endpoints:
> > > > 
> > > >   +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
> > > >                                   +-01.0-[04-36]-- DS hotplug port
> > > >                                   +-02.0-[37]----00.0 xHCI controller
> > > >                                   \-04.0-[38-6b]-- DS hotplug port
> > > > 
> > > > The root port (1b.0) and the PCIe switch downstream ports are all PCIe
> > > > gen3 so they support 8GT/s link speeds.
> > > > 
> > > > We wait for the PCIe hierarchy to enter D3cold (runtime):
> > > > 
> > > >   pcieport 0000:00:1b.0: power state changed by ACPI to D3cold
> > > > 
> > > > When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
> > > > PCIe switch is put to reset and its power is re-applied. This means that
> > > > we must follow the rules in PCIe 4.0 section 6.6.1.
> > > 
> > > If you have the PCIe 5.0 spec, can you update these references to
> > > that?  If not, I'm happy to do it for you.
> > 
> > I do have it and sure, I'll update them.
> > 
> > > > For the PCIe gen3 ports we are dealing with here, the following applies:
> > > > 
> > > >   With a Downstream Port that supports Link speeds greater than 5.0
> > > >   GT/s, software must wait a minimum of 100 ms after Link training
> > > >   completes before sending a Configuration Request to the device
> > > >   immediately below that Port. Software can determine when Link training
> > > >   completes by polling the Data Link Layer Link Active bit or by setting
> > > >   up an associated interrupt (see Section 6.7.3.3).
> > > > 
> > > > Translating this into the above topology we would need to do this (DLLLA
> > > > stands for Data Link Layer Link Active):
> > > > 
> > > >   0000:00:1b.0: wait for 100 ms after DLLLA is set before access to 0000:01:00.0
> > > >   0000:02:00.0: wait for 100 ms after DLLLA is set before access to 0000:03:00.0
> > > >   0000:02:02.0: wait for 100 ms after DLLLA is set before access to 0000:37:00.0
> > > > 
> > > > I've instrumented the kernel with some additional logging so we can see
> > > > the actual delays performed:
> > > > 
> > > >   pcieport 0000:00:1b.0: power state changed by ACPI to D0
> > > >   pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms
> > > >   pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms
> > > >   pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms
> > > >   pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms
> > > > 
> > > > For the switch upstream port (01:00.0 reachable through 00:1b.0 root
> > > > port) we wait for 100 ms but not taking into account the DLLLA
> > > > requirement. We then wait 10 ms for D3hot -> D0 transition of the root
> > > > port and the two downstream hotplug ports. This means that we deviate
> > > > from what the spec requires.
> > > > 
> > > > Performing the same check for system sleep (s2idle) transitions it turns
> > > > out to be even worse. None of the mandatory delays are performed. If
> > > > this would be S3 instead of s2idle then according to PCI FW spec 3.2
> > > > section 4.6.8. there is a specific _DSM that allows the OS to skip the
> > > > delays but this platform does not provide the _DSM and does not go to S3
> > > > anyway so no firmware is involved that could already handle these
> > > > delays.
> > > > 
> > > > On this particular platform these delays are not actually needed because
> > > > there is an additional delay as part of the ACPI power resource that is
> > > > used to turn on power to the hierarchy but since that additional delay
> > > > is not required by any of standards (PCIe, ACPI) it is not present in
> > > > the Intel Ice Lake, for example where missing the mandatory delays
> > > > causes pciehp to start tearing down the stack too early (links are not
> > > > yet trained). Below is an example how it looks like when this happens:
> > > > 
> > > >   pcieport 0000:83:04.0: pciehp: Slot(4): Card not present
> > > >   pcieport 0000:87:04.0: PME# disabled
> > > >   pcieport 0000:83:04.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 0000:86:00
> > > >   pcieport 0000:86:00.0: Refused to change power state, currently in D3
> > > >   pcieport 0000:86:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x201ff)
> > > >   pcieport 0000:86:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
> > > >   ...
> > > > 
> > > > There is also one reported case (see the bugzilla link below) where the
> > > > missing delay causes xHCI on a Titan Ridge controller fail to runtime
> > > > resume when USB-C dock is plugged. This does not involve pciehp but
> > > > instead it PCI core fails to runtime resume the xHCI device:
> > > > 
> > > >   pcieport 0000:04:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
> > > >   pcieport 0000:04:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100406)
> > > >   xhci_hcd 0000:39:00.0: Refused to change power state, currently in D3
> > > >   xhci_hcd 0000:39:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
> > > >   xhci_hcd 0000:39:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
> > > >   ...
> > > > 
> > > > For this reason, introduce a new function pci_bridge_wait_for_secondary_bus()
> > > > that is called on PCI core resume and runtime resume paths accordingly
> > > > if the bridge entered D3cold (and thus went through reset).
> > > > 
> > > > This is second attempt to add the missing delays. The previous solution
> > > > in commit c2bf1fc212f7 ("PCI: Add missing link delays required by the
> > > > PCIe spec") was reverted because of two issues it caused:
> > > > 
> > > >   1. One system become unresponsive after S3 resume due to PME service
> > > >      spinning in pcie_pme_work_fn(). The root port in question reports
> > > >      that the xHCI sent PME but the xHCI device itself does not have PME
> > > >      status set. The PME status bit is never cleared in the root port
> > > >      resulting the indefinite loop in pcie_pme_work_fn().
> > > > 
> > > >   2. Slows down resume if the root/downstream port does not support
> > > >      Data Link Layer Active Reporting because pcie_wait_for_link_delay()
> > > >      waits 1100 ms in that case.
> > > > 
> > > > This version should avoid the above issues because we restrict the delay
> > > > to happen only if the port went into D3cold.
> > > > 
> > > > Link: https://lore.kernel.org/linux-pci/SL2P216MB01878BBCD75F21D882AEEA2880C60@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM/
> > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=203885
> > > > Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > > > Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > > > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > > ---
> > > >  drivers/pci/pci-driver.c | 18 ++++++++
> > > >  drivers/pci/pci.c        | 92 +++++++++++++++++++++++++++++++++++++---
> > > >  drivers/pci/pci.h        |  1 +
> > > >  3 files changed, 104 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > > > index a8124e47bf6e..74a144c9cf4e 100644
> > > > --- a/drivers/pci/pci-driver.c
> > > > +++ b/drivers/pci/pci-driver.c
> > > > @@ -917,6 +917,7 @@ static int pci_pm_suspend_noirq(struct device *dev)
> > > >  static int pci_pm_resume_noirq(struct device *dev)
> > > >  {
> > > >  	struct pci_dev *pci_dev = to_pci_dev(dev);
> > > > +	bool d3cold = pci_dev->current_state == PCI_D3cold;
> > > >  	struct device_driver *drv = dev->driver;
> > > >  	int error = 0;
> > > >  
> > > > @@ -947,6 +948,14 @@ static int pci_pm_resume_noirq(struct device *dev)
> > > >  
> > > >  	pcie_pme_root_status_cleanup(pci_dev);
> > > >  
> > > > +	/*
> > > > +	 * If the hierarchy went into D3cold wait for the secondary bus to
> > > > +	 * become accessible. This is important for PCIe to prevent pciehp
> > > > +	 * from tearing down the downstream devices too soon.
> 
> > > Can we move this closer to where we initiate the reset?  It's pretty
> > > hard to tell from looking at pci_pm_resume_noirq() that there's a
> > > reset happening here.
> > 
> > Well we actually don't do explicit reset but instead we power the thing
> > on from D3cold.
> 
> The point is that it's too hard to maintain unless we can connect the
> delay with the related hardware event.

The related hardware event is resume in this case. Can you point me to
the actual point where you want me to put this?

> > > For D3cold->D0, I guess that would be somewhere down in
> > > platform_pci_set_power_state()?  Maybe acpi_pci_set_power_state()?
> > > What about the mid_pci_set_power_state() path?  Does that need this
> > > too?
> > 
> > I can take a look if it can be placed there. Yes,
> > mid_pci_set_power_state() may at least in theory need it too although I
> > don't remember any MID platforms with real PCIe devices.
> 
> I don't know how the OS is supposed to know if these are real PCIe
> devices or not.  If we don't know, we have to assume they work per
> spec and may require the delays per spec.

Well MID devices are pretty much "hard-coded" the OS knows everything
there is connected.

> > > In the ACPI spec, _PS0 doesn't say anything about delays.  _ON (which
> > > I assume is not for PCI devices themselves) *does* say firmware is
> > > responsible for sequencing delays, so I would tend to assume it's
> > > really firmware's job and we shouldn't need to do this in the kernel
> > > at all.
> > 
> > _ON is also for PCI device itself but all those methods are not just for
> > PCI so they don't really talk about any PCI specific delays. You need to
> > look at other specs. For example PCI FW spec v3.2 section 4.6.9 says
> > this about the _DSM that can be used to decrease the delays:
> > 
> >   This function is optional. If the platform does not provide it, the
> >   operating system must adhere to all timing requirements as described
> >   in the PCI Express Base specification and/or applicable form factor
> >   specification, including values contained in Readiness Time Reporting
> >   capability structure.
> 
> I don't think this _DSM tells us anything about delays after _ON,
> _PS0, etc.  All the delays it mentions are for transitions the OS can
> do natively without the _ON, _PS0, etc methods.  It makes no mention
> of those methods, or of the D3cold->D0 transition (which would require
> them).

D3cold->D0 transition is explained in PCI spec 5.0 page 492 (there is
picture). You can see that D3cold -> D0 involves fundamental reset.
Section 6.6.1 (page 551) then says that rundamental reset is one
category of conventional reset. Now, that _DSM allows lowering the init
time after conventional reset. So to me it talks exactly about those
delays (also PCIe cannot go into D3cold without help from the platform,
ACPI in this case).

> > Relevant PCIe spec section is 6.6.1 (also referenced in the changelog).
> > 
> > [If you have access to ECN titled "Async Hot-Plug Updates" (you can find
> > it in PCI-SIG site) that document has a nice table about the delays in
> > page 32. It compares surprise hotplug with downstream port containment
> > for async hotplug]
> 
> Thanks for the pointer, that ECN looks very useful.  It does talk
> about delays in general, but I don't see anything that clarifies
> whether ACPI methods or the OS is responsible for them.

No but the _DSM description above is pretty clear about that. At least
for me it is clear.

> > > What about D3hot->D0?  When a bridge (Root Port or Switch Downstream
> > > Port) is in D3hot, I'm not really clear on the state of its link.  If
> > > the link is down, I assume putting the bridge in D0 will bring it up
> > > and we'd have to wait for that?  If so, we'd need to do something in
> > > the reset path, e.g., pci_pm_reset()?
> > 
> > AFAIK the link goes into L1 when the function is programmed to any other
> > D state than D0. 
> 
> Yes, and the "function" here is the one on the *downstream* end, e.g.,
> the Endpoint or Switch Upstream Port.  When the upstream bridge (Root
> Port or Switch Downstream Port) is in a non-D0 state, the downstream
> component is unreachable (memory, I/O, and type 1 config requests are
> terminated by the bridge as unsupported requests).

Yes, the link is in L1 (its PM state is determined by the D-state of the
downstream component. From there you can get it back to functional state
by programming the downstream port to D0 (the link is still in L1)
followed by programming the function itself to D0 which brings the link
back to L0. It does not involve conventional reset (see picture in page
492 of PCIe 5.0 spec). The recovery delays needed are listed in the same
page.

> > If we don't put the device into D3cold then I think it
> > stays in L1 where it can be brought back by writing D0 to PM register
> > which does not need any other delay than the D3hot -> D0 (10ms).
> 
> In pci_pm_reset(), we're doing the D0->D3hot->D0 transitions
> specifically to do a reset, so No_Soft_Reset is false.  Doesn't 6.6.1
> say we need at least 100ms here?

No since it does not go into D3cold. It just "reset" the thing if it
happens to do internal reset after D3hot -> D0.

Actually looking at the spec 5.3.1.4 it seems that pci_pm_reset() may
depend on something not guaranteed:

  If the No_Soft_Reset bit is Clear, functional context is not required
  to be maintained by the Function in the D3hot state, however it is not
  guaranteed that functional context will be cleared and software must
  not depend on such behavior.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-10-28 18:06         ` Mika Westerberg
@ 2019-10-28 20:16           ` Bjorn Helgaas
  2019-10-29 11:15             ` Mika Westerberg
  0 siblings, 1 reply; 77+ messages in thread
From: Bjorn Helgaas @ 2019-10-28 20:16 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Matthias Andree, Paul Menzel, Nicholas Johnson, linux-pci,
	linux-kernel

On Mon, Oct 28, 2019 at 08:06:01PM +0200, Mika Westerberg wrote:
> On Mon, Oct 28, 2019 at 08:42:47AM -0500, Bjorn Helgaas wrote:
> > On Mon, Oct 28, 2019 at 01:28:52PM +0200, Mika Westerberg wrote:
> > > On Sat, Oct 26, 2019 at 09:19:38AM -0500, Bjorn Helgaas wrote:
> > > > On Fri, Oct 04, 2019 at 03:39:47PM +0300, Mika Westerberg wrote:
> > > > > Currently Linux does not follow PCIe spec regarding the required delays
> > > > > after reset. A concrete example is a Thunderbolt add-in-card that
> > > > > consists of a PCIe switch and two PCIe endpoints:
> > > > > 
> > > > >   +-1b.0-[01-6b]----00.0-[02-6b]--+-00.0-[03]----00.0 TBT controller
> > > > >                                   +-01.0-[04-36]-- DS hotplug port
> > > > >                                   +-02.0-[37]----00.0 xHCI controller
> > > > >                                   \-04.0-[38-6b]-- DS hotplug port
> > > > > 
> > > > > The root port (1b.0) and the PCIe switch downstream ports are all PCIe
> > > > > gen3 so they support 8GT/s link speeds.
> > > > > 
> > > > > We wait for the PCIe hierarchy to enter D3cold (runtime):
> > > > > 
> > > > >   pcieport 0000:00:1b.0: power state changed by ACPI to D3cold
> > > > > 
> > > > > When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
> > > > > PCIe switch is put to reset and its power is re-applied. This means that
> > > > > we must follow the rules in PCIe 4.0 section 6.6.1.
> > > > 
> > > > If you have the PCIe 5.0 spec, can you update these references to
> > > > that?  If not, I'm happy to do it for you.
> > > 
> > > I do have it and sure, I'll update them.
> > > 
> > > > > For the PCIe gen3 ports we are dealing with here, the following applies:
> > > > > 
> > > > >   With a Downstream Port that supports Link speeds greater than 5.0
> > > > >   GT/s, software must wait a minimum of 100 ms after Link training
> > > > >   completes before sending a Configuration Request to the device
> > > > >   immediately below that Port. Software can determine when Link training
> > > > >   completes by polling the Data Link Layer Link Active bit or by setting
> > > > >   up an associated interrupt (see Section 6.7.3.3).
> > > > > 
> > > > > Translating this into the above topology we would need to do this (DLLLA
> > > > > stands for Data Link Layer Link Active):
> > > > > 
> > > > >   0000:00:1b.0: wait for 100 ms after DLLLA is set before access to 0000:01:00.0
> > > > >   0000:02:00.0: wait for 100 ms after DLLLA is set before access to 0000:03:00.0
> > > > >   0000:02:02.0: wait for 100 ms after DLLLA is set before access to 0000:37:00.0
> > > > > 
> > > > > I've instrumented the kernel with some additional logging so we can see
> > > > > the actual delays performed:
> > > > > 
> > > > >   pcieport 0000:00:1b.0: power state changed by ACPI to D0
> > > > >   pcieport 0000:00:1b.0: waiting for D3cold delay of 100 ms
> > > > >   pcieport 0000:00:1b.0: waiting for D3hot delay of 10 ms
> > > > >   pcieport 0000:02:01.0: waiting for D3hot delay of 10 ms
> > > > >   pcieport 0000:02:04.0: waiting for D3hot delay of 10 ms
> > > > > 
> > > > > For the switch upstream port (01:00.0 reachable through 00:1b.0 root
> > > > > port) we wait for 100 ms but not taking into account the DLLLA
> > > > > requirement. We then wait 10 ms for D3hot -> D0 transition of the root
> > > > > port and the two downstream hotplug ports. This means that we deviate
> > > > > from what the spec requires.
> > > > > 
> > > > > Performing the same check for system sleep (s2idle) transitions it turns
> > > > > out to be even worse. None of the mandatory delays are performed. If
> > > > > this would be S3 instead of s2idle then according to PCI FW spec 3.2
> > > > > section 4.6.8. there is a specific _DSM that allows the OS to skip the
> > > > > delays but this platform does not provide the _DSM and does not go to S3
> > > > > anyway so no firmware is involved that could already handle these
> > > > > delays.
> > > > > 
> > > > > On this particular platform these delays are not actually needed because
> > > > > there is an additional delay as part of the ACPI power resource that is
> > > > > used to turn on power to the hierarchy but since that additional delay
> > > > > is not required by any of standards (PCIe, ACPI) it is not present in
> > > > > the Intel Ice Lake, for example where missing the mandatory delays
> > > > > causes pciehp to start tearing down the stack too early (links are not
> > > > > yet trained). Below is an example how it looks like when this happens:
> > > > > 
> > > > >   pcieport 0000:83:04.0: pciehp: Slot(4): Card not present
> > > > >   pcieport 0000:87:04.0: PME# disabled
> > > > >   pcieport 0000:83:04.0: pciehp: pciehp_unconfigure_device: domain:bus:dev = 0000:86:00
> > > > >   pcieport 0000:86:00.0: Refused to change power state, currently in D3
> > > > >   pcieport 0000:86:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x201ff)
> > > > >   pcieport 0000:86:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
> > > > >   ...
> > > > > 
> > > > > There is also one reported case (see the bugzilla link below) where the
> > > > > missing delay causes xHCI on a Titan Ridge controller fail to runtime
> > > > > resume when USB-C dock is plugged. This does not involve pciehp but
> > > > > instead it PCI core fails to runtime resume the xHCI device:
> > > > > 
> > > > >   pcieport 0000:04:02.0: restoring config space at offset 0xc (was 0x10000, writing 0x10020)
> > > > >   pcieport 0000:04:02.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100406)
> > > > >   xhci_hcd 0000:39:00.0: Refused to change power state, currently in D3
> > > > >   xhci_hcd 0000:39:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x1ff)
> > > > >   xhci_hcd 0000:39:00.0: restoring config space at offset 0x38 (was 0xffffffff, writing 0x0)
> > > > >   ...
> > > > > 
> > > > > For this reason, introduce a new function pci_bridge_wait_for_secondary_bus()
> > > > > that is called on PCI core resume and runtime resume paths accordingly
> > > > > if the bridge entered D3cold (and thus went through reset).
> > > > > 
> > > > > This is second attempt to add the missing delays. The previous solution
> > > > > in commit c2bf1fc212f7 ("PCI: Add missing link delays required by the
> > > > > PCIe spec") was reverted because of two issues it caused:
> > > > > 
> > > > >   1. One system become unresponsive after S3 resume due to PME service
> > > > >      spinning in pcie_pme_work_fn(). The root port in question reports
> > > > >      that the xHCI sent PME but the xHCI device itself does not have PME
> > > > >      status set. The PME status bit is never cleared in the root port
> > > > >      resulting the indefinite loop in pcie_pme_work_fn().
> > > > > 
> > > > >   2. Slows down resume if the root/downstream port does not support
> > > > >      Data Link Layer Active Reporting because pcie_wait_for_link_delay()
> > > > >      waits 1100 ms in that case.
> > > > > 
> > > > > This version should avoid the above issues because we restrict the delay
> > > > > to happen only if the port went into D3cold.
> > > > > 
> > > > > Link: https://lore.kernel.org/linux-pci/SL2P216MB01878BBCD75F21D882AEEA2880C60@SL2P216MB0187.KORP216.PROD.OUTLOOK.COM/
> > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=203885
> > > > > Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > > > > Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > > > > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > > > ---
> > > > >  drivers/pci/pci-driver.c | 18 ++++++++
> > > > >  drivers/pci/pci.c        | 92 +++++++++++++++++++++++++++++++++++++---
> > > > >  drivers/pci/pci.h        |  1 +
> > > > >  3 files changed, 104 insertions(+), 7 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> > > > > index a8124e47bf6e..74a144c9cf4e 100644
> > > > > --- a/drivers/pci/pci-driver.c
> > > > > +++ b/drivers/pci/pci-driver.c
> > > > > @@ -917,6 +917,7 @@ static int pci_pm_suspend_noirq(struct device *dev)
> > > > >  static int pci_pm_resume_noirq(struct device *dev)
> > > > >  {
> > > > >  	struct pci_dev *pci_dev = to_pci_dev(dev);
> > > > > +	bool d3cold = pci_dev->current_state == PCI_D3cold;
> > > > >  	struct device_driver *drv = dev->driver;
> > > > >  	int error = 0;
> > > > >  
> > > > > @@ -947,6 +948,14 @@ static int pci_pm_resume_noirq(struct device *dev)
> > > > >  
> > > > >  	pcie_pme_root_status_cleanup(pci_dev);
> > > > >  
> > > > > +	/*
> > > > > +	 * If the hierarchy went into D3cold wait for the secondary bus to
> > > > > +	 * become accessible. This is important for PCIe to prevent pciehp
> > > > > +	 * from tearing down the downstream devices too soon.
> > 
> > > > Can we move this closer to where we initiate the reset?  It's pretty
> > > > hard to tell from looking at pci_pm_resume_noirq() that there's a
> > > > reset happening here.
> > > 
> > > Well we actually don't do explicit reset but instead we power the thing
> > > on from D3cold.
> > 
> > The point is that it's too hard to maintain unless we can connect the
> > delay with the related hardware event.
> 
> The related hardware event is resume in this case. Can you point me to
> the actual point where you want me to put this?

"Resume" is a Linux software concept, so of course the PCIe spec
doesn't say anything about it.  The spec talks about delays related to
resets and device power and link state transitions, so somehow we have
to connect the Linux delay with those hardware events.

Since we're talking about a transition from D3cold, this has to be
done via something external to the device such as power regulators.
For ACPI systems that's probably hidden inside _PS0 or something
similar.  That's opaque, but at least it's a hook that says "here's
where we put the device into D0".  I suggested
acpi_pci_set_power_state() as a possibility since I think that's the
lowest-level point where we have the pci_dev so we know the current
state and the new state.

> > > > For D3cold->D0, I guess that would be somewhere down in
> > > > platform_pci_set_power_state()?  Maybe acpi_pci_set_power_state()?
> > > > What about the mid_pci_set_power_state() path?  Does that need this
> > > > too?
> > > 
> > > I can take a look if it can be placed there. Yes,
> > > mid_pci_set_power_state() may at least in theory need it too although I
> > > don't remember any MID platforms with real PCIe devices.
> > 
> > I don't know how the OS is supposed to know if these are real PCIe
> > devices or not.  If we don't know, we have to assume they work per
> > spec and may require the delays per spec.
> 
> Well MID devices are pretty much "hard-coded" the OS knows everything
> there is connected.

MID seems to be magic in that it wants to use the normal PCI core
without having to abide by all the assumptions in the spec.  That's
OK, but MID needs to be explicit about when it is OK to violate those
assumptions.  In this case, I think it means that if we add the delay
to acpi_pci_set_power_state(), we should at least add a comment to
mid_pci_set_power_state() about why the delay is or is not required
for MID.

> > > > In the ACPI spec, _PS0 doesn't say anything about delays.  _ON (which
> > > > I assume is not for PCI devices themselves) *does* say firmware is
> > > > responsible for sequencing delays, so I would tend to assume it's
> > > > really firmware's job and we shouldn't need to do this in the kernel
> > > > at all.
> > > 
> > > _ON is also for PCI device itself but all those methods are not just for
> > > PCI so they don't really talk about any PCI specific delays. You need to
> > > look at other specs. For example PCI FW spec v3.2 section 4.6.9 says
> > > this about the _DSM that can be used to decrease the delays:
> > > 
> > >   This function is optional. If the platform does not provide it, the
> > >   operating system must adhere to all timing requirements as described
> > >   in the PCI Express Base specification and/or applicable form factor
> > >   specification, including values contained in Readiness Time Reporting
> > >   capability structure.
> > 
> > I don't think this _DSM tells us anything about delays after _ON,
> > _PS0, etc.  All the delays it mentions are for transitions the OS can
> > do natively without the _ON, _PS0, etc methods.  It makes no mention
> > of those methods, or of the D3cold->D0 transition (which would require
> > them).
> 
> D3cold->D0 transition is explained in PCI spec 5.0 page 492 (there is
> picture). You can see that D3cold -> D0 involves fundamental reset.
> Section 6.6.1 (page 551) then says that fundamental reset is one
> category of conventional reset. Now, that _DSM allows lowering the init
> time after conventional reset. So to me it talks exactly about those
> delays (also PCIe cannot go into D3cold without help from the platform,
> ACPI in this case).

Everything on the _DSM list is something the OS can do natively (even
conventional reset can be done via Secondary Bus Reset), and it says
nothing about a connection with ACPI power management methods (_PS0,
etc), so I think it's ambiguous at best.  A simple "OS is responsible
for any bus-specific delays after a transition" in the ACPI _PS0
documentation would have trivially resolved this.

But it seems that at least some ACPI firmware doesn't do those delays,
so I guess our only alternatives are to always do it in the OS or have
some sort of blacklist.  And it doesn't really seem practical to
maintain a blacklist.

> > > Relevant PCIe spec section is 6.6.1 (also referenced in the changelog).
> > > 
> > > [If you have access to ECN titled "Async Hot-Plug Updates" (you can find
> > > it in PCI-SIG site) that document has a nice table about the delays in
> > > page 32. It compares surprise hotplug with downstream port containment
> > > for async hotplug]
> > 
> > Thanks for the pointer, that ECN looks very useful.  It does talk
> > about delays in general, but I don't see anything that clarifies
> > whether ACPI methods or the OS is responsible for them.
> 
> No but the _DSM description above is pretty clear about that. At least
> for me it is clear.
> 
> > > > What about D3hot->D0?  When a bridge (Root Port or Switch Downstream
> > > > Port) is in D3hot, I'm not really clear on the state of its link.  If
> > > > the link is down, I assume putting the bridge in D0 will bring it up
> > > > and we'd have to wait for that?  If so, we'd need to do something in
> > > > the reset path, e.g., pci_pm_reset()?
> > > 
> > > AFAIK the link goes into L1 when the function is programmed to any other
> > > D state than D0. 
> > 
> > Yes, and the "function" here is the one on the *downstream* end, e.g.,
> > the Endpoint or Switch Upstream Port.  When the upstream bridge (Root
> > Port or Switch Downstream Port) is in a non-D0 state, the downstream
> > component is unreachable (memory, I/O, and type 1 config requests are
> > terminated by the bridge as unsupported requests).
> 
> Yes, the link is in L1 (its PM state is determined by the D-state of the
> downstream component. From there you can get it back to functional state
> by programming the downstream port to D0 (the link is still in L1)
> followed by programming the function itself to D0 which brings the link
> back to L0. It does not involve conventional reset (see picture in page
> 492 of PCIe 5.0 spec). The recovery delays needed are listed in the same
> page.
> 
> > > If we don't put the device into D3cold then I think it
> > > stays in L1 where it can be brought back by writing D0 to PM register
> > > which does not need any other delay than the D3hot -> D0 (10ms).
> > 
> > In pci_pm_reset(), we're doing the D0->D3hot->D0 transitions
> > specifically to do a reset, so No_Soft_Reset is false.  Doesn't 6.6.1
> > say we need at least 100ms here?
> 
> No since it does not go into D3cold. It just "reset" the thing if it
> happens to do internal reset after D3hot -> D0.

Sec 5.8, Figure 5-18 says D3hot->D0uninitialized is a "Soft Reset", which
unfortunately is not defined.

My guess is that in sec 5.9, Table 5-13, the 10ms delay is for the
D3hot->D0active (i.e., No_Soft_Reset=1) transition, and the
D3hot->D0uninitialized (i.e., No_Soft_Reset=0) that does a "soft
reset" (whatever that is) probably requires more and we should handle
it like a conventional reset to be safe.

> Actually looking at the spec 5.3.1.4 it seems that pci_pm_reset() may
> depend on something not guaranteed:
> 
>   If the No_Soft_Reset bit is Clear, functional context is not required
>   to be maintained by the Function in the D3hot state, however it is not
>   guaranteed that functional context will be cleared and software must
>   not depend on such behavior.

Good point.  Sounds like that reset method may not be reliable in
general, although it might work for SR-IOV: 9.6.2 says that PFs with
No_Soft_Reset clear must perform an internal reset on D3hot->D0.

Bjorn

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-10-28 20:16           ` Bjorn Helgaas
@ 2019-10-29 11:15             ` Mika Westerberg
  2019-10-29 20:27               ` Bjorn Helgaas
  0 siblings, 1 reply; 77+ messages in thread
From: Mika Westerberg @ 2019-10-29 11:15 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Matthias Andree, Paul Menzel, Nicholas Johnson, linux-pci,
	linux-kernel

On Mon, Oct 28, 2019 at 03:16:53PM -0500, Bjorn Helgaas wrote:
> > The related hardware event is resume in this case. Can you point me to
> > the actual point where you want me to put this?
> 
> "Resume" is a Linux software concept, so of course the PCIe spec
> doesn't say anything about it.  The spec talks about delays related to
> resets and device power and link state transitions, so somehow we have
> to connect the Linux delay with those hardware events.
> 
> Since we're talking about a transition from D3cold, this has to be
> done via something external to the device such as power regulators.
> For ACPI systems that's probably hidden inside _PS0 or something
> similar.  That's opaque, but at least it's a hook that says "here's
> where we put the device into D0".  I suggested
> acpi_pci_set_power_state() as a possibility since I think that's the
> lowest-level point where we have the pci_dev so we know the current
> state and the new state.

I looked at how we could use acpi_pci_set_power_state() but I don't
think it is possible because it is likely that only the root port has
the power resource that is used to bring the link to L2 or L3. However,
we would need to repeat the delay for each downstream/root port if there
are multiple PCIe switches in the topology.

Also the delay needs to be issued after the downstream link is trained
so the downstream/root port needs to be in D0 first.

> > > > > For D3cold->D0, I guess that would be somewhere down in
> > > > > platform_pci_set_power_state()?  Maybe acpi_pci_set_power_state()?
> > > > > What about the mid_pci_set_power_state() path?  Does that need this
> > > > > too?
> > > > 
> > > > I can take a look if it can be placed there. Yes,
> > > > mid_pci_set_power_state() may at least in theory need it too although I
> > > > don't remember any MID platforms with real PCIe devices.
> > > 
> > > I don't know how the OS is supposed to know if these are real PCIe
> > > devices or not.  If we don't know, we have to assume they work per
> > > spec and may require the delays per spec.
> > 
> > Well MID devices are pretty much "hard-coded" the OS knows everything
> > there is connected.
> 
> MID seems to be magic in that it wants to use the normal PCI core
> without having to abide by all the assumptions in the spec.  That's
> OK, but MID needs to be explicit about when it is OK to violate those
> assumptions.  In this case, I think it means that if we add the delay
> to acpi_pci_set_power_state(), we should at least add a comment to
> mid_pci_set_power_state() about why the delay is or is not required
> for MID.
> 
> > > > > In the ACPI spec, _PS0 doesn't say anything about delays.  _ON (which
> > > > > I assume is not for PCI devices themselves) *does* say firmware is
> > > > > responsible for sequencing delays, so I would tend to assume it's
> > > > > really firmware's job and we shouldn't need to do this in the kernel
> > > > > at all.
> > > > 
> > > > _ON is also for PCI device itself but all those methods are not just for
> > > > PCI so they don't really talk about any PCI specific delays. You need to
> > > > look at other specs. For example PCI FW spec v3.2 section 4.6.9 says
> > > > this about the _DSM that can be used to decrease the delays:
> > > > 
> > > >   This function is optional. If the platform does not provide it, the
> > > >   operating system must adhere to all timing requirements as described
> > > >   in the PCI Express Base specification and/or applicable form factor
> > > >   specification, including values contained in Readiness Time Reporting
> > > >   capability structure.
> > > 
> > > I don't think this _DSM tells us anything about delays after _ON,
> > > _PS0, etc.  All the delays it mentions are for transitions the OS can
> > > do natively without the _ON, _PS0, etc methods.  It makes no mention
> > > of those methods, or of the D3cold->D0 transition (which would require
> > > them).
> > 
> > D3cold->D0 transition is explained in PCI spec 5.0 page 492 (there is
> > picture). You can see that D3cold -> D0 involves fundamental reset.
> > Section 6.6.1 (page 551) then says that fundamental reset is one
> > category of conventional reset. Now, that _DSM allows lowering the init
> > time after conventional reset. So to me it talks exactly about those
> > delays (also PCIe cannot go into D3cold without help from the platform,
> > ACPI in this case).
> 
> Everything on the _DSM list is something the OS can do natively (even
> conventional reset can be done via Secondary Bus Reset), and it says
> nothing about a connection with ACPI power management methods (_PS0,
> etc), so I think it's ambiguous at best.  A simple "OS is responsible
> for any bus-specific delays after a transition" in the ACPI _PS0
> documentation would have trivially resolved this.

But I would imagine that is not always the case, that's the reason we
have documents such as PCI FW.

> But it seems that at least some ACPI firmware doesn't do those delays,
> so I guess our only alternatives are to always do it in the OS or have
> some sort of blacklist.  And it doesn't really seem practical to
> maintain a blacklist.

I really think this is crystal clear:

The OS is always responsible for the delays described in the PCIe spec.
However, if the platform implements some of them say in _ON or _PS0
methods then it can notify the OS about this by using the _DSM so the OS
does not need to duplicate all of them.

> > > > Relevant PCIe spec section is 6.6.1 (also referenced in the changelog).
> > > > 
> > > > [If you have access to ECN titled "Async Hot-Plug Updates" (you can find
> > > > it in PCI-SIG site) that document has a nice table about the delays in
> > > > page 32. It compares surprise hotplug with downstream port containment
> > > > for async hotplug]
> > > 
> > > Thanks for the pointer, that ECN looks very useful.  It does talk
> > > about delays in general, but I don't see anything that clarifies
> > > whether ACPI methods or the OS is responsible for them.
> > 
> > No but the _DSM description above is pretty clear about that. At least
> > for me it is clear.
> > 
> > > > > What about D3hot->D0?  When a bridge (Root Port or Switch Downstream
> > > > > Port) is in D3hot, I'm not really clear on the state of its link.  If
> > > > > the link is down, I assume putting the bridge in D0 will bring it up
> > > > > and we'd have to wait for that?  If so, we'd need to do something in
> > > > > the reset path, e.g., pci_pm_reset()?
> > > > 
> > > > AFAIK the link goes into L1 when the function is programmed to any other
> > > > D state than D0. 
> > > 
> > > Yes, and the "function" here is the one on the *downstream* end, e.g.,
> > > the Endpoint or Switch Upstream Port.  When the upstream bridge (Root
> > > Port or Switch Downstream Port) is in a non-D0 state, the downstream
> > > component is unreachable (memory, I/O, and type 1 config requests are
> > > terminated by the bridge as unsupported requests).
> > 
> > Yes, the link is in L1 (its PM state is determined by the D-state of the
> > downstream component. From there you can get it back to functional state
> > by programming the downstream port to D0 (the link is still in L1)
> > followed by programming the function itself to D0 which brings the link
> > back to L0. It does not involve conventional reset (see picture in page
> > 492 of PCIe 5.0 spec). The recovery delays needed are listed in the same
> > page.
> > 
> > > > If we don't put the device into D3cold then I think it
> > > > stays in L1 where it can be brought back by writing D0 to PM register
> > > > which does not need any other delay than the D3hot -> D0 (10ms).
> > > 
> > > In pci_pm_reset(), we're doing the D0->D3hot->D0 transitions
> > > specifically to do a reset, so No_Soft_Reset is false.  Doesn't 6.6.1
> > > say we need at least 100ms here?
> > 
> > No since it does not go into D3cold. It just "reset" the thing if it
> > happens to do internal reset after D3hot -> D0.
> 
> Sec 5.8, Figure 5-18 says D3hot->D0uninitialized is a "Soft Reset", which
> unfortunately is not defined.
> 
> My guess is that in sec 5.9, Table 5-13, the 10ms delay is for the
> D3hot->D0active (i.e., No_Soft_Reset=1) transition, and the
> D3hot->D0uninitialized (i.e., No_Soft_Reset=0) that does a "soft
> reset" (whatever that is) probably requires more and we should handle
> it like a conventional reset to be safe.

I think it simply means the device functional context is lost (there is
more in section 5.3.1.4). Linux handles this properly already (well at
least according the minimum timings required by the spec) and restores
the context accordingly after it has waited for the 10ms.

It is the D3cold (where links go to L2 or L3) where we really need the
delays so that the link gets properly trained before we start poking the
downstream device.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-10-29 11:15             ` Mika Westerberg
@ 2019-10-29 20:27               ` Bjorn Helgaas
  2019-10-30 11:15                 ` Mika Westerberg
  0 siblings, 1 reply; 77+ messages in thread
From: Bjorn Helgaas @ 2019-10-29 20:27 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Matthias Andree, Paul Menzel, Nicholas Johnson, linux-pci,
	linux-kernel

On Tue, Oct 29, 2019 at 01:15:20PM +0200, Mika Westerberg wrote:
> On Mon, Oct 28, 2019 at 03:16:53PM -0500, Bjorn Helgaas wrote:
> > > The related hardware event is resume in this case. Can you point
> > > me to the actual point where you want me to put this?
> > 
> > "Resume" is a Linux software concept, so of course the PCIe spec
> > doesn't say anything about it.  The spec talks about delays
> > related to resets and device power and link state transitions, so
> > somehow we have to connect the Linux delay with those hardware
> > events.
> > 
> > Since we're talking about a transition from D3cold, this has to be
> > done via something external to the device such as power
> > regulators.  For ACPI systems that's probably hidden inside _PS0
> > or something similar.  That's opaque, but at least it's a hook
> > that says "here's where we put the device into D0".  I suggested
> > acpi_pci_set_power_state() as a possibility since I think that's
> > the lowest-level point where we have the pci_dev so we know the
> > current state and the new state.
> 
> I looked at how we could use acpi_pci_set_power_state() but I don't
> think it is possible because it is likely that only the root port
> has the power resource that is used to bring the link to L2 or L3.
> However, we would need to repeat the delay for each downstream/root
> port if there are multiple PCIe switches in the topology.

OK, I think I understand why that's a problem (correct me if I'm
wrong):

  We call pci_pm_resume_noirq() for every device, but it only calls
  acpi_pci_set_power_state() for devices that have _PS0 or _PR0
  methods.  So if the delay is in acpi_pci_set_power_state() and we
  have A -> B -> C where only A has _PS0, we would delay for the link
  to B to come up, but not for the link to C.

I do see that we do need both delays.  In acpi_pci_set_power_state()
when we transition A from D3cold->D0, I assume that single _PS0
evaluation on A causes B to transition from D3cold->D3hot, which in
turn causes C to transition from D3cold->D3hot.  Is that your
understanding, too?

We do know that topology in acpi_pci_set_power_state(), since we have
the pci_dev for A, so it seems conceivable that we could descend the
hierarchy and delay for each level.

If the delay is in pci_pm_resume_noirq() (as in your patch), what
happens with a switch with several Downstream Ports?  I assume that
all the Downstream Ports start their transition out of D3cold
basically simultaneously, so we probably don't need N delays, do we?
It seems a little messy to optimize this in pci_pm_resume_noirq().

The outline of the pci_pm_resume_noirq() part of this patch is:

  pci_pm_resume_noirq
    if (!dev->skip_bus_pm ...)   # <-- condition 1
      pci_pm_default_resume_early
        pci_power_up
          if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
            platform_pci_set_power_state
              pci_platform_pm->set_state
                acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
                  acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
+   if (d3cold)                  # <-- condition 2
+     pci_bridge_wait_for_secondary_bus

Another thing that niggles at me here is that the condition for
calling pci_bridge_wait_for_secondary_bus() is completely different
than the condition for changing the power state.  If we didn't change
the power state, there's no reason to wait, is there?

The outline of the pci_pm_runtime_resume() part of this patch is:

  pci_pm_runtime_resume
    pci_restore_standard_config
      if (dev->current_state != PCI_D0)
        pci_set_power_state(PCI_D0)
          __pci_start_power_transition
            pci_platform_power_transition
              if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
                platform_pci_set_power_state
                  pci_platform_pm->set_state
                    acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
                      acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
              pci_raw_set_power_state
          __pci_complete_power_transition
+   if (d3cold)
+     pci_bridge_wait_for_secondary_bus

In this part, the power state change is inside
pci_restore_standard_config(), which calls pci_set_power_state().
There are many other callers of pci_set_power_state(); can we be sure
that none of them need a delay?

> > But it seems that at least some ACPI firmware doesn't do those
> > delays, so I guess our only alternatives are to always do it in
> > the OS or have some sort of blacklist.  And it doesn't really seem
> > practical to maintain a blacklist.
> 
> I really think this is crystal clear:

I am agreeing with you that the OS needs to do the delays.

> The OS is always responsible for the delays described in the PCIe
> spec.

If the ACPI spec contained this statement, it would be useful, but I
haven't seen it.  It's certainly true that some combination of
firmware and the OS is responsible for the delays :)

> However, if the platform implements some of them say in _ON or _PS0
> methods then it can notify the OS about this by using the _DSM so
> the OS does not need to duplicate all of them.

That makes good sense, but there are other reasons for using that
_DSM, e.g., firmware may know that MID or similar devices are not
really PCI devices and don't need delays anywhere.  So the existence
of the _DSM by itself doesn't convince me that the OS is responsible
for the delays.

> > > > In pci_pm_reset(), we're doing the D0->D3hot->D0 transitions
> > > > specifically to do a reset, so No_Soft_Reset is false.
> > > > Doesn't 6.6.1 say we need at least 100ms here?
> > > 
> > > No since it does not go into D3cold. It just "reset" the thing
> > > if it happens to do internal reset after D3hot -> D0.
> > 
> > Sec 5.8, Figure 5-18 says D3hot->D0uninitialized is a "Soft
> > Reset", which unfortunately is not defined.
> > 
> > My guess is that in sec 5.9, Table 5-13, the 10ms delay is for the
> > D3hot->D0active (i.e., No_Soft_Reset=1) transition, and the
> > D3hot->D0uninitialized (i.e., No_Soft_Reset=0) that does a "soft
> > reset" (whatever that is) probably requires more and we should
> > handle it like a conventional reset to be safe.
> 
> I think it simply means the device functional context is lost (there
> is more in section 5.3.1.4). Linux handles this properly already
> (well at least according the minimum timings required by the spec)
> and restores the context accordingly after it has waited for the
> 10ms.
> 
> It is the D3cold (where links go to L2 or L3) where we really need
> the delays so that the link gets properly trained before we start
> poking the downstream device.

I'm already speculating above, so I don't think I can contribute
anything useful here.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-10-04 12:39 ` [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec Mika Westerberg
  2019-10-26 14:19   ` Bjorn Helgaas
@ 2019-10-29 20:54   ` Bjorn Helgaas
  2019-10-30 11:33     ` Mika Westerberg
  1 sibling, 1 reply; 77+ messages in thread
From: Bjorn Helgaas @ 2019-10-29 20:54 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Matthias Andree, Paul Menzel, Nicholas Johnson, linux-pci,
	linux-kernel

On Fri, Oct 04, 2019 at 03:39:47PM +0300, Mika Westerberg wrote:
> Currently Linux does not follow PCIe spec regarding the required delays
> after reset. A concrete example is a Thunderbolt add-in-card that
> consists of a PCIe switch and two PCIe endpoints:
> ...

> @@ -1025,15 +1025,11 @@ static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state)
>  	if (state == PCI_D0) {
>  		pci_platform_power_transition(dev, PCI_D0);
>  		/*
> -		 * Mandatory power management transition delays, see
> -		 * PCI Express Base Specification Revision 2.0 Section
> -		 * 6.6.1: Conventional Reset.  Do not delay for
> -		 * devices powered on/off by corresponding bridge,
> -		 * because have already delayed for the bridge.
> +		 * Mandatory power management transition delays are handled
> +		 * in pci_pm_runtime_resume() of the corresponding
> +		 * downstream/root port.
>  		 */
>  		if (dev->runtime_d3cold) {
> -			if (dev->d3cold_delay && !dev->imm_ready)
> -				msleep(dev->d3cold_delay);

This removes the only use of d3cold_delay.  I assume that's
intentional?  If we no longer need it, we might as well remove it from
the pci_dev and remove the places that set it.  It'd be nice if that
could be a separate patch, even if we waited a little longer than
necessary at that one bisection point.

It also removes one of the three uses of imm_ready, leaving only the
two in FLR.  I suspect there are other places we should use imm_ready,
e.g., transitions to/from D1 and D2, but that would be beyond the
scope of this patch.

> +	/*
> +	 * For PCIe downstream and root ports that do not support speeds
> +	 * greater than 5 GT/s need to wait minimum 100 ms. For higher
> +	 * speeds (gen3) we need to wait first for the data link layer to
> +	 * become active.
> +	 *
> +	 * However, 100 ms is the minimum and the PCIe spec says the
> +	 * software must allow at least 1s before it can determine that the
> +	 * device that did not respond is a broken device. There is
> +	 * evidence that 100 ms is not always enough, for example certain
> +	 * Titan Ridge xHCI controller does not always respond to
> +	 * configuration requests if we only wait for 100 ms (see
> +	 * https://bugzilla.kernel.org/show_bug.cgi?id=203885).
> +	 *
> +	 * Therefore we wait for 100 ms and check for the device presence.
> +	 * If it is still not present give it an additional 100 ms.
> +	 */
> +	if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT &&
> +	    pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)
> +		return;

Shouldn't this be:

  if (!pcie_downstream_port(dev))
    return

so we include PCI/PCI-X to PCIe bridges?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-10-29 20:27               ` Bjorn Helgaas
@ 2019-10-30 11:15                 ` Mika Westerberg
  2019-10-31 22:31                   ` Bjorn Helgaas
  0 siblings, 1 reply; 77+ messages in thread
From: Mika Westerberg @ 2019-10-30 11:15 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng, Paul Menzel,
	Nicholas Johnson, linux-pci, linux-kernel

On Tue, Oct 29, 2019 at 03:27:09PM -0500, Bjorn Helgaas wrote:
> On Tue, Oct 29, 2019 at 01:15:20PM +0200, Mika Westerberg wrote:
> > On Mon, Oct 28, 2019 at 03:16:53PM -0500, Bjorn Helgaas wrote:
> > > > The related hardware event is resume in this case. Can you point
> > > > me to the actual point where you want me to put this?
> > > 
> > > "Resume" is a Linux software concept, so of course the PCIe spec
> > > doesn't say anything about it.  The spec talks about delays
> > > related to resets and device power and link state transitions, so
> > > somehow we have to connect the Linux delay with those hardware
> > > events.
> > > 
> > > Since we're talking about a transition from D3cold, this has to be
> > > done via something external to the device such as power
> > > regulators.  For ACPI systems that's probably hidden inside _PS0
> > > or something similar.  That's opaque, but at least it's a hook
> > > that says "here's where we put the device into D0".  I suggested
> > > acpi_pci_set_power_state() as a possibility since I think that's
> > > the lowest-level point where we have the pci_dev so we know the
> > > current state and the new state.
> > 
> > I looked at how we could use acpi_pci_set_power_state() but I don't
> > think it is possible because it is likely that only the root port
> > has the power resource that is used to bring the link to L2 or L3.
> > However, we would need to repeat the delay for each downstream/root
> > port if there are multiple PCIe switches in the topology.
> 
> OK, I think I understand why that's a problem (correct me if I'm
> wrong):
> 
>   We call pci_pm_resume_noirq() for every device, but it only calls
>   acpi_pci_set_power_state() for devices that have _PS0 or _PR0
>   methods.  So if the delay is in acpi_pci_set_power_state() and we
>   have A -> B -> C where only A has _PS0, we would delay for the link
>   to B to come up, but not for the link to C.

Yes, that's correct.

> I do see that we do need both delays.  In acpi_pci_set_power_state()
> when we transition A from D3cold->D0, I assume that single _PS0
> evaluation on A causes B to transition from D3cold->D3hot, which in
> turn causes C to transition from D3cold->D3hot.  Is that your
> understanding, too?

Not exactly :)

It is _ON() that causes the links to be retrained and it also causes the
PERST# (reset) to be unasserted for the whole topology transitioning all
devices into D0unitialized (default value for PMCSR PowerState field is 0).

> We do know that topology in acpi_pci_set_power_state(), since we have
> the pci_dev for A, so it seems conceivable that we could descend the
> hierarchy and delay for each level.

Right.

> If the delay is in pci_pm_resume_noirq() (as in your patch), what
> happens with a switch with several Downstream Ports?  I assume that
> all the Downstream Ports start their transition out of D3cold
> basically simultaneously, so we probably don't need N delays, do we?

No. Actually Linux already resumes these in paraller because async
suspend is set for them (for system suspend that is).

> It seems a little messy to optimize this in pci_pm_resume_noirq().

I agree.

> The outline of the pci_pm_resume_noirq() part of this patch is:
> 
>   pci_pm_resume_noirq
>     if (!dev->skip_bus_pm ...)   # <-- condition 1
>       pci_pm_default_resume_early
>         pci_power_up
>           if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
>             platform_pci_set_power_state
>               pci_platform_pm->set_state
>                 acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
>                   acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
> +   if (d3cold)                  # <-- condition 2
> +     pci_bridge_wait_for_secondary_bus
> 
> Another thing that niggles at me here is that the condition for
> calling pci_bridge_wait_for_secondary_bus() is completely different
> than the condition for changing the power state.  If we didn't change
> the power state, there's no reason to wait, is there?

Indeed, if you are talking about the dev->skip_bus_pm check there is no
point to wait if we did not change the power state. I would assume that
d3cold is false in that case but we could also do this for clarity:

	if (!dev->skip_bus_pm && d3cold)
		pci_bridge_wait_for_secondary_bus(...)

> The outline of the pci_pm_runtime_resume() part of this patch is:
> 
>   pci_pm_runtime_resume
>     pci_restore_standard_config
>       if (dev->current_state != PCI_D0)
>         pci_set_power_state(PCI_D0)
>           __pci_start_power_transition
>             pci_platform_power_transition
>               if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
>                 platform_pci_set_power_state
>                   pci_platform_pm->set_state
>                     acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
>                       acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
>               pci_raw_set_power_state
>           __pci_complete_power_transition
> +   if (d3cold)
> +     pci_bridge_wait_for_secondary_bus
> 
> In this part, the power state change is inside
> pci_restore_standard_config(), which calls pci_set_power_state().
> There are many other callers of pci_set_power_state(); can we be sure
> that none of them need a delay?

Since we are handling the delay when we resume the downstream port, not
when we resume the device itself, I think the link should be up already
and the device accessible if someone calls pci_set_power_state() for it
(as the parent is always resumed before children).

> > > But it seems that at least some ACPI firmware doesn't do those
> > > delays, so I guess our only alternatives are to always do it in
> > > the OS or have some sort of blacklist.  And it doesn't really seem
> > > practical to maintain a blacklist.
> > 
> > I really think this is crystal clear:
> 
> I am agreeing with you that the OS needs to do the delays.
> 
> > The OS is always responsible for the delays described in the PCIe
> > spec.
> 
> If the ACPI spec contained this statement, it would be useful, but I
> haven't seen it.  It's certainly true that some combination of
> firmware and the OS is responsible for the delays :)
> 
> > However, if the platform implements some of them say in _ON or _PS0
> > methods then it can notify the OS about this by using the _DSM so
> > the OS does not need to duplicate all of them.
> 
> That makes good sense, but there are other reasons for using that
> _DSM, e.g., firmware may know that MID or similar devices are not
> really PCI devices and don't need delays anywhere.  So the existence
> of the _DSM by itself doesn't convince me that the OS is responsible
> for the delays.

Hmm, my interpretion of the specs is that OS is responsible for these
delays but if you can't be convinced then how you propose we handle this
problem? I mean there are two cases already listed in the changelog of
this patch from a real systems that need these delays. I don't think we
can just say people that unfortunately your system will not be supported
by Linux because we are not convinced that OS should do these delays. ;-)

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-10-29 20:54   ` Bjorn Helgaas
@ 2019-10-30 11:33     ` Mika Westerberg
  0 siblings, 0 replies; 77+ messages in thread
From: Mika Westerberg @ 2019-10-30 11:33 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Matthias Andree, Paul Menzel, Nicholas Johnson, linux-pci,
	linux-kernel

On Tue, Oct 29, 2019 at 03:54:56PM -0500, Bjorn Helgaas wrote:
> On Fri, Oct 04, 2019 at 03:39:47PM +0300, Mika Westerberg wrote:
> > Currently Linux does not follow PCIe spec regarding the required delays
> > after reset. A concrete example is a Thunderbolt add-in-card that
> > consists of a PCIe switch and two PCIe endpoints:
> > ...
> 
> > @@ -1025,15 +1025,11 @@ static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state)
> >  	if (state == PCI_D0) {
> >  		pci_platform_power_transition(dev, PCI_D0);
> >  		/*
> > -		 * Mandatory power management transition delays, see
> > -		 * PCI Express Base Specification Revision 2.0 Section
> > -		 * 6.6.1: Conventional Reset.  Do not delay for
> > -		 * devices powered on/off by corresponding bridge,
> > -		 * because have already delayed for the bridge.
> > +		 * Mandatory power management transition delays are handled
> > +		 * in pci_pm_runtime_resume() of the corresponding
> > +		 * downstream/root port.
> >  		 */
> >  		if (dev->runtime_d3cold) {
> > -			if (dev->d3cold_delay && !dev->imm_ready)
> > -				msleep(dev->d3cold_delay);
> 
> This removes the only use of d3cold_delay.  I assume that's
> intentional?  If we no longer need it, we might as well remove it from
> the pci_dev and remove the places that set it.  It'd be nice if that
> could be a separate patch, even if we waited a little longer than
> necessary at that one bisection point.

Yes, it is intentional. In the previous version I had function
pcie_get_downstream_delay() that used both d3cold_delay and imm_ready to
calculate the downstream device delay but you said:

  I'm not sold on the idea that this delay depends on what's *below* the                                                                                                   
  bridge.  We're using sec 6.6.1 to justify the delay, and that section                                                                                               
  doesn't say anything about downstream devices.

So I dropped it and use 100 ms instead.

Now that you mention, I think if we want to continue support that _DSM,
we should still take d3cold_delay into account in this patch. There is
also one driver (drivers/mfd/intel-lpss-pci.c) that sets it to 0.

> It also removes one of the three uses of imm_ready, leaving only the
> two in FLR.  I suspect there are other places we should use imm_ready,
> e.g., transitions to/from D1 and D2, but that would be beyond the
> scope of this patch.

Right, I think imm_ready does not apply here. If I understand correctly
it is exactly for D1, D2 and D3hot transitions which we should take into
account in pci_dev_d3_sleep() (which we don't do right now).

> > +	/*
> > +	 * For PCIe downstream and root ports that do not support speeds
> > +	 * greater than 5 GT/s need to wait minimum 100 ms. For higher
> > +	 * speeds (gen3) we need to wait first for the data link layer to
> > +	 * become active.
> > +	 *
> > +	 * However, 100 ms is the minimum and the PCIe spec says the
> > +	 * software must allow at least 1s before it can determine that the
> > +	 * device that did not respond is a broken device. There is
> > +	 * evidence that 100 ms is not always enough, for example certain
> > +	 * Titan Ridge xHCI controller does not always respond to
> > +	 * configuration requests if we only wait for 100 ms (see
> > +	 * https://bugzilla.kernel.org/show_bug.cgi?id=203885).
> > +	 *
> > +	 * Therefore we wait for 100 ms and check for the device presence.
> > +	 * If it is still not present give it an additional 100 ms.
> > +	 */
> > +	if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT &&
> > +	    pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)
> > +		return;
> 
> Shouldn't this be:
> 
>   if (!pcie_downstream_port(dev))
>     return
> 
> so we include PCI/PCI-X to PCIe bridges?

Yes, I'll change it in v3.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-10-30 11:15                 ` Mika Westerberg
@ 2019-10-31 22:31                   ` Bjorn Helgaas
  2019-11-01 11:19                     ` Mika Westerberg
  0 siblings, 1 reply; 77+ messages in thread
From: Bjorn Helgaas @ 2019-10-31 22:31 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng, Paul Menzel,
	Nicholas Johnson, linux-pci, linux-kernel

On Wed, Oct 30, 2019 at 01:15:16PM +0200, Mika Westerberg wrote:
> On Tue, Oct 29, 2019 at 03:27:09PM -0500, Bjorn Helgaas wrote:
> > On Tue, Oct 29, 2019 at 01:15:20PM +0200, Mika Westerberg wrote:
> > > On Mon, Oct 28, 2019 at 03:16:53PM -0500, Bjorn Helgaas wrote:
> > > > > The related hardware event is resume in this case. Can you point
> > > > > me to the actual point where you want me to put this?
> > > > 
> > > > "Resume" is a Linux software concept, so of course the PCIe spec
> > > > doesn't say anything about it.  The spec talks about delays
> > > > related to resets and device power and link state transitions, so
> > > > somehow we have to connect the Linux delay with those hardware
> > > > events.
> > > > 
> > > > Since we're talking about a transition from D3cold, this has to be
> > > > done via something external to the device such as power
> > > > regulators.  For ACPI systems that's probably hidden inside _PS0
> > > > or something similar.  That's opaque, but at least it's a hook
> > > > that says "here's where we put the device into D0".  I suggested
> > > > acpi_pci_set_power_state() as a possibility since I think that's
> > > > the lowest-level point where we have the pci_dev so we know the
> > > > current state and the new state.
> > > 
> > > I looked at how we could use acpi_pci_set_power_state() but I don't
> > > think it is possible because it is likely that only the root port
> > > has the power resource that is used to bring the link to L2 or L3.
> > > However, we would need to repeat the delay for each downstream/root
> > > port if there are multiple PCIe switches in the topology.
> > 
> > OK, I think I understand why that's a problem (correct me if I'm
> > wrong):
> > 
> >   We call pci_pm_resume_noirq() for every device, but it only calls
> >   acpi_pci_set_power_state() for devices that have _PS0 or _PR0
> >   methods.  So if the delay is in acpi_pci_set_power_state() and we
> >   have A -> B -> C where only A has _PS0, we would delay for the link
> >   to B to come up, but not for the link to C.
> 
> Yes, that's correct.
> 
> > I do see that we do need both delays.  In acpi_pci_set_power_state()
> > when we transition A from D3cold->D0, I assume that single _PS0
> > evaluation on A causes B to transition from D3cold->D3hot, which in
> > turn causes C to transition from D3cold->D3hot.  Is that your
> > understanding, too?
> 
> Not exactly :)
> 
> It is _ON() that causes the links to be retrained and it also causes the
> PERST# (reset) to be unasserted for the whole topology transitioning all
> devices into D0unitialized (default value for PMCSR PowerState field is 0).

OK.  I guess the important thing is that a single power-on from D3cold
at any point in the hierarchy can power on the entire subtree rooted
at that point.  So if we have RP -> SUP -> SDP0..SDP7 where SDP0..SDP7
are Switch Downstream Ports, when we evaluate _ON for RP, PERST# will
be deasserted below it, and everything downstream should start the
process of going to D0uninitialized.

And we can't rely on any other hooks like _ON/_PS0 invocations for
SUP, SDPx, etc, where we could do additional delays.

> > If the delay is in pci_pm_resume_noirq() (as in your patch), what
> > happens with a switch with several Downstream Ports?  I assume that
> > all the Downstream Ports start their transition out of D3cold
> > basically simultaneously, so we probably don't need N delays, do we?
> 
> No. Actually Linux already resumes these in parallel because async
> suspend is set for them (for system suspend that is).

So I think we have something like this:

  pci_pm_resume_noirq(RP)
    pdev->current_state == PCI_D3cold  # HW actually in D3cold
    _ON(RP)                            # turns on entire hierarchy
    pci_bridge_wait_for_secondary_bus  # waits only for RP -> SUP link

  pci_pm_resume_noirq(SUP)
    pdev->current_state == PCI_D3cold  # HW probably in D0uninitialized
    pci_bridge_wait_for_secondary_bus  # no wait (not a downstream port)

  pci_pm_resume_noirq(SDP0)
    pdev->current_state == PCI_D3cold  # HW probably in D0uninitialized
    pci_bridge_wait_for_secondary_bus  # waits for SDP0 -> ? link

  ...

  pci_pm_resume_noirq(SDP7)
    pdev->current_state == PCI_D3cold  # HW probably in D0uninitialized
    pci_bridge_wait_for_secondary_bus  # waits for SDP7 -> ? link

and we have 1 delay for the Root Port plus 8 delays (one for each
Switch Downstream Port), and as soon as SUP has been resumed,
SDP0..SDP7 can be resumed simultaneously (assuming async is set for
them)?

I'm not a huge fan of relying on async because the asynchrony is far
removed from this code and really hard to figure out.  Maybe an
alternative would be to figure out in the pci_pm_resume_noirq(RP) path
how many levels of links to wait for.

Ideally someone expert in PCIe but not in Linux would be able to look
at the local code and verify that it matches the spec.  If verification
requires extensive analysis or someone expert in *both* PCIe and
Linux, it makes maintenance much harder.

> > The outline of the pci_pm_resume_noirq() part of this patch is:
> > 
> >   pci_pm_resume_noirq
> >     if (!dev->skip_bus_pm ...)   # <-- condition 1
> >       pci_pm_default_resume_early
> >         pci_power_up
> >           if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
> >             platform_pci_set_power_state
> >               pci_platform_pm->set_state
> >                 acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
> >                   acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
> > +   if (d3cold)                  # <-- condition 2
> > +     pci_bridge_wait_for_secondary_bus
> > 
> > Another thing that niggles at me here is that the condition for
> > calling pci_bridge_wait_for_secondary_bus() is completely different
> > than the condition for changing the power state.  If we didn't change
> > the power state, there's no reason to wait, is there?
> 
> Indeed, if you are talking about the dev->skip_bus_pm check there is no
> point to wait if we did not change the power state. I would assume that
> d3cold is false in that case but we could also do this for clarity:
> 
> 	if (!dev->skip_bus_pm && d3cold)
> 		pci_bridge_wait_for_secondary_bus(...)

Could the wait go in pci_power_up()?  That would at least connect it
directly with a -> D0 transition.  Or, if that doesn't seem the right
place for it, could we do the following?

  if (!(pci_dev->skip_bus_pm && pm_suspend_no_platform())) {
    pci_pm_default_resume_early(pci_dev);
    if (d3cold)
      pci_bridge_wait_for_secondary_bus(pci_dev);
  }

  pci_fixup_device(pci_fixup_resume_early, pci_dev);
  pcie_pme_root_status_cleanup(pci_dev);

  if (pci_has_legacy_pm_support(pci_dev))
    return pci_legacy_resume_early(dev);
  ...

Either way would also fix the problem that with the current patch, if
the device has legacy PM support, we call pci_legacy_resume_early()
but we don't wait for the secondary bus.

> > The outline of the pci_pm_runtime_resume() part of this patch is:
> > 
> >   pci_pm_runtime_resume
> >     pci_restore_standard_config
> >       if (dev->current_state != PCI_D0)
> >         pci_set_power_state(PCI_D0)
> >           __pci_start_power_transition
> >             pci_platform_power_transition
> >               if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
> >                 platform_pci_set_power_state
> >                   pci_platform_pm->set_state
> >                     acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
> >                       acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
> >               pci_raw_set_power_state
> >           __pci_complete_power_transition
> > +   if (d3cold)
> > +     pci_bridge_wait_for_secondary_bus
> > 
> > In this part, the power state change is inside
> > pci_restore_standard_config(), which calls pci_set_power_state().
> > There are many other callers of pci_set_power_state(); can we be sure
> > that none of them need a delay?
> 
> Since we are handling the delay when we resume the downstream port, not
> when we resume the device itself, I think the link should be up already
> and the device accessible if someone calls pci_set_power_state() for it
> (as the parent is always resumed before children).

Ah, yeah, I guess that since all the calls I see are for non-bridge
devices, there would be no delay for a secondary bus.

This is a tangent, but there are ~140 pci_set_power_state(PCI_D0)
calls, mostly from .resume() methods of drivers with legacy PM.  Are
those even necessary?  It looks like the PCI core does this so the
driver wouldn't need to:

  pci_pm_resume_noirq
    pci_pm_default_resume_early
      pci_power_up
        pci_raw_set_power_state(dev, PCI_D0)   # <-- PCI core

  pci_pm_resume
    if (pci_has_legacy_pm_support(pci_dev))
      pci_legacy_resume(dev)
        drv->resume
	  pci_set_power_state(PCI_D0)          # <-- driver .resume()

> > > > But it seems that at least some ACPI firmware doesn't do those
> > > > delays, so I guess our only alternatives are to always do it in
> > > > the OS or have some sort of blacklist.  And it doesn't really seem
> > > > practical to maintain a blacklist.
> > > 
> > > I really think this is crystal clear:
> > 
> > I am agreeing with you that the OS needs to do the delays.

Did you miss this part?  I said below that the existence of the _DSM
*by itself* doesn't convince me.  But I think the lack of clarity and
the fact that at least some firmware doesn't do it means that the OS
must do it.

> > > The OS is always responsible for the delays described in the PCIe
> > > spec.
> > 
> > If the ACPI spec contained this statement, it would be useful, but I
> > haven't seen it.  It's certainly true that some combination of
> > firmware and the OS is responsible for the delays :)
> > 
> > > However, if the platform implements some of them say in _ON or _PS0
> > > methods then it can notify the OS about this by using the _DSM so
> > > the OS does not need to duplicate all of them.
> > 
> > That makes good sense, but there are other reasons for using that
> > _DSM, e.g., firmware may know that MID or similar devices are not
> > really PCI devices and don't need delays anywhere.  So the existence
> > of the _DSM by itself doesn't convince me that the OS is responsible
> > for the delays.
> 
> Hmm, my interpretion of the specs is that OS is responsible for these
> delays but if you can't be convinced then how you propose we handle this
> problem? I mean there are two cases already listed in the changelog of
> this patch from a real systems that need these delays. I don't think we
> can just say people that unfortunately your system will not be supported
> by Linux because we are not convinced that OS should do these delays. ;-)

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-10-31 22:31                   ` Bjorn Helgaas
@ 2019-11-01 11:19                     ` Mika Westerberg
  2019-11-05  0:00                       ` Bjorn Helgaas
  0 siblings, 1 reply; 77+ messages in thread
From: Mika Westerberg @ 2019-11-01 11:19 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng, Paul Menzel,
	Nicholas Johnson, linux-pci, linux-kernel

On Thu, Oct 31, 2019 at 05:31:44PM -0500, Bjorn Helgaas wrote:
> On Wed, Oct 30, 2019 at 01:15:16PM +0200, Mika Westerberg wrote:
> > On Tue, Oct 29, 2019 at 03:27:09PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Oct 29, 2019 at 01:15:20PM +0200, Mika Westerberg wrote:
> > > > On Mon, Oct 28, 2019 at 03:16:53PM -0500, Bjorn Helgaas wrote:
> > > > > > The related hardware event is resume in this case. Can you point
> > > > > > me to the actual point where you want me to put this?
> > > > > 
> > > > > "Resume" is a Linux software concept, so of course the PCIe spec
> > > > > doesn't say anything about it.  The spec talks about delays
> > > > > related to resets and device power and link state transitions, so
> > > > > somehow we have to connect the Linux delay with those hardware
> > > > > events.
> > > > > 
> > > > > Since we're talking about a transition from D3cold, this has to be
> > > > > done via something external to the device such as power
> > > > > regulators.  For ACPI systems that's probably hidden inside _PS0
> > > > > or something similar.  That's opaque, but at least it's a hook
> > > > > that says "here's where we put the device into D0".  I suggested
> > > > > acpi_pci_set_power_state() as a possibility since I think that's
> > > > > the lowest-level point where we have the pci_dev so we know the
> > > > > current state and the new state.
> > > > 
> > > > I looked at how we could use acpi_pci_set_power_state() but I don't
> > > > think it is possible because it is likely that only the root port
> > > > has the power resource that is used to bring the link to L2 or L3.
> > > > However, we would need to repeat the delay for each downstream/root
> > > > port if there are multiple PCIe switches in the topology.
> > > 
> > > OK, I think I understand why that's a problem (correct me if I'm
> > > wrong):
> > > 
> > >   We call pci_pm_resume_noirq() for every device, but it only calls
> > >   acpi_pci_set_power_state() for devices that have _PS0 or _PR0
> > >   methods.  So if the delay is in acpi_pci_set_power_state() and we
> > >   have A -> B -> C where only A has _PS0, we would delay for the link
> > >   to B to come up, but not for the link to C.
> > 
> > Yes, that's correct.
> > 
> > > I do see that we do need both delays.  In acpi_pci_set_power_state()
> > > when we transition A from D3cold->D0, I assume that single _PS0
> > > evaluation on A causes B to transition from D3cold->D3hot, which in
> > > turn causes C to transition from D3cold->D3hot.  Is that your
> > > understanding, too?
> > 
> > Not exactly :)
> > 
> > It is _ON() that causes the links to be retrained and it also causes the
> > PERST# (reset) to be unasserted for the whole topology transitioning all
> > devices into D0unitialized (default value for PMCSR PowerState field is 0).
> 
> OK.  I guess the important thing is that a single power-on from D3cold
> at any point in the hierarchy can power on the entire subtree rooted
> at that point.  So if we have RP -> SUP -> SDP0..SDP7 where SDP0..SDP7
> are Switch Downstream Ports, when we evaluate _ON for RP, PERST# will
> be deasserted below it, and everything downstream should start the
> process of going to D0uninitialized.
> 
> And we can't rely on any other hooks like _ON/_PS0 invocations for
> SUP, SDPx, etc, where we could do additional delays.

What I've seen they don't typically even have representation in ACPI.

> > > If the delay is in pci_pm_resume_noirq() (as in your patch), what
> > > happens with a switch with several Downstream Ports?  I assume that
> > > all the Downstream Ports start their transition out of D3cold
> > > basically simultaneously, so we probably don't need N delays, do we?
> > 
> > No. Actually Linux already resumes these in parallel because async
> > suspend is set for them (for system suspend that is).
> 
> So I think we have something like this:
> 
>   pci_pm_resume_noirq(RP)
>     pdev->current_state == PCI_D3cold  # HW actually in D3cold
>     _ON(RP)                            # turns on entire hierarchy
>     pci_bridge_wait_for_secondary_bus  # waits only for RP -> SUP link
> 
>   pci_pm_resume_noirq(SUP)
>     pdev->current_state == PCI_D3cold  # HW probably in D0uninitialized
>     pci_bridge_wait_for_secondary_bus  # no wait (not a downstream port)
> 
>   pci_pm_resume_noirq(SDP0)
>     pdev->current_state == PCI_D3cold  # HW probably in D0uninitialized
>     pci_bridge_wait_for_secondary_bus  # waits for SDP0 -> ? link
> 
>   ...
> 
>   pci_pm_resume_noirq(SDP7)
>     pdev->current_state == PCI_D3cold  # HW probably in D0uninitialized
>     pci_bridge_wait_for_secondary_bus  # waits for SDP7 -> ? link
> 
> and we have 1 delay for the Root Port plus 8 delays (one for each
> Switch Downstream Port), and as soon as SUP has been resumed,
> SDP0..SDP7 can be resumed simultaneously (assuming async is set for
> them)?

Yes.

> I'm not a huge fan of relying on async because the asynchrony is far
> removed from this code and really hard to figure out.  Maybe an
> alternative would be to figure out in the pci_pm_resume_noirq(RP) path
> how many levels of links to wait for.

There is problem with this. For gen3 speeds and further we need to wait
for the link (each link) to be activated before we delay. If we do it
only in the root port it would need to enumerate all the ports and
handle this which complicates it unnecessarily.

You can also "guess" the delay by waiting for the worst possible time
but I don't think that's something we want to do.

> Ideally someone expert in PCIe but not in Linux would be able to look
> at the local code and verify that it matches the spec.  If verification
> requires extensive analysis or someone expert in *both* PCIe and
> Linux, it makes maintenance much harder.

Well whoever reads the code needs to be expert in both anyway.

> > > The outline of the pci_pm_resume_noirq() part of this patch is:
> > > 
> > >   pci_pm_resume_noirq
> > >     if (!dev->skip_bus_pm ...)   # <-- condition 1
> > >       pci_pm_default_resume_early
> > >         pci_power_up
> > >           if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
> > >             platform_pci_set_power_state
> > >               pci_platform_pm->set_state
> > >                 acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
> > >                   acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
> > > +   if (d3cold)                  # <-- condition 2
> > > +     pci_bridge_wait_for_secondary_bus
> > > 
> > > Another thing that niggles at me here is that the condition for
> > > calling pci_bridge_wait_for_secondary_bus() is completely different
> > > than the condition for changing the power state.  If we didn't change
> > > the power state, there's no reason to wait, is there?
> > 
> > Indeed, if you are talking about the dev->skip_bus_pm check there is no
> > point to wait if we did not change the power state. I would assume that
> > d3cold is false in that case but we could also do this for clarity:
> > 
> > 	if (!dev->skip_bus_pm && d3cold)
> > 		pci_bridge_wait_for_secondary_bus(...)
> 
> Could the wait go in pci_power_up()?  That would at least connect it
> directly with a -> D0 transition.  Or, if that doesn't seem the right
> place for it, could we do the following?
> 
>   if (!(pci_dev->skip_bus_pm && pm_suspend_no_platform())) {
>     pci_pm_default_resume_early(pci_dev);
>     if (d3cold)
>       pci_bridge_wait_for_secondary_bus(pci_dev);
>   }
> 
>   pci_fixup_device(pci_fixup_resume_early, pci_dev);
>   pcie_pme_root_status_cleanup(pci_dev);
> 
>   if (pci_has_legacy_pm_support(pci_dev))
>     return pci_legacy_resume_early(dev);
>   ...

The reason why pci_bridge_wait_for_secondary_bus() is called almost the
last is that I figured we want to resume the root/downstream port
completely first before we start delaying for the device downstream.
Need to call it before port services (pciehp) is resumed, though.

If you think it is fine to do the delay before we have restored
everything I can move it inside pci_power_up() or call it after
pci_pm_default_resume_early() as above. I think at least we should make
sure all the saved registers are restored before so that the link
activation check actually works.

> Either way would also fix the problem that with the current patch, if
> the device has legacy PM support, we call pci_legacy_resume_early()
> but we don't wait for the secondary bus.

True.

> > > The outline of the pci_pm_runtime_resume() part of this patch is:
> > > 
> > >   pci_pm_runtime_resume
> > >     pci_restore_standard_config
> > >       if (dev->current_state != PCI_D0)
> > >         pci_set_power_state(PCI_D0)
> > >           __pci_start_power_transition
> > >             pci_platform_power_transition
> > >               if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
> > >                 platform_pci_set_power_state
> > >                   pci_platform_pm->set_state
> > >                     acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
> > >                       acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
> > >               pci_raw_set_power_state
> > >           __pci_complete_power_transition
> > > +   if (d3cold)
> > > +     pci_bridge_wait_for_secondary_bus
> > > 
> > > In this part, the power state change is inside
> > > pci_restore_standard_config(), which calls pci_set_power_state().
> > > There are many other callers of pci_set_power_state(); can we be sure
> > > that none of them need a delay?
> > 
> > Since we are handling the delay when we resume the downstream port, not
> > when we resume the device itself, I think the link should be up already
> > and the device accessible if someone calls pci_set_power_state() for it
> > (as the parent is always resumed before children).
> 
> Ah, yeah, I guess that since all the calls I see are for non-bridge
> devices, there would be no delay for a secondary bus.
> 
> This is a tangent, but there are ~140 pci_set_power_state(PCI_D0)
> calls, mostly from .resume() methods of drivers with legacy PM.  Are
> those even necessary?  It looks like the PCI core does this so the
> driver wouldn't need to:
> 
>   pci_pm_resume_noirq
>     pci_pm_default_resume_early
>       pci_power_up
>         pci_raw_set_power_state(dev, PCI_D0)   # <-- PCI core
> 
>   pci_pm_resume
>     if (pci_has_legacy_pm_support(pci_dev))
>       pci_legacy_resume(dev)
>         drv->resume
> 	  pci_set_power_state(PCI_D0)          # <-- driver .resume()

I don't think most of them are necessary anymore.

> > > > > But it seems that at least some ACPI firmware doesn't do those
> > > > > delays, so I guess our only alternatives are to always do it in
> > > > > the OS or have some sort of blacklist.  And it doesn't really seem
> > > > > practical to maintain a blacklist.
> > > > 
> > > > I really think this is crystal clear:
> > > 
> > > I am agreeing with you that the OS needs to do the delays.
> 
> Did you miss this part?  I said below that the existence of the _DSM
> *by itself* doesn't convince me.  But I think the lack of clarity and
> the fact that at least some firmware doesn't do it means that the OS
> must do it.

Yes, I missed this part. Sorry about that.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-11-01 11:19                     ` Mika Westerberg
@ 2019-11-05  0:00                       ` Bjorn Helgaas
  2019-11-05  9:54                         ` Mika Westerberg
  0 siblings, 1 reply; 77+ messages in thread
From: Bjorn Helgaas @ 2019-11-05  0:00 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng, Paul Menzel,
	Nicholas Johnson, linux-pci, linux-kernel

On Fri, Nov 01, 2019 at 01:19:18PM +0200, Mika Westerberg wrote:
> On Thu, Oct 31, 2019 at 05:31:44PM -0500, Bjorn Helgaas wrote:
> > On Wed, Oct 30, 2019 at 01:15:16PM +0200, Mika Westerberg wrote:
> > > On Tue, Oct 29, 2019 at 03:27:09PM -0500, Bjorn Helgaas wrote:

> > I'm not a huge fan of relying on async because the asynchrony is far
> > removed from this code and really hard to figure out.  Maybe an
> > alternative would be to figure out in the pci_pm_resume_noirq(RP) path
> > how many levels of links to wait for.
> 
> There is problem with this. For gen3 speeds and further we need to wait
> for the link (each link) to be activated before we delay. If we do it
> only in the root port it would need to enumerate all the ports and
> handle this which complicates it unnecessarily.

I agree, that doesn't sound good.  If we're resuming a Downstream
Port, I don't think we should be reading Link Status from other ports
farther downstream.

> > > > The outline of the pci_pm_resume_noirq() part of this patch is:
> > > > 
> > > >   pci_pm_resume_noirq
> > > >     if (!dev->skip_bus_pm ...)   # <-- condition 1
> > > >       pci_pm_default_resume_early
> > > >         pci_power_up
> > > >           if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
> > > >             platform_pci_set_power_state
> > > >               pci_platform_pm->set_state
> > > >                 acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
> > > >                   acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
> > > > +   if (d3cold)                  # <-- condition 2
> > > > +     pci_bridge_wait_for_secondary_bus

> The reason why pci_bridge_wait_for_secondary_bus() is called almost the
> last is that I figured we want to resume the root/downstream port
> completely first before we start delaying for the device downstream.

For understandability, I think the wait needs to go in some function
that contains "PCI_D0", e.g., platform_pci_set_power_state() or
pci_power_up(), so it's connected with the transition from D3cold to
D0.

Since pci_pm_default_resume_early() is the only caller of
pci_power_up(), maybe we should just inline pci_power_up(), e.g.,
something like this:

  static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
  {
    pci_power_state_t prev_state = pci_dev->current_state;

    if (platform_pci_power_manageable(pci_dev))
      platform_pci_set_power_state(pci_dev, PCI_D0);

    pci_raw_set_power_state(pci_dev, PCI_D0);
    pci_update_current_state(pci_dev, PCI_D0);

    pci_restore_state(pci_dev);
    pci_pme_restore(pci_dev);

    if (prev_state == PCI_D3cold)
      pci_bridge_wait_for_secondary_bus(dev);
  }

I don't understand why we call both platform_pci_set_power_state() and
pci_raw_set_power_state().  I thought platform_pci_set_power_state()
should put the device in D0, so we shouldn't need the PCI_PM_CTRL
update in pci_raw_set_power_state(), although we probably do need
things like pci_restore_bars() and pcie_aspm_pm_state_change().

And in fact, it seems wrong that if platform_pci_set_power_state()
puts the device in D0 and the device lacks a PM capability, we bail
out of pci_raw_set_power_state() before calling pci_restore_bars().

Tangent: I think "pci_pm_default_resume_early" is the wrong name for
this because "default" suggests that this is what we fall back to if a
driver or arch doesn't supply a more specific method.  But here we're
doing mandatory things that cannot be overridden, so I think something
like "pci_pm_core_resume_early()" would be more accurate.

> Need to call it before port services (pciehp) is resumed, though.

I guess this is because pciehp_resume() looks at PCI_EXP_LNKSTA and
will be confused if the link isn't up yet?

> If you think it is fine to do the delay before we have restored
> everything I can move it inside pci_power_up() or call it after
> pci_pm_default_resume_early() as above. I think at least we should make
> sure all the saved registers are restored before so that the link
> activation check actually works.

What needs to be restored to make pcie_wait_for_link_delay() work?
And what event does the restore need to be ordered with?  I could
imagine needing to restore something like Target Link Speed before
waiting, but that sounds racy unless we force a link retrain after
restoring it.

Bjorn

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-11-05  0:00                       ` Bjorn Helgaas
@ 2019-11-05  9:54                         ` Mika Westerberg
  2019-11-05 12:58                           ` Mika Westerberg
  2019-11-05 15:00                           ` Bjorn Helgaas
  0 siblings, 2 replies; 77+ messages in thread
From: Mika Westerberg @ 2019-11-05  9:54 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng, Paul Menzel,
	Nicholas Johnson, linux-pci, linux-kernel

On Mon, Nov 04, 2019 at 06:00:00PM -0600, Bjorn Helgaas wrote:
> > > > > The outline of the pci_pm_resume_noirq() part of this patch is:
> > > > > 
> > > > >   pci_pm_resume_noirq
> > > > >     if (!dev->skip_bus_pm ...)   # <-- condition 1
> > > > >       pci_pm_default_resume_early
> > > > >         pci_power_up
> > > > >           if (platform_pci_power_manageable())   # _PS0 or _PR0 exist?
> > > > >             platform_pci_set_power_state
> > > > >               pci_platform_pm->set_state
> > > > >                 acpi_pci_set_power_state(PCI_D0) # acpi_pci_platform_pm.set_state
> > > > >                   acpi_device_set_power(ACPI_STATE_D0) # <-- eval _PS0
> > > > > +   if (d3cold)                  # <-- condition 2
> > > > > +     pci_bridge_wait_for_secondary_bus
> 
> > The reason why pci_bridge_wait_for_secondary_bus() is called almost the
> > last is that I figured we want to resume the root/downstream port
> > completely first before we start delaying for the device downstream.
> 
> For understandability, I think the wait needs to go in some function
> that contains "PCI_D0", e.g., platform_pci_set_power_state() or
> pci_power_up(), so it's connected with the transition from D3cold to
> D0.
> 
> Since pci_pm_default_resume_early() is the only caller of
> pci_power_up(), maybe we should just inline pci_power_up(), e.g.,
> something like this:
> 
>   static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
>   {
>     pci_power_state_t prev_state = pci_dev->current_state;
> 
>     if (platform_pci_power_manageable(pci_dev))
>       platform_pci_set_power_state(pci_dev, PCI_D0);
> 
>     pci_raw_set_power_state(pci_dev, PCI_D0);
>     pci_update_current_state(pci_dev, PCI_D0);
> 
>     pci_restore_state(pci_dev);
>     pci_pme_restore(pci_dev);
> 
>     if (prev_state == PCI_D3cold)
>       pci_bridge_wait_for_secondary_bus(dev);
>   }

OK, I'll see if this works.

> I don't understand why we call both platform_pci_set_power_state() and
> pci_raw_set_power_state().

platform_pci_set_power_state() deals with the ACPI methods such as
calling _PS0 after D3hot. To transition the device from D3hot to D0 you
need the PMCSR write which is done in pci_raw_set_power_state().

> I thought platform_pci_set_power_state()
> should put the device in D0, so we shouldn't need the PCI_PM_CTRL
> update in pci_raw_set_power_state(), although we probably do need
> things like pci_restore_bars() and pcie_aspm_pm_state_change().
> 
> And in fact, it seems wrong that if platform_pci_set_power_state()
> puts the device in D0 and the device lacks a PM capability, we bail
> out of pci_raw_set_power_state() before calling pci_restore_bars().
> 
> Tangent: I think "pci_pm_default_resume_early" is the wrong name for
> this because "default" suggests that this is what we fall back to if a
> driver or arch doesn't supply a more specific method.  But here we're
> doing mandatory things that cannot be overridden, so I think something
> like "pci_pm_core_resume_early()" would be more accurate.
> 
> > Need to call it before port services (pciehp) is resumed, though.
> 
> I guess this is because pciehp_resume() looks at PCI_EXP_LNKSTA and
> will be confused if the link isn't up yet?

Yes.

> > If you think it is fine to do the delay before we have restored
> > everything I can move it inside pci_power_up() or call it after
> > pci_pm_default_resume_early() as above. I think at least we should make
> > sure all the saved registers are restored before so that the link
> > activation check actually works.
> 
> What needs to be restored to make pcie_wait_for_link_delay() work?

I'm not entirely sure. I think that pci_restore_state() at least should
be called so that the PCIe capability gets restored. Maybe not event
that because Data Link Layer Layer Active always reflects the DL_Active
or not and it does not need to be enabled separately.

> And what event does the restore need to be ordered with?

Not sure I follow you here.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-11-05  9:54                         ` Mika Westerberg
@ 2019-11-05 12:58                           ` Mika Westerberg
  2019-11-05 20:01                             ` Bjorn Helgaas
  2019-11-05 15:00                           ` Bjorn Helgaas
  1 sibling, 1 reply; 77+ messages in thread
From: Mika Westerberg @ 2019-11-05 12:58 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng, Paul Menzel,
	Nicholas Johnson, linux-pci, linux-kernel

On Tue, Nov 05, 2019 at 11:54:33AM +0200, Mika Westerberg wrote:
> > For understandability, I think the wait needs to go in some function
> > that contains "PCI_D0", e.g., platform_pci_set_power_state() or
> > pci_power_up(), so it's connected with the transition from D3cold to
> > D0.
> > 
> > Since pci_pm_default_resume_early() is the only caller of
> > pci_power_up(), maybe we should just inline pci_power_up(), e.g.,
> > something like this:
> > 
> >   static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
> >   {
> >     pci_power_state_t prev_state = pci_dev->current_state;
> > 
> >     if (platform_pci_power_manageable(pci_dev))
> >       platform_pci_set_power_state(pci_dev, PCI_D0);
> > 
> >     pci_raw_set_power_state(pci_dev, PCI_D0);
> >     pci_update_current_state(pci_dev, PCI_D0);
> > 
> >     pci_restore_state(pci_dev);
> >     pci_pme_restore(pci_dev);
> > 
> >     if (prev_state == PCI_D3cold)
> >       pci_bridge_wait_for_secondary_bus(dev);
> >   }
> 
> OK, I'll see if this works.

Well, I think we want to execute pci_fixup_resume_early before we delay
for the downstream component (same for runtime resume path). Currently
nobody is using that for root/downstream ports but in theory at least
some port may need it before it works properly. Also probably good idea
to disable wake as soon as possible to avoid possible extra PME messages
coming from the port.

I feel that the following is the right place to perform the delay but if
you think we can ignore the above, I will just do what you say and do it
in pci_pm_default_resume_early() (and pci_restore_standard_config() for
runtime path).

[The below diff does not have check for pci_dev->skip_bus_pm because I
 was planning to move it inside pci_bridge_wait_for_secondary_bus() itself.]

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 08d3bdbc8c04..3c0e52aaef79 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -890,6 +890,7 @@ static int pci_pm_resume_noirq(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+	pci_power_t prev_state = pci_dev->current_state;
 
 	if (dev_pm_may_skip_resume(dev))
 		return 0;
@@ -914,6 +915,9 @@ static int pci_pm_resume_noirq(struct device *dev)
 	pci_fixup_device(pci_fixup_resume_early, pci_dev);
 	pcie_pme_root_status_cleanup(pci_dev);
 
+	if (prev_state == PCI_D3cold)
+		pci_bridge_wait_for_secondary_bus(pci_dev);
+
 	if (pci_has_legacy_pm_support(pci_dev))
 		return 0;
 
@@ -1299,6 +1303,7 @@ static int pci_pm_runtime_resume(struct device *dev)
 {
 	struct pci_dev *pci_dev = to_pci_dev(dev);
 	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
+	pci_power_t prev_state = pci_dev->current_state;
 	int error = 0;
 
 	/*
@@ -1314,6 +1319,9 @@ static int pci_pm_runtime_resume(struct device *dev)
 	pci_fixup_device(pci_fixup_resume_early, pci_dev);
 	pci_pm_default_resume(pci_dev);
 
+	if (prev_state == PCI_D3cold)
+		pci_bridge_wait_for_secondary_bus(pci_dev);
+
 	if (pm && pm->runtime_resume)
 		error = pm->runtime_resume(dev);
 

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-11-05  9:54                         ` Mika Westerberg
  2019-11-05 12:58                           ` Mika Westerberg
@ 2019-11-05 15:00                           ` Bjorn Helgaas
  2019-11-05 15:28                             ` Mika Westerberg
  1 sibling, 1 reply; 77+ messages in thread
From: Bjorn Helgaas @ 2019-11-05 15:00 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng, Paul Menzel,
	Nicholas Johnson, linux-pci, linux-kernel

On Tue, Nov 05, 2019 at 11:54:28AM +0200, Mika Westerberg wrote:
> On Mon, Nov 04, 2019 at 06:00:00PM -0600, Bjorn Helgaas wrote:

> > > If you think it is fine to do the delay before we have restored
> > > everything I can move it inside pci_power_up() or call it after
> > > pci_pm_default_resume_early() as above. I think at least we should make
> > > sure all the saved registers are restored before so that the link
> > > activation check actually works.
> > 
> > What needs to be restored to make pcie_wait_for_link_delay() work?
> 
> I'm not entirely sure. I think that pci_restore_state() at least should
> be called so that the PCIe capability gets restored. Maybe not even
> that because Data Link Layer Layer Active always reflects the DL_Active
> or not and it does not need to be enabled separately.
> 
> > And what event does the restore need to be ordered with?
> 
> Not sure I follow you here.

You're suggesting that we should restore saved registers first so
pcie_wait_for_link_delay() works.  If the link activation depends on
something being restored and we don't enforce an ordering, the
activation might succeed or fail depending on whether it happens
before or after the restore.  So if there is a dependency, we should
make it explicit to avoid a race like that.

But I'm not saying we *shouldn't* do the restore before the wait; only
that any dependency should be explicit.  Even if there is no actual
dependency it probably makes sense to do the restore first so it can
overlap with the hardware link training, which may reduce the time
pcie_wait_for_link_delay() has to wait when we do call it, e.g.,

  |-----------------|          link activation
     |-----|                   restore state
           |--------|          pcie_wait_for_link_delay()

whereas if we do the restore after waiting for the link to come up, it
probably takes longer:

  |-----------------|          link activation
     |--------------|          pcie_wait_for_link_delay()
                    |-----|    restore state

I actually suspect there *is* a dependency -- we should respect the
Target Link Speed and and width so the link resumes in the same
configuration it was before suspend.  And I suspect that may require
an explicit retrain after restoring PCI_EXP_LNKCTL2.

Bjorn

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-11-05 15:00                           ` Bjorn Helgaas
@ 2019-11-05 15:28                             ` Mika Westerberg
  2019-11-05 16:10                               ` Bjorn Helgaas
  0 siblings, 1 reply; 77+ messages in thread
From: Mika Westerberg @ 2019-11-05 15:28 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng, Paul Menzel,
	Nicholas Johnson, linux-pci, linux-kernel

On Tue, Nov 05, 2019 at 09:00:13AM -0600, Bjorn Helgaas wrote:
> On Tue, Nov 05, 2019 at 11:54:28AM +0200, Mika Westerberg wrote:
> > On Mon, Nov 04, 2019 at 06:00:00PM -0600, Bjorn Helgaas wrote:
> 
> > > > If you think it is fine to do the delay before we have restored
> > > > everything I can move it inside pci_power_up() or call it after
> > > > pci_pm_default_resume_early() as above. I think at least we should make
> > > > sure all the saved registers are restored before so that the link
> > > > activation check actually works.
> > > 
> > > What needs to be restored to make pcie_wait_for_link_delay() work?
> > 
> > I'm not entirely sure. I think that pci_restore_state() at least should
> > be called so that the PCIe capability gets restored. Maybe not even
> > that because Data Link Layer Layer Active always reflects the DL_Active
> > or not and it does not need to be enabled separately.
> > 
> > > And what event does the restore need to be ordered with?
> > 
> > Not sure I follow you here.
> 
> You're suggesting that we should restore saved registers first so
> pcie_wait_for_link_delay() works.  If the link activation depends on
> something being restored and we don't enforce an ordering, the
> activation might succeed or fail depending on whether it happens
> before or after the restore.  So if there is a dependency, we should
> make it explicit to avoid a race like that.

OK thanks. By explicit you mean document it in the code, right?

> But I'm not saying we *shouldn't* do the restore before the wait; only
> that any dependency should be explicit.  Even if there is no actual
> dependency it probably makes sense to do the restore first so it can
> overlap with the hardware link training, which may reduce the time
> pcie_wait_for_link_delay() has to wait when we do call it, e.g.,
> 
>   |-----------------|          link activation
>      |-----|                   restore state
>            |--------|          pcie_wait_for_link_delay()
> 
> whereas if we do the restore after waiting for the link to come up, it
> probably takes longer:
> 
>   |-----------------|          link activation
>      |--------------|          pcie_wait_for_link_delay()
>                     |-----|    restore state
> 
> I actually suspect there *is* a dependency -- we should respect the
> Target Link Speed and and width so the link resumes in the same
> configuration it was before suspend.  And I suspect that may require
> an explicit retrain after restoring PCI_EXP_LNKCTL2.

According the PCIe spec the PCI_EXP_LNKCTL2 Target Link Speed is marked
as RWS (S for sticky) so I suspect its value is retained after reset in
the same way as PME bits. Assuming I understood it correctly.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-11-05 15:28                             ` Mika Westerberg
@ 2019-11-05 16:10                               ` Bjorn Helgaas
  2019-11-06 13:29                                 ` Mika Westerberg
  0 siblings, 1 reply; 77+ messages in thread
From: Bjorn Helgaas @ 2019-11-05 16:10 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng, Paul Menzel,
	Nicholas Johnson, linux-pci, linux-kernel

On Tue, Nov 05, 2019 at 05:28:32PM +0200, Mika Westerberg wrote:
> On Tue, Nov 05, 2019 at 09:00:13AM -0600, Bjorn Helgaas wrote:
> > On Tue, Nov 05, 2019 at 11:54:28AM +0200, Mika Westerberg wrote:
> > > On Mon, Nov 04, 2019 at 06:00:00PM -0600, Bjorn Helgaas wrote:
> > 
> > > > > If you think it is fine to do the delay before we have restored
> > > > > everything I can move it inside pci_power_up() or call it after
> > > > > pci_pm_default_resume_early() as above. I think at least we should make
> > > > > sure all the saved registers are restored before so that the link
> > > > > activation check actually works.
> > > > 
> > > > What needs to be restored to make pcie_wait_for_link_delay() work?
> > > 
> > > I'm not entirely sure. I think that pci_restore_state() at least should
> > > be called so that the PCIe capability gets restored. Maybe not even
> > > that because Data Link Layer Layer Active always reflects the DL_Active
> > > or not and it does not need to be enabled separately.
> > > 
> > > > And what event does the restore need to be ordered with?
> > > 
> > > Not sure I follow you here.
> > 
> > You're suggesting that we should restore saved registers first so
> > pcie_wait_for_link_delay() works.  If the link activation depends on
> > something being restored and we don't enforce an ordering, the
> > activation might succeed or fail depending on whether it happens
> > before or after the restore.  So if there is a dependency, we should
> > make it explicit to avoid a race like that.
> 
> OK thanks. By explicit you mean document it in the code, right?

So far all we have is a feeling that maybe we ought to restore before
waiting, but I don't really know why.  If there's an actual
dependency, we should chase down specifically what it is and add a
comment or code (e.g., a link retrain) as appropriate.

> > I actually suspect there *is* a dependency -- we should respect the
> > Target Link Speed and and width so the link resumes in the same
> > configuration it was before suspend.  And I suspect that may require
> > an explicit retrain after restoring PCI_EXP_LNKCTL2.
> 
> According the PCIe spec the PCI_EXP_LNKCTL2 Target Link Speed is marked
> as RWS (S for sticky) so I suspect its value is retained after reset in
> the same way as PME bits. Assuming I understood it correctly.

This patch is about coming from D3cold, isn't it?  I don't think we
can assume sticky bits are preserved in D3cold (except maybe when
auxiliary power is enabled).

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-11-05 12:58                           ` Mika Westerberg
@ 2019-11-05 20:01                             ` Bjorn Helgaas
  2019-11-06 13:31                               ` Mika Westerberg
  0 siblings, 1 reply; 77+ messages in thread
From: Bjorn Helgaas @ 2019-11-05 20:01 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng, Paul Menzel,
	Nicholas Johnson, linux-pci, linux-kernel

On Tue, Nov 05, 2019 at 02:58:18PM +0200, Mika Westerberg wrote:
> On Tue, Nov 05, 2019 at 11:54:33AM +0200, Mika Westerberg wrote:
> > > For understandability, I think the wait needs to go in some function
> > > that contains "PCI_D0", e.g., platform_pci_set_power_state() or
> > > pci_power_up(), so it's connected with the transition from D3cold to
> > > D0.
> > > 
> > > Since pci_pm_default_resume_early() is the only caller of
> > > pci_power_up(), maybe we should just inline pci_power_up(), e.g.,
> > > something like this:
> > > 
> > >   static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
> > >   {
> > >     pci_power_state_t prev_state = pci_dev->current_state;
> > > 
> > >     if (platform_pci_power_manageable(pci_dev))
> > >       platform_pci_set_power_state(pci_dev, PCI_D0);
> > > 
> > >     pci_raw_set_power_state(pci_dev, PCI_D0);
> > >     pci_update_current_state(pci_dev, PCI_D0);
> > > 
> > >     pci_restore_state(pci_dev);
> > >     pci_pme_restore(pci_dev);
> > > 
> > >     if (prev_state == PCI_D3cold)
> > >       pci_bridge_wait_for_secondary_bus(dev);
> > >   }
> > 
> > OK, I'll see if this works.
> 
> Well, I think we want to execute pci_fixup_resume_early before we delay
> for the downstream component (same for runtime resume path). Currently
> nobody is using that for root/downstream ports but in theory at least
> some port may need it before it works properly. Also probably good idea
> to disable wake as soon as possible to avoid possible extra PME messages
> coming from the port.

OK, I wish we could connect it more closely with the actual power-on,
but I guess that makes sense.

> I feel that the following is the right place to perform the delay but if
> you think we can ignore the above, I will just do what you say and do it
> in pci_pm_default_resume_early() (and pci_restore_standard_config() for
> runtime path).
> 
> [The below diff does not have check for pci_dev->skip_bus_pm because I
>  was planning to move it inside pci_bridge_wait_for_secondary_bus() itself.]

What do you gain by moving it?  IIUC we want them to be the same
condition, and if one is in pci_pm_resume_noirq() and another is
inside pci_bridge_wait_for_secondary_bus(), it's hard to see that
they're connected.  I'd rather have pci_pm_resume_noirq() check it
once, save the result, and test that result before calling
pci_pm_default_resume_early() and pci_bridge_wait_for_secondary_bus().

> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 08d3bdbc8c04..3c0e52aaef79 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -890,6 +890,7 @@ static int pci_pm_resume_noirq(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
>  	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +	pci_power_t prev_state = pci_dev->current_state;
>  
>  	if (dev_pm_may_skip_resume(dev))
>  		return 0;
> @@ -914,6 +915,9 @@ static int pci_pm_resume_noirq(struct device *dev)
>  	pci_fixup_device(pci_fixup_resume_early, pci_dev);
>  	pcie_pme_root_status_cleanup(pci_dev);
>  
> +	if (prev_state == PCI_D3cold)
> +		pci_bridge_wait_for_secondary_bus(pci_dev);
> +
>  	if (pci_has_legacy_pm_support(pci_dev))
>  		return 0;
>  
> @@ -1299,6 +1303,7 @@ static int pci_pm_runtime_resume(struct device *dev)
>  {
>  	struct pci_dev *pci_dev = to_pci_dev(dev);
>  	const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
> +	pci_power_t prev_state = pci_dev->current_state;
>  	int error = 0;
>  
>  	/*
> @@ -1314,6 +1319,9 @@ static int pci_pm_runtime_resume(struct device *dev)
>  	pci_fixup_device(pci_fixup_resume_early, pci_dev);
>  	pci_pm_default_resume(pci_dev);
>  
> +	if (prev_state == PCI_D3cold)
> +		pci_bridge_wait_for_secondary_bus(pci_dev);
> +
>  	if (pm && pm->runtime_resume)
>  		error = pm->runtime_resume(dev);
>  

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-11-05 16:10                               ` Bjorn Helgaas
@ 2019-11-06 13:29                                 ` Mika Westerberg
  0 siblings, 0 replies; 77+ messages in thread
From: Mika Westerberg @ 2019-11-06 13:29 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng, Paul Menzel,
	Nicholas Johnson, linux-pci, linux-kernel

On Tue, Nov 05, 2019 at 10:10:17AM -0600, Bjorn Helgaas wrote:
> > > I actually suspect there *is* a dependency -- we should respect the
> > > Target Link Speed and and width so the link resumes in the same
> > > configuration it was before suspend.  And I suspect that may require
> > > an explicit retrain after restoring PCI_EXP_LNKCTL2.
> > 
> > According the PCIe spec the PCI_EXP_LNKCTL2 Target Link Speed is marked
> > as RWS (S for sticky) so I suspect its value is retained after reset in
> > the same way as PME bits. Assuming I understood it correctly.
> 
> This patch is about coming from D3cold, isn't it?  I don't think we
> can assume sticky bits are preserved in D3cold (except maybe when
> auxiliary power is enabled).

Indeed, good point. I see some GPU drivers are programming Target Link
Speed which will not be retained after the hierarchy is put into D3cold
and back. I think this potential problem is not related to the missing
link delays this patch is addressing, though. It has been existing in
pci_restore_pcie_state() already (where it restores PCI_EXP_LNKCTL2).

I think this can be solved as a separate patch by doing something
like:

  1. In pci_restore_pcie_state() check if the saved Target Link Speed
     differs from what is in the register currently.

  2. Restore the value as we already do now.

  3. If there the speed differs then trigger link retrain.

  4. Restore rest of the root/downstream port state.

It is not clear if we need to do anything for upstream ports (PCIe 5.0
sec 6.11 talks about doing this on upstream component e.g downstream
port). After this there will be the link delay (added by this patch)
which takes care of waiting for the downstream component to be
accessible (even after retrain).

However, I'm not sure how this can be properly tested. Maybe hacking
some downstream port to lower the speed, enter D3cold and then resume it
and see if the Target Link Speed gets updated correctly.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec
  2019-11-05 20:01                             ` Bjorn Helgaas
@ 2019-11-06 13:31                               ` Mika Westerberg
  0 siblings, 0 replies; 77+ messages in thread
From: Mika Westerberg @ 2019-11-06 13:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Rafael J. Wysocki, Len Brown, Lukas Wunner, Keith Busch,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng, Paul Menzel,
	Nicholas Johnson, linux-pci, linux-kernel

On Tue, Nov 05, 2019 at 02:01:05PM -0600, Bjorn Helgaas wrote:
> > I feel that the following is the right place to perform the delay but if
> > you think we can ignore the above, I will just do what you say and do it
> > in pci_pm_default_resume_early() (and pci_restore_standard_config() for
> > runtime path).
> > 
> > [The below diff does not have check for pci_dev->skip_bus_pm because I
> >  was planning to move it inside pci_bridge_wait_for_secondary_bus() itself.]
> 
> What do you gain by moving it?  IIUC we want them to be the same
> condition, and if one is in pci_pm_resume_noirq() and another is
> inside pci_bridge_wait_for_secondary_bus(), it's hard to see that
> they're connected.  I'd rather have pci_pm_resume_noirq() check it
> once, save the result, and test that result before calling
> pci_pm_default_resume_early() and pci_bridge_wait_for_secondary_bus().

Fair enough :)

^ permalink raw reply	[flat|nested] 77+ messages in thread

* 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics?
@ 2020-05-29 18:03               ` Marc MERLIN
       [not found]                 ` <20200529180315.GA18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
                                   ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-05-29 18:03 UTC (permalink / raw)
  To: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Howdy,

So, I have a Thinkpad P70 with hybrid graphics.
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M600M] (rev a2)
that one works fine, I can use i915 for the main screen, and nouveau to
display on the external ports (external ports are only wired to nvidia
chip, so it's impossible to use them without turning the nvidia chip
on).

I now got a newer P73 also with the same hybrid graphics (setup as such
in the bios). It runs fine with i915, and I don't need to use external
display with nouveau for now (it almost works, but I only see the mouse
cursor on the external screen, no window or anything else can get
displayed, very weird).
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)

What I need for now is either nouveau, or bbswitch if it still works to
turn the nvidia chip off every time I power on/reboot/plug/unplug
external power.
if I don't load the nouveau module, I get this in powertop:
Bad           Runtime PM for PCI Device NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q]
powertop cannot really turn it off and I get
The battery reports a discharge rate of 63.0 W

If I load the nouveau driver, the chip gets turned off (great), but it's
unstable and hard crashes my laptop when I plug/unplug it from power
after a few times.
This is what I got in my logs after the last crash:
intel-lpss 0000:00:15.0: power state changed by ACPI to D3cold
intel-lpss 0000:00:15.1: power state changed by ACPI to D3cold
snd_hda_intel 0000:00:1f.3: PME# enabled
intel-lpss 0000:00:1e.0: power state changed by ACPI to D3cold
snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D3hot
xhci_hcd 0000:01:00.2: PME# enabled
nvidia-gpu 0000:01:00.3: PME# enabled
pcieport 0000:05:00.0: PME# enabled
xhci_hcd 0000:2c:00.0: PME# enabled
pcieport 0000:05:02.0: PME# enabled
pcieport 0000:04:00.0: PME# enabled
pcieport 0000:00:1c.0: PME# enabled
pcieport 0000:00:1c.0: power state changed by ACPI to D3cold
nouveau 0000:01:00.0: power state changed by ACPI to D3cold
pcieport 0000:00:01.0: PME# enabled
pcieport 0000:00:01.0: power state changed by ACPI to D3cold

I am using TLP to manage battery use, the driver might not like things getting turned off to save power
(although when it works, I can get the laptop down to 10W)

Any suggestions on my best way to just keep the nvidia chip off reliably?
nouveau? bbswitch? other?
(and before you ask, no, you cannot turn it off in the bios, it's hybrid or nvidia only)


If that helps, here is what I got when I tried to use hybrid graphics to power an external
monitor (just pasting for completeness, I don't need this to work for now)

pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
nouveau: detected PR support, will not use DSM
nouveau 0000:01:00.0: runtime IRQ mapping not provided by arch
nouveau 0000:01:00.0: NVIDIA TU104 (164000a1)
nouveau 0000:01:00.0: bios: version 90.04.4d.00.2c
nouveau 0000:01:00.0: enabling bus mastering
nouveau 0000:01:00.0: fb: 8192 MiB GDDR6
vga_switcheroo: enabled
[TTM] Zone  kernel: Available graphics memory: 32730618 KiB
[TTM] Zone   dma32: Available graphics memory: 2097152 KiB
[TTM] Initializing pool allocator
[TTM] Initializing DMA pool allocator
nouveau 0000:01:00.0: DRM: VRAM: 8192 MiB
nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
nouveau 0000:01:00.0: DRM: BIT table 'A' not found
nouveau 0000:01:00.0: DRM: BIT table 'L' not found
nouveau 0000:01:00.0: DRM: TMDS table version 2.0
nouveau 0000:01:00.0: DRM: DCB version 4.1
nouveau 0000:01:00.0: DRM: DCB outp 00: 02800f66 04600020
nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f52 00020010
nouveau 0000:01:00.0: DRM: DCB outp 02: 01022f36 04600010
nouveau 0000:01:00.0: DRM: DCB outp 03: 04033f76 04600010
nouveau 0000:01:00.0: DRM: DCB outp 04: 04044f86 04600020
nouveau 0000:01:00.0: DRM: DCB conn 00: 00020047
nouveau 0000:01:00.0: DRM: DCB conn 01: 00010161
nouveau 0000:01:00.0: DRM: DCB conn 02: 00001248
nouveau 0000:01:00.0: DRM: DCB conn 03: 01000348
nouveau 0000:01:00.0: DRM: DCB conn 04: 02000471
nouveau 0000:01:00.0: DRM: failed to create kernel channel, -22
nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
nouveau 0000:01:00.0: DRM: unknown connector type 48
nouveau 0000:01:00.0: DRM: unknown connector type 48
[drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[drm] Driver supports precise vblank timestamp query.
memmap_init_zone_device initialised 2097152 pages in 16ms
nouveau 0000:01:00.0: DRM: DMEM: registered 8192MB of device memory
nouveau 0000:01:00.0: DRM: allocated 2560x1600 fb: 0x200000, bo 0000000018f13ee1
nouveau 0000:01:00.0: fb1: nouveaudrmfb frame buffer device

sauron:~$ xrandr --setprovideroutputsource 1 0
sauron:~$ xrandr --listactivemonitors 
Monitors: 1
 0: +*eDP-1 3840/382x2160/214+0+0  eDP-1

sauron:~$ xrandr --auto
sauron:~$ xrandr --listactivemonitors 
Monitors: 2
 0: +*eDP-1 3840/382x2160/214+0+0  eDP-1
 1: +HDMI-1-1 2560/641x1600/400+3840+0  HDMI-1-1

moving to new window moves the mouse, but not windows get displayed.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics?
       [not found]                 ` <20200529180315.GA18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
@ 2020-05-29 18:53                   ` Ilia Mirkin
       [not found]                     ` <CAKb7Uvhw2EYo1RR-=NGgLO3CU9QTRWchcAw1injffybZbJ-zOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 77+ messages in thread
From: Ilia Mirkin @ 2020-05-29 18:53 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: nouveau

On Fri, May 29, 2020 at 2:35 PM Marc MERLIN <marc_nouveau-xnduUnryOU1AfugRpC6u6w@public.gmane.org> wrote:
>
> Howdy,
>
> So, I have a Thinkpad P70 with hybrid graphics.
> 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M600M] (rev a2)
> that one works fine, I can use i915 for the main screen, and nouveau to
> display on the external ports (external ports are only wired to nvidia
> chip, so it's impossible to use them without turning the nvidia chip
> on).
>
> I now got a newer P73 also with the same hybrid graphics (setup as such
> in the bios). It runs fine with i915, and I don't need to use external
> display with nouveau for now (it almost works, but I only see the mouse
> cursor on the external screen, no window or anything else can get
> displayed, very weird).
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
>
> What I need for now is either nouveau, or bbswitch if it still works to
> turn the nvidia chip off every time I power on/reboot/plug/unplug
> external power.
> if I don't load the nouveau module, I get this in powertop:
> Bad           Runtime PM for PCI Device NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q]
> powertop cannot really turn it off and I get
> The battery reports a discharge rate of 63.0 W
>
> If I load the nouveau driver, the chip gets turned off (great), but it's
> unstable and hard crashes my laptop when I plug/unplug it from power
> after a few times.
> This is what I got in my logs after the last crash:
> intel-lpss 0000:00:15.0: power state changed by ACPI to D3cold
> intel-lpss 0000:00:15.1: power state changed by ACPI to D3cold
> snd_hda_intel 0000:00:1f.3: PME# enabled
> intel-lpss 0000:00:1e.0: power state changed by ACPI to D3cold
> snd_hda_intel 0000:00:1f.3: power state changed by ACPI to D3hot
> xhci_hcd 0000:01:00.2: PME# enabled
> nvidia-gpu 0000:01:00.3: PME# enabled
> pcieport 0000:05:00.0: PME# enabled
> xhci_hcd 0000:2c:00.0: PME# enabled
> pcieport 0000:05:02.0: PME# enabled
> pcieport 0000:04:00.0: PME# enabled
> pcieport 0000:00:1c.0: PME# enabled
> pcieport 0000:00:1c.0: power state changed by ACPI to D3cold
> nouveau 0000:01:00.0: power state changed by ACPI to D3cold
> pcieport 0000:00:01.0: PME# enabled
> pcieport 0000:00:01.0: power state changed by ACPI to D3cold
>
> I am using TLP to manage battery use, the driver might not like things getting turned off to save power
> (although when it works, I can get the laptop down to 10W)
>
> Any suggestions on my best way to just keep the nvidia chip off reliably?
> nouveau? bbswitch? other?
> (and before you ask, no, you cannot turn it off in the bios, it's hybrid or nvidia only)
>
>
> If that helps, here is what I got when I tried to use hybrid graphics to power an external
> monitor (just pasting for completeness, I don't need this to work for now)
>
> pci 0000:01:00.0: optimus capabilities: enabled, status dynamic power, hda bios codec supported
> VGA switcheroo: detected Optimus DSM method \_SB_.PCI0.PEG0.PEGP handle
> nouveau: detected PR support, will not use DSM
> nouveau 0000:01:00.0: runtime IRQ mapping not provided by arch
> nouveau 0000:01:00.0: NVIDIA TU104 (164000a1)
> nouveau 0000:01:00.0: bios: version 90.04.4d.00.2c
> nouveau 0000:01:00.0: enabling bus mastering
> nouveau 0000:01:00.0: fb: 8192 MiB GDDR6
> vga_switcheroo: enabled
> [TTM] Zone  kernel: Available graphics memory: 32730618 KiB
> [TTM] Zone   dma32: Available graphics memory: 2097152 KiB
> [TTM] Initializing pool allocator
> [TTM] Initializing DMA pool allocator
> nouveau 0000:01:00.0: DRM: VRAM: 8192 MiB
> nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
> nouveau 0000:01:00.0: DRM: BIT table 'A' not found
> nouveau 0000:01:00.0: DRM: BIT table 'L' not found
> nouveau 0000:01:00.0: DRM: TMDS table version 2.0
> nouveau 0000:01:00.0: DRM: DCB version 4.1
> nouveau 0000:01:00.0: DRM: DCB outp 00: 02800f66 04600020
> nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f52 00020010
> nouveau 0000:01:00.0: DRM: DCB outp 02: 01022f36 04600010
> nouveau 0000:01:00.0: DRM: DCB outp 03: 04033f76 04600010
> nouveau 0000:01:00.0: DRM: DCB outp 04: 04044f86 04600020
> nouveau 0000:01:00.0: DRM: DCB conn 00: 00020047
> nouveau 0000:01:00.0: DRM: DCB conn 01: 00010161
> nouveau 0000:01:00.0: DRM: DCB conn 02: 00001248
> nouveau 0000:01:00.0: DRM: DCB conn 03: 01000348
> nouveau 0000:01:00.0: DRM: DCB conn 04: 02000471
> nouveau 0000:01:00.0: DRM: failed to create kernel channel, -22
> nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
> nouveau 0000:01:00.0: DRM: unknown connector type 48
> nouveau 0000:01:00.0: DRM: unknown connector type 48
> [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [drm] Driver supports precise vblank timestamp query.
> memmap_init_zone_device initialised 2097152 pages in 16ms
> nouveau 0000:01:00.0: DRM: DMEM: registered 8192MB of device memory
> nouveau 0000:01:00.0: DRM: allocated 2560x1600 fb: 0x200000, bo 0000000018f13ee1
> nouveau 0000:01:00.0: fb1: nouveaudrmfb frame buffer device
>
> sauron:~$ xrandr --setprovideroutputsource 1 0
> sauron:~$ xrandr --listactivemonitors
> Monitors: 1
>  0: +*eDP-1 3840/382x2160/214+0+0  eDP-1
>
> sauron:~$ xrandr --auto
> sauron:~$ xrandr --listactivemonitors
> Monitors: 2
>  0: +*eDP-1 3840/382x2160/214+0+0  eDP-1
>  1: +HDMI-1-1 2560/641x1600/400+3840+0  HDMI-1-1
>
> moving to new window moves the mouse, but not windows get displayed.

Do you see anything in dmesg after this is set up? I'd expect some
errors about timeouts or something else.

Which kernel are you using? There have been some turing-specific fixes recently.

Also note that TLP has a problem where it forces the audio
sub-function to always-on which prevents the GPU from suspending.

Cheers,

  -ilia

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics?
       [not found]                     ` <CAKb7Uvhw2EYo1RR-=NGgLO3CU9QTRWchcAw1injffybZbJ-zOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-05-29 19:46                       ` Marc MERLIN
       [not found]                         ` <20200529194605.GB18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
       [not found]                       ` <CACO55tsvY0t_z986VVoYCvxuBASdZ+rQcDtZ_dAtQR60NLmQQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 77+ messages in thread
From: Marc MERLIN @ 2020-05-29 19:46 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau

On Fri, May 29, 2020 at 02:53:51PM -0400, Ilia Mirkin wrote:
> > moving to new window moves the mouse, but not windows get displayed.
> 
> Do you see anything in dmesg after this is set up? I'd expect some
> errors about timeouts or something else.
 
Nothing other than what I pasted.

> Which kernel are you using? There have been some turing-specific fixes recently.

5.5.11. I can put 5.6 if needed.

> Also note that TLP has a problem where it forces the audio
> sub-function to always-on which prevents the GPU from suspending.

Ah, thanks for that.
I have
#RUNTIME_PM_DRIVER_BLACKLIST="amdgpu mei_me nouveau nvidia pcieport radeon"

sauron:~$ lspci |grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)

So you're saying that I need to blacklist 01:00.1 and without that it hangs
when suspending the powered off nvidia chip, which is what I'm experiencing
as a hang when I unplug power?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics?
       [not found]                         ` <20200529194605.GB18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
@ 2020-05-30 17:32                           ` Karol Herbst
  0 siblings, 0 replies; 77+ messages in thread
From: Karol Herbst @ 2020-05-30 17:32 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: nouveau

On Fri, May 29, 2020 at 9:48 PM Marc MERLIN <marc_nouveau-xnduUnryOU1AfugRpC6u6w@public.gmane.org> wrote:
>
> On Fri, May 29, 2020 at 02:53:51PM -0400, Ilia Mirkin wrote:
> > > moving to new window moves the mouse, but not windows get displayed.
> >
> > Do you see anything in dmesg after this is set up? I'd expect some
> > errors about timeouts or something else.
>
> Nothing other than what I pasted.
>
> > Which kernel are you using? There have been some turing-specific fixes recently.
>
> 5.5.11. I can put 5.6 if needed.
>

please do. 5.5 is EOL and 5.4 and 5.6 got the runpm fixes in recent releases.

> > Also note that TLP has a problem where it forces the audio
> > sub-function to always-on which prevents the GPU from suspending.
>
> Ah, thanks for that.
> I have
> #RUNTIME_PM_DRIVER_BLACKLIST="amdgpu mei_me nouveau nvidia pcieport radeon"
>
> sauron:~$ lspci |grep -i nvidia
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
> 01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
> 01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
> 01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
>
> So you're saying that I need to blacklist 01:00.1 and without that it hangs
> when suspending the powered off nvidia chip, which is what I'm experiencing
> as a hang when I unplug power?
>

no. It forces the audio device to be always on when on AC. there are
some sound power settings.

SOUND_POWER_SAVE_CONTROLLER=Y
SOUND_POWER_SAVE_ON_AC=1
SOUND_POWER_SAVE_ON_BAT=1

need to be set.

> Thanks,
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>
> Home page: http://marc.merlins.org/
> _______________________________________________
> Nouveau mailing list
> Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics?
       [not found]                       ` <CACO55tsvY0t_z986VVoYCvxuBASdZ+rQcDtZ_dAtQR60NLmQQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-05-31 18:31                         ` Marc MERLIN
  0 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-05-31 18:31 UTC (permalink / raw)
  To: Karol Herbst, Ilia Mirkin; +Cc: nouveau

On Sat, May 30, 2020 at 07:32:16PM +0200, Karol Herbst wrote:
> > 5.5.11. I can put 5.6 if needed.
> 
> please do. 5.5 is EOL and 5.4 and 5.6 got the runpm fixes in recent releases.

Done, just went to 5.6.15, thanks.

> no. It forces the audio device to be always on when on AC. there are
> some sound power settings.
> 
> SOUND_POWER_SAVE_CONTROLLER=Y
> SOUND_POWER_SAVE_ON_AC=1
> SOUND_POWER_SAVE_ON_BAT=1

Thank you Karol and Ilia for the kind help, this totally worked.
I filed a documentation bug on https://github.com/linrunner/TLP/issues/495
this will hopefully help other people.

In other great news, I was able to mirror my display on HDMI through
nouveau on the new kernel, thank you!

xrandr --listproviders
Providers: number : 2
Provider 0: id: 0x43 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 1 associated providers: 0 name:modesetting
Provider 1: id: 0xf1 cap: 0x2, Sink Output crtcs: 4 outputs: 5 associated providers: 0 name:modesetting
xrandr --setprovideroutputsource 1 0

[42753.806113] nouveau 0000:01:00.0: DRM: allocated 2560x1600 fb: 0x200000, bo 00000000e75d7ede
[42753.806248] nouveau 0000:01:00.0: fb1: nouveaudrmfb frame buffer device

Thaks much,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay()
  2019-10-04 12:39 ` [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay() Mika Westerberg
@ 2020-08-08 20:22   ` Marc MERLIN
  2020-08-08 20:23     ` Marc MERLIN
                       ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-08-08 20:22 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Matthias Andree, Paul Menzel, Nicholas Johnson, linux-pci,
	linux-kernel

On Fri, Oct 04, 2019 at 03:39:46PM +0300, Mika Westerberg wrote:
> This is otherwise similar to pcie_wait_for_link() but allows passing
> custom activation delay in milliseconds.
> 
> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
>  drivers/pci/pci.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index e7982af9a5d8..bfd92e018925 100644

Hi Mika,

So, I have a thinkpad P73 with thunderbolt, and while I don't boot
often, my last boots have been unreliable at best (was only able to boot
5.7 once, and 5.8 did not succeed either).

5.6 was working for a while, but couldn't boot it either this morning,
so I had to go back to 5.5. This does not mean 5.5 does not have the
problem, just that it booted this morning, while 5.6 didn't when I
tried.
Once the kernel is booted, the problem does not seem to occur much, or
at all.

Basically, I'm getting the same thing than this person with a P53 (which
is a mostly identical lenovo thinkpad, to mine)
kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
https://bbs.archlinux.org/viewtopic.php?id=250658

The kernel boots eventually, but it takes minutes, and everything is so
super slow, that I just can't reasonably use the machine.

This shows similar issues with 5.3, 5.4.
https://forum.proxmox.com/threads/pme-spurious-native-interrupt-kernel-meldungen.62850/

Another report here with 5.6:
https://bugzilla.redhat.com/show_bug.cgi?id=1831899

My current kernel is running your patch above, and I haven't done a lot
of research yet to confirm whether going back to a kernel before it was
merged, fixes the problem. Unfortunately the problem is not consistent,
so it makes things harder to test/debug, especially on my main laptop
that I do all my work on :)

I noticed this older patch of yours:
http://patchwork.ozlabs.org/project/linux-pci/patch/0113014581dbe2d1f938813f1783905bd81b79db.1560079442.git.lukas@wunner.de/
This patch is not in my kernel, is it worth adding?

Can I get you more info to help debug this?

If that helps:
sauron:/usr/src/linux-5.7.11-amd64-preempt-sysrq-20190816/drivers/pci# lspci
00:00.0 Host bridge: Intel Corporation Device 3e20 (rev 0d)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 0d)
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile) (rev 02)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 0d)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller (rev 10)
00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #17 (rev f0)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 (rev f0)
00:1c.5 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0)
00:1c.7 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #8 (rev f0)
00:1e.0 Communication controller: Intel Corporation Cannon Lake PCH Serial IO UART Host Controller (rev 10)
00:1f.0 ISA bridge: Intel Corporation Cannon Lake LPC Controller (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
04:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
06:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06)
2c:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06)
52:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
54:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)


sauron:/usr/src/linux-5.7.11-amd64-preempt-sysrq-20190816/drivers/pci# lsusb -t
/:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
/:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
/:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 10000M
/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 10000M
/:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M
    |__ Port 2: Dev 2, If 0, Class=Human Interface Device, Driver=usbhid, 12M
    |__ Port 8: Dev 3, If 3, Class=Video, Driver=uvcvideo, 480M
    |__ Port 8: Dev 3, If 1, Class=Video, Driver=uvcvideo, 480M
    |__ Port 8: Dev 3, If 2, Class=Video, Driver=uvcvideo, 480M
    |__ Port 8: Dev 3, If 0, Class=Video, Driver=uvcvideo, 480M
    |__ Port 9: Dev 4, If 0, Class=Vendor Specific Class, Driver=, 12M
    |__ Port 14: Dev 6, If 0, Class=Wireless, Driver=btusb, 12M
    |__ Port 14: Dev 6, If 1, Class=Wireless, Driver=btusb, 12M

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay()
  2020-08-08 20:22   ` Marc MERLIN
@ 2020-08-08 20:23     ` Marc MERLIN
  2020-08-09 16:31     ` Marc MERLIN
  2020-09-06 18:18       ` Marc MERLIN
  2 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-08-08 20:23 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Lukas Wunner,
	Keith Busch, Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Matthias Andree, Paul Menzel, Nicholas Johnson, linux-pci,
	linux-kernel

I forgot to add that my mostly hanging boots look like this:
https://photos.app.goo.gl/HJvTraYYZbiNTNE39

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay()
  2020-08-08 20:22   ` Marc MERLIN
  2020-08-08 20:23     ` Marc MERLIN
@ 2020-08-09 16:31     ` Marc MERLIN
  2020-09-06 18:18       ` Marc MERLIN
  2 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-08-09 16:31 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Lukas Wunner,
	Alex Williamson, Alexandru Gagniuc, Kai-Heng Feng,
	Matthias Andree, Paul Menzel, Nicholas Johnson, linux-pci,
	linux-kernel

On Sat, Aug 08, 2020 at 01:22:02PM -0700, Marc MERLIN wrote:
> Basically, I'm getting the same thing than this person with a P53 (which
> is a mostly identical lenovo thinkpad, to mine)
> kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> https://bbs.archlinux.org/viewtopic.php?id=250658
 
I had to reboot today and tried my 5.7.11 kernel 6 times.
It never booted and each time got stuck on 
pcieport 0000:00:01.0: PME: Spurious native interrupt!

This is the nvidia driver and claimed by nouveau (I don't use nvidia graphics
but I'm forced to use nouveau to turn the nvidia chip down so that it
doesn't drain my batteries).

I ended up being able to boot the 7th time after removing the yubikey in my USB-C 
port, which is also thunderbolt.
PME messages shown below. Let me know if you'd like further data.

Thanks,
Marc

[    4.142484] acpi PNP0A08:00: _OSC: OS now controls [PCIeHotplug PME PCIeCapability LTR DPC]
[    4.151715] pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
[    4.151727] pci 0000:00:01.0: PME# disabled
[    4.165979] pci 0000:00:14.0: PME# supported from D3hot D3cold
[    4.166000] pci 0000:00:14.0: PME# disabled
[    4.177746] pci 0000:00:16.0: PME# supported from D3hot
[    4.177767] pci 0000:00:16.0: PME# disabled
[    4.180850] pci 0000:00:17.0: PME# supported from D3hot
[    4.180862] pci 0000:00:17.0: PME# disabled
[    4.183830] pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold
[    4.183847] pci 0000:00:1b.0: PME# disabled
[    4.189643] pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold
[    4.189660] pci 0000:00:1c.0: PME# disabled
[    4.193085] pci 0000:00:1c.5: PME# supported from D0 D3hot D3cold
[    4.193101] pci 0000:00:1c.5: PME# disabled
[    4.196462] pci 0000:00:1c.7: PME# supported from D0 D3hot D3cold
[    4.196478] pci 0000:00:1c.7: PME# disabled
[    4.206057] pci 0000:00:1f.3: PME# supported from D3hot D3cold
[    4.206079] pci 0000:00:1f.3: PME# disabled
[    4.214993] pci 0000:00:1f.6: PME# supported from D0 D3hot D3cold
[    4.215015] pci 0000:00:1f.6: PME# disabled
[    4.217978] pci 0000:01:00.0: PME# supported from D0 D3hot
[    4.217991] pci 0000:01:00.0: PME# disabled
[    4.219129] pci 0000:01:00.2: PME# supported from D0 D3hot
[    4.219142] pci 0000:01:00.2: PME# disabled
[    4.219578] pci 0000:01:00.3: PME# supported from D0 D3hot
[    4.219591] pci 0000:01:00.3: PME# disabled
[    4.221398] pci 0000:04:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    4.221433] pci 0000:04:00.0: PME# disabled
[    4.222282] pci 0000:05:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    4.222297] pci 0000:05:00.0: PME# disabled
[    4.222792] pci 0000:05:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[    4.222806] pci 0000:05:01.0: PME# disabled
[    4.223289] pci 0000:05:02.0: PME# supported from D0 D1 D2 D3hot D3cold
[    4.223304] pci 0000:05:02.0: PME# disabled
[    4.223839] pci 0000:05:04.0: PME# supported from D0 D1 D2 D3hot D3cold
[    4.223854] pci 0000:05:04.0: PME# disabled
[    4.224645] pci 0000:06:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    4.224661] pci 0000:06:00.0: PME# disabled
[    4.225644] pci 0000:2c:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    4.225661] pci 0000:2c:00.0: PME# disabled
[    4.227557] pci 0000:52:00.0: PME# supported from D0 D3hot D3cold
[    4.227708] pci 0000:52:00.0: PME# disabled
[    4.229139] pci 0000:54:00.0: PME# supported from D1 D2 D3hot D3cold
[    4.229155] pci 0000:54:00.0: PME# disabled
[    7.238126] pcieport 0000:00:01.0: PME: Signaling with IRQ 122
[    7.239208] pcieport 0000:00:1b.0: PME: Signaling with IRQ 123
[    7.239861] pcieport 0000:00:1c.0: PME: Signaling with IRQ 124
[    7.241522] pcieport 0000:00:1c.5: PME: Signaling with IRQ 125
[    7.242499] pcieport 0000:00:1c.7: PME: Signaling with IRQ 126
[    7.401422] pcieport 0000:05:01.0: PME# enabled
[    7.401868] pcieport 0000:05:04.0: PME# enabled
[    8.985668] xhci_hcd 0000:01:00.2: PME# enabled
[    8.988738] xhci_hcd 0000:2c:00.0: PME# enabled
[    9.008649] pcieport 0000:05:02.0: PME# enabled
[   12.378450] nvidia-gpu 0000:01:00.3: PME# enabled
[   25.610848] thunderbolt 0000:06:00.0: PME# enabled
[   25.628766] pcieport 0000:05:00.0: PME# enabled
[   25.648762] pcieport 0000:04:00.0: PME# enabled
[   25.668889] pcieport 0000:00:1c.0: PME# enabled
[  179.608847] nvidia-gpu 0000:01:00.3: PME# disabled
[  179.608873] pcieport 0000:00:01.0: PME: Spurious native interrupt!
[  183.359454] nvidia-gpu 0000:01:00.3: PME# enabled
[  183.396832] nvidia-gpu 0000:01:00.3: PME# disabled
[  183.396859] pcieport 0000:00:01.0: PME: Spurious native interrupt!
[  187.147398] nvidia-gpu 0000:01:00.3: PME# enabled
[  187.184826] nvidia-gpu 0000:01:00.3: PME# disabled
[  187.184852] pcieport 0000:00:01.0: PME: Spurious native interrupt!
[  190.932329] nvidia-gpu 0000:01:00.3: PME# enabled
[  190.972359] nvidia-gpu 0000:01:00.3: PME# disabled
[  190.972366] pcieport 0000:00:01.0: PME: Spurious native interrupt!
[  192.351330] snd_hda_intel 0000:00:1f.3: PME# enabled
[  192.468365] snd_hda_intel 0000:00:1f.3: PME# disabled
[  192.736342] xhci_hcd 0000:01:00.2: PME# disabled
[  194.296431] pcieport 0000:00:1c.0: PME# disabled
[  194.432427] pcieport 0000:04:00.0: PME# disabled
[  194.432591] pcieport 0000:05:01.0: PME# disabled
[  194.432771] pcieport 0000:05:00.0: PME# disabled
[  194.432943] pcieport 0000:05:02.0: PME# disabled
[  194.433102] pcieport 0000:05:04.0: PME# disabled
[  194.556299] pcieport 0000:05:01.0: PME# enabled
[  194.556417] pcieport 0000:05:04.0: PME# enabled
[  195.560440] thunderbolt 0000:06:00.0: PME# disabled


-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
@ 2020-09-06 18:18       ` Marc MERLIN
  0 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-09-06 18:18 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Bjorn Helgaas, nouveau, Rafael J. Wysocki, Len Brown,
	Lukas Wunner, Keith Busch, Alex Williamson, Alexandru Gagniuc,
	Kai-Heng Feng, Matthias Andree, Paul Menzel, Nicholas Johnson,
	linux-pci, linux-kernel

Ok, I have an update to this problem. I added the nouveau list because
I can't quite tell if the issue is:
- the PCIe changes that went in 5.6 I think (or 5.5?), referenced below

- a new issue with thunderbold on thinkpad P73, that seems to be
  triggered if I have a USB-C yubikey in the port. With 5.7, my issues
  went away if I removed the USB key during boot, showing an interaction
  between nouveau and thunderbolt

- changes in the nouveau driver. Mika told me the PCIe regression
  "pcieport 0000:00:01.0: PME: Spurious native interrupt!" is supposed
  to be fixed in 5.8, but I still get a 4mn hang or so during boot and
  with 5.8, removing the USB key, didn't help make the boot faster

I don't otherwise use the nvidia chip I so wish I didn't have, I only
use intel graphics on that laptop, but I must apparently use the nouveau
driver to manage the nouveau chip so that it's turned off and not
burning 60W doing nothing.

lspci is in the quoted message below, I won't copy it here again, but
here's the nvidia bit:
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)

Here are 5 boots, 4 on 5.8.5:

dmesg.1_hang_but_no_warning.txt https://pastebin.com/Y5NaH08n
Boot hung for quite a while, but no clear output

dmesg.2_pme_spurious.txt https://pastebin.com/dX19aCpj
[    8.185808] nvidia-gpu 0000:01:00.3: runtime IRQ mapping not provided by arch
[    8.185989] nvidia-gpu 0000:01:00.3: enabling device (0000 -> 0002)
[    8.188986] nvidia-gpu 0000:01:00.3: enabling bus mastering
[   11.936507] nvidia-gpu 0000:01:00.3: PME# enabled
[   11.975985] nvidia-gpu 0000:01:00.3: PME# disabled
[   11.976011] pcieport 0000:00:01.0: PME: Spurious native interrupt!

dmesg.3_usb_key_yanked.txt https://pastebin.com/m7QLnCZt
I yanked the USB key during boot, that seemed to help unlock things with
5.7, but did not with 5.8. It's hung on a loop of:
[   11.262854] nvidia-gpu 0000:01:00.3: saving config space at offset 0x0 (reading 0x1ad910de)
[   11.262863] nvidia-gpu 0000:01:00.3: saving config space at offset 0x4 (reading 0x100406)
[   11.262869] nvidia-gpu 0000:01:00.3: saving config space at offset 0x8 (reading 0xc8000a1)
[   11.262874] nvidia-gpu 0000:01:00.3: saving config space at offset 0xc (reading 0x800000)
[   11.262880] nvidia-gpu 0000:01:00.3: saving config space at offset 0x10 (reading 0xce054000)
[   11.262885] nvidia-gpu 0000:01:00.3: saving config space at offset 0x14 (reading 0x0)
[   11.262890] nvidia-gpu 0000:01:00.3: saving config space at offset 0x18 (reading 0x0)
[   11.262895] nvidia-gpu 0000:01:00.3: saving config space at offset 0x1c (reading 0x0)
[   11.262900] nvidia-gpu 0000:01:00.3: saving config space at offset 0x20 (reading 0x0)
[   11.262906] nvidia-gpu 0000:01:00.3: saving config space at offset 0x24 (reading 0x0)
[   11.262911] nvidia-gpu 0000:01:00.3: saving config space at offset 0x28 (reading 0x0)
[   11.262916] nvidia-gpu 0000:01:00.3: saving config space at offset 0x2c (reading 0x229b17aa)
[   11.262921] nvidia-gpu 0000:01:00.3: saving config space at offset 0x30 (reading 0x0)
[   11.262926] nvidia-gpu 0000:01:00.3: saving config space at offset 0x34 (reading 0x68)
[   11.262931] nvidia-gpu 0000:01:00.3: saving config space at offset 0x38 (reading 0x0)
[   11.262937] nvidia-gpu 0000:01:00.3: saving config space at offset 0x3c (reading 0x4ff)
[   11.262985] nvidia-gpu 0000:01:00.3: PME# enabled
[   11.303060] nvidia-gpu 0000:01:00.3: PME# disabled

dmesg.4_5.5_boot_fine.txt https://pastebin.com/WXgQTUYP
reference boot with 4.5, it works fine, no issues

dmesg.5_no_key_still_hang.txt https://pastebin.com/kcT8Ras0
unfortunately, booting without the USB-C key in thunderbolt, did not
allow this boot to be faster, it looks different though:
[    6.723454] pcieport 0000:00:01.0: runtime IRQ mapping not provided by arch
[    6.723598] pcieport 0000:00:01.0: PME: Signaling with IRQ 122
[    6.724011] pcieport 0000:00:01.0: saving config space at offset 0x0 (reading 0x19018086)
[    6.724016] pcieport 0000:00:01.0: saving config space at offset 0x4 (reading 0x100407)
[    6.724021] pcieport 0000:00:01.0: saving config space at offset 0x8 (reading 0x604000d)
[    6.724025] pcieport 0000:00:01.0: saving config space at offset 0xc (reading 0x810000)
[    6.724029] pcieport 0000:00:01.0: saving config space at offset 0x10 (reading 0x0)
[    6.724033] pcieport 0000:00:01.0: saving config space at offset 0x14 (reading 0x0)
[    6.724037] pcieport 0000:00:01.0: saving config space at offset 0x18 (reading 0x10100)
[    6.724041] pcieport 0000:00:01.0: saving config space at offset 0x1c (reading 0x20002020)
[    6.724046] pcieport 0000:00:01.0: saving config space at offset 0x20 (reading 0xce00cd00)
[    6.724050] pcieport 0000:00:01.0: saving config space at offset 0x24 (reading 0xb1f1a001)
[    6.724054] pcieport 0000:00:01.0: saving config space at offset 0x28 (reading 0x0)
[    6.724058] pcieport 0000:00:01.0: saving config space at offset 0x2c (reading 0x0)
[    6.724062] pcieport 0000:00:01.0: saving config space at offset 0x30 (reading 0x0)
[    6.724066] pcieport 0000:00:01.0: saving config space at offset 0x34 (reading 0x88)
[    6.724070] pcieport 0000:00:01.0: saving config space at offset 0x38 (reading 0x0)
[    6.724074] pcieport 0000:00:01.0: saving config space at offset 0x3c (reading 0x201ff)
[    6.724129] pcieport 0000:00:1b.0: runtime IRQ mapping not provided by arch
[    6.724650] pcieport 0000:00:1b.0: PME: Signaling with IRQ 123
[    6.725021] pcieport 0000:00:1b.0: saving config space at offset 0x0 (reading 0xa3408086)
[    6.725026] pcieport 0000:00:1b.0: saving config space at offset 0x4 (reading 0x100407)
[    6.725031] pcieport 0000:00:1b.0: saving config space at offset 0x8 (reading 0x60400f0)
[    6.725035] pcieport 0000:00:1b.0: saving config space at offset 0xc (reading 0x810000)
[    6.725040] pcieport 0000:00:1b.0: saving config space at offset 0x10 (reading 0x0)
[    6.725044] pcieport 0000:00:1b.0: saving config space at offset 0x14 (reading 0x0)
[    6.725049] pcieport 0000:00:1b.0: saving config space at offset 0x18 (reading 0x20200)
[    6.725053] pcieport 0000:00:1b.0: saving config space at offset 0x1c (reading 0x200000f0)
[    6.725058] pcieport 0000:00:1b.0: saving config space at offset 0x20 (reading 0xce30ce30)
[    6.725062] pcieport 0000:00:1b.0: saving config space at offset 0x24 (reading 0x1fff1)
[    6.725067] pcieport 0000:00:1b.0: saving config space at offset 0x28 (reading 0x0)
[    6.725071] pcieport 0000:00:1b.0: saving config space at offset 0x2c (reading 0x0)
[    6.725075] pcieport 0000:00:1b.0: saving config space at offset 0x30 (reading 0x0)
[    6.725080] pcieport 0000:00:1b.0: saving config space at offset 0x34 (reading 0x40)
[    6.725084] pcieport 0000:00:1b.0: saving config space at offset 0x38 (reading 0x0)
[    6.725089] pcieport 0000:00:1b.0: saving config space at offset 0x3c (reading 0x201ff)
[    6.725154] pcieport 0000:00:1c.0: runtime IRQ mapping not provided by arch
[    6.725284] pcieport 0000:00:1c.0: PME: Signaling with IRQ 124
[    6.725580] pcieport 0000:00:1c.0: pciehp: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
[    6.726086] pci_bus 0000:04: dev 00, created physical slot 0

Any idea what's going on?

Thanks,
Marc

On Sat, Aug 08, 2020 at 01:22:02PM -0700, Marc MERLIN wrote:
> On Fri, Oct 04, 2019 at 03:39:46PM +0300, Mika Westerberg wrote:
> > This is otherwise similar to pcie_wait_for_link() but allows passing
> > custom activation delay in milliseconds.
> > 
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > ---
> >  drivers/pci/pci.c | 21 ++++++++++++++++++---
> >  1 file changed, 18 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index e7982af9a5d8..bfd92e018925 100644
> 
> Hi Mika,
> 
> So, I have a thinkpad P73 with thunderbolt, and while I don't boot
> often, my last boots have been unreliable at best (was only able to boot
> 5.7 once, and 5.8 did not succeed either).
> 
> 5.6 was working for a while, but couldn't boot it either this morning,
> so I had to go back to 5.5. This does not mean 5.5 does not have the
> problem, just that it booted this morning, while 5.6 didn't when I
> tried.
> Once the kernel is booted, the problem does not seem to occur much, or
> at all.
> 
> Basically, I'm getting the same thing than this person with a P53 (which
> is a mostly identical lenovo thinkpad, to mine)
> kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> https://bbs.archlinux.org/viewtopic.php?id=250658
> 
> The kernel boots eventually, but it takes minutes, and everything is so
> super slow, that I just can't reasonably use the machine.
> 
> This shows similar issues with 5.3, 5.4.
> https://forum.proxmox.com/threads/pme-spurious-native-interrupt-kernel-meldungen.62850/
> 
> Another report here with 5.6:
> https://bugzilla.redhat.com/show_bug.cgi?id=1831899
> 
> My current kernel is running your patch above, and I haven't done a lot
> of research yet to confirm whether going back to a kernel before it was
> merged, fixes the problem. Unfortunately the problem is not consistent,
> so it makes things harder to test/debug, especially on my main laptop
> that I do all my work on :)
> 
> I noticed this older patch of yours:
> http://patchwork.ozlabs.org/project/linux-pci/patch/0113014581dbe2d1f938813f1783905bd81b79db.1560079442.git.lukas@wunner.de/
> This patch is not in my kernel, is it worth adding?
> 
> Can I get you more info to help debug this?
> 
> If that helps:
> sauron:/usr/src/linux-5.7.11-amd64-preempt-sysrq-20190816/drivers/pci# lspci
> 00:00.0 Host bridge: Intel Corporation Device 3e20 (rev 0d)
> 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 0d)
> 00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile) (rev 02)
> 00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 0d)
> 00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
> 00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
> 00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
> 00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
> 00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
> 00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10)
> 00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
> 00:17.0 SATA controller: Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller (rev 10)
> 00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #17 (rev f0)
> 00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 (rev f0)
> 00:1c.5 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0)
> 00:1c.7 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #8 (rev f0)
> 00:1e.0 Communication controller: Intel Corporation Cannon Lake PCH Serial IO UART Host Controller (rev 10)
> 00:1f.0 ISA bridge: Intel Corporation Cannon Lake LPC Controller (rev 10)
> 00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
> 00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
> 01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
> 01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
> 01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
> 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
> 04:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> 05:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> 05:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> 05:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> 05:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> 06:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06)
> 2c:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06)
> 52:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
> 54:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)
> 
> 
> sauron:/usr/src/linux-5.7.11-amd64-preempt-sysrq-20190816/drivers/pci# lsusb -t
> /:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
> /:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
> /:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 10000M
> /:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
> /:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 10000M
> /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M
>     |__ Port 2: Dev 2, If 0, Class=Human Interface Device, Driver=usbhid, 12M
>     |__ Port 8: Dev 3, If 3, Class=Video, Driver=uvcvideo, 480M
>     |__ Port 8: Dev 3, If 1, Class=Video, Driver=uvcvideo, 480M
>     |__ Port 8: Dev 3, If 2, Class=Video, Driver=uvcvideo, 480M
>     |__ Port 8: Dev 3, If 0, Class=Video, Driver=uvcvideo, 480M
>     |__ Port 9: Dev 4, If 0, Class=Vendor Specific Class, Driver=, 12M
>     |__ Port 14: Dev 6, If 0, Class=Wireless, Driver=btusb, 12M
>     |__ Port 14: Dev 6, If 1, Class=Wireless, Driver=btusb, 12M
> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>  
> Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
@ 2020-09-06 18:18       ` Marc MERLIN
  0 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-09-06 18:18 UTC (permalink / raw)
  To: Mika Westerberg
  Cc: Kai-Heng Feng, Paul Menzel, Nicholas Johnson,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Rafael J. Wysocki,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Keith Busch,
	Alex Williamson, Alexandru Gagniuc,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, Bjorn Helgaas, Matthias Andree,
	Len Brown

Ok, I have an update to this problem. I added the nouveau list because
I can't quite tell if the issue is:
- the PCIe changes that went in 5.6 I think (or 5.5?), referenced below

- a new issue with thunderbold on thinkpad P73, that seems to be
  triggered if I have a USB-C yubikey in the port. With 5.7, my issues
  went away if I removed the USB key during boot, showing an interaction
  between nouveau and thunderbolt

- changes in the nouveau driver. Mika told me the PCIe regression
  "pcieport 0000:00:01.0: PME: Spurious native interrupt!" is supposed
  to be fixed in 5.8, but I still get a 4mn hang or so during boot and
  with 5.8, removing the USB key, didn't help make the boot faster

I don't otherwise use the nvidia chip I so wish I didn't have, I only
use intel graphics on that laptop, but I must apparently use the nouveau
driver to manage the nouveau chip so that it's turned off and not
burning 60W doing nothing.

lspci is in the quoted message below, I won't copy it here again, but
here's the nvidia bit:
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)

Here are 5 boots, 4 on 5.8.5:

dmesg.1_hang_but_no_warning.txt https://pastebin.com/Y5NaH08n
Boot hung for quite a while, but no clear output

dmesg.2_pme_spurious.txt https://pastebin.com/dX19aCpj
[    8.185808] nvidia-gpu 0000:01:00.3: runtime IRQ mapping not provided by arch
[    8.185989] nvidia-gpu 0000:01:00.3: enabling device (0000 -> 0002)
[    8.188986] nvidia-gpu 0000:01:00.3: enabling bus mastering
[   11.936507] nvidia-gpu 0000:01:00.3: PME# enabled
[   11.975985] nvidia-gpu 0000:01:00.3: PME# disabled
[   11.976011] pcieport 0000:00:01.0: PME: Spurious native interrupt!

dmesg.3_usb_key_yanked.txt https://pastebin.com/m7QLnCZt
I yanked the USB key during boot, that seemed to help unlock things with
5.7, but did not with 5.8. It's hung on a loop of:
[   11.262854] nvidia-gpu 0000:01:00.3: saving config space at offset 0x0 (reading 0x1ad910de)
[   11.262863] nvidia-gpu 0000:01:00.3: saving config space at offset 0x4 (reading 0x100406)
[   11.262869] nvidia-gpu 0000:01:00.3: saving config space at offset 0x8 (reading 0xc8000a1)
[   11.262874] nvidia-gpu 0000:01:00.3: saving config space at offset 0xc (reading 0x800000)
[   11.262880] nvidia-gpu 0000:01:00.3: saving config space at offset 0x10 (reading 0xce054000)
[   11.262885] nvidia-gpu 0000:01:00.3: saving config space at offset 0x14 (reading 0x0)
[   11.262890] nvidia-gpu 0000:01:00.3: saving config space at offset 0x18 (reading 0x0)
[   11.262895] nvidia-gpu 0000:01:00.3: saving config space at offset 0x1c (reading 0x0)
[   11.262900] nvidia-gpu 0000:01:00.3: saving config space at offset 0x20 (reading 0x0)
[   11.262906] nvidia-gpu 0000:01:00.3: saving config space at offset 0x24 (reading 0x0)
[   11.262911] nvidia-gpu 0000:01:00.3: saving config space at offset 0x28 (reading 0x0)
[   11.262916] nvidia-gpu 0000:01:00.3: saving config space at offset 0x2c (reading 0x229b17aa)
[   11.262921] nvidia-gpu 0000:01:00.3: saving config space at offset 0x30 (reading 0x0)
[   11.262926] nvidia-gpu 0000:01:00.3: saving config space at offset 0x34 (reading 0x68)
[   11.262931] nvidia-gpu 0000:01:00.3: saving config space at offset 0x38 (reading 0x0)
[   11.262937] nvidia-gpu 0000:01:00.3: saving config space at offset 0x3c (reading 0x4ff)
[   11.262985] nvidia-gpu 0000:01:00.3: PME# enabled
[   11.303060] nvidia-gpu 0000:01:00.3: PME# disabled

dmesg.4_5.5_boot_fine.txt https://pastebin.com/WXgQTUYP
reference boot with 4.5, it works fine, no issues

dmesg.5_no_key_still_hang.txt https://pastebin.com/kcT8Ras0
unfortunately, booting without the USB-C key in thunderbolt, did not
allow this boot to be faster, it looks different though:
[    6.723454] pcieport 0000:00:01.0: runtime IRQ mapping not provided by arch
[    6.723598] pcieport 0000:00:01.0: PME: Signaling with IRQ 122
[    6.724011] pcieport 0000:00:01.0: saving config space at offset 0x0 (reading 0x19018086)
[    6.724016] pcieport 0000:00:01.0: saving config space at offset 0x4 (reading 0x100407)
[    6.724021] pcieport 0000:00:01.0: saving config space at offset 0x8 (reading 0x604000d)
[    6.724025] pcieport 0000:00:01.0: saving config space at offset 0xc (reading 0x810000)
[    6.724029] pcieport 0000:00:01.0: saving config space at offset 0x10 (reading 0x0)
[    6.724033] pcieport 0000:00:01.0: saving config space at offset 0x14 (reading 0x0)
[    6.724037] pcieport 0000:00:01.0: saving config space at offset 0x18 (reading 0x10100)
[    6.724041] pcieport 0000:00:01.0: saving config space at offset 0x1c (reading 0x20002020)
[    6.724046] pcieport 0000:00:01.0: saving config space at offset 0x20 (reading 0xce00cd00)
[    6.724050] pcieport 0000:00:01.0: saving config space at offset 0x24 (reading 0xb1f1a001)
[    6.724054] pcieport 0000:00:01.0: saving config space at offset 0x28 (reading 0x0)
[    6.724058] pcieport 0000:00:01.0: saving config space at offset 0x2c (reading 0x0)
[    6.724062] pcieport 0000:00:01.0: saving config space at offset 0x30 (reading 0x0)
[    6.724066] pcieport 0000:00:01.0: saving config space at offset 0x34 (reading 0x88)
[    6.724070] pcieport 0000:00:01.0: saving config space at offset 0x38 (reading 0x0)
[    6.724074] pcieport 0000:00:01.0: saving config space at offset 0x3c (reading 0x201ff)
[    6.724129] pcieport 0000:00:1b.0: runtime IRQ mapping not provided by arch
[    6.724650] pcieport 0000:00:1b.0: PME: Signaling with IRQ 123
[    6.725021] pcieport 0000:00:1b.0: saving config space at offset 0x0 (reading 0xa3408086)
[    6.725026] pcieport 0000:00:1b.0: saving config space at offset 0x4 (reading 0x100407)
[    6.725031] pcieport 0000:00:1b.0: saving config space at offset 0x8 (reading 0x60400f0)
[    6.725035] pcieport 0000:00:1b.0: saving config space at offset 0xc (reading 0x810000)
[    6.725040] pcieport 0000:00:1b.0: saving config space at offset 0x10 (reading 0x0)
[    6.725044] pcieport 0000:00:1b.0: saving config space at offset 0x14 (reading 0x0)
[    6.725049] pcieport 0000:00:1b.0: saving config space at offset 0x18 (reading 0x20200)
[    6.725053] pcieport 0000:00:1b.0: saving config space at offset 0x1c (reading 0x200000f0)
[    6.725058] pcieport 0000:00:1b.0: saving config space at offset 0x20 (reading 0xce30ce30)
[    6.725062] pcieport 0000:00:1b.0: saving config space at offset 0x24 (reading 0x1fff1)
[    6.725067] pcieport 0000:00:1b.0: saving config space at offset 0x28 (reading 0x0)
[    6.725071] pcieport 0000:00:1b.0: saving config space at offset 0x2c (reading 0x0)
[    6.725075] pcieport 0000:00:1b.0: saving config space at offset 0x30 (reading 0x0)
[    6.725080] pcieport 0000:00:1b.0: saving config space at offset 0x34 (reading 0x40)
[    6.725084] pcieport 0000:00:1b.0: saving config space at offset 0x38 (reading 0x0)
[    6.725089] pcieport 0000:00:1b.0: saving config space at offset 0x3c (reading 0x201ff)
[    6.725154] pcieport 0000:00:1c.0: runtime IRQ mapping not provided by arch
[    6.725284] pcieport 0000:00:1c.0: PME: Signaling with IRQ 124
[    6.725580] pcieport 0000:00:1c.0: pciehp: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
[    6.726086] pci_bus 0000:04: dev 00, created physical slot 0

Any idea what's going on?

Thanks,
Marc

On Sat, Aug 08, 2020 at 01:22:02PM -0700, Marc MERLIN wrote:
> On Fri, Oct 04, 2019 at 03:39:46PM +0300, Mika Westerberg wrote:
> > This is otherwise similar to pcie_wait_for_link() but allows passing
> > custom activation delay in milliseconds.
> > 
> > Signed-off-by: Mika Westerberg <mika.westerberg-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > ---
> >  drivers/pci/pci.c | 21 ++++++++++++++++++---
> >  1 file changed, 18 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index e7982af9a5d8..bfd92e018925 100644
> 
> Hi Mika,
> 
> So, I have a thinkpad P73 with thunderbolt, and while I don't boot
> often, my last boots have been unreliable at best (was only able to boot
> 5.7 once, and 5.8 did not succeed either).
> 
> 5.6 was working for a while, but couldn't boot it either this morning,
> so I had to go back to 5.5. This does not mean 5.5 does not have the
> problem, just that it booted this morning, while 5.6 didn't when I
> tried.
> Once the kernel is booted, the problem does not seem to occur much, or
> at all.
> 
> Basically, I'm getting the same thing than this person with a P53 (which
> is a mostly identical lenovo thinkpad, to mine)
> kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> https://bbs.archlinux.org/viewtopic.php?id=250658
> 
> The kernel boots eventually, but it takes minutes, and everything is so
> super slow, that I just can't reasonably use the machine.
> 
> This shows similar issues with 5.3, 5.4.
> https://forum.proxmox.com/threads/pme-spurious-native-interrupt-kernel-meldungen.62850/
> 
> Another report here with 5.6:
> https://bugzilla.redhat.com/show_bug.cgi?id=1831899
> 
> My current kernel is running your patch above, and I haven't done a lot
> of research yet to confirm whether going back to a kernel before it was
> merged, fixes the problem. Unfortunately the problem is not consistent,
> so it makes things harder to test/debug, especially on my main laptop
> that I do all my work on :)
> 
> I noticed this older patch of yours:
> http://patchwork.ozlabs.org/project/linux-pci/patch/0113014581dbe2d1f938813f1783905bd81b79db.1560079442.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org/
> This patch is not in my kernel, is it worth adding?
> 
> Can I get you more info to help debug this?
> 
> If that helps:
> sauron:/usr/src/linux-5.7.11-amd64-preempt-sysrq-20190816/drivers/pci# lspci
> 00:00.0 Host bridge: Intel Corporation Device 3e20 (rev 0d)
> 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 0d)
> 00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile) (rev 02)
> 00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 0d)
> 00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
> 00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
> 00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
> 00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
> 00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
> 00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10)
> 00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
> 00:17.0 SATA controller: Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller (rev 10)
> 00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #17 (rev f0)
> 00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 (rev f0)
> 00:1c.5 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0)
> 00:1c.7 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #8 (rev f0)
> 00:1e.0 Communication controller: Intel Corporation Cannon Lake PCH Serial IO UART Host Controller (rev 10)
> 00:1f.0 ISA bridge: Intel Corporation Cannon Lake LPC Controller (rev 10)
> 00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
> 00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
> 01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
> 01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
> 01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
> 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
> 04:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> 05:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> 05:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> 05:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> 05:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> 06:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06)
> 2c:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06)
> 52:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
> 54:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)
> 
> 
> sauron:/usr/src/linux-5.7.11-amd64-preempt-sysrq-20190816/drivers/pci# lsusb -t
> /:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
> /:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
> /:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 10000M
> /:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
> /:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 10000M
> /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M
>     |__ Port 2: Dev 2, If 0, Class=Human Interface Device, Driver=usbhid, 12M
>     |__ Port 8: Dev 3, If 3, Class=Video, Driver=uvcvideo, 480M
>     |__ Port 8: Dev 3, If 1, Class=Video, Driver=uvcvideo, 480M
>     |__ Port 8: Dev 3, If 2, Class=Video, Driver=uvcvideo, 480M
>     |__ Port 8: Dev 3, If 0, Class=Video, Driver=uvcvideo, 480M
>     |__ Port 9: Dev 4, If 0, Class=Vendor Specific Class, Driver=, 12M
>     |__ Port 14: Dev 6, If 0, Class=Wireless, Driver=btusb, 12M
>     |__ Port 14: Dev 6, If 1, Class=Wireless, Driver=btusb, 12M
> 
> Thanks,
> Marc
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>  
> Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
  2020-09-06 18:18       ` Marc MERLIN
  (?)
@ 2020-09-06 18:26       ` Matthias Andree
  -1 siblings, 0 replies; 77+ messages in thread
From: Matthias Andree @ 2020-09-06 18:26 UTC (permalink / raw)
  To: Marc MERLIN, Mika Westerberg
  Cc: Bjorn Helgaas, nouveau, Rafael J. Wysocki, Len Brown,
	Lukas Wunner, Keith Busch, Alex Williamson, Alexandru Gagniuc,
	Kai-Heng Feng, Paul Menzel, Nicholas Johnson, linux-pci,
	linux-kernel

Please everyone stop Cc:ing me on this discussion, I have no interest
and nothing to contribute here.

I have set an invalid Reply-To: just in case...



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
@ 2020-09-07 19:14         ` Karol Herbst
  0 siblings, 0 replies; 77+ messages in thread
From: Karol Herbst @ 2020-09-07 19:14 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: Mika Westerberg, Kai-Heng Feng, Paul Menzel, Nicholas Johnson,
	nouveau, Rafael J. Wysocki, LKML, Keith Busch, Alex Williamson,
	Alexandru Gagniuc, Linux PCI, Bjorn Helgaas, Matthias Andree,
	Len Brown

On Sun, Sep 6, 2020 at 8:52 PM Marc MERLIN <marc_nouveau@merlins.org> wrote:
>
> Ok, I have an update to this problem. I added the nouveau list because
> I can't quite tell if the issue is:
> - the PCIe changes that went in 5.6 I think (or 5.5?), referenced below
>
> - a new issue with thunderbold on thinkpad P73, that seems to be
>   triggered if I have a USB-C yubikey in the port. With 5.7, my issues
>   went away if I removed the USB key during boot, showing an interaction
>   between nouveau and thunderbolt
>
> - changes in the nouveau driver. Mika told me the PCIe regression
>   "pcieport 0000:00:01.0: PME: Spurious native interrupt!" is supposed
>   to be fixed in 5.8, but I still get a 4mn hang or so during boot and
>   with 5.8, removing the USB key, didn't help make the boot faster
>

that's the root port the GPU is attached to, no? I saw that message on
the Thinkpad P1G2 when runtime resuming the Nvidia GPU, but it does
seem to come from the root port.

> I don't otherwise use the nvidia chip I so wish I didn't have, I only
> use intel graphics on that laptop, but I must apparently use the nouveau
> driver to manage the nouveau chip so that it's turned off and not
> burning 60W doing nothing.
>

Well, you'd also need it when attaching external displays.

> lspci is in the quoted message below, I won't copy it here again, but
> here's the nvidia bit:
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
> 01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
> 01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
> 01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
>
> Here are 5 boots, 4 on 5.8.5:
>
> dmesg.1_hang_but_no_warning.txt https://pastebin.com/Y5NaH08n
> Boot hung for quite a while, but no clear output
>
> dmesg.2_pme_spurious.txt https://pastebin.com/dX19aCpj
> [    8.185808] nvidia-gpu 0000:01:00.3: runtime IRQ mapping not provided by arch
> [    8.185989] nvidia-gpu 0000:01:00.3: enabling device (0000 -> 0002)
> [    8.188986] nvidia-gpu 0000:01:00.3: enabling bus mastering
> [   11.936507] nvidia-gpu 0000:01:00.3: PME# enabled
> [   11.975985] nvidia-gpu 0000:01:00.3: PME# disabled
> [   11.976011] pcieport 0000:00:01.0: PME: Spurious native interrupt!
>
> dmesg.3_usb_key_yanked.txt https://pastebin.com/m7QLnCZt
> I yanked the USB key during boot, that seemed to help unlock things with
> 5.7, but did not with 5.8. It's hung on a loop of:
> [   11.262854] nvidia-gpu 0000:01:00.3: saving config space at offset 0x0 (reading 0x1ad910de)
> [   11.262863] nvidia-gpu 0000:01:00.3: saving config space at offset 0x4 (reading 0x100406)
> [   11.262869] nvidia-gpu 0000:01:00.3: saving config space at offset 0x8 (reading 0xc8000a1)
> [   11.262874] nvidia-gpu 0000:01:00.3: saving config space at offset 0xc (reading 0x800000)
> [   11.262880] nvidia-gpu 0000:01:00.3: saving config space at offset 0x10 (reading 0xce054000)
> [   11.262885] nvidia-gpu 0000:01:00.3: saving config space at offset 0x14 (reading 0x0)
> [   11.262890] nvidia-gpu 0000:01:00.3: saving config space at offset 0x18 (reading 0x0)
> [   11.262895] nvidia-gpu 0000:01:00.3: saving config space at offset 0x1c (reading 0x0)
> [   11.262900] nvidia-gpu 0000:01:00.3: saving config space at offset 0x20 (reading 0x0)
> [   11.262906] nvidia-gpu 0000:01:00.3: saving config space at offset 0x24 (reading 0x0)
> [   11.262911] nvidia-gpu 0000:01:00.3: saving config space at offset 0x28 (reading 0x0)
> [   11.262916] nvidia-gpu 0000:01:00.3: saving config space at offset 0x2c (reading 0x229b17aa)
> [   11.262921] nvidia-gpu 0000:01:00.3: saving config space at offset 0x30 (reading 0x0)
> [   11.262926] nvidia-gpu 0000:01:00.3: saving config space at offset 0x34 (reading 0x68)
> [   11.262931] nvidia-gpu 0000:01:00.3: saving config space at offset 0x38 (reading 0x0)
> [   11.262937] nvidia-gpu 0000:01:00.3: saving config space at offset 0x3c (reading 0x4ff)
> [   11.262985] nvidia-gpu 0000:01:00.3: PME# enabled
> [   11.303060] nvidia-gpu 0000:01:00.3: PME# disabled
>

mhh, interesting. I heard some random comments that the Nvidia
USB-C/UCSI driver is a bit broken and can cause various issues. Mind
blacklisting i2c-nvidia-gpu and typec_nvidia (and verify they don't
get loaded) and see if that helps?

> dmesg.4_5.5_boot_fine.txt https://pastebin.com/WXgQTUYP
> reference boot with 4.5, it works fine, no issues
>
> dmesg.5_no_key_still_hang.txt https://pastebin.com/kcT8Ras0
> unfortunately, booting without the USB-C key in thunderbolt, did not
> allow this boot to be faster, it looks different though:
> [    6.723454] pcieport 0000:00:01.0: runtime IRQ mapping not provided by arch
> [    6.723598] pcieport 0000:00:01.0: PME: Signaling with IRQ 122
> [    6.724011] pcieport 0000:00:01.0: saving config space at offset 0x0 (reading 0x19018086)
> [    6.724016] pcieport 0000:00:01.0: saving config space at offset 0x4 (reading 0x100407)
> [    6.724021] pcieport 0000:00:01.0: saving config space at offset 0x8 (reading 0x604000d)
> [    6.724025] pcieport 0000:00:01.0: saving config space at offset 0xc (reading 0x810000)
> [    6.724029] pcieport 0000:00:01.0: saving config space at offset 0x10 (reading 0x0)
> [    6.724033] pcieport 0000:00:01.0: saving config space at offset 0x14 (reading 0x0)
> [    6.724037] pcieport 0000:00:01.0: saving config space at offset 0x18 (reading 0x10100)
> [    6.724041] pcieport 0000:00:01.0: saving config space at offset 0x1c (reading 0x20002020)
> [    6.724046] pcieport 0000:00:01.0: saving config space at offset 0x20 (reading 0xce00cd00)
> [    6.724050] pcieport 0000:00:01.0: saving config space at offset 0x24 (reading 0xb1f1a001)
> [    6.724054] pcieport 0000:00:01.0: saving config space at offset 0x28 (reading 0x0)
> [    6.724058] pcieport 0000:00:01.0: saving config space at offset 0x2c (reading 0x0)
> [    6.724062] pcieport 0000:00:01.0: saving config space at offset 0x30 (reading 0x0)
> [    6.724066] pcieport 0000:00:01.0: saving config space at offset 0x34 (reading 0x88)
> [    6.724070] pcieport 0000:00:01.0: saving config space at offset 0x38 (reading 0x0)
> [    6.724074] pcieport 0000:00:01.0: saving config space at offset 0x3c (reading 0x201ff)
> [    6.724129] pcieport 0000:00:1b.0: runtime IRQ mapping not provided by arch
> [    6.724650] pcieport 0000:00:1b.0: PME: Signaling with IRQ 123
> [    6.725021] pcieport 0000:00:1b.0: saving config space at offset 0x0 (reading 0xa3408086)
> [    6.725026] pcieport 0000:00:1b.0: saving config space at offset 0x4 (reading 0x100407)
> [    6.725031] pcieport 0000:00:1b.0: saving config space at offset 0x8 (reading 0x60400f0)
> [    6.725035] pcieport 0000:00:1b.0: saving config space at offset 0xc (reading 0x810000)
> [    6.725040] pcieport 0000:00:1b.0: saving config space at offset 0x10 (reading 0x0)
> [    6.725044] pcieport 0000:00:1b.0: saving config space at offset 0x14 (reading 0x0)
> [    6.725049] pcieport 0000:00:1b.0: saving config space at offset 0x18 (reading 0x20200)
> [    6.725053] pcieport 0000:00:1b.0: saving config space at offset 0x1c (reading 0x200000f0)
> [    6.725058] pcieport 0000:00:1b.0: saving config space at offset 0x20 (reading 0xce30ce30)
> [    6.725062] pcieport 0000:00:1b.0: saving config space at offset 0x24 (reading 0x1fff1)
> [    6.725067] pcieport 0000:00:1b.0: saving config space at offset 0x28 (reading 0x0)
> [    6.725071] pcieport 0000:00:1b.0: saving config space at offset 0x2c (reading 0x0)
> [    6.725075] pcieport 0000:00:1b.0: saving config space at offset 0x30 (reading 0x0)
> [    6.725080] pcieport 0000:00:1b.0: saving config space at offset 0x34 (reading 0x40)
> [    6.725084] pcieport 0000:00:1b.0: saving config space at offset 0x38 (reading 0x0)
> [    6.725089] pcieport 0000:00:1b.0: saving config space at offset 0x3c (reading 0x201ff)
> [    6.725154] pcieport 0000:00:1c.0: runtime IRQ mapping not provided by arch
> [    6.725284] pcieport 0000:00:1c.0: PME: Signaling with IRQ 124
> [    6.725580] pcieport 0000:00:1c.0: pciehp: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
> [    6.726086] pci_bus 0000:04: dev 00, created physical slot 0
>
> Any idea what's going on?
>
> Thanks,
> Marc
>
> On Sat, Aug 08, 2020 at 01:22:02PM -0700, Marc MERLIN wrote:
> > On Fri, Oct 04, 2019 at 03:39:46PM +0300, Mika Westerberg wrote:
> > > This is otherwise similar to pcie_wait_for_link() but allows passing
> > > custom activation delay in milliseconds.
> > >
> > > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > ---
> > >  drivers/pci/pci.c | 21 ++++++++++++++++++---
> > >  1 file changed, 18 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index e7982af9a5d8..bfd92e018925 100644
> >
> > Hi Mika,
> >
> > So, I have a thinkpad P73 with thunderbolt, and while I don't boot
> > often, my last boots have been unreliable at best (was only able to boot
> > 5.7 once, and 5.8 did not succeed either).
> >
> > 5.6 was working for a while, but couldn't boot it either this morning,
> > so I had to go back to 5.5. This does not mean 5.5 does not have the
> > problem, just that it booted this morning, while 5.6 didn't when I
> > tried.
> > Once the kernel is booted, the problem does not seem to occur much, or
> > at all.
> >
> > Basically, I'm getting the same thing than this person with a P53 (which
> > is a mostly identical lenovo thinkpad, to mine)
> > kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> > kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> > kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> > kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> > kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> > https://bbs.archlinux.org/viewtopic.php?id=250658
> >
> > The kernel boots eventually, but it takes minutes, and everything is so
> > super slow, that I just can't reasonably use the machine.
> >
> > This shows similar issues with 5.3, 5.4.
> > https://forum.proxmox.com/threads/pme-spurious-native-interrupt-kernel-meldungen.62850/
> >
> > Another report here with 5.6:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1831899
> >
> > My current kernel is running your patch above, and I haven't done a lot
> > of research yet to confirm whether going back to a kernel before it was
> > merged, fixes the problem. Unfortunately the problem is not consistent,
> > so it makes things harder to test/debug, especially on my main laptop
> > that I do all my work on :)
> >
> > I noticed this older patch of yours:
> > http://patchwork.ozlabs.org/project/linux-pci/patch/0113014581dbe2d1f938813f1783905bd81b79db.1560079442.git.lukas@wunner.de/
> > This patch is not in my kernel, is it worth adding?
> >
> > Can I get you more info to help debug this?
> >
> > If that helps:
> > sauron:/usr/src/linux-5.7.11-amd64-preempt-sysrq-20190816/drivers/pci# lspci
> > 00:00.0 Host bridge: Intel Corporation Device 3e20 (rev 0d)
> > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 0d)
> > 00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile) (rev 02)
> > 00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 0d)
> > 00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
> > 00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
> > 00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
> > 00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
> > 00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
> > 00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10)
> > 00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
> > 00:17.0 SATA controller: Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller (rev 10)
> > 00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #17 (rev f0)
> > 00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 (rev f0)
> > 00:1c.5 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0)
> > 00:1c.7 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #8 (rev f0)
> > 00:1e.0 Communication controller: Intel Corporation Cannon Lake PCH Serial IO UART Host Controller (rev 10)
> > 00:1f.0 ISA bridge: Intel Corporation Cannon Lake LPC Controller (rev 10)
> > 00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
> > 00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
> > 00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
> > 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
> > 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
> > 01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
> > 01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
> > 01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
> > 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
> > 04:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> > 05:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> > 05:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> > 05:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> > 05:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> > 06:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06)
> > 2c:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06)
> > 52:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
> > 54:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)
> >
> >
> > sauron:/usr/src/linux-5.7.11-amd64-preempt-sysrq-20190816/drivers/pci# lsusb -t
> > /:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
> > /:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
> > /:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 10000M
> > /:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
> > /:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 10000M
> > /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M
> >     |__ Port 2: Dev 2, If 0, Class=Human Interface Device, Driver=usbhid, 12M
> >     |__ Port 8: Dev 3, If 3, Class=Video, Driver=uvcvideo, 480M
> >     |__ Port 8: Dev 3, If 1, Class=Video, Driver=uvcvideo, 480M
> >     |__ Port 8: Dev 3, If 2, Class=Video, Driver=uvcvideo, 480M
> >     |__ Port 8: Dev 3, If 0, Class=Video, Driver=uvcvideo, 480M
> >     |__ Port 9: Dev 4, If 0, Class=Vendor Specific Class, Driver=, 12M
> >     |__ Port 14: Dev 6, If 0, Class=Wireless, Driver=btusb, 12M
> >     |__ Port 14: Dev 6, If 1, Class=Wireless, Driver=btusb, 12M
> >
> > Thanks,
> > Marc
> > --
> > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> >
> > Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
>
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>
> Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
@ 2020-09-07 19:14         ` Karol Herbst
  0 siblings, 0 replies; 77+ messages in thread
From: Karol Herbst @ 2020-09-07 19:14 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: Paul Menzel, Nicholas Johnson, nouveau, Rafael J. Wysocki, LKML,
	Keith Busch, Alex Williamson, Kai-Heng Feng, Alexandru Gagniuc,
	Linux PCI, Bjorn Helgaas, Matthias Andree, Mika Westerberg,
	Len Brown

On Sun, Sep 6, 2020 at 8:52 PM Marc MERLIN <marc_nouveau-xnduUnryOU1AfugRpC6u6w@public.gmane.org> wrote:
>
> Ok, I have an update to this problem. I added the nouveau list because
> I can't quite tell if the issue is:
> - the PCIe changes that went in 5.6 I think (or 5.5?), referenced below
>
> - a new issue with thunderbold on thinkpad P73, that seems to be
>   triggered if I have a USB-C yubikey in the port. With 5.7, my issues
>   went away if I removed the USB key during boot, showing an interaction
>   between nouveau and thunderbolt
>
> - changes in the nouveau driver. Mika told me the PCIe regression
>   "pcieport 0000:00:01.0: PME: Spurious native interrupt!" is supposed
>   to be fixed in 5.8, but I still get a 4mn hang or so during boot and
>   with 5.8, removing the USB key, didn't help make the boot faster
>

that's the root port the GPU is attached to, no? I saw that message on
the Thinkpad P1G2 when runtime resuming the Nvidia GPU, but it does
seem to come from the root port.

> I don't otherwise use the nvidia chip I so wish I didn't have, I only
> use intel graphics on that laptop, but I must apparently use the nouveau
> driver to manage the nouveau chip so that it's turned off and not
> burning 60W doing nothing.
>

Well, you'd also need it when attaching external displays.

> lspci is in the quoted message below, I won't copy it here again, but
> here's the nvidia bit:
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
> 01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
> 01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
> 01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
>
> Here are 5 boots, 4 on 5.8.5:
>
> dmesg.1_hang_but_no_warning.txt https://pastebin.com/Y5NaH08n
> Boot hung for quite a while, but no clear output
>
> dmesg.2_pme_spurious.txt https://pastebin.com/dX19aCpj
> [    8.185808] nvidia-gpu 0000:01:00.3: runtime IRQ mapping not provided by arch
> [    8.185989] nvidia-gpu 0000:01:00.3: enabling device (0000 -> 0002)
> [    8.188986] nvidia-gpu 0000:01:00.3: enabling bus mastering
> [   11.936507] nvidia-gpu 0000:01:00.3: PME# enabled
> [   11.975985] nvidia-gpu 0000:01:00.3: PME# disabled
> [   11.976011] pcieport 0000:00:01.0: PME: Spurious native interrupt!
>
> dmesg.3_usb_key_yanked.txt https://pastebin.com/m7QLnCZt
> I yanked the USB key during boot, that seemed to help unlock things with
> 5.7, but did not with 5.8. It's hung on a loop of:
> [   11.262854] nvidia-gpu 0000:01:00.3: saving config space at offset 0x0 (reading 0x1ad910de)
> [   11.262863] nvidia-gpu 0000:01:00.3: saving config space at offset 0x4 (reading 0x100406)
> [   11.262869] nvidia-gpu 0000:01:00.3: saving config space at offset 0x8 (reading 0xc8000a1)
> [   11.262874] nvidia-gpu 0000:01:00.3: saving config space at offset 0xc (reading 0x800000)
> [   11.262880] nvidia-gpu 0000:01:00.3: saving config space at offset 0x10 (reading 0xce054000)
> [   11.262885] nvidia-gpu 0000:01:00.3: saving config space at offset 0x14 (reading 0x0)
> [   11.262890] nvidia-gpu 0000:01:00.3: saving config space at offset 0x18 (reading 0x0)
> [   11.262895] nvidia-gpu 0000:01:00.3: saving config space at offset 0x1c (reading 0x0)
> [   11.262900] nvidia-gpu 0000:01:00.3: saving config space at offset 0x20 (reading 0x0)
> [   11.262906] nvidia-gpu 0000:01:00.3: saving config space at offset 0x24 (reading 0x0)
> [   11.262911] nvidia-gpu 0000:01:00.3: saving config space at offset 0x28 (reading 0x0)
> [   11.262916] nvidia-gpu 0000:01:00.3: saving config space at offset 0x2c (reading 0x229b17aa)
> [   11.262921] nvidia-gpu 0000:01:00.3: saving config space at offset 0x30 (reading 0x0)
> [   11.262926] nvidia-gpu 0000:01:00.3: saving config space at offset 0x34 (reading 0x68)
> [   11.262931] nvidia-gpu 0000:01:00.3: saving config space at offset 0x38 (reading 0x0)
> [   11.262937] nvidia-gpu 0000:01:00.3: saving config space at offset 0x3c (reading 0x4ff)
> [   11.262985] nvidia-gpu 0000:01:00.3: PME# enabled
> [   11.303060] nvidia-gpu 0000:01:00.3: PME# disabled
>

mhh, interesting. I heard some random comments that the Nvidia
USB-C/UCSI driver is a bit broken and can cause various issues. Mind
blacklisting i2c-nvidia-gpu and typec_nvidia (and verify they don't
get loaded) and see if that helps?

> dmesg.4_5.5_boot_fine.txt https://pastebin.com/WXgQTUYP
> reference boot with 4.5, it works fine, no issues
>
> dmesg.5_no_key_still_hang.txt https://pastebin.com/kcT8Ras0
> unfortunately, booting without the USB-C key in thunderbolt, did not
> allow this boot to be faster, it looks different though:
> [    6.723454] pcieport 0000:00:01.0: runtime IRQ mapping not provided by arch
> [    6.723598] pcieport 0000:00:01.0: PME: Signaling with IRQ 122
> [    6.724011] pcieport 0000:00:01.0: saving config space at offset 0x0 (reading 0x19018086)
> [    6.724016] pcieport 0000:00:01.0: saving config space at offset 0x4 (reading 0x100407)
> [    6.724021] pcieport 0000:00:01.0: saving config space at offset 0x8 (reading 0x604000d)
> [    6.724025] pcieport 0000:00:01.0: saving config space at offset 0xc (reading 0x810000)
> [    6.724029] pcieport 0000:00:01.0: saving config space at offset 0x10 (reading 0x0)
> [    6.724033] pcieport 0000:00:01.0: saving config space at offset 0x14 (reading 0x0)
> [    6.724037] pcieport 0000:00:01.0: saving config space at offset 0x18 (reading 0x10100)
> [    6.724041] pcieport 0000:00:01.0: saving config space at offset 0x1c (reading 0x20002020)
> [    6.724046] pcieport 0000:00:01.0: saving config space at offset 0x20 (reading 0xce00cd00)
> [    6.724050] pcieport 0000:00:01.0: saving config space at offset 0x24 (reading 0xb1f1a001)
> [    6.724054] pcieport 0000:00:01.0: saving config space at offset 0x28 (reading 0x0)
> [    6.724058] pcieport 0000:00:01.0: saving config space at offset 0x2c (reading 0x0)
> [    6.724062] pcieport 0000:00:01.0: saving config space at offset 0x30 (reading 0x0)
> [    6.724066] pcieport 0000:00:01.0: saving config space at offset 0x34 (reading 0x88)
> [    6.724070] pcieport 0000:00:01.0: saving config space at offset 0x38 (reading 0x0)
> [    6.724074] pcieport 0000:00:01.0: saving config space at offset 0x3c (reading 0x201ff)
> [    6.724129] pcieport 0000:00:1b.0: runtime IRQ mapping not provided by arch
> [    6.724650] pcieport 0000:00:1b.0: PME: Signaling with IRQ 123
> [    6.725021] pcieport 0000:00:1b.0: saving config space at offset 0x0 (reading 0xa3408086)
> [    6.725026] pcieport 0000:00:1b.0: saving config space at offset 0x4 (reading 0x100407)
> [    6.725031] pcieport 0000:00:1b.0: saving config space at offset 0x8 (reading 0x60400f0)
> [    6.725035] pcieport 0000:00:1b.0: saving config space at offset 0xc (reading 0x810000)
> [    6.725040] pcieport 0000:00:1b.0: saving config space at offset 0x10 (reading 0x0)
> [    6.725044] pcieport 0000:00:1b.0: saving config space at offset 0x14 (reading 0x0)
> [    6.725049] pcieport 0000:00:1b.0: saving config space at offset 0x18 (reading 0x20200)
> [    6.725053] pcieport 0000:00:1b.0: saving config space at offset 0x1c (reading 0x200000f0)
> [    6.725058] pcieport 0000:00:1b.0: saving config space at offset 0x20 (reading 0xce30ce30)
> [    6.725062] pcieport 0000:00:1b.0: saving config space at offset 0x24 (reading 0x1fff1)
> [    6.725067] pcieport 0000:00:1b.0: saving config space at offset 0x28 (reading 0x0)
> [    6.725071] pcieport 0000:00:1b.0: saving config space at offset 0x2c (reading 0x0)
> [    6.725075] pcieport 0000:00:1b.0: saving config space at offset 0x30 (reading 0x0)
> [    6.725080] pcieport 0000:00:1b.0: saving config space at offset 0x34 (reading 0x40)
> [    6.725084] pcieport 0000:00:1b.0: saving config space at offset 0x38 (reading 0x0)
> [    6.725089] pcieport 0000:00:1b.0: saving config space at offset 0x3c (reading 0x201ff)
> [    6.725154] pcieport 0000:00:1c.0: runtime IRQ mapping not provided by arch
> [    6.725284] pcieport 0000:00:1c.0: PME: Signaling with IRQ 124
> [    6.725580] pcieport 0000:00:1c.0: pciehp: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
> [    6.726086] pci_bus 0000:04: dev 00, created physical slot 0
>
> Any idea what's going on?
>
> Thanks,
> Marc
>
> On Sat, Aug 08, 2020 at 01:22:02PM -0700, Marc MERLIN wrote:
> > On Fri, Oct 04, 2019 at 03:39:46PM +0300, Mika Westerberg wrote:
> > > This is otherwise similar to pcie_wait_for_link() but allows passing
> > > custom activation delay in milliseconds.
> > >
> > > Signed-off-by: Mika Westerberg <mika.westerberg-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > > ---
> > >  drivers/pci/pci.c | 21 ++++++++++++++++++---
> > >  1 file changed, 18 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index e7982af9a5d8..bfd92e018925 100644
> >
> > Hi Mika,
> >
> > So, I have a thinkpad P73 with thunderbolt, and while I don't boot
> > often, my last boots have been unreliable at best (was only able to boot
> > 5.7 once, and 5.8 did not succeed either).
> >
> > 5.6 was working for a while, but couldn't boot it either this morning,
> > so I had to go back to 5.5. This does not mean 5.5 does not have the
> > problem, just that it booted this morning, while 5.6 didn't when I
> > tried.
> > Once the kernel is booted, the problem does not seem to occur much, or
> > at all.
> >
> > Basically, I'm getting the same thing than this person with a P53 (which
> > is a mostly identical lenovo thinkpad, to mine)
> > kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> > kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> > kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> > kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> > kernel: pcieport 0000:00:01.0: PME: Spurious native interrupt!
> > https://bbs.archlinux.org/viewtopic.php?id=250658
> >
> > The kernel boots eventually, but it takes minutes, and everything is so
> > super slow, that I just can't reasonably use the machine.
> >
> > This shows similar issues with 5.3, 5.4.
> > https://forum.proxmox.com/threads/pme-spurious-native-interrupt-kernel-meldungen.62850/
> >
> > Another report here with 5.6:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1831899
> >
> > My current kernel is running your patch above, and I haven't done a lot
> > of research yet to confirm whether going back to a kernel before it was
> > merged, fixes the problem. Unfortunately the problem is not consistent,
> > so it makes things harder to test/debug, especially on my main laptop
> > that I do all my work on :)
> >
> > I noticed this older patch of yours:
> > http://patchwork.ozlabs.org/project/linux-pci/patch/0113014581dbe2d1f938813f1783905bd81b79db.1560079442.git.lukas-JFq808J9C/izQB+pC5nmwQ@public.gmane.org/
> > This patch is not in my kernel, is it worth adding?
> >
> > Can I get you more info to help debug this?
> >
> > If that helps:
> > sauron:/usr/src/linux-5.7.11-amd64-preempt-sysrq-20190816/drivers/pci# lspci
> > 00:00.0 Host bridge: Intel Corporation Device 3e20 (rev 0d)
> > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 0d)
> > 00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile) (rev 02)
> > 00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 0d)
> > 00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
> > 00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
> > 00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
> > 00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
> > 00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
> > 00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10)
> > 00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
> > 00:17.0 SATA controller: Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller (rev 10)
> > 00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #17 (rev f0)
> > 00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 (rev f0)
> > 00:1c.5 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0)
> > 00:1c.7 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #8 (rev f0)
> > 00:1e.0 Communication controller: Intel Corporation Cannon Lake PCH Serial IO UART Host Controller (rev 10)
> > 00:1f.0 ISA bridge: Intel Corporation Cannon Lake LPC Controller (rev 10)
> > 00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
> > 00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
> > 00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
> > 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
> > 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
> > 01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
> > 01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
> > 01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
> > 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
> > 04:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> > 05:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> > 05:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> > 05:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> > 05:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
> > 06:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06)
> > 2c:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06)
> > 52:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
> > 54:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)
> >
> >
> > sauron:/usr/src/linux-5.7.11-amd64-preempt-sysrq-20190816/drivers/pci# lsusb -t
> > /:  Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 10000M
> > /:  Bus 05.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
> > /:  Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 10000M
> > /:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
> > /:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 10000M
> > /:  Bus 01.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M
> >     |__ Port 2: Dev 2, If 0, Class=Human Interface Device, Driver=usbhid, 12M
> >     |__ Port 8: Dev 3, If 3, Class=Video, Driver=uvcvideo, 480M
> >     |__ Port 8: Dev 3, If 1, Class=Video, Driver=uvcvideo, 480M
> >     |__ Port 8: Dev 3, If 2, Class=Video, Driver=uvcvideo, 480M
> >     |__ Port 8: Dev 3, If 0, Class=Video, Driver=uvcvideo, 480M
> >     |__ Port 9: Dev 4, If 0, Class=Vendor Specific Class, Driver=, 12M
> >     |__ Port 14: Dev 6, If 0, Class=Wireless, Driver=btusb, 12M
> >     |__ Port 14: Dev 6, If 1, Class=Wireless, Driver=btusb, 12M
> >
> > Thanks,
> > Marc
> > --
> > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> >
> > Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
>
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>
> Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
> _______________________________________________
> Nouveau mailing list
> Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
@ 2020-09-07 20:58           ` Marc MERLIN
  0 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-09-07 20:58 UTC (permalink / raw)
  To: Karol Herbst
  Cc: Mika Westerberg, Kai-Heng Feng, Nicholas Johnson, nouveau, LKML,
	Linux PCI, Bjorn Helgaas, Len Brown

On Mon, Sep 07, 2020 at 09:14:03PM +0200, Karol Herbst wrote:
> > - changes in the nouveau driver. Mika told me the PCIe regression
> >   "pcieport 0000:00:01.0: PME: Spurious native interrupt!" is supposed
> >   to be fixed in 5.8, but I still get a 4mn hang or so during boot and
> >   with 5.8, removing the USB key, didn't help make the boot faster
> 
> that's the root port the GPU is attached to, no? I saw that message on
> the Thinkpad P1G2 when runtime resuming the Nvidia GPU, but it does
> seem to come from the root port.

Hi Karol, thanks for your answer.
 
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 0d)
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)

> Well, you'd also need it when attaching external displays.
 
Indeed. I just don't need that on this laptop, but familiar with the not
so seemless procedure to turn on both GPUs, and mirror the intel one into
the nvidia one for external output. 

> > [   11.262985] nvidia-gpu 0000:01:00.3: PME# enabled
> > [   11.303060] nvidia-gpu 0000:01:00.3: PME# disabled
> 
> mhh, interesting. I heard some random comments that the Nvidia
> USB-C/UCSI driver is a bit broken and can cause various issues. Mind
> blacklisting i2c-nvidia-gpu and typec_nvidia (and verify they don't
> get loaded) and see if that helps?

Right, this one:
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
Sure, I'll blacklist it. Ok, just did that, removed from initrd,
rebooted, and it was no better.

From initrd (before root gets mounted), I have this:
nouveau              1961984  0
mxm_wmi                16384  1 nouveau
hwmon                  32768  1 nouveau
ttm                   102400  1 nouveau
wmi                    32768  2 nouveau,mxm_wmi

I still got a 2mn hang. and a nouveau probe error
[  189.124530] nouveau: probe of 0000:01:00.0 failed with error -12


Here's what it looks like:
[    9.693230] hid: raw HID events driver (C) Jiri Kosina
[    9.694988] usbcore: registered new interface driver usbhid
[    9.694989] usbhid: USB HID core driver
[    9.696700] hid-generic 0003:1050:0200.0001: hiddev0,hidraw0: USB HID v1.00 Device [Yubico Yubico Gnubby (gnubby1)] on usb-0000:00:14.0-2/input0
[    9.784456] Console: switching to colour frame buffer device 240x67
[    9.816297] i915 0000:00:02.0: fb0: i915drmfb frame buffer device
[   25.087400] thunderbolt 0000:06:00.0: saving config space at offset 0x0 (reading 0x15eb8086)
[   25.087414] thunderbolt 0000:06:00.0: saving config space at offset 0x4 (reading 0x100406)
[   25.087419] thunderbolt 0000:06:00.0: saving config space at offset 0x8 (reading 0x8800006)
[   25.087424] thunderbolt 0000:06:00.0: saving config space at offset 0xc (reading 0x20)
[   25.087430] thunderbolt 0000:06:00.0: saving config space at offset 0x10 (reading 0xcc100000)
[   25.087435] thunderbolt 0000:06:00.0: saving config space at offset 0x14 (reading 0xcc140000)
[   25.087440] thunderbolt 0000:06:00.0: saving config space at offset 0x18 (reading 0x0)
[   25.087445] thunderbolt 0000:06:00.0: saving config space at offset 0x1c (reading 0x0)
[   25.087450] thunderbolt 0000:06:00.0: saving config space at offset 0x20 (reading 0x0)
[   25.087455] thunderbolt 0000:06:00.0: saving config space at offset 0x24 (reading 0x0)
[   25.087460] thunderbolt 0000:06:00.0: saving config space at offset 0x28 (reading 0x0)
[   25.087466] thunderbolt 0000:06:00.0: saving config space at offset 0x2c (reading 0x229b17aa)
[   25.087471] thunderbolt 0000:06:00.0: saving config space at offset 0x30 (reading 0x0)
[   25.087476] thunderbolt 0000:06:00.0: saving config space at offset 0x34 (reading 0x80)
[   25.087481] thunderbolt 0000:06:00.0: saving config space at offset 0x38 (reading 0x0)
[   25.087486] thunderbolt 0000:06:00.0: saving config space at offset 0x3c (reading 0x1ff)
[   25.087571] thunderbolt 0000:06:00.0: PME# enabled
[   25.105353] pcieport 0000:05:00.0: saving config space at offset 0x0 (reading 0x15ea8086)
[   25.105364] pcieport 0000:05:00.0: saving config space at offset 0x4 (reading 0x100407)
[   25.105370] pcieport 0000:05:00.0: saving config space at offset 0x8 (reading 0x6040006)
[   25.105375] pcieport 0000:05:00.0: saving config space at offset 0xc (reading 0x10020)
[   25.105380] pcieport 0000:05:00.0: saving config space at offset 0x10 (reading 0x0)
[   25.105384] pcieport 0000:05:00.0: saving config space at offset 0x14 (reading 0x0)
[   25.105389] pcieport 0000:05:00.0: saving config space at offset 0x18 (reading 0x60605)
[   25.105394] pcieport 0000:05:00.0: saving config space at offset 0x1c (reading 0x1f1)
[   25.105399] pcieport 0000:05:00.0: saving config space at offset 0x20 (reading 0xcc10cc10)
[   25.105404] pcieport 0000:05:00.0: saving config space at offset 0x24 (reading 0x1fff1)
[   25.105409] pcieport 0000:05:00.0: saving config space at offset 0x28 (reading 0x0)
[   25.105413] pcieport 0000:05:00.0: saving config space at offset 0x2c (reading 0x0)
[   25.105418] pcieport 0000:05:00.0: saving config space at offset 0x30 (reading 0x0)
[   25.105423] pcieport 0000:05:00.0: saving config space at offset 0x34 (reading 0x80)
[   25.105428] pcieport 0000:05:00.0: saving config space at offset 0x38 (reading 0x0)
[   25.105432] pcieport 0000:05:00.0: saving config space at offset 0x3c (reading 0x201ff)
[   25.105517] pcieport 0000:05:00.0: PME# enabled
[   25.125367] pcieport 0000:04:00.0: saving config space at offset 0x0 (reading 0x15ea8086)
[   25.125378] pcieport 0000:04:00.0: saving config space at offset 0x4 (reading 0x100007)
[   25.125383] pcieport 0000:04:00.0: saving config space at offset 0x8 (reading 0x6040006)
[   25.125388] pcieport 0000:04:00.0: saving config space at offset 0xc (reading 0x10020)
[   25.125393] pcieport 0000:04:00.0: saving config space at offset 0x10 (reading 0x0)
[   25.125398] pcieport 0000:04:00.0: saving config space at offset 0x14 (reading 0x0)
[   25.125403] pcieport 0000:04:00.0: saving config space at offset 0x18 (reading 0x510504)
[   25.125407] pcieport 0000:04:00.0: saving config space at offset 0x1c (reading 0x5141)
[   25.125412] pcieport 0000:04:00.0: saving config space at offset 0x20 (reading 0xcc10b400)
[   25.125417] pcieport 0000:04:00.0: saving config space at offset 0x24 (reading 0x3ff10001)
[   25.125422] pcieport 0000:04:00.0: saving config space at offset 0x28 (reading 0x60)
[   25.125427] pcieport 0000:04:00.0: saving config space at offset 0x2c (reading 0x60)
[   25.125431] pcieport 0000:04:00.0: saving config space at offset 0x30 (reading 0x0)
[   25.125436] pcieport 0000:04:00.0: saving config space at offset 0x34 (reading 0x80)
[   25.125441] pcieport 0000:04:00.0: saving config space at offset 0x38 (reading 0x0)
[   25.125446] pcieport 0000:04:00.0: saving config space at offset 0x3c (reading 0x201ff)
[   25.125528] pcieport 0000:04:00.0: PME# enabled
[   25.145423] pcieport 0000:00:1c.0: saving config space at offset 0x0 (reading 0xa3388086)
[   25.145437] pcieport 0000:00:1c.0: saving config space at offset 0x4 (reading 0x100407)
[   25.145445] pcieport 0000:00:1c.0: saving config space at offset 0x8 (reading 0x60400f0)
[   25.145453] pcieport 0000:00:1c.0: saving config space at offset 0xc (reading 0x810000)
[   25.145460] pcieport 0000:00:1c.0: saving config space at offset 0x10 (reading 0x0)
[   25.145464] pcieport 0000:00:1c.0: saving config space at offset 0x14 (reading 0x0)
[   25.145469] pcieport 0000:00:1c.0: saving config space at offset 0x18 (reading 0x510400)
[   25.145476] pcieport 0000:00:1c.0: saving config space at offset 0x1c (reading 0x20006040)
[   25.145484] pcieport 0000:00:1c.0: saving config space at offset 0x20 (reading 0xcc10b400)
[   25.145488] pcieport 0000:00:1c.0: saving config space at offset 0x24 (reading 0x3ff10001)
[   25.145493] pcieport 0000:00:1c.0: saving config space at offset 0x28 (reading 0x60)
[   25.145497] pcieport 0000:00:1c.0: saving config space at offset 0x2c (reading 0x60)
[   25.145502] pcieport 0000:00:1c.0: saving config space at offset 0x30 (reading 0x0)
[   25.145506] pcieport 0000:00:1c.0: saving config space at offset 0x34 (reading 0x40)
[   25.145510] pcieport 0000:00:1c.0: saving config space at offset 0x38 (reading 0x0)
[   25.145515] pcieport 0000:00:1c.0: saving config space at offset 0x3c (reading 0x201ff)
[   25.145604] pcieport 0000:00:1c.0: PME# enabled
[   26.265697] pcieport 0000:00:1c.0: power state changed by ACPI to D3cold
[   45.468365] random: crng init done
[  105.032727] usb 1-2: USB disconnect, device number 2  <= I removed a usb key, didn't help
[  128.495144] async_tx: api initialized (async)
[  128.514820] device-mapper: uevent: version 1.0.3
[  128.518186] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel@redhat.com
[  144.869445] e1000e 0000:00:1f.6 eth0: NIC Link is Down
[  172.851384] BTRFS: device label btrfs_pool4 devid 1 transid 78270 /dev/sdb4 scanned by btrfs (1293)
[  172.851648] BTRFS: device label btrfs_pool3 devid 1 transid 27410 /dev/sda5 scanned by btrfs (1293)
[  172.852030] BTRFS: device fsid de9694f8-9c0d-4e9d-bd12-57adc4381cd7 devid 1 transid 41 /dev/sda3 scanned by btrfs (1293)
[  172.852224] BTRFS: device fsid 23e1398d-e462-41aa-b85e-f574906ddc03 devid 1 transid 585 /dev/nvme0n1p4 scanned by btrfs (1293)
[  189.124291] nouveau 0000:01:00.0: disp ctor failed, -12
[  189.124530] nouveau: probe of 0000:01:00.0 failed with error -12

The next boot looks similar:
[   25.161759] pcieport 0000:00:1c.0: PME# enabled
[   26.297810] pcieport 0000:00:1c.0: power state changed by ACPI to D3cold
[  128.427270] async_tx: api initialized (async)
[  128.446525] device-mapper: uevent: version 1.0.3
[  128.446691] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel@redhat.com
[  128.458120] random: cryptsetup: uninitialized urandom read (4 bytes read)
[  138.507373] random: cryptsetup: uninitialized urandom read (4 bytes read)
[  144.793573] e1000e 0000:00:1f.6 eth0: NIC Link is Down
[  159.627780] random: crng init done
[  171.814064] process '/usr/bin/fstype' started with executable stack
[  181.949989] BTRFS: device label btrfs_boot devid 1 transid 394687 /dev/mapper/cryptroot scanned by btrfs (1063)
[  181.953437] BTRFS: device label btrfs_pool4 devid 1 transid 78270 /dev/sdb4 scanned by btrfs (1063)
[  181.956989] BTRFS: device label btrfs_pool3 devid 1 transid 27410 /dev/sda5 scanned by btrfs (1063)
[  181.960473] BTRFS: device fsid de9694f8-9c0d-4e9d-bd12-57adc4381cd7 devid 1 transid 41 /dev/sda3 scanned by btrfs (1063)
[  181.964097] BTRFS: device fsid 23e1398d-e462-41aa-b85e-f574906ddc03 devid 1 transid 585 /dev/nvme0n1p4 scanned by btrfs (1063)
[  188.733645] nouveau 0000:01:00.0: disp ctor failed, -12
[  188.740653] nouveau: probe of 0000:01:00.0 failed with error -12
[  188.901070] PM: Image not found (code -22)

Does that help?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
@ 2020-09-07 20:58           ` Marc MERLIN
  0 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-09-07 20:58 UTC (permalink / raw)
  To: Karol Herbst
  Cc: Nicholas Johnson, nouveau, LKML, Kai-Heng Feng, Linux PCI,
	Bjorn Helgaas, Mika Westerberg, Len Brown

On Mon, Sep 07, 2020 at 09:14:03PM +0200, Karol Herbst wrote:
> > - changes in the nouveau driver. Mika told me the PCIe regression
> >   "pcieport 0000:00:01.0: PME: Spurious native interrupt!" is supposed
> >   to be fixed in 5.8, but I still get a 4mn hang or so during boot and
> >   with 5.8, removing the USB key, didn't help make the boot faster
> 
> that's the root port the GPU is attached to, no? I saw that message on
> the Thinkpad P1G2 when runtime resuming the Nvidia GPU, but it does
> seem to come from the root port.

Hi Karol, thanks for your answer.
 
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 0d)
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)

> Well, you'd also need it when attaching external displays.
 
Indeed. I just don't need that on this laptop, but familiar with the not
so seemless procedure to turn on both GPUs, and mirror the intel one into
the nvidia one for external output. 

> > [   11.262985] nvidia-gpu 0000:01:00.3: PME# enabled
> > [   11.303060] nvidia-gpu 0000:01:00.3: PME# disabled
> 
> mhh, interesting. I heard some random comments that the Nvidia
> USB-C/UCSI driver is a bit broken and can cause various issues. Mind
> blacklisting i2c-nvidia-gpu and typec_nvidia (and verify they don't
> get loaded) and see if that helps?

Right, this one:
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
Sure, I'll blacklist it. Ok, just did that, removed from initrd,
rebooted, and it was no better.

From initrd (before root gets mounted), I have this:
nouveau              1961984  0
mxm_wmi                16384  1 nouveau
hwmon                  32768  1 nouveau
ttm                   102400  1 nouveau
wmi                    32768  2 nouveau,mxm_wmi

I still got a 2mn hang. and a nouveau probe error
[  189.124530] nouveau: probe of 0000:01:00.0 failed with error -12


Here's what it looks like:
[    9.693230] hid: raw HID events driver (C) Jiri Kosina
[    9.694988] usbcore: registered new interface driver usbhid
[    9.694989] usbhid: USB HID core driver
[    9.696700] hid-generic 0003:1050:0200.0001: hiddev0,hidraw0: USB HID v1.00 Device [Yubico Yubico Gnubby (gnubby1)] on usb-0000:00:14.0-2/input0
[    9.784456] Console: switching to colour frame buffer device 240x67
[    9.816297] i915 0000:00:02.0: fb0: i915drmfb frame buffer device
[   25.087400] thunderbolt 0000:06:00.0: saving config space at offset 0x0 (reading 0x15eb8086)
[   25.087414] thunderbolt 0000:06:00.0: saving config space at offset 0x4 (reading 0x100406)
[   25.087419] thunderbolt 0000:06:00.0: saving config space at offset 0x8 (reading 0x8800006)
[   25.087424] thunderbolt 0000:06:00.0: saving config space at offset 0xc (reading 0x20)
[   25.087430] thunderbolt 0000:06:00.0: saving config space at offset 0x10 (reading 0xcc100000)
[   25.087435] thunderbolt 0000:06:00.0: saving config space at offset 0x14 (reading 0xcc140000)
[   25.087440] thunderbolt 0000:06:00.0: saving config space at offset 0x18 (reading 0x0)
[   25.087445] thunderbolt 0000:06:00.0: saving config space at offset 0x1c (reading 0x0)
[   25.087450] thunderbolt 0000:06:00.0: saving config space at offset 0x20 (reading 0x0)
[   25.087455] thunderbolt 0000:06:00.0: saving config space at offset 0x24 (reading 0x0)
[   25.087460] thunderbolt 0000:06:00.0: saving config space at offset 0x28 (reading 0x0)
[   25.087466] thunderbolt 0000:06:00.0: saving config space at offset 0x2c (reading 0x229b17aa)
[   25.087471] thunderbolt 0000:06:00.0: saving config space at offset 0x30 (reading 0x0)
[   25.087476] thunderbolt 0000:06:00.0: saving config space at offset 0x34 (reading 0x80)
[   25.087481] thunderbolt 0000:06:00.0: saving config space at offset 0x38 (reading 0x0)
[   25.087486] thunderbolt 0000:06:00.0: saving config space at offset 0x3c (reading 0x1ff)
[   25.087571] thunderbolt 0000:06:00.0: PME# enabled
[   25.105353] pcieport 0000:05:00.0: saving config space at offset 0x0 (reading 0x15ea8086)
[   25.105364] pcieport 0000:05:00.0: saving config space at offset 0x4 (reading 0x100407)
[   25.105370] pcieport 0000:05:00.0: saving config space at offset 0x8 (reading 0x6040006)
[   25.105375] pcieport 0000:05:00.0: saving config space at offset 0xc (reading 0x10020)
[   25.105380] pcieport 0000:05:00.0: saving config space at offset 0x10 (reading 0x0)
[   25.105384] pcieport 0000:05:00.0: saving config space at offset 0x14 (reading 0x0)
[   25.105389] pcieport 0000:05:00.0: saving config space at offset 0x18 (reading 0x60605)
[   25.105394] pcieport 0000:05:00.0: saving config space at offset 0x1c (reading 0x1f1)
[   25.105399] pcieport 0000:05:00.0: saving config space at offset 0x20 (reading 0xcc10cc10)
[   25.105404] pcieport 0000:05:00.0: saving config space at offset 0x24 (reading 0x1fff1)
[   25.105409] pcieport 0000:05:00.0: saving config space at offset 0x28 (reading 0x0)
[   25.105413] pcieport 0000:05:00.0: saving config space at offset 0x2c (reading 0x0)
[   25.105418] pcieport 0000:05:00.0: saving config space at offset 0x30 (reading 0x0)
[   25.105423] pcieport 0000:05:00.0: saving config space at offset 0x34 (reading 0x80)
[   25.105428] pcieport 0000:05:00.0: saving config space at offset 0x38 (reading 0x0)
[   25.105432] pcieport 0000:05:00.0: saving config space at offset 0x3c (reading 0x201ff)
[   25.105517] pcieport 0000:05:00.0: PME# enabled
[   25.125367] pcieport 0000:04:00.0: saving config space at offset 0x0 (reading 0x15ea8086)
[   25.125378] pcieport 0000:04:00.0: saving config space at offset 0x4 (reading 0x100007)
[   25.125383] pcieport 0000:04:00.0: saving config space at offset 0x8 (reading 0x6040006)
[   25.125388] pcieport 0000:04:00.0: saving config space at offset 0xc (reading 0x10020)
[   25.125393] pcieport 0000:04:00.0: saving config space at offset 0x10 (reading 0x0)
[   25.125398] pcieport 0000:04:00.0: saving config space at offset 0x14 (reading 0x0)
[   25.125403] pcieport 0000:04:00.0: saving config space at offset 0x18 (reading 0x510504)
[   25.125407] pcieport 0000:04:00.0: saving config space at offset 0x1c (reading 0x5141)
[   25.125412] pcieport 0000:04:00.0: saving config space at offset 0x20 (reading 0xcc10b400)
[   25.125417] pcieport 0000:04:00.0: saving config space at offset 0x24 (reading 0x3ff10001)
[   25.125422] pcieport 0000:04:00.0: saving config space at offset 0x28 (reading 0x60)
[   25.125427] pcieport 0000:04:00.0: saving config space at offset 0x2c (reading 0x60)
[   25.125431] pcieport 0000:04:00.0: saving config space at offset 0x30 (reading 0x0)
[   25.125436] pcieport 0000:04:00.0: saving config space at offset 0x34 (reading 0x80)
[   25.125441] pcieport 0000:04:00.0: saving config space at offset 0x38 (reading 0x0)
[   25.125446] pcieport 0000:04:00.0: saving config space at offset 0x3c (reading 0x201ff)
[   25.125528] pcieport 0000:04:00.0: PME# enabled
[   25.145423] pcieport 0000:00:1c.0: saving config space at offset 0x0 (reading 0xa3388086)
[   25.145437] pcieport 0000:00:1c.0: saving config space at offset 0x4 (reading 0x100407)
[   25.145445] pcieport 0000:00:1c.0: saving config space at offset 0x8 (reading 0x60400f0)
[   25.145453] pcieport 0000:00:1c.0: saving config space at offset 0xc (reading 0x810000)
[   25.145460] pcieport 0000:00:1c.0: saving config space at offset 0x10 (reading 0x0)
[   25.145464] pcieport 0000:00:1c.0: saving config space at offset 0x14 (reading 0x0)
[   25.145469] pcieport 0000:00:1c.0: saving config space at offset 0x18 (reading 0x510400)
[   25.145476] pcieport 0000:00:1c.0: saving config space at offset 0x1c (reading 0x20006040)
[   25.145484] pcieport 0000:00:1c.0: saving config space at offset 0x20 (reading 0xcc10b400)
[   25.145488] pcieport 0000:00:1c.0: saving config space at offset 0x24 (reading 0x3ff10001)
[   25.145493] pcieport 0000:00:1c.0: saving config space at offset 0x28 (reading 0x60)
[   25.145497] pcieport 0000:00:1c.0: saving config space at offset 0x2c (reading 0x60)
[   25.145502] pcieport 0000:00:1c.0: saving config space at offset 0x30 (reading 0x0)
[   25.145506] pcieport 0000:00:1c.0: saving config space at offset 0x34 (reading 0x40)
[   25.145510] pcieport 0000:00:1c.0: saving config space at offset 0x38 (reading 0x0)
[   25.145515] pcieport 0000:00:1c.0: saving config space at offset 0x3c (reading 0x201ff)
[   25.145604] pcieport 0000:00:1c.0: PME# enabled
[   26.265697] pcieport 0000:00:1c.0: power state changed by ACPI to D3cold
[   45.468365] random: crng init done
[  105.032727] usb 1-2: USB disconnect, device number 2  <= I removed a usb key, didn't help
[  128.495144] async_tx: api initialized (async)
[  128.514820] device-mapper: uevent: version 1.0.3
[  128.518186] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
[  144.869445] e1000e 0000:00:1f.6 eth0: NIC Link is Down
[  172.851384] BTRFS: device label btrfs_pool4 devid 1 transid 78270 /dev/sdb4 scanned by btrfs (1293)
[  172.851648] BTRFS: device label btrfs_pool3 devid 1 transid 27410 /dev/sda5 scanned by btrfs (1293)
[  172.852030] BTRFS: device fsid de9694f8-9c0d-4e9d-bd12-57adc4381cd7 devid 1 transid 41 /dev/sda3 scanned by btrfs (1293)
[  172.852224] BTRFS: device fsid 23e1398d-e462-41aa-b85e-f574906ddc03 devid 1 transid 585 /dev/nvme0n1p4 scanned by btrfs (1293)
[  189.124291] nouveau 0000:01:00.0: disp ctor failed, -12
[  189.124530] nouveau: probe of 0000:01:00.0 failed with error -12

The next boot looks similar:
[   25.161759] pcieport 0000:00:1c.0: PME# enabled
[   26.297810] pcieport 0000:00:1c.0: power state changed by ACPI to D3cold
[  128.427270] async_tx: api initialized (async)
[  128.446525] device-mapper: uevent: version 1.0.3
[  128.446691] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
[  128.458120] random: cryptsetup: uninitialized urandom read (4 bytes read)
[  138.507373] random: cryptsetup: uninitialized urandom read (4 bytes read)
[  144.793573] e1000e 0000:00:1f.6 eth0: NIC Link is Down
[  159.627780] random: crng init done
[  171.814064] process '/usr/bin/fstype' started with executable stack
[  181.949989] BTRFS: device label btrfs_boot devid 1 transid 394687 /dev/mapper/cryptroot scanned by btrfs (1063)
[  181.953437] BTRFS: device label btrfs_pool4 devid 1 transid 78270 /dev/sdb4 scanned by btrfs (1063)
[  181.956989] BTRFS: device label btrfs_pool3 devid 1 transid 27410 /dev/sda5 scanned by btrfs (1063)
[  181.960473] BTRFS: device fsid de9694f8-9c0d-4e9d-bd12-57adc4381cd7 devid 1 transid 41 /dev/sda3 scanned by btrfs (1063)
[  181.964097] BTRFS: device fsid 23e1398d-e462-41aa-b85e-f574906ddc03 devid 1 transid 585 /dev/nvme0n1p4 scanned by btrfs (1063)
[  188.733645] nouveau 0000:01:00.0: disp ctor failed, -12
[  188.740653] nouveau: probe of 0000:01:00.0 failed with error -12
[  188.901070] PM: Image not found (code -22)

Does that help?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
@ 2020-09-07 23:51             ` Karol Herbst
  0 siblings, 0 replies; 77+ messages in thread
From: Karol Herbst @ 2020-09-07 23:51 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: Mika Westerberg, Kai-Heng Feng, Nicholas Johnson, nouveau, LKML,
	Linux PCI, Bjorn Helgaas, Len Brown, Lyude Paul, Ben Skeggs

On Mon, Sep 7, 2020 at 10:58 PM Marc MERLIN <marc_nouveau@merlins.org> wrote:
>
> On Mon, Sep 07, 2020 at 09:14:03PM +0200, Karol Herbst wrote:
> > > - changes in the nouveau driver. Mika told me the PCIe regression
> > >   "pcieport 0000:00:01.0: PME: Spurious native interrupt!" is supposed
> > >   to be fixed in 5.8, but I still get a 4mn hang or so during boot and
> > >   with 5.8, removing the USB key, didn't help make the boot faster
> >
> > that's the root port the GPU is attached to, no? I saw that message on
> > the Thinkpad P1G2 when runtime resuming the Nvidia GPU, but it does
> > seem to come from the root port.
>
> Hi Karol, thanks for your answer.
>
> 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 0d)
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
>
> > Well, you'd also need it when attaching external displays.
>
> Indeed. I just don't need that on this laptop, but familiar with the not
> so seemless procedure to turn on both GPUs, and mirror the intel one into
> the nvidia one for external output.
>
> > > [   11.262985] nvidia-gpu 0000:01:00.3: PME# enabled
> > > [   11.303060] nvidia-gpu 0000:01:00.3: PME# disabled
> >
> > mhh, interesting. I heard some random comments that the Nvidia
> > USB-C/UCSI driver is a bit broken and can cause various issues. Mind
> > blacklisting i2c-nvidia-gpu and typec_nvidia (and verify they don't
> > get loaded) and see if that helps?
>
> Right, this one:
> 01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
> Sure, I'll blacklist it. Ok, just did that, removed from initrd,
> rebooted, and it was no better.
>
> From initrd (before root gets mounted), I have this:
> nouveau              1961984  0
> mxm_wmi                16384  1 nouveau
> hwmon                  32768  1 nouveau
> ttm                   102400  1 nouveau
> wmi                    32768  2 nouveau,mxm_wmi
>
> I still got a 2mn hang. and a nouveau probe error
> [  189.124530] nouveau: probe of 0000:01:00.0 failed with error -12
>
>
> Here's what it looks like:
> [    9.693230] hid: raw HID events driver (C) Jiri Kosina
> [    9.694988] usbcore: registered new interface driver usbhid
> [    9.694989] usbhid: USB HID core driver
> [    9.696700] hid-generic 0003:1050:0200.0001: hiddev0,hidraw0: USB HID v1.00 Device [Yubico Yubico Gnubby (gnubby1)] on usb-0000:00:14.0-2/input0
> [    9.784456] Console: switching to colour frame buffer device 240x67
> [    9.816297] i915 0000:00:02.0: fb0: i915drmfb frame buffer device
> [   25.087400] thunderbolt 0000:06:00.0: saving config space at offset 0x0 (reading 0x15eb8086)
> [   25.087414] thunderbolt 0000:06:00.0: saving config space at offset 0x4 (reading 0x100406)
> [   25.087419] thunderbolt 0000:06:00.0: saving config space at offset 0x8 (reading 0x8800006)
> [   25.087424] thunderbolt 0000:06:00.0: saving config space at offset 0xc (reading 0x20)
> [   25.087430] thunderbolt 0000:06:00.0: saving config space at offset 0x10 (reading 0xcc100000)
> [   25.087435] thunderbolt 0000:06:00.0: saving config space at offset 0x14 (reading 0xcc140000)
> [   25.087440] thunderbolt 0000:06:00.0: saving config space at offset 0x18 (reading 0x0)
> [   25.087445] thunderbolt 0000:06:00.0: saving config space at offset 0x1c (reading 0x0)
> [   25.087450] thunderbolt 0000:06:00.0: saving config space at offset 0x20 (reading 0x0)
> [   25.087455] thunderbolt 0000:06:00.0: saving config space at offset 0x24 (reading 0x0)
> [   25.087460] thunderbolt 0000:06:00.0: saving config space at offset 0x28 (reading 0x0)
> [   25.087466] thunderbolt 0000:06:00.0: saving config space at offset 0x2c (reading 0x229b17aa)
> [   25.087471] thunderbolt 0000:06:00.0: saving config space at offset 0x30 (reading 0x0)
> [   25.087476] thunderbolt 0000:06:00.0: saving config space at offset 0x34 (reading 0x80)
> [   25.087481] thunderbolt 0000:06:00.0: saving config space at offset 0x38 (reading 0x0)
> [   25.087486] thunderbolt 0000:06:00.0: saving config space at offset 0x3c (reading 0x1ff)
> [   25.087571] thunderbolt 0000:06:00.0: PME# enabled
> [   25.105353] pcieport 0000:05:00.0: saving config space at offset 0x0 (reading 0x15ea8086)
> [   25.105364] pcieport 0000:05:00.0: saving config space at offset 0x4 (reading 0x100407)
> [   25.105370] pcieport 0000:05:00.0: saving config space at offset 0x8 (reading 0x6040006)
> [   25.105375] pcieport 0000:05:00.0: saving config space at offset 0xc (reading 0x10020)
> [   25.105380] pcieport 0000:05:00.0: saving config space at offset 0x10 (reading 0x0)
> [   25.105384] pcieport 0000:05:00.0: saving config space at offset 0x14 (reading 0x0)
> [   25.105389] pcieport 0000:05:00.0: saving config space at offset 0x18 (reading 0x60605)
> [   25.105394] pcieport 0000:05:00.0: saving config space at offset 0x1c (reading 0x1f1)
> [   25.105399] pcieport 0000:05:00.0: saving config space at offset 0x20 (reading 0xcc10cc10)
> [   25.105404] pcieport 0000:05:00.0: saving config space at offset 0x24 (reading 0x1fff1)
> [   25.105409] pcieport 0000:05:00.0: saving config space at offset 0x28 (reading 0x0)
> [   25.105413] pcieport 0000:05:00.0: saving config space at offset 0x2c (reading 0x0)
> [   25.105418] pcieport 0000:05:00.0: saving config space at offset 0x30 (reading 0x0)
> [   25.105423] pcieport 0000:05:00.0: saving config space at offset 0x34 (reading 0x80)
> [   25.105428] pcieport 0000:05:00.0: saving config space at offset 0x38 (reading 0x0)
> [   25.105432] pcieport 0000:05:00.0: saving config space at offset 0x3c (reading 0x201ff)
> [   25.105517] pcieport 0000:05:00.0: PME# enabled
> [   25.125367] pcieport 0000:04:00.0: saving config space at offset 0x0 (reading 0x15ea8086)
> [   25.125378] pcieport 0000:04:00.0: saving config space at offset 0x4 (reading 0x100007)
> [   25.125383] pcieport 0000:04:00.0: saving config space at offset 0x8 (reading 0x6040006)
> [   25.125388] pcieport 0000:04:00.0: saving config space at offset 0xc (reading 0x10020)
> [   25.125393] pcieport 0000:04:00.0: saving config space at offset 0x10 (reading 0x0)
> [   25.125398] pcieport 0000:04:00.0: saving config space at offset 0x14 (reading 0x0)
> [   25.125403] pcieport 0000:04:00.0: saving config space at offset 0x18 (reading 0x510504)
> [   25.125407] pcieport 0000:04:00.0: saving config space at offset 0x1c (reading 0x5141)
> [   25.125412] pcieport 0000:04:00.0: saving config space at offset 0x20 (reading 0xcc10b400)
> [   25.125417] pcieport 0000:04:00.0: saving config space at offset 0x24 (reading 0x3ff10001)
> [   25.125422] pcieport 0000:04:00.0: saving config space at offset 0x28 (reading 0x60)
> [   25.125427] pcieport 0000:04:00.0: saving config space at offset 0x2c (reading 0x60)
> [   25.125431] pcieport 0000:04:00.0: saving config space at offset 0x30 (reading 0x0)
> [   25.125436] pcieport 0000:04:00.0: saving config space at offset 0x34 (reading 0x80)
> [   25.125441] pcieport 0000:04:00.0: saving config space at offset 0x38 (reading 0x0)
> [   25.125446] pcieport 0000:04:00.0: saving config space at offset 0x3c (reading 0x201ff)
> [   25.125528] pcieport 0000:04:00.0: PME# enabled
> [   25.145423] pcieport 0000:00:1c.0: saving config space at offset 0x0 (reading 0xa3388086)
> [   25.145437] pcieport 0000:00:1c.0: saving config space at offset 0x4 (reading 0x100407)
> [   25.145445] pcieport 0000:00:1c.0: saving config space at offset 0x8 (reading 0x60400f0)
> [   25.145453] pcieport 0000:00:1c.0: saving config space at offset 0xc (reading 0x810000)
> [   25.145460] pcieport 0000:00:1c.0: saving config space at offset 0x10 (reading 0x0)
> [   25.145464] pcieport 0000:00:1c.0: saving config space at offset 0x14 (reading 0x0)
> [   25.145469] pcieport 0000:00:1c.0: saving config space at offset 0x18 (reading 0x510400)
> [   25.145476] pcieport 0000:00:1c.0: saving config space at offset 0x1c (reading 0x20006040)
> [   25.145484] pcieport 0000:00:1c.0: saving config space at offset 0x20 (reading 0xcc10b400)
> [   25.145488] pcieport 0000:00:1c.0: saving config space at offset 0x24 (reading 0x3ff10001)
> [   25.145493] pcieport 0000:00:1c.0: saving config space at offset 0x28 (reading 0x60)
> [   25.145497] pcieport 0000:00:1c.0: saving config space at offset 0x2c (reading 0x60)
> [   25.145502] pcieport 0000:00:1c.0: saving config space at offset 0x30 (reading 0x0)
> [   25.145506] pcieport 0000:00:1c.0: saving config space at offset 0x34 (reading 0x40)
> [   25.145510] pcieport 0000:00:1c.0: saving config space at offset 0x38 (reading 0x0)
> [   25.145515] pcieport 0000:00:1c.0: saving config space at offset 0x3c (reading 0x201ff)
> [   25.145604] pcieport 0000:00:1c.0: PME# enabled
> [   26.265697] pcieport 0000:00:1c.0: power state changed by ACPI to D3cold
> [   45.468365] random: crng init done
> [  105.032727] usb 1-2: USB disconnect, device number 2  <= I removed a usb key, didn't help
> [  128.495144] async_tx: api initialized (async)
> [  128.514820] device-mapper: uevent: version 1.0.3
> [  128.518186] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel@redhat.com
> [  144.869445] e1000e 0000:00:1f.6 eth0: NIC Link is Down
> [  172.851384] BTRFS: device label btrfs_pool4 devid 1 transid 78270 /dev/sdb4 scanned by btrfs (1293)
> [  172.851648] BTRFS: device label btrfs_pool3 devid 1 transid 27410 /dev/sda5 scanned by btrfs (1293)
> [  172.852030] BTRFS: device fsid de9694f8-9c0d-4e9d-bd12-57adc4381cd7 devid 1 transid 41 /dev/sda3 scanned by btrfs (1293)
> [  172.852224] BTRFS: device fsid 23e1398d-e462-41aa-b85e-f574906ddc03 devid 1 transid 585 /dev/nvme0n1p4 scanned by btrfs (1293)
> [  189.124291] nouveau 0000:01:00.0: disp ctor failed, -12
> [  189.124530] nouveau: probe of 0000:01:00.0 failed with error -12
>
> The next boot looks similar:
> [   25.161759] pcieport 0000:00:1c.0: PME# enabled
> [   26.297810] pcieport 0000:00:1c.0: power state changed by ACPI to D3cold
> [  128.427270] async_tx: api initialized (async)
> [  128.446525] device-mapper: uevent: version 1.0.3
> [  128.446691] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel@redhat.com
> [  128.458120] random: cryptsetup: uninitialized urandom read (4 bytes read)
> [  138.507373] random: cryptsetup: uninitialized urandom read (4 bytes read)
> [  144.793573] e1000e 0000:00:1f.6 eth0: NIC Link is Down
> [  159.627780] random: crng init done
> [  171.814064] process '/usr/bin/fstype' started with executable stack
> [  181.949989] BTRFS: device label btrfs_boot devid 1 transid 394687 /dev/mapper/cryptroot scanned by btrfs (1063)
> [  181.953437] BTRFS: device label btrfs_pool4 devid 1 transid 78270 /dev/sdb4 scanned by btrfs (1063)
> [  181.956989] BTRFS: device label btrfs_pool3 devid 1 transid 27410 /dev/sda5 scanned by btrfs (1063)
> [  181.960473] BTRFS: device fsid de9694f8-9c0d-4e9d-bd12-57adc4381cd7 devid 1 transid 41 /dev/sda3 scanned by btrfs (1063)
> [  181.964097] BTRFS: device fsid 23e1398d-e462-41aa-b85e-f574906ddc03 devid 1 transid 585 /dev/nvme0n1p4 scanned by btrfs (1063)
> [  188.733645] nouveau 0000:01:00.0: disp ctor failed, -12
> [  188.740653] nouveau: probe of 0000:01:00.0 failed with error -12

oh, I somehow missed that "disp ctor failed" message. I think that
might explain why things are a bit hanging. From the top of my head I
am not sure if that's something known or something new. But just in
case I CCed Lyude and Ben. And I think booting with
nouveau.debug=disp=trace could already show something relevant.

> [  188.901070] PM: Image not found (code -22)
>
> Does that help?
>
> Thanks,
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>
> Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
>


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
@ 2020-09-07 23:51             ` Karol Herbst
  0 siblings, 0 replies; 77+ messages in thread
From: Karol Herbst @ 2020-09-07 23:51 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: Nicholas Johnson, nouveau, LKML, Kai-Heng Feng, Ben Skeggs,
	Linux PCI, Bjorn Helgaas, Mika Westerberg, Len Brown

On Mon, Sep 7, 2020 at 10:58 PM Marc MERLIN <marc_nouveau-xnduUnryOU1AfugRpC6u6w@public.gmane.org> wrote:
>
> On Mon, Sep 07, 2020 at 09:14:03PM +0200, Karol Herbst wrote:
> > > - changes in the nouveau driver. Mika told me the PCIe regression
> > >   "pcieport 0000:00:01.0: PME: Spurious native interrupt!" is supposed
> > >   to be fixed in 5.8, but I still get a 4mn hang or so during boot and
> > >   with 5.8, removing the USB key, didn't help make the boot faster
> >
> > that's the root port the GPU is attached to, no? I saw that message on
> > the Thinkpad P1G2 when runtime resuming the Nvidia GPU, but it does
> > seem to come from the root port.
>
> Hi Karol, thanks for your answer.
>
> 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 0d)
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
>
> > Well, you'd also need it when attaching external displays.
>
> Indeed. I just don't need that on this laptop, but familiar with the not
> so seemless procedure to turn on both GPUs, and mirror the intel one into
> the nvidia one for external output.
>
> > > [   11.262985] nvidia-gpu 0000:01:00.3: PME# enabled
> > > [   11.303060] nvidia-gpu 0000:01:00.3: PME# disabled
> >
> > mhh, interesting. I heard some random comments that the Nvidia
> > USB-C/UCSI driver is a bit broken and can cause various issues. Mind
> > blacklisting i2c-nvidia-gpu and typec_nvidia (and verify they don't
> > get loaded) and see if that helps?
>
> Right, this one:
> 01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
> Sure, I'll blacklist it. Ok, just did that, removed from initrd,
> rebooted, and it was no better.
>
> From initrd (before root gets mounted), I have this:
> nouveau              1961984  0
> mxm_wmi                16384  1 nouveau
> hwmon                  32768  1 nouveau
> ttm                   102400  1 nouveau
> wmi                    32768  2 nouveau,mxm_wmi
>
> I still got a 2mn hang. and a nouveau probe error
> [  189.124530] nouveau: probe of 0000:01:00.0 failed with error -12
>
>
> Here's what it looks like:
> [    9.693230] hid: raw HID events driver (C) Jiri Kosina
> [    9.694988] usbcore: registered new interface driver usbhid
> [    9.694989] usbhid: USB HID core driver
> [    9.696700] hid-generic 0003:1050:0200.0001: hiddev0,hidraw0: USB HID v1.00 Device [Yubico Yubico Gnubby (gnubby1)] on usb-0000:00:14.0-2/input0
> [    9.784456] Console: switching to colour frame buffer device 240x67
> [    9.816297] i915 0000:00:02.0: fb0: i915drmfb frame buffer device
> [   25.087400] thunderbolt 0000:06:00.0: saving config space at offset 0x0 (reading 0x15eb8086)
> [   25.087414] thunderbolt 0000:06:00.0: saving config space at offset 0x4 (reading 0x100406)
> [   25.087419] thunderbolt 0000:06:00.0: saving config space at offset 0x8 (reading 0x8800006)
> [   25.087424] thunderbolt 0000:06:00.0: saving config space at offset 0xc (reading 0x20)
> [   25.087430] thunderbolt 0000:06:00.0: saving config space at offset 0x10 (reading 0xcc100000)
> [   25.087435] thunderbolt 0000:06:00.0: saving config space at offset 0x14 (reading 0xcc140000)
> [   25.087440] thunderbolt 0000:06:00.0: saving config space at offset 0x18 (reading 0x0)
> [   25.087445] thunderbolt 0000:06:00.0: saving config space at offset 0x1c (reading 0x0)
> [   25.087450] thunderbolt 0000:06:00.0: saving config space at offset 0x20 (reading 0x0)
> [   25.087455] thunderbolt 0000:06:00.0: saving config space at offset 0x24 (reading 0x0)
> [   25.087460] thunderbolt 0000:06:00.0: saving config space at offset 0x28 (reading 0x0)
> [   25.087466] thunderbolt 0000:06:00.0: saving config space at offset 0x2c (reading 0x229b17aa)
> [   25.087471] thunderbolt 0000:06:00.0: saving config space at offset 0x30 (reading 0x0)
> [   25.087476] thunderbolt 0000:06:00.0: saving config space at offset 0x34 (reading 0x80)
> [   25.087481] thunderbolt 0000:06:00.0: saving config space at offset 0x38 (reading 0x0)
> [   25.087486] thunderbolt 0000:06:00.0: saving config space at offset 0x3c (reading 0x1ff)
> [   25.087571] thunderbolt 0000:06:00.0: PME# enabled
> [   25.105353] pcieport 0000:05:00.0: saving config space at offset 0x0 (reading 0x15ea8086)
> [   25.105364] pcieport 0000:05:00.0: saving config space at offset 0x4 (reading 0x100407)
> [   25.105370] pcieport 0000:05:00.0: saving config space at offset 0x8 (reading 0x6040006)
> [   25.105375] pcieport 0000:05:00.0: saving config space at offset 0xc (reading 0x10020)
> [   25.105380] pcieport 0000:05:00.0: saving config space at offset 0x10 (reading 0x0)
> [   25.105384] pcieport 0000:05:00.0: saving config space at offset 0x14 (reading 0x0)
> [   25.105389] pcieport 0000:05:00.0: saving config space at offset 0x18 (reading 0x60605)
> [   25.105394] pcieport 0000:05:00.0: saving config space at offset 0x1c (reading 0x1f1)
> [   25.105399] pcieport 0000:05:00.0: saving config space at offset 0x20 (reading 0xcc10cc10)
> [   25.105404] pcieport 0000:05:00.0: saving config space at offset 0x24 (reading 0x1fff1)
> [   25.105409] pcieport 0000:05:00.0: saving config space at offset 0x28 (reading 0x0)
> [   25.105413] pcieport 0000:05:00.0: saving config space at offset 0x2c (reading 0x0)
> [   25.105418] pcieport 0000:05:00.0: saving config space at offset 0x30 (reading 0x0)
> [   25.105423] pcieport 0000:05:00.0: saving config space at offset 0x34 (reading 0x80)
> [   25.105428] pcieport 0000:05:00.0: saving config space at offset 0x38 (reading 0x0)
> [   25.105432] pcieport 0000:05:00.0: saving config space at offset 0x3c (reading 0x201ff)
> [   25.105517] pcieport 0000:05:00.0: PME# enabled
> [   25.125367] pcieport 0000:04:00.0: saving config space at offset 0x0 (reading 0x15ea8086)
> [   25.125378] pcieport 0000:04:00.0: saving config space at offset 0x4 (reading 0x100007)
> [   25.125383] pcieport 0000:04:00.0: saving config space at offset 0x8 (reading 0x6040006)
> [   25.125388] pcieport 0000:04:00.0: saving config space at offset 0xc (reading 0x10020)
> [   25.125393] pcieport 0000:04:00.0: saving config space at offset 0x10 (reading 0x0)
> [   25.125398] pcieport 0000:04:00.0: saving config space at offset 0x14 (reading 0x0)
> [   25.125403] pcieport 0000:04:00.0: saving config space at offset 0x18 (reading 0x510504)
> [   25.125407] pcieport 0000:04:00.0: saving config space at offset 0x1c (reading 0x5141)
> [   25.125412] pcieport 0000:04:00.0: saving config space at offset 0x20 (reading 0xcc10b400)
> [   25.125417] pcieport 0000:04:00.0: saving config space at offset 0x24 (reading 0x3ff10001)
> [   25.125422] pcieport 0000:04:00.0: saving config space at offset 0x28 (reading 0x60)
> [   25.125427] pcieport 0000:04:00.0: saving config space at offset 0x2c (reading 0x60)
> [   25.125431] pcieport 0000:04:00.0: saving config space at offset 0x30 (reading 0x0)
> [   25.125436] pcieport 0000:04:00.0: saving config space at offset 0x34 (reading 0x80)
> [   25.125441] pcieport 0000:04:00.0: saving config space at offset 0x38 (reading 0x0)
> [   25.125446] pcieport 0000:04:00.0: saving config space at offset 0x3c (reading 0x201ff)
> [   25.125528] pcieport 0000:04:00.0: PME# enabled
> [   25.145423] pcieport 0000:00:1c.0: saving config space at offset 0x0 (reading 0xa3388086)
> [   25.145437] pcieport 0000:00:1c.0: saving config space at offset 0x4 (reading 0x100407)
> [   25.145445] pcieport 0000:00:1c.0: saving config space at offset 0x8 (reading 0x60400f0)
> [   25.145453] pcieport 0000:00:1c.0: saving config space at offset 0xc (reading 0x810000)
> [   25.145460] pcieport 0000:00:1c.0: saving config space at offset 0x10 (reading 0x0)
> [   25.145464] pcieport 0000:00:1c.0: saving config space at offset 0x14 (reading 0x0)
> [   25.145469] pcieport 0000:00:1c.0: saving config space at offset 0x18 (reading 0x510400)
> [   25.145476] pcieport 0000:00:1c.0: saving config space at offset 0x1c (reading 0x20006040)
> [   25.145484] pcieport 0000:00:1c.0: saving config space at offset 0x20 (reading 0xcc10b400)
> [   25.145488] pcieport 0000:00:1c.0: saving config space at offset 0x24 (reading 0x3ff10001)
> [   25.145493] pcieport 0000:00:1c.0: saving config space at offset 0x28 (reading 0x60)
> [   25.145497] pcieport 0000:00:1c.0: saving config space at offset 0x2c (reading 0x60)
> [   25.145502] pcieport 0000:00:1c.0: saving config space at offset 0x30 (reading 0x0)
> [   25.145506] pcieport 0000:00:1c.0: saving config space at offset 0x34 (reading 0x40)
> [   25.145510] pcieport 0000:00:1c.0: saving config space at offset 0x38 (reading 0x0)
> [   25.145515] pcieport 0000:00:1c.0: saving config space at offset 0x3c (reading 0x201ff)
> [   25.145604] pcieport 0000:00:1c.0: PME# enabled
> [   26.265697] pcieport 0000:00:1c.0: power state changed by ACPI to D3cold
> [   45.468365] random: crng init done
> [  105.032727] usb 1-2: USB disconnect, device number 2  <= I removed a usb key, didn't help
> [  128.495144] async_tx: api initialized (async)
> [  128.514820] device-mapper: uevent: version 1.0.3
> [  128.518186] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> [  144.869445] e1000e 0000:00:1f.6 eth0: NIC Link is Down
> [  172.851384] BTRFS: device label btrfs_pool4 devid 1 transid 78270 /dev/sdb4 scanned by btrfs (1293)
> [  172.851648] BTRFS: device label btrfs_pool3 devid 1 transid 27410 /dev/sda5 scanned by btrfs (1293)
> [  172.852030] BTRFS: device fsid de9694f8-9c0d-4e9d-bd12-57adc4381cd7 devid 1 transid 41 /dev/sda3 scanned by btrfs (1293)
> [  172.852224] BTRFS: device fsid 23e1398d-e462-41aa-b85e-f574906ddc03 devid 1 transid 585 /dev/nvme0n1p4 scanned by btrfs (1293)
> [  189.124291] nouveau 0000:01:00.0: disp ctor failed, -12
> [  189.124530] nouveau: probe of 0000:01:00.0 failed with error -12
>
> The next boot looks similar:
> [   25.161759] pcieport 0000:00:1c.0: PME# enabled
> [   26.297810] pcieport 0000:00:1c.0: power state changed by ACPI to D3cold
> [  128.427270] async_tx: api initialized (async)
> [  128.446525] device-mapper: uevent: version 1.0.3
> [  128.446691] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
> [  128.458120] random: cryptsetup: uninitialized urandom read (4 bytes read)
> [  138.507373] random: cryptsetup: uninitialized urandom read (4 bytes read)
> [  144.793573] e1000e 0000:00:1f.6 eth0: NIC Link is Down
> [  159.627780] random: crng init done
> [  171.814064] process '/usr/bin/fstype' started with executable stack
> [  181.949989] BTRFS: device label btrfs_boot devid 1 transid 394687 /dev/mapper/cryptroot scanned by btrfs (1063)
> [  181.953437] BTRFS: device label btrfs_pool4 devid 1 transid 78270 /dev/sdb4 scanned by btrfs (1063)
> [  181.956989] BTRFS: device label btrfs_pool3 devid 1 transid 27410 /dev/sda5 scanned by btrfs (1063)
> [  181.960473] BTRFS: device fsid de9694f8-9c0d-4e9d-bd12-57adc4381cd7 devid 1 transid 41 /dev/sda3 scanned by btrfs (1063)
> [  181.964097] BTRFS: device fsid 23e1398d-e462-41aa-b85e-f574906ddc03 devid 1 transid 585 /dev/nvme0n1p4 scanned by btrfs (1063)
> [  188.733645] nouveau 0000:01:00.0: disp ctor failed, -12
> [  188.740653] nouveau: probe of 0000:01:00.0 failed with error -12

oh, I somehow missed that "disp ctor failed" message. I think that
might explain why things are a bit hanging. From the top of my head I
am not sure if that's something known or something new. But just in
case I CCed Lyude and Ben. And I think booting with
nouveau.debug=disp=trace could already show something relevant.

> [  188.901070] PM: Image not found (code -22)
>
> Does that help?
>
> Thanks,
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>
> Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
  2020-09-07 23:51             ` Karol Herbst
  (?)
@ 2020-09-08  0:29             ` Marc MERLIN
  2020-05-29 18:03               ` 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics? Marc MERLIN
  2020-09-13 20:15                 ` Marc MERLIN
  -1 siblings, 2 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-09-08  0:29 UTC (permalink / raw)
  To: Karol Herbst
  Cc: Mika Westerberg, nouveau, LKML, Linux PCI, Lyude Paul, Ben Skeggs

On Tue, Sep 08, 2020 at 01:51:19AM +0200, Karol Herbst wrote:
> oh, I somehow missed that "disp ctor failed" message. I think that
> might explain why things are a bit hanging. From the top of my head I
> am not sure if that's something known or something new. But just in
> case I CCed Lyude and Ben. And I think booting with
> nouveau.debug=disp=trace could already show something relevant.

Thanks.
I've added that to my boot for next time I reboot.

I'm moving some folks to Bcc now, and let's remove the lists other than
nouveau on followups (lkml and pci). I'm just putting a warning here
so that it shows up in other list archives and anyone finding this
later knows that they should look in the nouveau archives for further
updates/resolution.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
@ 2020-09-13 20:15                 ` Marc MERLIN
  0 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-09-13 20:15 UTC (permalink / raw)
  To: Karol Herbst; +Cc: nouveau, LKML, Ben Skeggs

On Mon, Sep 07, 2020 at 05:29:35PM -0700, Marc MERLIN wrote:
> On Tue, Sep 08, 2020 at 01:51:19AM +0200, Karol Herbst wrote:
> > oh, I somehow missed that "disp ctor failed" message. I think that
> > might explain why things are a bit hanging. From the top of my head I
> > am not sure if that's something known or something new. But just in
> > case I CCed Lyude and Ben. And I think booting with
> > nouveau.debug=disp=trace could already show something relevant.
> 
> Thanks.
> I've added that to my boot for next time I reboot.
> 
> I'm moving some folks to Bcc now, and let's remove the lists other than
> nouveau on followups (lkml and pci). I'm just putting a warning here
> so that it shows up in other list archives and anyone finding this
> later knows that they should look in the nouveau archives for further
> updates/resolution.

Hi, I didn't hear back on this issue. Did you need the nouveau.debug=disp=trace
or are you already working on the "disp ctor failed" issue?

Thanks
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
@ 2020-09-13 20:15                 ` Marc MERLIN
  0 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-09-13 20:15 UTC (permalink / raw)
  To: Karol Herbst; +Cc: nouveau, LKML, Ben Skeggs

On Mon, Sep 07, 2020 at 05:29:35PM -0700, Marc MERLIN wrote:
> On Tue, Sep 08, 2020 at 01:51:19AM +0200, Karol Herbst wrote:
> > oh, I somehow missed that "disp ctor failed" message. I think that
> > might explain why things are a bit hanging. From the top of my head I
> > am not sure if that's something known or something new. But just in
> > case I CCed Lyude and Ben. And I think booting with
> > nouveau.debug=disp=trace could already show something relevant.
> 
> Thanks.
> I've added that to my boot for next time I reboot.
> 
> I'm moving some folks to Bcc now, and let's remove the lists other than
> nouveau on followups (lkml and pci). I'm just putting a warning here
> so that it shows up in other list archives and anyone finding this
> later knows that they should look in the nouveau archives for further
> updates/resolution.

Hi, I didn't hear back on this issue. Did you need the nouveau.debug=disp=trace
or are you already working on the "disp ctor failed" issue?

Thanks
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73)
       [not found]                 ` <20200913201545.GL2622-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
@ 2020-09-19 23:18                   ` Marc MERLIN
  0 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-09-19 23:18 UTC (permalink / raw)
  To: Karol Herbst; +Cc: nouveau, Ben Skeggs

On Sun, Sep 13, 2020 at 01:15:45PM -0700, Marc MERLIN wrote:
> On Mon, Sep 07, 2020 at 05:29:35PM -0700, Marc MERLIN wrote:
> > On Tue, Sep 08, 2020 at 01:51:19AM +0200, Karol Herbst wrote:
> > > oh, I somehow missed that "disp ctor failed" message. I think that
> > > might explain why things are a bit hanging. From the top of my head I
> > > am not sure if that's something known or something new. But just in
> > > case I CCed Lyude and Ben. And I think booting with
> > > nouveau.debug=disp=trace could already show something relevant.
> > 
> > Thanks.
> > I've added that to my boot for next time I reboot.
> > 
> > I'm moving some folks to Bcc now, and let's remove the lists other than
> > nouveau on followups (lkml and pci). I'm just putting a warning here
> > so that it shows up in other list archives and anyone finding this
> > later knows that they should look in the nouveau archives for further
> > updates/resolution.
> 
> Hi, I didn't hear back on this issue. Did you need the nouveau.debug=disp=trace
> or are you already working on the "disp ctor failed" issue?

I rebooted with the option you asked for:
BOOT_IMAGE=/vmlinuz-5.8.5-amd64-preempt-sysrq-20190817 root=/dev/mapper/cryptroot ro rootflags=subvol=roo
t cryptopts=source=/dev/nvme0n1p7,keyscript=/sbin/cryptgetpw usbcore.autosuspend=1 pcie_aspm=force resume=/dev/dm-1 acpi_backlight=ven
dor nouveau.debug=disp=trace

[    8.371448] nouveau: detected PR support, will not use DSM
[    8.371458] nouveau 0000:01:00.0: runtime IRQ mapping not provided by arch
[    8.371463] nouveau 0000:01:00.0: enabling device (0000 -> 0003)
[    8.371510] Console: switching to colour dummy device 80x25
[    8.371542] i915 0000:00:02.0: vgaarb: deactivate vga console
[    8.371574] nouveau 0000:01:00.0: NVIDIA TU104 (164000a1)
[    8.373522] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    8.374215] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=mem
[    8.377328] i915 0000:00:02.0: [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[    8.472037] nouveau 0000:01:00.0: bios: version 90.04.4d.00.2c

note that I still get a 3mn hang at boot here

[  188.334912] nouveau 0000:01:00.0: disp: destroy running...
[  188.341741] nouveau 0000:01:00.0: disp: destroy completed in 1us
[  188.344559] nouveau 0000:01:00.0: disp ctor failed, -12
[  188.347708] nouveau: probe of 0000:01:00.0 failed with error -12

As a reminder:
sauron:~# lspci |grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)

full boot still looks like this:
[    9.812614] Console: switching to colour frame buffer device 240x67
[    9.844351] i915 0000:00:02.0: fb0: i915drmfb frame buffer device

16 seconds here? Why?

[   25.107472] thunderbolt 0000:06:00.0: saving config space at offset 0x0 (reading 0x15eb8086)
[   25.107503] thunderbolt 0000:06:00.0: saving config space at offset 0x4 (reading 0x100406)
[   25.107509] thunderbolt 0000:06:00.0: saving config space at offset 0x8 (reading 0x8800006)
[   25.107514] thunderbolt 0000:06:00.0: saving config space at offset 0xc (reading 0x20)
[   25.107520] thunderbolt 0000:06:00.0: saving config space at offset 0x10 (reading 0xcc100000)
[   25.107525] thunderbolt 0000:06:00.0: saving config space at offset 0x14 (reading 0xcc140000)
[   25.107530] thunderbolt 0000:06:00.0: saving config space at offset 0x18 (reading 0x0)
[   25.107535] thunderbolt 0000:06:00.0: saving config space at offset 0x1c (reading 0x0)
[   25.107540] thunderbolt 0000:06:00.0: saving config space at offset 0x20 (reading 0x0)
[   25.107545] thunderbolt 0000:06:00.0: saving config space at offset 0x24 (reading 0x0)
[   25.107550] thunderbolt 0000:06:00.0: saving config space at offset 0x28 (reading 0x0)
[   25.107556] thunderbolt 0000:06:00.0: saving config space at offset 0x2c (reading 0x229b17aa)
[   25.107561] thunderbolt 0000:06:00.0: saving config space at offset 0x30 (reading 0x0)
[   25.107566] thunderbolt 0000:06:00.0: saving config space at offset 0x34 (reading 0x80)
[   25.107571] thunderbolt 0000:06:00.0: saving config space at offset 0x38 (reading 0x0)
[   25.107576] thunderbolt 0000:06:00.0: saving config space at offset 0x3c (reading 0x1ff)
[   25.107661] thunderbolt 0000:06:00.0: PME# enabled
[   25.125418] pcieport 0000:05:00.0: saving config space at offset 0x0 (reading 0x15ea8086)
[   25.125448] pcieport 0000:05:00.0: saving config space at offset 0x4 (reading 0x100407)
[   25.125454] pcieport 0000:05:00.0: saving config space at offset 0x8 (reading 0x6040006)
[   25.125459] pcieport 0000:05:00.0: saving config space at offset 0xc (reading 0x10020)
[   25.125464] pcieport 0000:05:00.0: saving config space at offset 0x10 (reading 0x0)
[   25.125469] pcieport 0000:05:00.0: saving config space at offset 0x14 (reading 0x0)
[   25.125474] pcieport 0000:05:00.0: saving config space at offset 0x18 (reading 0x60605)
[   25.125478] pcieport 0000:05:00.0: saving config space at offset 0x1c (reading 0x1f1)
[   25.125483] pcieport 0000:05:00.0: saving config space at offset 0x20 (reading 0xcc10cc10)
[   25.125488] pcieport 0000:05:00.0: saving config space at offset 0x24 (reading 0x1fff1)
[   25.125493] pcieport 0000:05:00.0: saving config space at offset 0x28 (reading 0x0)
[   25.125498] pcieport 0000:05:00.0: saving config space at offset 0x2c (reading 0x0)
[   25.125503] pcieport 0000:05:00.0: saving config space at offset 0x30 (reading 0x0)
[   25.125508] pcieport 0000:05:00.0: saving config space at offset 0x34 (reading 0x80)
[   25.125512] pcieport 0000:05:00.0: saving config space at offset 0x38 (reading 0x0)
[   25.125517] pcieport 0000:05:00.0: saving config space at offset 0x3c (reading 0x201ff)
[   25.125603] pcieport 0000:05:00.0: PME# enabled
[   25.145407] pcieport 0000:04:00.0: saving config space at offset 0x0 (reading 0x15ea8086)
[   25.145426] pcieport 0000:04:00.0: saving config space at offset 0x4 (reading 0x100007)
[   25.145431] pcieport 0000:04:00.0: saving config space at offset 0x8 (reading 0x6040006)
[   25.145436] pcieport 0000:04:00.0: saving config space at offset 0xc (reading 0x10020)
[   25.145441] pcieport 0000:04:00.0: saving config space at offset 0x10 (reading 0x0)
[   25.145446] pcieport 0000:04:00.0: saving config space at offset 0x14 (reading 0x0)
[   25.145451] pcieport 0000:04:00.0: saving config space at offset 0x18 (reading 0x510504)
[   25.145456] pcieport 0000:04:00.0: saving config space at offset 0x1c (reading 0x5141)
[   25.145461] pcieport 0000:04:00.0: saving config space at offset 0x20 (reading 0xcc10b400)
[   25.145466] pcieport 0000:04:00.0: saving config space at offset 0x24 (reading 0x3ff10001)
[   25.145471] pcieport 0000:04:00.0: saving config space at offset 0x28 (reading 0x60)
[   25.145476] pcieport 0000:04:00.0: saving config space at offset 0x2c (reading 0x60)
[   25.145481] pcieport 0000:04:00.0: saving config space at offset 0x30 (reading 0x0)
[   25.145485] pcieport 0000:04:00.0: saving config space at offset 0x34 (reading 0x80)
[   25.145490] pcieport 0000:04:00.0: saving config space at offset 0x38 (reading 0x0)
[   25.145495] pcieport 0000:04:00.0: saving config space at offset 0x3c (reading 0x201ff)
[   25.145578] pcieport 0000:04:00.0: PME# enabled
[   25.165654] pcieport 0000:00:1c.0: saving config space at offset 0x0 (reading 0xa3388086)
[   25.165667] pcieport 0000:00:1c.0: saving config space at offset 0x4 (reading 0x100407)
[   25.165676] pcieport 0000:00:1c.0: saving config space at offset 0x8 (reading 0x60400f0)
[   25.165684] pcieport 0000:00:1c.0: saving config space at offset 0xc (reading 0x810000)
[   25.165692] pcieport 0000:00:1c.0: saving config space at offset 0x10 (reading 0x0)
[   25.165699] pcieport 0000:00:1c.0: saving config space at offset 0x14 (reading 0x0)
[   25.165704] pcieport 0000:00:1c.0: saving config space at offset 0x18 (reading 0x510400)
[   25.165711] pcieport 0000:00:1c.0: saving config space at offset 0x1c (reading 0x20006040)
[   25.165724] pcieport 0000:00:1c.0: saving config space at offset 0x20 (reading 0xcc10b400)
[   25.165731] pcieport 0000:00:1c.0: saving config space at offset 0x24 (reading 0x3ff10001)
[   25.165736] pcieport 0000:00:1c.0: saving config space at offset 0x28 (reading 0x60)
[   25.165740] pcieport 0000:00:1c.0: saving config space at offset 0x2c (reading 0x60)
[   25.165745] pcieport 0000:00:1c.0: saving config space at offset 0x30 (reading 0x0)
[   25.165749] pcieport 0000:00:1c.0: saving config space at offset 0x34 (reading 0x40)
[   25.165754] pcieport 0000:00:1c.0: saving config space at offset 0x38 (reading 0x0)
[   25.165758] pcieport 0000:00:1c.0: saving config space at offset 0x3c (reading 0x201ff)
[   25.165849] pcieport 0000:00:1c.0: PME# enabled
[   26.293697] pcieport 0000:00:1c.0: power state changed by ACPI to D3cold

then 2mn lost here.

[  128.473799] async_tx: api initialized (async)
[  128.492893] device-mapper: uevent: version 1.0.3
[  128.493134] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
[  128.504534] random: cryptsetup: uninitialized urandom read (4 bytes read)
[  138.554741] random: cryptsetup: uninitialized urandom read (4 bytes read)
[  144.837421] e1000e 0000:00:1f.6 eth0: NIC Link is Down
[  188.334912] nouveau 0000:01:00.0: disp: destroy running...
[  188.341741] nouveau 0000:01:00.0: disp: destroy completed in 1us
[  188.344559] nouveau 0000:01:00.0: disp ctor failed, -12
[  188.347708] nouveau: probe of 0000:01:00.0 failed with error -12

Full lspci if it's helpful:

00:00.0 Host bridge: Intel Corporation Device 3e20 (rev 0d)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 0d)
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile) (rev 02)
00:04.0 Signal processing controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem (rev 0d)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller (rev 10)
00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #17 (rev f0)
00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 (rev f0)
00:1c.5 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0)
00:1c.7 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #8 (rev f0)
00:1e.0 Communication controller: Intel Corporation Cannon Lake PCH Serial IO UART Host Controller (rev 10)
00:1f.0 ISA bridge: Intel Corporation Cannon Lake LPC Controller (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU104 HD Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU104 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller (rev a1)
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
04:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:01.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:02.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
05:04.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] (rev 06)
06:00.0 System peripheral: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] (rev 06)
2c:00.0 USB controller: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] (rev 06)
52:00.0 Network controller: Intel Corporation Wi-Fi 6 AX200 (rev 1a)
54:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader (rev 01)

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
       [not found]                 ` <20200529180315.GA18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
@ 2020-12-26 11:12                   ` Marc MERLIN
  0 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-12-26 11:12 UTC (permalink / raw)
  To: nouveau; +Cc: Mika Westerberg, LKML, Linux PCI

This started with 5.5 and hasn't gotten better since then, despite some reports
I tried to send.

As per my previous message:
I have a Thinkpad P70 with hybrid graphics.
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M600M] (rev a2)
that one works fine, I can use i915 for the main screen, and nouveau to
display on the external ports (external ports are only wired to nvidia
chip, so it's impossible to use them without turning the nvidia chip
on).
 
I now got a newer P73 also with the same hybrid graphics (setup as such
in the bios). It runs fine with i915, and I don't need to use external
display with nouveau for now (it almost works, but I only see the mouse
cursor on the external screen, no window or anything else can get
displayed, very weird).
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
 

after boot, when it gets the right trigger (not sure which ones), it
loops on this evern 2 seconds, mostly forever.

I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or something else.

Boot hangs look like this:
[   10.659209] Console: switching to colour frame buffer device 240x67
[   10.732353] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
[   12.101203] nvidia-gpu 0000:01:00.3: saving config space at offset 0x0 (reading 0x1ad910de)
[   12.101212] nvidia-gpu 0000:01:00.3: saving config space at offset 0x4 (reading 0x100406)
[   12.101217] nvidia-gpu 0000:01:00.3: saving config space at offset 0x8 (reading 0xc8000a1)
[   12.101223] nvidia-gpu 0000:01:00.3: saving config space at offset 0xc (reading 0x800000)
[   12.101228] nvidia-gpu 0000:01:00.3: saving config space at offset 0x10 (reading 0xce054000)
[   12.101234] nvidia-gpu 0000:01:00.3: saving config space at offset 0x14 (reading 0x0)
[   12.101239] nvidia-gpu 0000:01:00.3: saving config space at offset 0x18 (reading 0x0)
[   12.101244] nvidia-gpu 0000:01:00.3: saving config space at offset 0x1c (reading 0x0)
[   12.101249] nvidia-gpu 0000:01:00.3: saving config space at offset 0x20 (reading 0x0)
[   12.101254] nvidia-gpu 0000:01:00.3: saving config space at offset 0x24 (reading 0x0)
[   12.101259] nvidia-gpu 0000:01:00.3: saving config space at offset 0x28 (reading 0x0)
[   12.101265] nvidia-gpu 0000:01:00.3: saving config space at offset 0x2c (reading 0x229b17aa)
[   12.101270] nvidia-gpu 0000:01:00.3: saving config space at offset 0x30 (reading 0x0)
[   12.101275] nvidia-gpu 0000:01:00.3: saving config space at offset 0x34 (reading 0x68)
[   12.101280] nvidia-gpu 0000:01:00.3: saving config space at offset 0x38 (reading 0x0)
[   12.101285] nvidia-gpu 0000:01:00.3: saving config space at offset 0x3c (reading 0x4ff)
[   12.101333] nvidia-gpu 0000:01:00.3: PME# enabled
[   25.151246] thunderbolt 0000:06:00.0: saving config space at offset 0x0 (reading 0x15eb8086)
[   25.151260] thunderbolt 0000:06:00.0: saving config space at offset 0x4 (reading 0x100406)
[   25.151265] thunderbolt 0000:06:00.0: saving config space at offset 0x8 (reading 0x8800006)
[   25.151270] thunderbolt 0000:06:00.0: saving config space at offset 0xc (reading 0x20)
[   25.151276] thunderbolt 0000:06:00.0: saving config space at offset 0x10 (reading 0xcc100000)
[   25.151281] thunderbolt 0000:06:00.0: saving config space at offset 0x14 (reading 0xcc140000)
[   25.151286] thunderbolt 0000:06:00.0: saving config space at offset 0x18 (reading 0x0)
[   25.151291] thunderbolt 0000:06:00.0: saving config space at offset 0x1c (reading 0x0)
[   25.151296] thunderbolt 0000:06:00.0: saving config space at offset 0x20 (reading 0x0)
[   25.151301] thunderbolt 0000:06:00.0: saving config space at offset 0x24 (reading 0x0)
[   25.151306] thunderbolt 0000:06:00.0: saving config space at offset 0x28 (reading 0x0)
[   25.151311] thunderbolt 0000:06:00.0: saving config space at offset 0x2c (reading 0x229b17aa)
[   25.151316] thunderbolt 0000:06:00.0: saving config space at offset 0x30 (reading 0x0)
[   25.151322] thunderbolt 0000:06:00.0: saving config space at offset 0x34 (reading 0x80)
[   25.151327] thunderbolt 0000:06:00.0: saving config space at offset 0x38 (reading 0x0)
[   25.151332] thunderbolt 0000:06:00.0: saving config space at offset 0x3c (reading 0x1ff)
[   25.151416] thunderbolt 0000:06:00.0: PME# enabled
[   25.169204] pcieport 0000:05:00.0: saving config space at offset 0x0 (reading 0x15ea8086)
[   25.169214] pcieport 0000:05:00.0: saving config space at offset 0x4 (reading 0x100407)
[   25.169219] pcieport 0000:05:00.0: saving config space at offset 0x8 (reading 0x6040006)
[   25.169224] pcieport 0000:05:00.0: saving config space at offset 0xc (reading 0x10020)
[   25.169229] pcieport 0000:05:00.0: saving config space at offset 0x10 (reading 0x0)
[   25.169233] pcieport 0000:05:00.0: saving config space at offset 0x14 (reading 0x0)
[   25.169238] pcieport 0000:05:00.0: saving config space at offset 0x18 (reading 0x60605)
[   25.169243] pcieport 0000:05:00.0: saving config space at offset 0x1c (reading 0x1f1)
[   25.169248] pcieport 0000:05:00.0: saving config space at offset 0x20 (reading 0xcc10cc10)
[   25.169253] pcieport 0000:05:00.0: saving config space at offset 0x24 (reading 0x1fff1)
[   25.169258] pcieport 0000:05:00.0: saving config space at offset 0x28 (reading 0x0)
[   25.169263] pcieport 0000:05:00.0: saving config space at offset 0x2c (reading 0x0)
[   25.169268] pcieport 0000:05:00.0: saving config space at offset 0x30 (reading 0x0)
[   25.169272] pcieport 0000:05:00.0: saving config space at offset 0x34 (reading 0x80)
[   25.169277] pcieport 0000:05:00.0: saving config space at offset 0x38 (reading 0x0)
[   25.169282] pcieport 0000:05:00.0: saving config space at offset 0x3c (reading 0x201ff)
[   25.169367] pcieport 0000:05:00.0: PME# enabled
[   25.189195] pcieport 0000:04:00.0: saving config space at offset 0x0 (reading 0x15ea8086)
[   25.189206] pcieport 0000:04:00.0: saving config space at offset 0x4 (reading 0x100007)
[   25.189212] pcieport 0000:04:00.0: saving config space at offset 0x8 (reading 0x6040006)
[   25.189216] pcieport 0000:04:00.0: saving config space at offset 0xc (reading 0x10020)
[   25.189221] pcieport 0000:04:00.0: saving config space at offset 0x10 (reading 0x0)
[   25.189226] pcieport 0000:04:00.0: saving config space at offset 0x14 (reading 0x0)
[   25.189231] pcieport 0000:04:00.0: saving config space at offset 0x18 (reading 0x510504)
[   25.189236] pcieport 0000:04:00.0: saving config space at offset 0x1c (reading 0x5141)
[   25.189241] pcieport 0000:04:00.0: saving config space at offset 0x20 (reading 0xcc10b400)
[   25.189246] pcieport 0000:04:00.0: saving config space at offset 0x24 (reading 0x3ff10001)
[   25.189251] pcieport 0000:04:00.0: saving config space at offset 0x28 (reading 0x60)
[   25.189255] pcieport 0000:04:00.0: saving config space at offset 0x2c (reading 0x60)
[   25.189260] pcieport 0000:04:00.0: saving config space at offset 0x30 (reading 0x0)
[   25.189265] pcieport 0000:04:00.0: saving config space at offset 0x34 (reading 0x80)
[   25.189270] pcieport 0000:04:00.0: saving config space at offset 0x38 (reading 0x0)
[   25.189274] pcieport 0000:04:00.0: saving config space at offset 0x3c (reading 0x201ff)
[   25.189358] pcieport 0000:04:00.0: PME# enabled
[   25.209257] pcieport 0000:00:1c.0: saving config space at offset 0x0 (reading 0xa3388086)
[   25.209271] pcieport 0000:00:1c.0: saving config space at offset 0x4 (reading 0x100407)
[   25.209279] pcieport 0000:00:1c.0: saving config space at offset 0x8 (reading 0x60400f0)
[   25.209287] pcieport 0000:00:1c.0: saving config space at offset 0xc (reading 0x810000)
[   25.209291] pcieport 0000:00:1c.0: saving config space at offset 0x10 (reading 0x0)
[   25.209299] pcieport 0000:00:1c.0: saving config space at offset 0x14 (reading 0x0)
[   25.209303] pcieport 0000:00:1c.0: saving config space at offset 0x18 (reading 0x510400)
[   25.209311] pcieport 0000:00:1c.0: saving config space at offset 0x1c (reading 0x20006040)
[   25.209324] pcieport 0000:00:1c.0: saving config space at offset 0x20 (reading 0xcc10b400)
[   25.209329] pcieport 0000:00:1c.0: saving config space at offset 0x24 (reading 0x3ff10001)
[   25.209333] pcieport 0000:00:1c.0: saving config space at offset 0x28 (reading 0x60)
[   25.209338] pcieport 0000:00:1c.0: saving config space at offset 0x2c (reading 0x60)
[   25.209342] pcieport 0000:00:1c.0: saving config space at offset 0x30 (reading 0x0)
[   25.209346] pcieport 0000:00:1c.0: saving config space at offset 0x34 (reading 0x40)
[   25.209351] pcieport 0000:00:1c.0: saving config space at offset 0x38 (reading 0x0)
[   25.209355] pcieport 0000:00:1c.0: saving config space at offset 0x3c (reading 0x201ff)
[   25.209447] pcieport 0000:00:1c.0: PME# enabled
[   26.341460] pcieport 0000:00:1c.0: power state changed by ACPI to D3cold
[  129.257560] async_tx: api initialized (async)
[  129.280335] device-mapper: uevent: version 1.0.3
[  129.280466] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel@redhat.com
[  129.293087] random: cryptsetup: uninitialized urandom read (4 bytes read)
[  139.346041] random: cryptsetup: uninitialized urandom read (4 bytes read)
[  145.633300] e1000e 0000:00:1f.6 eth0: NIC Link is Down
[  149.384146] random: crng init done
[  161.435256] process '/usr/bin/fstype' started with executable stack
[  171.578236] BTRFS: device label btrfs_boot devid 1 transid 575473 /dev/mapper/cryptroot scanned by btrfs (1069)
[  171.583482] BTRFS: device label btrfs_pool4 devid 1 transid 117379 /dev/sdb4 scanned by btrfs (1069)
[  171.588979] BTRFS: device label btrfs_pool3 devid 1 transid 40487 /dev/sda5 scanned by btrfs (1069)
[  171.594484] BTRFS: device fsid de9694f8-9c0d-4e9d-bd12-57adc4381cd7 devid 1 transid 41 /dev/sda3 scanned by btrfs (1069)
[  171.600437] BTRFS: device fsid 23e1398d-e462-41aa-b85e-f574906ddc03 devid 1 transid 585 /dev/nvme0n1p4 scanned by btrfs (1069)
[  182.799968] PM: Image not found (code -22)
[  189.304662] nouveau 0000:01:00.0: pmu: firmware unavailable
[  189.312455] nouveau 0000:01:00.0: disp: destroy running...
[  189.316552] nouveau 0000:01:00.0: disp: destroy completed in 1us
[  189.320326] nouveau 0000:01:00.0: disp ctor failed, -12
[  189.324214] nouveau: probe of 0000:01:00.0 failed with error -12

At runtime, it later gets into a loop like this, and that murders
battery life if I'm not plugged in:
[2140771.370888] nvidia-gpu 0000:01:00.3: saving config space at offset 0x0 (reading 0x1ad910de)
[2140771.370895] nvidia-gpu 0000:01:00.3: saving config space at offset 0x4 (reading 0x100406)
[2140771.370899] nvidia-gpu 0000:01:00.3: saving config space at offset 0x8 (reading 0xc8000a1)
[2140771.370902] nvidia-gpu 0000:01:00.3: saving config space at offset 0xc (reading 0x800000)
[2140771.370905] nvidia-gpu 0000:01:00.3: saving config space at offset 0x10 (reading 0xce054000)
[2140771.370908] nvidia-gpu 0000:01:00.3: saving config space at offset 0x14 (reading 0x0)
[2140771.370912] nvidia-gpu 0000:01:00.3: saving config space at offset 0x18 (reading 0x0)
[2140771.370915] nvidia-gpu 0000:01:00.3: saving config space at offset 0x1c (reading 0x0)
[2140771.370918] nvidia-gpu 0000:01:00.3: saving config space at offset 0x20 (reading 0x0)
[2140771.370921] nvidia-gpu 0000:01:00.3: saving config space at offset 0x24 (reading 0x0)
[2140771.370924] nvidia-gpu 0000:01:00.3: saving config space at offset 0x28 (reading 0x0)
[2140771.370927] nvidia-gpu 0000:01:00.3: saving config space at offset 0x2c (reading 0x229b17aa)
[2140771.370930] nvidia-gpu 0000:01:00.3: saving config space at offset 0x30 (reading 0x0)
[2140771.370933] nvidia-gpu 0000:01:00.3: saving config space at offset 0x34 (reading 0x68)
[2140771.370936] nvidia-gpu 0000:01:00.3: saving config space at offset 0x38 (reading 0x0)
[2140771.370939] nvidia-gpu 0000:01:00.3: saving config space at offset 0x3c (reading 0x4ff)
[2140771.370970] nvidia-gpu 0000:01:00.3: PME# enabled
[2140771.389882] pci 0000:01:00.0: saving config space at offset 0x0 (reading 0x1eb610de)
[2140771.389891] pci 0000:01:00.0: saving config space at offset 0x4 (reading 0x100403)
[2140771.389896] pci 0000:01:00.0: saving config space at offset 0x8 (reading 0x30000a1)
[2140771.389899] pci 0000:01:00.0: saving config space at offset 0xc (reading 0x800000)
[2140771.389903] pci 0000:01:00.0: saving config space at offset 0x10 (reading 0xcd000000)
[2140771.389907] pci 0000:01:00.0: saving config space at offset 0x14 (reading 0xa000000c)
[2140771.389910] pci 0000:01:00.0: saving config space at offset 0x18 (reading 0x0)
[2140771.389914] pci 0000:01:00.0: saving config space at offset 0x1c (reading 0xb000000c)
[2140771.389918] pci 0000:01:00.0: saving config space at offset 0x20 (reading 0x0)
[2140771.389922] pci 0000:01:00.0: saving config space at offset 0x24 (reading 0x2001)
[2140771.389925] pci 0000:01:00.0: saving config space at offset 0x28 (reading 0x0)
[2140771.389928] pci 0000:01:00.0: saving config space at offset 0x2c (reading 0x229b17aa)
[2140771.389932] pci 0000:01:00.0: saving config space at offset 0x30 (reading 0xfff80000)
[2140771.389935] pci 0000:01:00.0: saving config space at offset 0x34 (reading 0x60)
[2140771.389939] pci 0000:01:00.0: saving config space at offset 0x38 (reading 0x0)
[2140771.389943] pci 0000:01:00.0: saving config space at offset 0x3c (reading 0x1ff)
[2140771.390027] pcieport 0000:00:01.0: saving config space at offset 0x0 (reading 0x19018086)
[2140771.390030] pcieport 0000:00:01.0: saving config space at offset 0x4 (reading 0x100407)
[2140771.390033] pcieport 0000:00:01.0: saving config space at offset 0x8 (reading 0x604000d)
[2140771.390036] pcieport 0000:00:01.0: saving config space at offset 0xc (reading 0x810000)
[2140771.390038] pcieport 0000:00:01.0: saving config space at offset 0x10 (reading 0x0)
[2140771.390041] pcieport 0000:00:01.0: saving config space at offset 0x14 (reading 0x0)
[2140771.390044] pcieport 0000:00:01.0: saving config space at offset 0x18 (reading 0x10100)
[2140771.390046] pcieport 0000:00:01.0: saving config space at offset 0x1c (reading 0x2020)
[2140771.390049] pcieport 0000:00:01.0: saving config space at offset 0x20 (reading 0xce00cd00)
[2140771.390051] pcieport 0000:00:01.0: saving config space at offset 0x24 (reading 0xb1f1a001)
[2140771.390054] pcieport 0000:00:01.0: saving config space at offset 0x28 (reading 0x0)
[2140771.390056] pcieport 0000:00:01.0: saving config space at offset 0x2c (reading 0x0)
[2140771.390059] pcieport 0000:00:01.0: saving config space at offset 0x30 (reading 0x0)
[2140771.390061] pcieport 0000:00:01.0: saving config space at offset 0x34 (reading 0x88)
[2140771.390064] pcieport 0000:00:01.0: saving config space at offset 0x38 (reading 0x0)
[2140771.390067] pcieport 0000:00:01.0: saving config space at offset 0x3c (reading 0x201ff)
[2140771.390125] pcieport 0000:00:01.0: PME# enabled

Thanks for any help
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
@ 2020-12-26 11:12                   ` Marc MERLIN
  0 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-12-26 11:12 UTC (permalink / raw)
  To: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Linux PCI, Mika Westerberg, LKML

This started with 5.5 and hasn't gotten better since then, despite some reports
I tried to send.

As per my previous message:
I have a Thinkpad P70 with hybrid graphics.
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M600M] (rev a2)
that one works fine, I can use i915 for the main screen, and nouveau to
display on the external ports (external ports are only wired to nvidia
chip, so it's impossible to use them without turning the nvidia chip
on).
 
I now got a newer P73 also with the same hybrid graphics (setup as such
in the bios). It runs fine with i915, and I don't need to use external
display with nouveau for now (it almost works, but I only see the mouse
cursor on the external screen, no window or anything else can get
displayed, very weird).
01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
 

after boot, when it gets the right trigger (not sure which ones), it
loops on this evern 2 seconds, mostly forever.

I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or something else.

Boot hangs look like this:
[   10.659209] Console: switching to colour frame buffer device 240x67
[   10.732353] i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device
[   12.101203] nvidia-gpu 0000:01:00.3: saving config space at offset 0x0 (reading 0x1ad910de)
[   12.101212] nvidia-gpu 0000:01:00.3: saving config space at offset 0x4 (reading 0x100406)
[   12.101217] nvidia-gpu 0000:01:00.3: saving config space at offset 0x8 (reading 0xc8000a1)
[   12.101223] nvidia-gpu 0000:01:00.3: saving config space at offset 0xc (reading 0x800000)
[   12.101228] nvidia-gpu 0000:01:00.3: saving config space at offset 0x10 (reading 0xce054000)
[   12.101234] nvidia-gpu 0000:01:00.3: saving config space at offset 0x14 (reading 0x0)
[   12.101239] nvidia-gpu 0000:01:00.3: saving config space at offset 0x18 (reading 0x0)
[   12.101244] nvidia-gpu 0000:01:00.3: saving config space at offset 0x1c (reading 0x0)
[   12.101249] nvidia-gpu 0000:01:00.3: saving config space at offset 0x20 (reading 0x0)
[   12.101254] nvidia-gpu 0000:01:00.3: saving config space at offset 0x24 (reading 0x0)
[   12.101259] nvidia-gpu 0000:01:00.3: saving config space at offset 0x28 (reading 0x0)
[   12.101265] nvidia-gpu 0000:01:00.3: saving config space at offset 0x2c (reading 0x229b17aa)
[   12.101270] nvidia-gpu 0000:01:00.3: saving config space at offset 0x30 (reading 0x0)
[   12.101275] nvidia-gpu 0000:01:00.3: saving config space at offset 0x34 (reading 0x68)
[   12.101280] nvidia-gpu 0000:01:00.3: saving config space at offset 0x38 (reading 0x0)
[   12.101285] nvidia-gpu 0000:01:00.3: saving config space at offset 0x3c (reading 0x4ff)
[   12.101333] nvidia-gpu 0000:01:00.3: PME# enabled
[   25.151246] thunderbolt 0000:06:00.0: saving config space at offset 0x0 (reading 0x15eb8086)
[   25.151260] thunderbolt 0000:06:00.0: saving config space at offset 0x4 (reading 0x100406)
[   25.151265] thunderbolt 0000:06:00.0: saving config space at offset 0x8 (reading 0x8800006)
[   25.151270] thunderbolt 0000:06:00.0: saving config space at offset 0xc (reading 0x20)
[   25.151276] thunderbolt 0000:06:00.0: saving config space at offset 0x10 (reading 0xcc100000)
[   25.151281] thunderbolt 0000:06:00.0: saving config space at offset 0x14 (reading 0xcc140000)
[   25.151286] thunderbolt 0000:06:00.0: saving config space at offset 0x18 (reading 0x0)
[   25.151291] thunderbolt 0000:06:00.0: saving config space at offset 0x1c (reading 0x0)
[   25.151296] thunderbolt 0000:06:00.0: saving config space at offset 0x20 (reading 0x0)
[   25.151301] thunderbolt 0000:06:00.0: saving config space at offset 0x24 (reading 0x0)
[   25.151306] thunderbolt 0000:06:00.0: saving config space at offset 0x28 (reading 0x0)
[   25.151311] thunderbolt 0000:06:00.0: saving config space at offset 0x2c (reading 0x229b17aa)
[   25.151316] thunderbolt 0000:06:00.0: saving config space at offset 0x30 (reading 0x0)
[   25.151322] thunderbolt 0000:06:00.0: saving config space at offset 0x34 (reading 0x80)
[   25.151327] thunderbolt 0000:06:00.0: saving config space at offset 0x38 (reading 0x0)
[   25.151332] thunderbolt 0000:06:00.0: saving config space at offset 0x3c (reading 0x1ff)
[   25.151416] thunderbolt 0000:06:00.0: PME# enabled
[   25.169204] pcieport 0000:05:00.0: saving config space at offset 0x0 (reading 0x15ea8086)
[   25.169214] pcieport 0000:05:00.0: saving config space at offset 0x4 (reading 0x100407)
[   25.169219] pcieport 0000:05:00.0: saving config space at offset 0x8 (reading 0x6040006)
[   25.169224] pcieport 0000:05:00.0: saving config space at offset 0xc (reading 0x10020)
[   25.169229] pcieport 0000:05:00.0: saving config space at offset 0x10 (reading 0x0)
[   25.169233] pcieport 0000:05:00.0: saving config space at offset 0x14 (reading 0x0)
[   25.169238] pcieport 0000:05:00.0: saving config space at offset 0x18 (reading 0x60605)
[   25.169243] pcieport 0000:05:00.0: saving config space at offset 0x1c (reading 0x1f1)
[   25.169248] pcieport 0000:05:00.0: saving config space at offset 0x20 (reading 0xcc10cc10)
[   25.169253] pcieport 0000:05:00.0: saving config space at offset 0x24 (reading 0x1fff1)
[   25.169258] pcieport 0000:05:00.0: saving config space at offset 0x28 (reading 0x0)
[   25.169263] pcieport 0000:05:00.0: saving config space at offset 0x2c (reading 0x0)
[   25.169268] pcieport 0000:05:00.0: saving config space at offset 0x30 (reading 0x0)
[   25.169272] pcieport 0000:05:00.0: saving config space at offset 0x34 (reading 0x80)
[   25.169277] pcieport 0000:05:00.0: saving config space at offset 0x38 (reading 0x0)
[   25.169282] pcieport 0000:05:00.0: saving config space at offset 0x3c (reading 0x201ff)
[   25.169367] pcieport 0000:05:00.0: PME# enabled
[   25.189195] pcieport 0000:04:00.0: saving config space at offset 0x0 (reading 0x15ea8086)
[   25.189206] pcieport 0000:04:00.0: saving config space at offset 0x4 (reading 0x100007)
[   25.189212] pcieport 0000:04:00.0: saving config space at offset 0x8 (reading 0x6040006)
[   25.189216] pcieport 0000:04:00.0: saving config space at offset 0xc (reading 0x10020)
[   25.189221] pcieport 0000:04:00.0: saving config space at offset 0x10 (reading 0x0)
[   25.189226] pcieport 0000:04:00.0: saving config space at offset 0x14 (reading 0x0)
[   25.189231] pcieport 0000:04:00.0: saving config space at offset 0x18 (reading 0x510504)
[   25.189236] pcieport 0000:04:00.0: saving config space at offset 0x1c (reading 0x5141)
[   25.189241] pcieport 0000:04:00.0: saving config space at offset 0x20 (reading 0xcc10b400)
[   25.189246] pcieport 0000:04:00.0: saving config space at offset 0x24 (reading 0x3ff10001)
[   25.189251] pcieport 0000:04:00.0: saving config space at offset 0x28 (reading 0x60)
[   25.189255] pcieport 0000:04:00.0: saving config space at offset 0x2c (reading 0x60)
[   25.189260] pcieport 0000:04:00.0: saving config space at offset 0x30 (reading 0x0)
[   25.189265] pcieport 0000:04:00.0: saving config space at offset 0x34 (reading 0x80)
[   25.189270] pcieport 0000:04:00.0: saving config space at offset 0x38 (reading 0x0)
[   25.189274] pcieport 0000:04:00.0: saving config space at offset 0x3c (reading 0x201ff)
[   25.189358] pcieport 0000:04:00.0: PME# enabled
[   25.209257] pcieport 0000:00:1c.0: saving config space at offset 0x0 (reading 0xa3388086)
[   25.209271] pcieport 0000:00:1c.0: saving config space at offset 0x4 (reading 0x100407)
[   25.209279] pcieport 0000:00:1c.0: saving config space at offset 0x8 (reading 0x60400f0)
[   25.209287] pcieport 0000:00:1c.0: saving config space at offset 0xc (reading 0x810000)
[   25.209291] pcieport 0000:00:1c.0: saving config space at offset 0x10 (reading 0x0)
[   25.209299] pcieport 0000:00:1c.0: saving config space at offset 0x14 (reading 0x0)
[   25.209303] pcieport 0000:00:1c.0: saving config space at offset 0x18 (reading 0x510400)
[   25.209311] pcieport 0000:00:1c.0: saving config space at offset 0x1c (reading 0x20006040)
[   25.209324] pcieport 0000:00:1c.0: saving config space at offset 0x20 (reading 0xcc10b400)
[   25.209329] pcieport 0000:00:1c.0: saving config space at offset 0x24 (reading 0x3ff10001)
[   25.209333] pcieport 0000:00:1c.0: saving config space at offset 0x28 (reading 0x60)
[   25.209338] pcieport 0000:00:1c.0: saving config space at offset 0x2c (reading 0x60)
[   25.209342] pcieport 0000:00:1c.0: saving config space at offset 0x30 (reading 0x0)
[   25.209346] pcieport 0000:00:1c.0: saving config space at offset 0x34 (reading 0x40)
[   25.209351] pcieport 0000:00:1c.0: saving config space at offset 0x38 (reading 0x0)
[   25.209355] pcieport 0000:00:1c.0: saving config space at offset 0x3c (reading 0x201ff)
[   25.209447] pcieport 0000:00:1c.0: PME# enabled
[   26.341460] pcieport 0000:00:1c.0: power state changed by ACPI to D3cold
[  129.257560] async_tx: api initialized (async)
[  129.280335] device-mapper: uevent: version 1.0.3
[  129.280466] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
[  129.293087] random: cryptsetup: uninitialized urandom read (4 bytes read)
[  139.346041] random: cryptsetup: uninitialized urandom read (4 bytes read)
[  145.633300] e1000e 0000:00:1f.6 eth0: NIC Link is Down
[  149.384146] random: crng init done
[  161.435256] process '/usr/bin/fstype' started with executable stack
[  171.578236] BTRFS: device label btrfs_boot devid 1 transid 575473 /dev/mapper/cryptroot scanned by btrfs (1069)
[  171.583482] BTRFS: device label btrfs_pool4 devid 1 transid 117379 /dev/sdb4 scanned by btrfs (1069)
[  171.588979] BTRFS: device label btrfs_pool3 devid 1 transid 40487 /dev/sda5 scanned by btrfs (1069)
[  171.594484] BTRFS: device fsid de9694f8-9c0d-4e9d-bd12-57adc4381cd7 devid 1 transid 41 /dev/sda3 scanned by btrfs (1069)
[  171.600437] BTRFS: device fsid 23e1398d-e462-41aa-b85e-f574906ddc03 devid 1 transid 585 /dev/nvme0n1p4 scanned by btrfs (1069)
[  182.799968] PM: Image not found (code -22)
[  189.304662] nouveau 0000:01:00.0: pmu: firmware unavailable
[  189.312455] nouveau 0000:01:00.0: disp: destroy running...
[  189.316552] nouveau 0000:01:00.0: disp: destroy completed in 1us
[  189.320326] nouveau 0000:01:00.0: disp ctor failed, -12
[  189.324214] nouveau: probe of 0000:01:00.0 failed with error -12

At runtime, it later gets into a loop like this, and that murders
battery life if I'm not plugged in:
[2140771.370888] nvidia-gpu 0000:01:00.3: saving config space at offset 0x0 (reading 0x1ad910de)
[2140771.370895] nvidia-gpu 0000:01:00.3: saving config space at offset 0x4 (reading 0x100406)
[2140771.370899] nvidia-gpu 0000:01:00.3: saving config space at offset 0x8 (reading 0xc8000a1)
[2140771.370902] nvidia-gpu 0000:01:00.3: saving config space at offset 0xc (reading 0x800000)
[2140771.370905] nvidia-gpu 0000:01:00.3: saving config space at offset 0x10 (reading 0xce054000)
[2140771.370908] nvidia-gpu 0000:01:00.3: saving config space at offset 0x14 (reading 0x0)
[2140771.370912] nvidia-gpu 0000:01:00.3: saving config space at offset 0x18 (reading 0x0)
[2140771.370915] nvidia-gpu 0000:01:00.3: saving config space at offset 0x1c (reading 0x0)
[2140771.370918] nvidia-gpu 0000:01:00.3: saving config space at offset 0x20 (reading 0x0)
[2140771.370921] nvidia-gpu 0000:01:00.3: saving config space at offset 0x24 (reading 0x0)
[2140771.370924] nvidia-gpu 0000:01:00.3: saving config space at offset 0x28 (reading 0x0)
[2140771.370927] nvidia-gpu 0000:01:00.3: saving config space at offset 0x2c (reading 0x229b17aa)
[2140771.370930] nvidia-gpu 0000:01:00.3: saving config space at offset 0x30 (reading 0x0)
[2140771.370933] nvidia-gpu 0000:01:00.3: saving config space at offset 0x34 (reading 0x68)
[2140771.370936] nvidia-gpu 0000:01:00.3: saving config space at offset 0x38 (reading 0x0)
[2140771.370939] nvidia-gpu 0000:01:00.3: saving config space at offset 0x3c (reading 0x4ff)
[2140771.370970] nvidia-gpu 0000:01:00.3: PME# enabled
[2140771.389882] pci 0000:01:00.0: saving config space at offset 0x0 (reading 0x1eb610de)
[2140771.389891] pci 0000:01:00.0: saving config space at offset 0x4 (reading 0x100403)
[2140771.389896] pci 0000:01:00.0: saving config space at offset 0x8 (reading 0x30000a1)
[2140771.389899] pci 0000:01:00.0: saving config space at offset 0xc (reading 0x800000)
[2140771.389903] pci 0000:01:00.0: saving config space at offset 0x10 (reading 0xcd000000)
[2140771.389907] pci 0000:01:00.0: saving config space at offset 0x14 (reading 0xa000000c)
[2140771.389910] pci 0000:01:00.0: saving config space at offset 0x18 (reading 0x0)
[2140771.389914] pci 0000:01:00.0: saving config space at offset 0x1c (reading 0xb000000c)
[2140771.389918] pci 0000:01:00.0: saving config space at offset 0x20 (reading 0x0)
[2140771.389922] pci 0000:01:00.0: saving config space at offset 0x24 (reading 0x2001)
[2140771.389925] pci 0000:01:00.0: saving config space at offset 0x28 (reading 0x0)
[2140771.389928] pci 0000:01:00.0: saving config space at offset 0x2c (reading 0x229b17aa)
[2140771.389932] pci 0000:01:00.0: saving config space at offset 0x30 (reading 0xfff80000)
[2140771.389935] pci 0000:01:00.0: saving config space at offset 0x34 (reading 0x60)
[2140771.389939] pci 0000:01:00.0: saving config space at offset 0x38 (reading 0x0)
[2140771.389943] pci 0000:01:00.0: saving config space at offset 0x3c (reading 0x1ff)
[2140771.390027] pcieport 0000:00:01.0: saving config space at offset 0x0 (reading 0x19018086)
[2140771.390030] pcieport 0000:00:01.0: saving config space at offset 0x4 (reading 0x100407)
[2140771.390033] pcieport 0000:00:01.0: saving config space at offset 0x8 (reading 0x604000d)
[2140771.390036] pcieport 0000:00:01.0: saving config space at offset 0xc (reading 0x810000)
[2140771.390038] pcieport 0000:00:01.0: saving config space at offset 0x10 (reading 0x0)
[2140771.390041] pcieport 0000:00:01.0: saving config space at offset 0x14 (reading 0x0)
[2140771.390044] pcieport 0000:00:01.0: saving config space at offset 0x18 (reading 0x10100)
[2140771.390046] pcieport 0000:00:01.0: saving config space at offset 0x1c (reading 0x2020)
[2140771.390049] pcieport 0000:00:01.0: saving config space at offset 0x20 (reading 0xce00cd00)
[2140771.390051] pcieport 0000:00:01.0: saving config space at offset 0x24 (reading 0xb1f1a001)
[2140771.390054] pcieport 0000:00:01.0: saving config space at offset 0x28 (reading 0x0)
[2140771.390056] pcieport 0000:00:01.0: saving config space at offset 0x2c (reading 0x0)
[2140771.390059] pcieport 0000:00:01.0: saving config space at offset 0x30 (reading 0x0)
[2140771.390061] pcieport 0000:00:01.0: saving config space at offset 0x34 (reading 0x88)
[2140771.390064] pcieport 0000:00:01.0: saving config space at offset 0x38 (reading 0x0)
[2140771.390067] pcieport 0000:00:01.0: saving config space at offset 0x3c (reading 0x201ff)
[2140771.390125] pcieport 0000:00:01.0: PME# enabled

Thanks for any help
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Nouveau] 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
@ 2020-12-27 18:28                     ` Ilia Mirkin
  0 siblings, 0 replies; 77+ messages in thread
From: Ilia Mirkin @ 2020-12-27 18:28 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: nouveau, Linux PCI, Mika Westerberg, LKML

On Sun, Dec 27, 2020 at 12:03 PM Marc MERLIN <marc_nouveau@merlins.org> wrote:
>
> This started with 5.5 and hasn't gotten better since then, despite some reports
> I tried to send.
>
> As per my previous message:
> I have a Thinkpad P70 with hybrid graphics.
> 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M600M] (rev a2)
> that one works fine, I can use i915 for the main screen, and nouveau to
> display on the external ports (external ports are only wired to nvidia
> chip, so it's impossible to use them without turning the nvidia chip
> on).
>
> I now got a newer P73 also with the same hybrid graphics (setup as such
> in the bios). It runs fine with i915, and I don't need to use external
> display with nouveau for now (it almost works, but I only see the mouse
> cursor on the external screen, no window or anything else can get
> displayed, very weird).
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)

Display offload usually requires acceleration -- the copies are done
using the DMA engine. Please make sure that you have firmware
available (and a new enough mesa). The errors suggest that you don't
have firmware available at the time that nouveau loads. Depending on
your setup, that might mean the firmware has to be built into the
kernel, or available in initramfs. (Or just regular filesystem if you
don't use a complicated boot sequence. But many people go with distro
defaults, which do have this complexity.)

>
>
> after boot, when it gets the right trigger (not sure which ones), it
> loops on this evern 2 seconds, mostly forever.

The gpu suspends with runtime pm. And then gets woken up for some
reason (could be something quite silly, like lspci, or could be
something explicitly checking connectors, etc). Repeat.

Cheers,

  -ilia

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
@ 2020-12-27 18:28                     ` Ilia Mirkin
  0 siblings, 0 replies; 77+ messages in thread
From: Ilia Mirkin @ 2020-12-27 18:28 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: nouveau, Mika Westerberg, LKML, Linux PCI

On Sun, Dec 27, 2020 at 12:03 PM Marc MERLIN <marc_nouveau-xnduUnryOU1AfugRpC6u6w@public.gmane.org> wrote:
>
> This started with 5.5 and hasn't gotten better since then, despite some reports
> I tried to send.
>
> As per my previous message:
> I have a Thinkpad P70 with hybrid graphics.
> 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M600M] (rev a2)
> that one works fine, I can use i915 for the main screen, and nouveau to
> display on the external ports (external ports are only wired to nvidia
> chip, so it's impossible to use them without turning the nvidia chip
> on).
>
> I now got a newer P73 also with the same hybrid graphics (setup as such
> in the bios). It runs fine with i915, and I don't need to use external
> display with nouveau for now (it almost works, but I only see the mouse
> cursor on the external screen, no window or anything else can get
> displayed, very weird).
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)

Display offload usually requires acceleration -- the copies are done
using the DMA engine. Please make sure that you have firmware
available (and a new enough mesa). The errors suggest that you don't
have firmware available at the time that nouveau loads. Depending on
your setup, that might mean the firmware has to be built into the
kernel, or available in initramfs. (Or just regular filesystem if you
don't use a complicated boot sequence. But many people go with distro
defaults, which do have this complexity.)

>
>
> after boot, when it gets the right trigger (not sure which ones), it
> loops on this evern 2 seconds, mostly forever.

The gpu suspends with runtime pm. And then gets woken up for some
reason (could be something quite silly, like lspci, or could be
something explicitly checking connectors, etc). Repeat.

Cheers,

  -ilia

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
  2020-05-29 18:03               ` 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics? Marc MERLIN
@ 2020-12-29 15:51                   ` Marc MERLIN
  2020-12-26 11:12                   ` Marc MERLIN
  2020-12-29 15:51                   ` Marc MERLIN
  2 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-12-29 15:51 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau, Mika Westerberg, LKML, Linux PCI

On Sat, Dec 26, 2020 at 03:12:09AM -0800, Ilia Mirkin wrote:
> > after boot, when it gets the right trigger (not sure which ones), it
> > loops on this evern 2 seconds, mostly forever.
> 
> The gpu suspends with runtime pm. And then gets woken up for some
> reason (could be something quite silly, like lspci, or could be
> something explicitly checking connectors, etc). Repeat.

Ah, fair point.  Could it be powertop even?
How would I go towards tracing that?
Sounds like this would be a problem with all chips if userspace is able
to wake them up every second or two with a probe. Now I wonder what
broken userspace I have that could be doing this.
 
> Display offload usually requires acceleration -- the copies are done
> using the DMA engine. Please make sure that you have firmware
> available (and a new enough mesa). The errors suggest that you don't
> have firmware available at the time that nouveau loads. Depending on
> your setup, that might mean the firmware has to be built into the
> kernel, or available in initramfs. (Or just regular filesystem if you
> don't use a complicated boot sequence. But many people go with distro
> defaults, which do have this complexity.)

Hi Ilia, thanks for your answer.

Do you think that could be a reason why the boot would hang for 2 full minutes at every
boot ever since I upgraded to 5.5?

Also, without wanting to sound like a full newbie, where is that
firmware you're talking about? In my kernel source?

Here's what I do have:
sauron:/usr/local/bin# dpkggrep nouveau
libdrm-nouveau2:amd64				install
xserver-xorg-video-nouveau			install

no nouveau-firmware package in debian:
sauron:/usr/local/bin# apt-cache search nouveau
bumblebee - NVIDIA Optimus support for Linux
libdrm-nouveau2 - Userspace interface to nouveau-specific kernel DRM services -- runtime
xfonts-jmk - Jim Knoble's character-cell fonts for X
xserver-xorg-video-nouveau - X.Org X server -- Nouveau display driver

No firmware file on my disk:
sauron:/usr/local/bin# find /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/ /lib/firmware/ |grep nouveau
/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
sauron:/usr/local/bin# 

The kernel module is in my initrd:
sauron:/usr/local/bin# dd if=/boot/initrd.img-5.9.11-amd64-preempt-sysrq-20190817 bs=2966528  skip=1 | gunzip | cpio -tdv | grep nouveau
drwxr-xr-x   1 root     root            0 Nov 30 15:40 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
-rw-r--r--   1 root     root      3691385 Nov 30 15:35 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
17+1 records in
17+1 records out
52566778 bytes (53 MB, 50 MiB) copied, 1.69708 s, 31.0 MB/s

What am I supposed to do/check next?

Note that ultimately I only need nouveau not to hang my boot 2mn and do
PM so that the nvidia chip goes to sleep since I don't use it.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
@ 2020-12-29 15:51                   ` Marc MERLIN
  0 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2020-12-29 15:51 UTC (permalink / raw)
  To: Ilia Mirkin
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Mika Westerberg, LKML,
	Linux PCI

On Sat, Dec 26, 2020 at 03:12:09AM -0800, Ilia Mirkin wrote:
> > after boot, when it gets the right trigger (not sure which ones), it
> > loops on this evern 2 seconds, mostly forever.
> 
> The gpu suspends with runtime pm. And then gets woken up for some
> reason (could be something quite silly, like lspci, or could be
> something explicitly checking connectors, etc). Repeat.

Ah, fair point.  Could it be powertop even?
How would I go towards tracing that?
Sounds like this would be a problem with all chips if userspace is able
to wake them up every second or two with a probe. Now I wonder what
broken userspace I have that could be doing this.
 
> Display offload usually requires acceleration -- the copies are done
> using the DMA engine. Please make sure that you have firmware
> available (and a new enough mesa). The errors suggest that you don't
> have firmware available at the time that nouveau loads. Depending on
> your setup, that might mean the firmware has to be built into the
> kernel, or available in initramfs. (Or just regular filesystem if you
> don't use a complicated boot sequence. But many people go with distro
> defaults, which do have this complexity.)

Hi Ilia, thanks for your answer.

Do you think that could be a reason why the boot would hang for 2 full minutes at every
boot ever since I upgraded to 5.5?

Also, without wanting to sound like a full newbie, where is that
firmware you're talking about? In my kernel source?

Here's what I do have:
sauron:/usr/local/bin# dpkggrep nouveau
libdrm-nouveau2:amd64				install
xserver-xorg-video-nouveau			install

no nouveau-firmware package in debian:
sauron:/usr/local/bin# apt-cache search nouveau
bumblebee - NVIDIA Optimus support for Linux
libdrm-nouveau2 - Userspace interface to nouveau-specific kernel DRM services -- runtime
xfonts-jmk - Jim Knoble's character-cell fonts for X
xserver-xorg-video-nouveau - X.Org X server -- Nouveau display driver

No firmware file on my disk:
sauron:/usr/local/bin# find /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/ /lib/firmware/ |grep nouveau
/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
sauron:/usr/local/bin# 

The kernel module is in my initrd:
sauron:/usr/local/bin# dd if=/boot/initrd.img-5.9.11-amd64-preempt-sysrq-20190817 bs=2966528  skip=1 | gunzip | cpio -tdv | grep nouveau
drwxr-xr-x   1 root     root            0 Nov 30 15:40 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
-rw-r--r--   1 root     root      3691385 Nov 30 15:35 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
17+1 records in
17+1 records out
52566778 bytes (53 MB, 50 MiB) copied, 1.69708 s, 31.0 MB/s

What am I supposed to do/check next?

Note that ultimately I only need nouveau not to hang my boot 2mn and do
PM so that the nvidia chip goes to sleep since I don't use it.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
@ 2020-12-29 16:33                     ` Ilia Mirkin
  0 siblings, 0 replies; 77+ messages in thread
From: Ilia Mirkin @ 2020-12-29 16:33 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: nouveau, Mika Westerberg, LKML, Linux PCI

On Tue, Dec 29, 2020 at 10:52 AM Marc MERLIN <marc_nouveau@merlins.org> wrote:
>
> On Sat, Dec 26, 2020 at 03:12:09AM -0800, Ilia Mirkin wrote:
> > > after boot, when it gets the right trigger (not sure which ones), it
> > > loops on this evern 2 seconds, mostly forever.
> >
> > The gpu suspends with runtime pm. And then gets woken up for some
> > reason (could be something quite silly, like lspci, or could be
> > something explicitly checking connectors, etc). Repeat.
>
> Ah, fair point.  Could it be powertop even?
> How would I go towards tracing that?
> Sounds like this would be a problem with all chips if userspace is able
> to wake them up every second or two with a probe. Now I wonder what
> broken userspace I have that could be doing this.

Well, it's a theory. Some userspace helpfully prevents the GPU from
suspending entirely, unfortunately I don't remember its name though by
messing with the attached audio device. It's very common and meant to
help... oh well.

>
> > Display offload usually requires acceleration -- the copies are done
> > using the DMA engine. Please make sure that you have firmware
> > available (and a new enough mesa). The errors suggest that you don't
> > have firmware available at the time that nouveau loads. Depending on
> > your setup, that might mean the firmware has to be built into the
> > kernel, or available in initramfs. (Or just regular filesystem if you
> > don't use a complicated boot sequence. But many people go with distro
> > defaults, which do have this complexity.)
>
> Hi Ilia, thanks for your answer.
>
> Do you think that could be a reason why the boot would hang for 2 full minutes at every
> boot ever since I upgraded to 5.5?

I'd have to check, but I'm guessing TU104 acceleration became a thing
in 5.5. I would also not be very surprised if the code didn't handle
failure extremely gracefully - there definitely have been problems
with that in the past.

>
> Also, without wanting to sound like a full newbie, where is that
> firmware you're talking about? In my kernel source?
>
> Here's what I do have:
> sauron:/usr/local/bin# dpkggrep nouveau
> libdrm-nouveau2:amd64                           install
> xserver-xorg-video-nouveau                      install
>
> no nouveau-firmware package in debian:
> sauron:/usr/local/bin# apt-cache search nouveau
> bumblebee - NVIDIA Optimus support for Linux
> libdrm-nouveau2 - Userspace interface to nouveau-specific kernel DRM services -- runtime
> xfonts-jmk - Jim Knoble's character-cell fonts for X
> xserver-xorg-video-nouveau - X.Org X server -- Nouveau display driver
>
> No firmware file on my disk:
> sauron:/usr/local/bin# find /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/ /lib/firmware/ |grep nouveau
> /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
> /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
> sauron:/usr/local/bin#
>
> The kernel module is in my initrd:
> sauron:/usr/local/bin# dd if=/boot/initrd.img-5.9.11-amd64-preempt-sysrq-20190817 bs=2966528  skip=1 | gunzip | cpio -tdv | grep nouveau
> drwxr-xr-x   1 root     root            0 Nov 30 15:40 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
> -rw-r--r--   1 root     root      3691385 Nov 30 15:35 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
> 17+1 records in
> 17+1 records out
> 52566778 bytes (53 MB, 50 MiB) copied, 1.69708 s, 31.0 MB/s

I think that gets you out of "full newbie" land...

>
> What am I supposed to do/check next?
>
> Note that ultimately I only need nouveau not to hang my boot 2mn and do
> PM so that the nvidia chip goes to sleep since I don't use it.

I'm not extremely familiar with debian packaging, but the firmware is
provided by NVIDIA and shipped as part of linux-firmware:

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/nvidia

This needs to be available at /lib/firmware/nvidia when nouveau loads.
Based on your email above, it's most likely that it would load from
the initrd - so make sure it's in there.

Of course now that I read your email a bit more carefully, it seems
your issue is with the "saving config space" messages. I'm not sure
I've seen those before. Perhaps you have some sort of debug enabled.
I'd find where in the kernel they are being produced, and what the
conditions for it are. But the failure to load firmware isn't great --
not 100% sure if it impacts runpm or not.

I just double-checked, TU10x accel came in via
afa3b96b058d87c2c44d1c83dadb2ba6998d03ce, which was first in v5.6.
Initial TU10x support came in v5.0. So that doesn't line up with your
timeline.

Anyways, I'd definitely sort the firmware situation out, but it may
not be the cause of your problem.

Cheers,

  -ilia

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
@ 2020-12-29 16:33                     ` Ilia Mirkin
  0 siblings, 0 replies; 77+ messages in thread
From: Ilia Mirkin @ 2020-12-29 16:33 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: nouveau, Mika Westerberg, LKML, Linux PCI

On Tue, Dec 29, 2020 at 10:52 AM Marc MERLIN <marc_nouveau-xnduUnryOU1AfugRpC6u6w@public.gmane.org> wrote:
>
> On Sat, Dec 26, 2020 at 03:12:09AM -0800, Ilia Mirkin wrote:
> > > after boot, when it gets the right trigger (not sure which ones), it
> > > loops on this evern 2 seconds, mostly forever.
> >
> > The gpu suspends with runtime pm. And then gets woken up for some
> > reason (could be something quite silly, like lspci, or could be
> > something explicitly checking connectors, etc). Repeat.
>
> Ah, fair point.  Could it be powertop even?
> How would I go towards tracing that?
> Sounds like this would be a problem with all chips if userspace is able
> to wake them up every second or two with a probe. Now I wonder what
> broken userspace I have that could be doing this.

Well, it's a theory. Some userspace helpfully prevents the GPU from
suspending entirely, unfortunately I don't remember its name though by
messing with the attached audio device. It's very common and meant to
help... oh well.

>
> > Display offload usually requires acceleration -- the copies are done
> > using the DMA engine. Please make sure that you have firmware
> > available (and a new enough mesa). The errors suggest that you don't
> > have firmware available at the time that nouveau loads. Depending on
> > your setup, that might mean the firmware has to be built into the
> > kernel, or available in initramfs. (Or just regular filesystem if you
> > don't use a complicated boot sequence. But many people go with distro
> > defaults, which do have this complexity.)
>
> Hi Ilia, thanks for your answer.
>
> Do you think that could be a reason why the boot would hang for 2 full minutes at every
> boot ever since I upgraded to 5.5?

I'd have to check, but I'm guessing TU104 acceleration became a thing
in 5.5. I would also not be very surprised if the code didn't handle
failure extremely gracefully - there definitely have been problems
with that in the past.

>
> Also, without wanting to sound like a full newbie, where is that
> firmware you're talking about? In my kernel source?
>
> Here's what I do have:
> sauron:/usr/local/bin# dpkggrep nouveau
> libdrm-nouveau2:amd64                           install
> xserver-xorg-video-nouveau                      install
>
> no nouveau-firmware package in debian:
> sauron:/usr/local/bin# apt-cache search nouveau
> bumblebee - NVIDIA Optimus support for Linux
> libdrm-nouveau2 - Userspace interface to nouveau-specific kernel DRM services -- runtime
> xfonts-jmk - Jim Knoble's character-cell fonts for X
> xserver-xorg-video-nouveau - X.Org X server -- Nouveau display driver
>
> No firmware file on my disk:
> sauron:/usr/local/bin# find /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/ /lib/firmware/ |grep nouveau
> /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
> /lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
> sauron:/usr/local/bin#
>
> The kernel module is in my initrd:
> sauron:/usr/local/bin# dd if=/boot/initrd.img-5.9.11-amd64-preempt-sysrq-20190817 bs=2966528  skip=1 | gunzip | cpio -tdv | grep nouveau
> drwxr-xr-x   1 root     root            0 Nov 30 15:40 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
> -rw-r--r--   1 root     root      3691385 Nov 30 15:35 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
> 17+1 records in
> 17+1 records out
> 52566778 bytes (53 MB, 50 MiB) copied, 1.69708 s, 31.0 MB/s

I think that gets you out of "full newbie" land...

>
> What am I supposed to do/check next?
>
> Note that ultimately I only need nouveau not to hang my boot 2mn and do
> PM so that the nvidia chip goes to sleep since I don't use it.

I'm not extremely familiar with debian packaging, but the firmware is
provided by NVIDIA and shipped as part of linux-firmware:

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/nvidia

This needs to be available at /lib/firmware/nvidia when nouveau loads.
Based on your email above, it's most likely that it would load from
the initrd - so make sure it's in there.

Of course now that I read your email a bit more carefully, it seems
your issue is with the "saving config space" messages. I'm not sure
I've seen those before. Perhaps you have some sort of debug enabled.
I'd find where in the kernel they are being produced, and what the
conditions for it are. But the failure to load firmware isn't great --
not 100% sure if it impacts runpm or not.

I just double-checked, TU10x accel came in via
afa3b96b058d87c2c44d1c83dadb2ba6998d03ce, which was first in v5.6.
Initial TU10x support came in v5.0. So that doesn't line up with your
timeline.

Anyways, I'd definitely sort the firmware situation out, but it may
not be the cause of your problem.

Cheers,

  -ilia

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
       [not found]                     ` <CAKb7UviFP_YVxC4PO7MDNnw6NDrD=3BCGF37umwAfaimjbX9Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2020-12-29 17:47                       ` Marc MERLIN
       [not found]                         ` <20201229174750.GI23389-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
  2020-12-30 12:16                       ` ael
  1 sibling, 1 reply; 77+ messages in thread
From: Marc MERLIN @ 2020-12-29 17:47 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau

(removed other lists, since it's likely not a linux-PCI problem)

On Tue, Dec 29, 2020 at 11:33:16AM -0500, Ilia Mirkin wrote:
> > Sounds like this would be a problem with all chips if userspace is able
> > to wake them up every second or two with a probe. Now I wonder what
> > broken userspace I have that could be doing this.
> 
> Well, it's a theory. Some userspace helpfully prevents the GPU from
> suspending entirely, unfortunately I don't remember its name though by
> messing with the attached audio device. It's very common and meant to
> help... oh well.

Are you thinking about tlp maybe?  https://linrunner.de/tlp/
I submitted a blacklist patch so that it works ok-ish on my laptop now.
(when the nvidia chip is unhappy, it happily uses 70W on batteries with
1.3h of runtime. When everything is ok, I can go down to about 12W/9H)

> > Do you think that could be a reason why the boot would hang for 2 full minutes at every
> > boot ever since I upgraded to 5.5?
> 
> I'd have to check, but I'm guessing TU104 acceleration became a thing
> in 5.5. I would also not be very surprised if the code didn't handle
> failure extremely gracefully - there definitely have been problems
> with that in the past.

Ah, then the timing checks out. That's exciting, at least now I have a
lead as to why I'm having problems. This was the same time a PCI PM
change went in, and I mistakenly thought it was to blame.

> > The kernel module is in my initrd:
> > sauron:/usr/local/bin# dd if=/boot/initrd.img-5.9.11-amd64-preempt-sysrq-20190817 bs=2966528  skip=1 | gunzip | cpio -tdv | grep nouveau
> > drwxr-xr-x   1 root     root            0 Nov 30 15:40 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau
> > -rw-r--r--   1 root     root      3691385 Nov 30 15:35 usr/lib/modules/5.9.11-amd64-preempt-sysrq-20190817/kernel/drivers/gpu/drm/nouveau/nouveau.ko
> > 17+1 records in
> > 17+1 records out
> > 52566778 bytes (53 MB, 50 MiB) copied, 1.69708 s, 31.0 MB/s
> 
> I think that gets you out of "full newbie" land...

:)  (ok, I have been using linux since 1993, but stuff changes so much
all the time, that sometimes I feel like a newbie all over again)
In my days, we didn't complain about systemd vs sysvinit, we had rc.local
and it was good enough :-D

> > Note that ultimately I only need nouveau not to hang my boot 2mn and do
> > PM so that the nvidia chip goes to sleep since I don't use it.
> 
> I'm not extremely familiar with debian packaging, but the firmware is
> provided by NVIDIA and shipped as part of linux-firmware:
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/nvidia
 
Ah, it comes from outside just like intel firmware, thanks.
Also, I was looking for nouveau, not nvidia:
sauron:/usr/local/bin# dd if=/boot/initrd.img-5.9.11-amd64-preempt-sysrq-20190817 bs=2966528  skip=1 | gunzip | cpio -tdv | grep tu104
shows no match

Good news is that debian did package it (they have multiple firmware
packages)
sauron:~# dpkggrep firmware | awk '{print $1}' | xargs apt-get install -y
sauron:~# dpkg -S /lib/firmware/nvidia/tu104
firmware-misc-nonfree: /lib/firmware/nvidia/tu104

update-initramfs -v -c -k 5.9.11-amd64-preempt-sysrq-20190817

Ok, I should be in business after next reboot, thank you.

> Of course now that I read your email a bit more carefully, it seems
> your issue is with the "saving config space" messages. I'm not sure
> I've seen those before. Perhaps you have some sort of debug enabled.
> I'd find where in the kernel they are being produced, and what the
> conditions for it are. But the failure to load firmware isn't great --
> not 100% sure if it impacts runpm or not.
 
Yes, I have 'nouveau.debug=disp=trace'
Someone on this list asked me to add this a few months back.

> I just double-checked, TU10x accel came in via
> afa3b96b058d87c2c44d1c83dadb2ba6998d03ce, which was first in v5.6.
> Initial TU10x support came in v5.0. So that doesn't line up with your
> timeline.

You know, I said 5.5, maybe it was 5.6 now, it's been a little while
since those issues started.

Now we know I was missing the required firmware, it's a good place to
start, so I'll start there, thank you very much for the pointers.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
       [not found]                     ` <CAKb7UviFP_YVxC4PO7MDNnw6NDrD=3BCGF37umwAfaimjbX9Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2020-12-29 17:47                       ` Marc MERLIN
@ 2020-12-30 12:16                       ` ael
  1 sibling, 0 replies; 77+ messages in thread
From: ael @ 2020-12-30 12:16 UTC (permalink / raw)
  To: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Tue, Dec 29, 2020 at 11:33:16AM -0500, Ilia Mirkin wrote:
> On Tue, Dec 29, 2020 at 10:52 AM Marc MERLIN <marc_nouveau-xnduUnryOU1AfugRpC6u6w@public.gmane.org> wrote:
> 
> I'm not extremely familiar with debian packaging, but the firmware is
> provided by NVIDIA and shipped as part of linux-firmware:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/nvidia

I think it may be  firmware-misc-nonfree.

ael

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
       [not found]                         ` <20201229174750.GI23389-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
@ 2021-01-04 11:49                           ` Marc MERLIN
       [not found]                             ` <20210104114955.GM32533-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
  0 siblings, 1 reply; 77+ messages in thread
From: Marc MERLIN @ 2021-01-04 11:49 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau

On Tue, Dec 29, 2020 at 09:47:50AM -0800, Marc MERLIN wrote:
> > Of course now that I read your email a bit more carefully, it seems
> > your issue is with the "saving config space" messages. I'm not sure
> > I've seen those before. Perhaps you have some sort of debug enabled.
> > I'd find where in the kernel they are being produced, and what the
> > conditions for it are. But the failure to load firmware isn't great --
> > not 100% sure if it impacts runpm or not.
>  
> Yes, I have 'nouveau.debug=disp=trace'
> Someone on this list asked me to add this a few months back.
> 
> > I just double-checked, TU10x accel came in via
> > afa3b96b058d87c2c44d1c83dadb2ba6998d03ce, which was first in v5.6.
> > Initial TU10x support came in v5.0. So that doesn't line up with your
> > timeline.
> 
> You know, I said 5.5, maybe it was 5.6 now, it's been a little while
> since those issues started.
> 
> Now we know I was missing the required firmware, it's a good place to
> start, so I'll start there, thank you very much for the pointers.

Sorry for the delay. I rebooted and everything worked great.
No hang at boot.
As for the PME loop I've been seeing, it hasn't happened so far.

I can't comment on whether firmware should be required for the kernel to
boot properly, but if it's at all possible, please try to make the
driver fall back or shut down if the firmware is absent as opposed to
hanging the boot 2mn.

Also some drivers give a better clue that their firmware is missing
and where to get it from. Adding a printk to help users could be a good
idea.

Below is the boot with firmware present.

Thanks for your help
Marc

sauron:~$ grep nouveau /var/log/dmesg 
[   11.016605] nouveau: detected PR support, will not use DSM
[   11.025191] nouveau 0000:01:00.0: runtime IRQ mapping not provided by arch
[   11.071823] nouveau 0000:01:00.0: enabling device (0000 -> 0003)
[   11.111588] nouveau 0000:01:00.0: NVIDIA TU104 (164000a1)
[   11.203598] nouveau 0000:01:00.0: bios: version 90.04.4d.00.2c
[   11.203921] nouveau 0000:01:00.0: pmu: firmware unavailable
[   11.204229] nouveau 0000:01:00.0: enabling bus mastering
[   11.204543] nouveau 0000:01:00.0: fb: 8192 MiB GDDR6
[   11.215524] nouveau 0000:01:00.0: DRM: VRAM: 8192 MiB
[   11.215525] nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
[   11.215527] nouveau 0000:01:00.0: DRM: BIT table 'A' not found
[   11.215527] nouveau 0000:01:00.0: DRM: BIT table 'L' not found
[   11.215528] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[   11.215529] nouveau 0000:01:00.0: DRM: DCB version 4.1
[   11.215530] nouveau 0000:01:00.0: DRM: DCB outp 00: 02800f66 04600020
[   11.215531] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f52 00020010
[   11.215532] nouveau 0000:01:00.0: DRM: DCB outp 02: 01022f36 04600010
[   11.215532] nouveau 0000:01:00.0: DRM: DCB outp 03: 04033f76 04600010
[   11.215533] nouveau 0000:01:00.0: DRM: DCB outp 04: 04044f86 04600020
[   11.215533] nouveau 0000:01:00.0: DRM: DCB conn 00: 00020047
[   11.215534] nouveau 0000:01:00.0: DRM: DCB conn 01: 00010161
[   11.215534] nouveau 0000:01:00.0: DRM: DCB conn 02: 00001248
[   11.215535] nouveau 0000:01:00.0: DRM: DCB conn 03: 01000348
[   11.215535] nouveau 0000:01:00.0: DRM: DCB conn 04: 02000471
[   11.216166] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
[   11.526753] nouveau 0000:01:00.0: DRM: unknown connector type 48
[   11.527077] nouveau 0000:01:00.0: DRM: unknown connector type 48
[   11.552051] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
[   11.554239] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
[   11.555822] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
[   11.556054] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1
[   11.556060] nouveau 0000:01:00.0: DRM: Disabling PCI power management to avoid bug
[   18.887229] nouveau 0000:01:00.0: saving config space at offset 0x0 (reading 0x1eb610de)
[   18.887231] nouveau 0000:01:00.0: saving config space at offset 0x4 (reading 0x100407)
[   18.887233] nouveau 0000:01:00.0: saving config space at offset 0x8 (reading 0x30000a1)
[   18.887235] nouveau 0000:01:00.0: saving config space at offset 0xc (reading 0x800000)
[   18.887237] nouveau 0000:01:00.0: saving config space at offset 0x10 (reading 0xcd000000)
[   18.887239] nouveau 0000:01:00.0: saving config space at offset 0x14 (reading 0xa000000c)
[   18.887241] nouveau 0000:01:00.0: saving config space at offset 0x18 (reading 0x0)
[   18.887243] nouveau 0000:01:00.0: saving config space at offset 0x1c (reading 0xb000000c)
[   18.887245] nouveau 0000:01:00.0: saving config space at offset 0x20 (reading 0x0)
[   18.887247] nouveau 0000:01:00.0: saving config space at offset 0x24 (reading 0x2001)
[   18.887249] nouveau 0000:01:00.0: saving config space at offset 0x28 (reading 0x0)
[   18.887251] nouveau 0000:01:00.0: saving config space at offset 0x2c (reading 0x229b17aa)
[   18.887253] nouveau 0000:01:00.0: saving config space at offset 0x30 (reading 0xfff80000)
[   18.887255] nouveau 0000:01:00.0: saving config space at offset 0x34 (reading 0x60)
[   18.887257] nouveau 0000:01:00.0: saving config space at offset 0x38 (reading 0x0)
[   18.887259] nouveau 0000:01:00.0: saving config space at offset 0x3c (reading 0x1ff)
[   18.887311] nouveau 0000:01:00.0: power state changed by ACPI to D3cold
[   42.094494] nouveau 0000:01:00.0: power state changed by ACPI to D0
[   42.094663] nouveau 0000:01:00.0: restoring config space at offset 0x3c (was 0x100, writing 0x1ff)
[   42.094679] nouveau 0000:01:00.0: restoring config space at offset 0x30 (was 0x0, writing 0xfff80000)
[   42.094699] nouveau 0000:01:00.0: restoring config space at offset 0x24 (was 0x1, writing 0x2001)
[   42.094721] nouveau 0000:01:00.0: restoring config space at offset 0x1c (was 0xc, writing 0xb000000c)
[   42.094738] nouveau 0000:01:00.0: restoring config space at offset 0x14 (was 0xc, writing 0xa000000c)
[   42.094769] nouveau 0000:01:00.0: restoring config space at offset 0x10 (was 0x0, writing 0xcd000000)
[   42.094792] nouveau 0000:01:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
[   42.538785] snd_hda_intel 0000:01:00.1: bound 0000:01:00.0 (ops nv50_audio_component_bind_ops [nouveau])

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
       [not found]                             ` <20210104114955.GM32533-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
@ 2021-01-04 13:28                               ` Karol Herbst
       [not found]                                 ` <CACO55tsdG37YKv7FV2er4hRnXk9vmwMbPuPptA+=ZtziWXC2+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 77+ messages in thread
From: Karol Herbst @ 2021-01-04 13:28 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: nouveau

mhh, that PCI config stuff should really not happen all the time, but
it also doesn't appear to. The other thing I really don't know is, how
well the runpm works with tools like TLP if there isn't only an audio
device, but also the USB stuff and all the subdevices have to be
turned off all the time in order for the GPU to stay powered down.

The firmware stuff is also just a functional problem, so you won't get
display offloading, but it shouldn't drain your battery as long as
nothing is connected. I'd check with "grep .
/sys/bus/pci/devices/*/power/runtime_status" if all subdevices of the
GPU are powered down, and check which one gets enabled regularly or
something.

On Mon, Jan 4, 2021 at 12:50 PM Marc MERLIN <marc_nouveau-xnduUnryOU1AfugRpC6u6w@public.gmane.org> wrote:
>
> On Tue, Dec 29, 2020 at 09:47:50AM -0800, Marc MERLIN wrote:
> > > Of course now that I read your email a bit more carefully, it seems
> > > your issue is with the "saving config space" messages. I'm not sure
> > > I've seen those before. Perhaps you have some sort of debug enabled.
> > > I'd find where in the kernel they are being produced, and what the
> > > conditions for it are. But the failure to load firmware isn't great --
> > > not 100% sure if it impacts runpm or not.
> >
> > Yes, I have 'nouveau.debug=disp=trace'
> > Someone on this list asked me to add this a few months back.
> >
> > > I just double-checked, TU10x accel came in via
> > > afa3b96b058d87c2c44d1c83dadb2ba6998d03ce, which was first in v5.6.
> > > Initial TU10x support came in v5.0. So that doesn't line up with your
> > > timeline.
> >
> > You know, I said 5.5, maybe it was 5.6 now, it's been a little while
> > since those issues started.
> >
> > Now we know I was missing the required firmware, it's a good place to
> > start, so I'll start there, thank you very much for the pointers.
>
> Sorry for the delay. I rebooted and everything worked great.
> No hang at boot.
> As for the PME loop I've been seeing, it hasn't happened so far.
>
> I can't comment on whether firmware should be required for the kernel to
> boot properly, but if it's at all possible, please try to make the
> driver fall back or shut down if the firmware is absent as opposed to
> hanging the boot 2mn.
>
> Also some drivers give a better clue that their firmware is missing
> and where to get it from. Adding a printk to help users could be a good
> idea.
>
> Below is the boot with firmware present.
>
> Thanks for your help
> Marc
>
> sauron:~$ grep nouveau /var/log/dmesg
> [   11.016605] nouveau: detected PR support, will not use DSM
> [   11.025191] nouveau 0000:01:00.0: runtime IRQ mapping not provided by arch
> [   11.071823] nouveau 0000:01:00.0: enabling device (0000 -> 0003)
> [   11.111588] nouveau 0000:01:00.0: NVIDIA TU104 (164000a1)
> [   11.203598] nouveau 0000:01:00.0: bios: version 90.04.4d.00.2c
> [   11.203921] nouveau 0000:01:00.0: pmu: firmware unavailable
> [   11.204229] nouveau 0000:01:00.0: enabling bus mastering
> [   11.204543] nouveau 0000:01:00.0: fb: 8192 MiB GDDR6
> [   11.215524] nouveau 0000:01:00.0: DRM: VRAM: 8192 MiB
> [   11.215525] nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
> [   11.215527] nouveau 0000:01:00.0: DRM: BIT table 'A' not found
> [   11.215527] nouveau 0000:01:00.0: DRM: BIT table 'L' not found
> [   11.215528] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
> [   11.215529] nouveau 0000:01:00.0: DRM: DCB version 4.1
> [   11.215530] nouveau 0000:01:00.0: DRM: DCB outp 00: 02800f66 04600020
> [   11.215531] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f52 00020010
> [   11.215532] nouveau 0000:01:00.0: DRM: DCB outp 02: 01022f36 04600010
> [   11.215532] nouveau 0000:01:00.0: DRM: DCB outp 03: 04033f76 04600010
> [   11.215533] nouveau 0000:01:00.0: DRM: DCB outp 04: 04044f86 04600020
> [   11.215533] nouveau 0000:01:00.0: DRM: DCB conn 00: 00020047
> [   11.215534] nouveau 0000:01:00.0: DRM: DCB conn 01: 00010161
> [   11.215534] nouveau 0000:01:00.0: DRM: DCB conn 02: 00001248
> [   11.215535] nouveau 0000:01:00.0: DRM: DCB conn 03: 01000348
> [   11.215535] nouveau 0000:01:00.0: DRM: DCB conn 04: 02000471
> [   11.216166] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
> [   11.526753] nouveau 0000:01:00.0: DRM: unknown connector type 48
> [   11.527077] nouveau 0000:01:00.0: DRM: unknown connector type 48
> [   11.552051] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
> [   11.554239] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
> [   11.555822] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
> [   11.556054] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1
> [   11.556060] nouveau 0000:01:00.0: DRM: Disabling PCI power management to avoid bug
> [   18.887229] nouveau 0000:01:00.0: saving config space at offset 0x0 (reading 0x1eb610de)
> [   18.887231] nouveau 0000:01:00.0: saving config space at offset 0x4 (reading 0x100407)
> [   18.887233] nouveau 0000:01:00.0: saving config space at offset 0x8 (reading 0x30000a1)
> [   18.887235] nouveau 0000:01:00.0: saving config space at offset 0xc (reading 0x800000)
> [   18.887237] nouveau 0000:01:00.0: saving config space at offset 0x10 (reading 0xcd000000)
> [   18.887239] nouveau 0000:01:00.0: saving config space at offset 0x14 (reading 0xa000000c)
> [   18.887241] nouveau 0000:01:00.0: saving config space at offset 0x18 (reading 0x0)
> [   18.887243] nouveau 0000:01:00.0: saving config space at offset 0x1c (reading 0xb000000c)
> [   18.887245] nouveau 0000:01:00.0: saving config space at offset 0x20 (reading 0x0)
> [   18.887247] nouveau 0000:01:00.0: saving config space at offset 0x24 (reading 0x2001)
> [   18.887249] nouveau 0000:01:00.0: saving config space at offset 0x28 (reading 0x0)
> [   18.887251] nouveau 0000:01:00.0: saving config space at offset 0x2c (reading 0x229b17aa)
> [   18.887253] nouveau 0000:01:00.0: saving config space at offset 0x30 (reading 0xfff80000)
> [   18.887255] nouveau 0000:01:00.0: saving config space at offset 0x34 (reading 0x60)
> [   18.887257] nouveau 0000:01:00.0: saving config space at offset 0x38 (reading 0x0)
> [   18.887259] nouveau 0000:01:00.0: saving config space at offset 0x3c (reading 0x1ff)
> [   18.887311] nouveau 0000:01:00.0: power state changed by ACPI to D3cold
> [   42.094494] nouveau 0000:01:00.0: power state changed by ACPI to D0
> [   42.094663] nouveau 0000:01:00.0: restoring config space at offset 0x3c (was 0x100, writing 0x1ff)
> [   42.094679] nouveau 0000:01:00.0: restoring config space at offset 0x30 (was 0x0, writing 0xfff80000)
> [   42.094699] nouveau 0000:01:00.0: restoring config space at offset 0x24 (was 0x1, writing 0x2001)
> [   42.094721] nouveau 0000:01:00.0: restoring config space at offset 0x1c (was 0xc, writing 0xb000000c)
> [   42.094738] nouveau 0000:01:00.0: restoring config space at offset 0x14 (was 0xc, writing 0xa000000c)
> [   42.094769] nouveau 0000:01:00.0: restoring config space at offset 0x10 (was 0x0, writing 0xcd000000)
> [   42.094792] nouveau 0000:01:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
> [   42.538785] snd_hda_intel 0000:01:00.1: bound 0000:01:00.0 (ops nv50_audio_component_bind_ops [nouveau])
>
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>
> Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
> _______________________________________________
> Nouveau mailing list
> Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
>

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
       [not found]                                 ` <CACO55tsdG37YKv7FV2er4hRnXk9vmwMbPuPptA+=ZtziWXC2+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2021-01-07 11:49                                   ` Marc MERLIN
  0 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2021-01-07 11:49 UTC (permalink / raw)
  To: Karol Herbst; +Cc: nouveau

On Mon, Jan 04, 2021 at 02:28:37PM +0100, Karol Herbst wrote:
> mhh, that PCI config stuff should really not happen all the time, but
> it also doesn't appear to. The other thing I really don't know is, how
> well the runpm works with tools like TLP if there isn't only an audio
> device, but also the USB stuff and all the subdevices have to be
> turned off all the time in order for the GPU to stay powered down.
> 
> The firmware stuff is also just a functional problem, so you won't get
> display offloading, but it shouldn't drain your battery as long as
> nothing is connected. I'd check with "grep .
> /sys/bus/pci/devices/*/power/runtime_status" if all subdevices of the
> GPU are powered down, and check which one gets enabled regularly or
> something.

Well, all I can say is that without the firmware, my boot hung 2mn every
single time (I sent details in the logs upthread).

The battery draw issue was inconsistent. I haven't quite found what
triggers it yet.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
@ 2021-01-27 21:33                     ` Bjorn Helgaas
  0 siblings, 0 replies; 77+ messages in thread
From: Bjorn Helgaas @ 2021-01-27 21:33 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: nouveau, Mika Westerberg, LKML, Linux PCI

Hi Marc, I appreciate your persistence on this.  I am frankly
surprised that you've put up with this so long.

On Sat, Dec 26, 2020 at 03:12:09AM -0800, Marc MERLIN wrote:
> This started with 5.5 and hasn't gotten better since then, despite
> some reports I tried to send.
> 
> As per my previous message:
> I have a Thinkpad P70 with hybrid graphics.
> 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M600M] (rev a2)
> that one works fine, I can use i915 for the main screen, and nouveau to
> display on the external ports (external ports are only wired to nvidia
> chip, so it's impossible to use them without turning the nvidia chip
> on).
>  
> I now got a newer P73 also with the same hybrid graphics (setup as such
> in the bios). It runs fine with i915, and I don't need to use external
> display with nouveau for now (it almost works, but I only see the mouse
> cursor on the external screen, no window or anything else can get
> displayed, very weird).
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
>  
> 
> after boot, when it gets the right trigger (not sure which ones), it
> loops on this evern 2 seconds, mostly forever.
> 
> I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or something else.

IIUC there are basically two problems:

  1) A 2 minute delay during boot
  2) Some sort of event every 2 seconds that kills your battery life

Your machine doesn't sound unusual, and I haven't seen a flood of
similar reports, so maybe there's something unusual about your config.
But I really don't have any guesses for either one.

It sounds like v5.5 worked fine and you first noticed the slow boot
problem in v5.8.  We *could* try to bisect it, but I know that's a lot
of work on your part.

Grasping for any ideas for the boot delay; could you boot with
"initcall_debug" and collect your "lsmod" output?  I notice async_tx
in some of your logs, but I have no idea what it is.  It's from
crypto, so possibly somewhat unusual?

Bjorn

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
@ 2021-01-27 21:33                     ` Bjorn Helgaas
  0 siblings, 0 replies; 77+ messages in thread
From: Bjorn Helgaas @ 2021-01-27 21:33 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Mika Westerberg, LKML,
	Linux PCI

Hi Marc, I appreciate your persistence on this.  I am frankly
surprised that you've put up with this so long.

On Sat, Dec 26, 2020 at 03:12:09AM -0800, Marc MERLIN wrote:
> This started with 5.5 and hasn't gotten better since then, despite
> some reports I tried to send.
> 
> As per my previous message:
> I have a Thinkpad P70 with hybrid graphics.
> 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M600M] (rev a2)
> that one works fine, I can use i915 for the main screen, and nouveau to
> display on the external ports (external ports are only wired to nvidia
> chip, so it's impossible to use them without turning the nvidia chip
> on).
>  
> I now got a newer P73 also with the same hybrid graphics (setup as such
> in the bios). It runs fine with i915, and I don't need to use external
> display with nouveau for now (it almost works, but I only see the mouse
> cursor on the external screen, no window or anything else can get
> displayed, very weird).
> 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
>  
> 
> after boot, when it gets the right trigger (not sure which ones), it
> loops on this evern 2 seconds, mostly forever.
> 
> I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or something else.

IIUC there are basically two problems:

  1) A 2 minute delay during boot
  2) Some sort of event every 2 seconds that kills your battery life

Your machine doesn't sound unusual, and I haven't seen a flood of
similar reports, so maybe there's something unusual about your config.
But I really don't have any guesses for either one.

It sounds like v5.5 worked fine and you first noticed the slow boot
problem in v5.8.  We *could* try to bisect it, but I know that's a lot
of work on your part.

Grasping for any ideas for the boot delay; could you boot with
"initcall_debug" and collect your "lsmod" output?  I notice async_tx
in some of your logs, but I have no idea what it is.  It's from
crypto, so possibly somewhat unusual?

Bjorn

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
  2021-01-27 21:33                     ` Bjorn Helgaas
@ 2021-01-28 20:59                       ` Bjorn Helgaas
  -1 siblings, 0 replies; 77+ messages in thread
From: Bjorn Helgaas @ 2021-01-28 20:59 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: nouveau, Mika Westerberg, LKML, Linux PCI

On Wed, Jan 27, 2021 at 03:33:02PM -0600, Bjorn Helgaas wrote:
> On Sat, Dec 26, 2020 at 03:12:09AM -0800, Marc MERLIN wrote:
> > This started with 5.5 and hasn't gotten better since then, despite
> > some reports I tried to send.
> > 
> > As per my previous message:
> > I have a Thinkpad P70 with hybrid graphics.
> > 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M600M] (rev a2)
> > that one works fine, I can use i915 for the main screen, and nouveau to
> > display on the external ports (external ports are only wired to nvidia
> > chip, so it's impossible to use them without turning the nvidia chip
> > on).
> >  
> > I now got a newer P73 also with the same hybrid graphics (setup as such
> > in the bios). It runs fine with i915, and I don't need to use external
> > display with nouveau for now (it almost works, but I only see the mouse
> > cursor on the external screen, no window or anything else can get
> > displayed, very weird).
> > 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
> >  
> > 
> > after boot, when it gets the right trigger (not sure which ones), it
> > loops on this evern 2 seconds, mostly forever.
> > 
> > I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or something else.
> 
> IIUC there are basically two problems:
> 
>   1) A 2 minute delay during boot
>   2) Some sort of event every 2 seconds that kills your battery life
> 
> Your machine doesn't sound unusual, and I haven't seen a flood of
> similar reports, so maybe there's something unusual about your config.
> But I really don't have any guesses for either one.
> 
> It sounds like v5.5 worked fine and you first noticed the slow boot
> problem in v5.8.  We *could* try to bisect it, but I know that's a lot
> of work on your part.
> 
> Grasping for any ideas for the boot delay; could you boot with
> "initcall_debug" and collect your "lsmod" output?  I notice async_tx
> in some of your logs, but I have no idea what it is.  It's from
> crypto, so possibly somewhat unusual?

Another random thought: is there any chance the boot delay could be
related to crypto waiting for entropy?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Nouveau] 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
@ 2021-01-28 20:59                       ` Bjorn Helgaas
  0 siblings, 0 replies; 77+ messages in thread
From: Bjorn Helgaas @ 2021-01-28 20:59 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: nouveau, Mika Westerberg, LKML, Linux PCI

On Wed, Jan 27, 2021 at 03:33:02PM -0600, Bjorn Helgaas wrote:
> On Sat, Dec 26, 2020 at 03:12:09AM -0800, Marc MERLIN wrote:
> > This started with 5.5 and hasn't gotten better since then, despite
> > some reports I tried to send.
> > 
> > As per my previous message:
> > I have a Thinkpad P70 with hybrid graphics.
> > 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M600M] (rev a2)
> > that one works fine, I can use i915 for the main screen, and nouveau to
> > display on the external ports (external ports are only wired to nvidia
> > chip, so it's impossible to use them without turning the nvidia chip
> > on).
> >  
> > I now got a newer P73 also with the same hybrid graphics (setup as such
> > in the bios). It runs fine with i915, and I don't need to use external
> > display with nouveau for now (it almost works, but I only see the mouse
> > cursor on the external screen, no window or anything else can get
> > displayed, very weird).
> > 01:00.0 VGA compatible controller: NVIDIA Corporation TU104GLM [Quadro RTX 4000 Mobile / Max-Q] (rev a1)
> >  
> > 
> > after boot, when it gets the right trigger (not sure which ones), it
> > loops on this evern 2 seconds, mostly forever.
> > 
> > I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or something else.
> 
> IIUC there are basically two problems:
> 
>   1) A 2 minute delay during boot
>   2) Some sort of event every 2 seconds that kills your battery life
> 
> Your machine doesn't sound unusual, and I haven't seen a flood of
> similar reports, so maybe there's something unusual about your config.
> But I really don't have any guesses for either one.
> 
> It sounds like v5.5 worked fine and you first noticed the slow boot
> problem in v5.8.  We *could* try to bisect it, but I know that's a lot
> of work on your part.
> 
> Grasping for any ideas for the boot delay; could you boot with
> "initcall_debug" and collect your "lsmod" output?  I notice async_tx
> in some of your logs, but I have no idea what it is.  It's from
> crypto, so possibly somewhat unusual?

Another random thought: is there any chance the boot delay could be
related to crypto waiting for entropy?
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
  2021-01-27 21:33                     ` Bjorn Helgaas
@ 2021-01-29  0:56                       ` Marc MERLIN
  -1 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2021-01-29  0:56 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: nouveau, Mika Westerberg, LKML, Linux PCI

On Wed, Jan 27, 2021 at 03:33:00PM -0600, Bjorn Helgaas wrote:
> Hi Marc, I appreciate your persistence on this.  I am frankly
> surprised that you've put up with this so long.
 
Well, been using linux for 27 years, but also it's not like I have much
of a choice outside of switching to windows, as tempting as it's getting
sometimes ;)

> > after boot, when it gets the right trigger (not sure which ones), it
> > loops on this evern 2 seconds, mostly forever.
> > 
> > I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or something else.
> 
> IIUC there are basically two problems:
> 
>   1) A 2 minute delay during boot
> Another random thought: is there any chance the boot delay could be
> related to crypto waiting for entropy?

So, the 2mn hang went away after I added the nouveau firwmare in initrd.
The only problem is that the nouveau driver does not give a very good
clue as to what's going on and what to do.
For comparison the intel iwlwifi driver is very clear about firmware
it's trying to load, if it can't and what exact firmware you need to
find on the internet (filename)

>   2) Some sort of event every 2 seconds that kills your battery life
> Your machine doesn't sound unusual, and I haven't seen a flood of
> similar reports, so maybe there's something unusual about your config.
> But I really don't have any guesses for either one.

Honestly, there are not too many thinpad P73 running linux out there. I
wouldn't be surprised if it's only a handful or two.

> It sounds like v5.5 worked fine and you first noticed the slow boot
> problem in v5.8.  We *could* try to bisect it, but I know that's a lot
> of work on your part.

I've done that in the past, to be honest now that it works after I added
the firmware that nouveau started needing, and didn't need before, the
hang at boot is gone for sure.
The PCI PM wakeup issues on batteries happen sometimes still, but they
are much more rare now.

> Grasping for any ideas for the boot delay; could you boot with
> "initcall_debug" and collect your "lsmod" output?  I notice async_tx
> in some of your logs, but I have no idea what it is.  It's from
> crypto, so possibly somewhat unusual?

Is this still neeeded? I think of nouveau does a better job of helping
the user correct the issue if firmware is missing (I think intel even
gives a URL in printk), that would probably be what's needed for the
most part.

[   12.832547] async_tx: api initialized (async) comes from ./crypto/async_tx/async_tx.c

Thanks for your answer, let me know if there is anything else useful I
can give, I think I'm otherwise mostly ok now.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Nouveau] 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
@ 2021-01-29  0:56                       ` Marc MERLIN
  0 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2021-01-29  0:56 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: nouveau, Mika Westerberg, LKML, Linux PCI

On Wed, Jan 27, 2021 at 03:33:00PM -0600, Bjorn Helgaas wrote:
> Hi Marc, I appreciate your persistence on this.  I am frankly
> surprised that you've put up with this so long.
 
Well, been using linux for 27 years, but also it's not like I have much
of a choice outside of switching to windows, as tempting as it's getting
sometimes ;)

> > after boot, when it gets the right trigger (not sure which ones), it
> > loops on this evern 2 seconds, mostly forever.
> > 
> > I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or something else.
> 
> IIUC there are basically two problems:
> 
>   1) A 2 minute delay during boot
> Another random thought: is there any chance the boot delay could be
> related to crypto waiting for entropy?

So, the 2mn hang went away after I added the nouveau firwmare in initrd.
The only problem is that the nouveau driver does not give a very good
clue as to what's going on and what to do.
For comparison the intel iwlwifi driver is very clear about firmware
it's trying to load, if it can't and what exact firmware you need to
find on the internet (filename)

>   2) Some sort of event every 2 seconds that kills your battery life
> Your machine doesn't sound unusual, and I haven't seen a flood of
> similar reports, so maybe there's something unusual about your config.
> But I really don't have any guesses for either one.

Honestly, there are not too many thinpad P73 running linux out there. I
wouldn't be surprised if it's only a handful or two.

> It sounds like v5.5 worked fine and you first noticed the slow boot
> problem in v5.8.  We *could* try to bisect it, but I know that's a lot
> of work on your part.

I've done that in the past, to be honest now that it works after I added
the firmware that nouveau started needing, and didn't need before, the
hang at boot is gone for sure.
The PCI PM wakeup issues on batteries happen sometimes still, but they
are much more rare now.

> Grasping for any ideas for the boot delay; could you boot with
> "initcall_debug" and collect your "lsmod" output?  I notice async_tx
> in some of your logs, but I have no idea what it is.  It's from
> crypto, so possibly somewhat unusual?

Is this still neeeded? I think of nouveau does a better job of helping
the user correct the issue if firmware is missing (I think intel even
gives a URL in printk), that would probably be what's needed for the
most part.

[   12.832547] async_tx: api initialized (async) comes from ./crypto/async_tx/async_tx.c

Thanks for your answer, let me know if there is anything else useful I
can give, I think I'm otherwise mostly ok now.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
  2021-01-29  0:56                       ` [Nouveau] " Marc MERLIN
@ 2021-01-29 21:20                         ` Bjorn Helgaas
  -1 siblings, 0 replies; 77+ messages in thread
From: Bjorn Helgaas @ 2021-01-29 21:20 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: nouveau, Mika Westerberg, LKML, Linux PCI

On Thu, Jan 28, 2021 at 04:56:26PM -0800, Marc MERLIN wrote:
> On Wed, Jan 27, 2021 at 03:33:00PM -0600, Bjorn Helgaas wrote:
> > Hi Marc, I appreciate your persistence on this.  I am frankly
> > surprised that you've put up with this so long.
>  
> Well, been using linux for 27 years, but also it's not like I have much
> of a choice outside of switching to windows, as tempting as it's getting
> sometimes ;)
> 
> > > after boot, when it gets the right trigger (not sure which ones), it
> > > loops on this evern 2 seconds, mostly forever.
> > > 
> > > I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or something else.
> > 
> > IIUC there are basically two problems:
> > 
> >   1) A 2 minute delay during boot
> > Another random thought: is there any chance the boot delay could be
> > related to crypto waiting for entropy?
> 
> So, the 2mn hang went away after I added the nouveau firwmare in initrd.
> The only problem is that the nouveau driver does not give a very good
> clue as to what's going on and what to do.
>
> For comparison the intel iwlwifi driver is very clear about firmware
> it's trying to load, if it can't and what exact firmware you need to
> find on the internet (filename)

I guess you're referring to this in iwl_request_firmware()?

  IWL_ERR(drv, "check git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git\n"); 

How can we fix this in nouveau so we don't have the debug this again?
I don't really know how firmware loading works, but "git grep -A5
request_firmware drivers/gpu/drm/nouveau/" shows that we generally
print something when request_firmware() fails.

But I didn't notice those messages in your logs, so I'm probably
barking up the wrong tree.

> >   2) Some sort of event every 2 seconds that kills your battery life
> > Your machine doesn't sound unusual, and I haven't seen a flood of
> > similar reports, so maybe there's something unusual about your config.
> > But I really don't have any guesses for either one.
> 
> Honestly, there are not too many thinpad P73 running linux out there. I
> wouldn't be surprised if it's only a handful or two.
> 
> > It sounds like v5.5 worked fine and you first noticed the slow boot
> > problem in v5.8.  We *could* try to bisect it, but I know that's a lot
> > of work on your part.
> 
> I've done that in the past, to be honest now that it works after I added
> the firmware that nouveau started needing, and didn't need before, the
> hang at boot is gone for sure.
> The PCI PM wakeup issues on batteries happen sometimes still, but they
> are much more rare now.

So maybe the wakeups are related to having vs not having the nouveau
firmware?  I'm still curious about that, and it smells like a bug to
me, but probably something to do with nouveau where I have no hope of
debugging it.

> > Grasping for any ideas for the boot delay; could you boot with
> > "initcall_debug" and collect your "lsmod" output?  I notice async_tx
> > in some of your logs, but I have no idea what it is.  It's from
> > crypto, so possibly somewhat unusual?
> 
> Is this still neeeded? I think of nouveau does a better job of helping
> the user correct the issue if firmware is missing (I think intel even
> gives a URL in printk), that would probably be what's needed for the
> most part.

Nope, don't bother with this, thanks.

Bjorn

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Nouveau] 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
@ 2021-01-29 21:20                         ` Bjorn Helgaas
  0 siblings, 0 replies; 77+ messages in thread
From: Bjorn Helgaas @ 2021-01-29 21:20 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: nouveau, Mika Westerberg, LKML, Linux PCI

On Thu, Jan 28, 2021 at 04:56:26PM -0800, Marc MERLIN wrote:
> On Wed, Jan 27, 2021 at 03:33:00PM -0600, Bjorn Helgaas wrote:
> > Hi Marc, I appreciate your persistence on this.  I am frankly
> > surprised that you've put up with this so long.
>  
> Well, been using linux for 27 years, but also it's not like I have much
> of a choice outside of switching to windows, as tempting as it's getting
> sometimes ;)
> 
> > > after boot, when it gets the right trigger (not sure which ones), it
> > > loops on this evern 2 seconds, mostly forever.
> > > 
> > > I'm not sure if it's nouveau's fault or the kernel's PCI PME's fault, or something else.
> > 
> > IIUC there are basically two problems:
> > 
> >   1) A 2 minute delay during boot
> > Another random thought: is there any chance the boot delay could be
> > related to crypto waiting for entropy?
> 
> So, the 2mn hang went away after I added the nouveau firwmare in initrd.
> The only problem is that the nouveau driver does not give a very good
> clue as to what's going on and what to do.
>
> For comparison the intel iwlwifi driver is very clear about firmware
> it's trying to load, if it can't and what exact firmware you need to
> find on the internet (filename)

I guess you're referring to this in iwl_request_firmware()?

  IWL_ERR(drv, "check git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git\n"); 

How can we fix this in nouveau so we don't have the debug this again?
I don't really know how firmware loading works, but "git grep -A5
request_firmware drivers/gpu/drm/nouveau/" shows that we generally
print something when request_firmware() fails.

But I didn't notice those messages in your logs, so I'm probably
barking up the wrong tree.

> >   2) Some sort of event every 2 seconds that kills your battery life
> > Your machine doesn't sound unusual, and I haven't seen a flood of
> > similar reports, so maybe there's something unusual about your config.
> > But I really don't have any guesses for either one.
> 
> Honestly, there are not too many thinpad P73 running linux out there. I
> wouldn't be surprised if it's only a handful or two.
> 
> > It sounds like v5.5 worked fine and you first noticed the slow boot
> > problem in v5.8.  We *could* try to bisect it, but I know that's a lot
> > of work on your part.
> 
> I've done that in the past, to be honest now that it works after I added
> the firmware that nouveau started needing, and didn't need before, the
> hang at boot is gone for sure.
> The PCI PM wakeup issues on batteries happen sometimes still, but they
> are much more rare now.

So maybe the wakeups are related to having vs not having the nouveau
firmware?  I'm still curious about that, and it smells like a bug to
me, but probably something to do with nouveau where I have no hope of
debugging it.

> > Grasping for any ideas for the boot delay; could you boot with
> > "initcall_debug" and collect your "lsmod" output?  I notice async_tx
> > in some of your logs, but I have no idea what it is.  It's from
> > crypto, so possibly somewhat unusual?
> 
> Is this still neeeded? I think of nouveau does a better job of helping
> the user correct the issue if firmware is missing (I think intel even
> gives a URL in printk), that would probably be what's needed for the
> most part.

Nope, don't bother with this, thanks.

Bjorn
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
  2021-01-29 21:20                         ` [Nouveau] " Bjorn Helgaas
@ 2021-01-30  2:04                           ` Marc MERLIN
  -1 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2021-01-30  2:04 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: nouveau, Mika Westerberg, LKML, Linux PCI

On Fri, Jan 29, 2021 at 03:20:32PM -0600, Bjorn Helgaas wrote:
> > For comparison the intel iwlwifi driver is very clear about firmware
> > it's trying to load, if it can't and what exact firmware you need to
> > find on the internet (filename)
> 
> I guess you're referring to this in iwl_request_firmware()?
> 
>   IWL_ERR(drv, "check git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git\n"); 
 
Yes :)

> How can we fix this in nouveau so we don't have the debug this again?
> I don't really know how firmware loading works, but "git grep -A5
> request_firmware drivers/gpu/drm/nouveau/" shows that we generally
> print something when request_firmware() fails.

Well, have a look at https://pastebin.com/dX19aCpj
do you see any warning whatsoever?

> But I didn't notice those messages in your logs, so I'm probably
> barking up the wrong tree.

you're not It seems that newer kernels are a bit better:
[  189.304662] nouveau 0000:01:00.0: pmu: firmware unavailable
[  189.312455] nouveau 0000:01:00.0: disp: destroy running...
[  189.316552] nouveau 0000:01:00.0: disp: destroy completed in 1us
[  189.320326] nouveau 0000:01:00.0: disp ctor failed, -12
[  189.324214] nouveau: probe of 0000:01:00.0 failed with error -12

So, it probably got better, but that message got displayed after the 2mn
hang that having the firmware, stops from happening.

whichever developer with the right hardware can probably easily
reproduce this by removing the firmware and looking at the boot
messages.

At the very least, it should print something more clear "driver will not
function properly", and a URL to where one can get the driver, would be
awesome.

> So maybe the wakeups are related to having vs not having the nouveau
> firmware?  I'm still curious about that, and it smells like a bug to
> me, but probably something to do with nouveau where I have no hope of
> debugging it.
 
Right. Honestly, given the time I've lost with this, and now that it
seems gone with the firmware, I'm happy to leave well enough alone :)

I'm not sure how you are involved with the driver, but are you able to
help improve the dmesg output?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Nouveau] 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile)
@ 2021-01-30  2:04                           ` Marc MERLIN
  0 siblings, 0 replies; 77+ messages in thread
From: Marc MERLIN @ 2021-01-30  2:04 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: nouveau, Mika Westerberg, LKML, Linux PCI

On Fri, Jan 29, 2021 at 03:20:32PM -0600, Bjorn Helgaas wrote:
> > For comparison the intel iwlwifi driver is very clear about firmware
> > it's trying to load, if it can't and what exact firmware you need to
> > find on the internet (filename)
> 
> I guess you're referring to this in iwl_request_firmware()?
> 
>   IWL_ERR(drv, "check git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git\n"); 
 
Yes :)

> How can we fix this in nouveau so we don't have the debug this again?
> I don't really know how firmware loading works, but "git grep -A5
> request_firmware drivers/gpu/drm/nouveau/" shows that we generally
> print something when request_firmware() fails.

Well, have a look at https://pastebin.com/dX19aCpj
do you see any warning whatsoever?

> But I didn't notice those messages in your logs, so I'm probably
> barking up the wrong tree.

you're not It seems that newer kernels are a bit better:
[  189.304662] nouveau 0000:01:00.0: pmu: firmware unavailable
[  189.312455] nouveau 0000:01:00.0: disp: destroy running...
[  189.316552] nouveau 0000:01:00.0: disp: destroy completed in 1us
[  189.320326] nouveau 0000:01:00.0: disp ctor failed, -12
[  189.324214] nouveau: probe of 0000:01:00.0 failed with error -12

So, it probably got better, but that message got displayed after the 2mn
hang that having the firmware, stops from happening.

whichever developer with the right hardware can probably easily
reproduce this by removing the firmware and looking at the boot
messages.

At the very least, it should print something more clear "driver will not
function properly", and a URL to where one can get the driver, would be
awesome.

> So maybe the wakeups are related to having vs not having the nouveau
> firmware?  I'm still curious about that, and it smells like a bug to
> me, but probably something to do with nouveau where I have no hope of
> debugging it.
 
Right. Honestly, given the time I've lost with this, and now that it
seems gone with the firmware, I'm happy to leave well enough alone :)

I'm not sure how you are involved with the driver, but are you able to
help improve the dmesg output?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [Nouveau] 5.12.1 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
  2021-01-30  2:04                           ` [Nouveau] " Marc MERLIN
  (?)
@ 2021-05-05 21:42                           ` Marc MERLIN
  2021-05-06 14:50                             ` Bjorn Helgaas
  -1 siblings, 1 reply; 77+ messages in thread
From: Marc MERLIN @ 2021-05-05 21:42 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: nouveau, Mika Westerberg

Howdy,
I upgraded my thinkpad P73 from 5.9 to 5.12, and I now get this new
ug at boot (although the system does continue booting and display works
since I use i915 for display and only use nouveau for PM)

Short:
[   18.561181] WARNING: CPU: 15 PID: 220 at drivers/gpu/drm/nouveau/nvkm/falcon/v1.c:247 nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
[   18.561300] Modules linked in: dm_crypt trusted tpm rng_core dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx multipath sata_sil24 r8169 realtek mdio_devres libphy mii hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel xhci_pci rtsx_pci_sdmmc nouveau ghash_clmulni_intel xhci_hcd mmc_core e1000e i2c_designware_platform mxm_wmi i2c_designware_core hwmon ptp aesni_intel intel_lpss_pci drm_ttm_helper i2c_i801 crypto_simd intel_lpss i2c_smbus psmouse i915 cryptd pps_core thunderbolt rtsx_pci idma64 usbcore ttm i2c_nvidia_gpu thermal wmi battery
[   18.561636] CPU: 15 PID: 220 Comm: kworker/15:2 Tainted: G     U            5.12.1-amd64-preempt-sysrq-20190817 #1
[   18.561707] Hardware name: LENOVO 20QRS00200/20QRS00200, BIOS N2NET40W (1.25 ) 08/26/2020
[   18.561765] Workqueue: pm pm_runtime_work
[   18.561799] RIP: 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]

Despite the warning, chip seems to go to sleep on batteries, poewertop
shows an encouraging low battery use (my lowest one yet of any kernel):
The battery reports a discharge rate of 10.7 W
The power consumed was 230 J

So it seems that what I need from nouveau is working (power management)

Full warning below with logs


Long:
[    0.000000] Linux version 5.12.1-amd64-preempt-sysrq-20190817 (root@sauron.svh.merlins.org) (gcc (Debian 10.2.1-3) 10.2.1 20201224, GNU ld (GNU Binutils for Debian) 2.35.1) #1 SMP PREEMPT Wed May 5 13:05:02 PDT 2021
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.12.1-amd64-preempt-sysrq-20190817 root=/dev/mapper/cryptroot ro rootflags=subvol=root cryptopts=source=/dev/nvme0n1p7,keyscript=/sbin/cryptgetpw usbcore.autosuspend=1 pcie_aspm=force resume=/dev/dm-1 acpi_backlight=vendor nouveau.debug=disp=trace
[    8.672663] nouveau 0000:01:00.0: runtime IRQ mapping not provided by arch
[    8.677434] nouveau 0000:01:00.0: enabling device (0000 -> 0003)
[    8.691872] nouveau 0000:01:00.0: NVIDIA TU104 (164000a1)
[    8.789240] nouveau 0000:01:00.0: bios: version 90.04.4d.00.2c
[    8.789605] nouveau 0000:01:00.0: pmu: firmware unavailable
[    8.789897] nouveau 0000:01:00.0: enabling bus mastering
[    8.789978] nouveau 0000:01:00.0: disp: preinit running...
[    8.789981] nouveau 0000:01:00.0: disp: preinit completed in 0us
[    8.789997] nouveau 0000:01:00.0: disp: fini running...
[    8.789999] nouveau 0000:01:00.0: disp: fini completed in 0us
[    8.790189] nouveau 0000:01:00.0: fb: 8192 MiB GDDR6
[    8.800113] nouveau 0000:01:00.0: disp: init running...
[    8.800116] nouveau 0000:01:00.0: disp: init skipped, engine has no users
[    8.800118] nouveau 0000:01:00.0: disp: init completed in 2us
[    8.801512] nouveau 0000:01:00.0: DRM: VRAM: 8192 MiB
[    8.801515] nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
[    8.801517] nouveau 0000:01:00.0: DRM: BIT table 'A' not found
[    8.801520] nouveau 0000:01:00.0: DRM: BIT table 'L' not found
[    8.801521] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
[    8.801525] nouveau 0000:01:00.0: DRM: DCB version 4.1
[    8.801527] nouveau 0000:01:00.0: DRM: DCB outp 00: 02800f66 04600020
[    8.801529] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f52 00020010
[    8.801531] nouveau 0000:01:00.0: DRM: DCB outp 02: 01022f36 04600010
[    8.801533] nouveau 0000:01:00.0: DRM: DCB outp 03: 04033f76 04600010
[    8.801535] nouveau 0000:01:00.0: DRM: DCB outp 04: 04044f86 04600020
[    8.801537] nouveau 0000:01:00.0: DRM: DCB conn 00: 00020047
[    8.801539] nouveau 0000:01:00.0: DRM: DCB conn 01: 00010161
[    8.801541] nouveau 0000:01:00.0: DRM: DCB conn 02: 00001248
[    8.801543] nouveau 0000:01:00.0: DRM: DCB conn 03: 01000348
[    8.801543] nouveau 0000:01:00.0: DRM: DCB conn 04: 02000471
[    8.802234] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
[    8.802255] nouveau 0000:01:00.0: disp: init running...
[    8.802257] nouveau 0000:01:00.0: disp: one-time init running...
[    8.802259] nouveau 0000:01:00.0: disp: outp 00:0006:0f82: type 06 loc 0 or 2 link 2 con 0 edid 6 bus 0 head f
[    8.802265] nouveau 0000:01:00.0: disp: outp 00:0006:0f82: bios dp 42 13 00 00
[    8.802268] nouveau 0000:01:00.0: disp: outp 01:0002:0f42: type 02 loc 0 or 2 link 1 con 1 edid 5 bus 1 head f
[    8.802272] nouveau 0000:01:00.0: disp: outp 02:0006:0f41: type 06 loc 0 or 1 link 1 con 2 edid 3 bus 2 head f
[    8.802276] nouveau 0000:01:00.0: disp: outp 02:0006:0f41: bios dp 42 13 00 00
[    8.802279] nouveau 0000:01:00.0: disp: outp 03:0006:0f44: type 06 loc 0 or 4 link 1 con 3 edid 7 bus 3 head f
[    8.802283] nouveau 0000:01:00.0: disp: outp 03:0006:0f44: bios dp 42 13 00 00
[    8.802285] nouveau 0000:01:00.0: disp: outp 04:0006:0f84: type 06 loc 0 or 4 link 2 con 4 edid 8 bus 4 head f
[    8.802290] nouveau 0000:01:00.0: disp: outp 04:0006:0f84: bios dp 42 13 00 00
[    8.802293] nouveau 0000:01:00.0: disp: conn 00:0047: type 47 loc 0 hpd 08 dp 0 di 0 sr 0 lcdid 0
[    8.802298] nouveau 0000:01:00.0: disp: conn 00:0047: func 52 (HPD)
[    8.802300] nouveau 0000:01:00.0: disp: conn 01:0161: type 61 loc 1 hpd 04 dp 0 di 0 sr 0 lcdid 0
[    8.802305] nouveau 0000:01:00.0: disp: conn 01:0161: func 51 (HPD)
[    8.802307] nouveau 0000:01:00.0: disp: conn 02:0248: type 48 loc 2 hpd 01 dp 0 di 0 sr 0 lcdid 0
[    8.802311] nouveau 0000:01:00.0: disp: conn 02:0248: func 07 (HPD)
[    8.802313] nouveau 0000:01:00.0: disp: conn 03:0348: type 48 loc 3 hpd 10 dp 0 di 0 sr 0 lcdid 0
[    8.802317] nouveau 0000:01:00.0: disp: conn 03:0348: func 5e (HPD)
[    8.802319] nouveau 0000:01:00.0: disp: conn 04:0471: type 71 loc 4 hpd 20 dp 0 di 0 sr 0 lcdid 0
[    8.802324] nouveau 0000:01:00.0: disp: conn 04:0471: func 5f (HPD)
[    8.802329] nouveau 0000:01:00.0: disp: Window(s): 8 (000000ff)
[    8.802334] nouveau 0000:01:00.0: disp:   Head(s): 4 (0f)
[    8.802338] nouveau 0000:01:00.0: disp: head-0: ctor
[    8.802341] nouveau 0000:01:00.0: disp: head-1: ctor
[    8.802345] nouveau 0000:01:00.0: disp: head-2: ctor
[    8.802348] nouveau 0000:01:00.0: disp: head-3: ctor
[    8.802352] nouveau 0000:01:00.0: disp:    SOR(s): 4 (0f)
[    8.802356] nouveau 0000:01:00.0: disp: SOR-0: ctor
[    8.802360] nouveau 0000:01:00.0: disp: SOR-1: ctor
[    8.802364] nouveau 0000:01:00.0: disp: SOR-2: ctor
[    8.802367] nouveau 0000:01:00.0: disp: SOR-3: ctor
[    8.802387] nouveau 0000:01:00.0: disp: one-time init completed in 129us
[    8.802440] nouveau 0000:01:00.0: disp: outp 00:0006:0f82: no route
[    9.112902] nouveau 0000:01:00.0: disp: outp 00:0006:0f82: aux power -> always
[    9.112987] nouveau 0000:01:00.0: disp: outp 00:0006:0f82: aux power -> demand
[    9.113021] nouveau 0000:01:00.0: disp: outp 01:0002:0f42: no route
[    9.113034] nouveau 0000:01:00.0: disp: outp 02:0006:0f41: no route
[    9.113059] nouveau 0000:01:00.0: disp: outp 02:0006:0f41: aux power -> always
[    9.113093] nouveau 0000:01:00.0: disp: outp 02:0006:0f41: aux power -> demand
[    9.113119] nouveau 0000:01:00.0: disp: outp 03:0006:0f44: no route
[    9.113141] nouveau 0000:01:00.0: disp: outp 03:0006:0f44: aux power -> always
[    9.113175] nouveau 0000:01:00.0: disp: outp 03:0006:0f44: aux power -> demand
[    9.113202] nouveau 0000:01:00.0: disp: outp 04:0006:0f84: no route
[    9.113224] nouveau 0000:01:00.0: disp: outp 04:0006:0f84: aux power -> always
[    9.113258] nouveau 0000:01:00.0: disp: outp 04:0006:0f84: aux power -> demand
[    9.113665] nouveau 0000:01:00.0: disp: init completed in 311407us
[    9.205451] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
[    9.205682] nouveau 0000:01:00.0: disp: supervisor 1: 00000000
[    9.205707] nouveau 0000:01:00.0: disp: head-0: 00000000
[    9.205720] nouveau 0000:01:00.0: disp: head-1: 00000000
[    9.205732] nouveau 0000:01:00.0: disp: head-2: 00000000
[    9.205742] nouveau 0000:01:00.0: disp: head-3: 00000000
[    9.205751] nouveau 0000:01:00.0: disp: Core:
[    9.205764] nouveau 0000:01:00.0: disp: 	0200: 7efebfff -> 00000001
[    9.205781] nouveau 0000:01:00.0: disp: 	0208: 00000000 -> f0000000
[    9.205795] nouveau 0000:01:00.0: disp: 	020c: 00000000 -> 00001000
[    9.205810] nouveau 0000:01:00.0: disp: 	0210: 00000000              
[    9.205824] nouveau 0000:01:00.0: disp: 	0214: 00000000              
[    9.205837] nouveau 0000:01:00.0: disp: 	0218: 00000000              
[    9.205851] nouveau 0000:01:00.0: disp: 	021c: 00000000              
[    9.205862] nouveau 0000:01:00.0: disp: Core - SOR 0:
[    9.205874] nouveau 0000:01:00.0: disp: 	0300: 00000100              
[    9.205889] nouveau 0000:01:00.0: disp: 	0304: 00000000              
[    9.205903] nouveau 0000:01:00.0: disp: 	0308: 00000000              
[    9.205918] nouveau 0000:01:00.0: disp: 	030c: 00000000              
[    9.205928] nouveau 0000:01:00.0: disp: Core - SOR 1:
[    9.205940] nouveau 0000:01:00.0: disp: 	0320: 00000100              
[    9.205954] nouveau 0000:01:00.0: disp: 	0324: 00000000              
[    9.205967] nouveau 0000:01:00.0: disp: 	0328: 00000000              
[    9.205981] nouveau 0000:01:00.0: disp: 	032c: 00000000              
[    9.205991] nouveau 0000:01:00.0: disp: Core - SOR 2:
[    9.206003] nouveau 0000:01:00.0: disp: 	0340: 00000100              
[    9.206017] nouveau 0000:01:00.0: disp: 	0344: 00000000              
[    9.206030] nouveau 0000:01:00.0: disp: 	0348: 00000000              
[    9.206044] nouveau 0000:01:00.0: disp: 	034c: 00000000              
[    9.206054] nouveau 0000:01:00.0: disp: Core - SOR 3:
[    9.206065] nouveau 0000:01:00.0: disp: 	0360: 00000100              
[    9.206078] nouveau 0000:01:00.0: disp: 	0364: 00000000              
[    9.206091] nouveau 0000:01:00.0: disp: 	0368: 00000000              
[    9.206104] nouveau 0000:01:00.0: disp: 	036c: 00000000              
[    9.206115] nouveau 0000:01:00.0: disp: Core - WINDOW 0:
[    9.206127] nouveau 0000:01:00.0: disp: 	1000: 0000000f -> 00000000
[    9.206142] nouveau 0000:01:00.0: disp: 	1004: 000003b7 -> 0000000f
[    9.206156] nouveau 0000:01:00.0: disp: 	1008: 00000000              
[    9.206171] nouveau 0000:01:00.0: disp: 	100c: 04000400              
[    9.206186] nouveau 0000:01:00.0: disp: 	1010: 00100000 -> 00117fff
[    9.206197] nouveau 0000:01:00.0: disp: Core - WINDOW 1:
[    9.206209] nouveau 0000:01:00.0: disp: 	1080: 0000000f -> 00000000
[    9.206223] nouveau 0000:01:00.0: disp: 	1084: 000003b7 -> 0000000f
[    9.206237] nouveau 0000:01:00.0: disp: 	1088: 00000000              
[    9.206250] nouveau 0000:01:00.0: disp: 	108c: 04000400              
[    9.206265] nouveau 0000:01:00.0: disp: 	1090: 00100000 -> 00117fff
[    9.206275] nouveau 0000:01:00.0: disp: Core - WINDOW 2:
[    9.206287] nouveau 0000:01:00.0: disp: 	1100: 0000000f -> 00000001
[    9.206300] nouveau 0000:01:00.0: disp: 	1104: 000003b7 -> 0000000f
[    9.206313] nouveau 0000:01:00.0: disp: 	1108: 00000000              
[    9.206327] nouveau 0000:01:00.0: disp: 	110c: 04000400              
[    9.206341] nouveau 0000:01:00.0: disp: 	1110: 00100000 -> 00117fff
[    9.206351] nouveau 0000:01:00.0: disp: Core - WINDOW 3:
[    9.206362] nouveau 0000:01:00.0: disp: 	1180: 0000000f -> 00000001
[    9.206375] nouveau 0000:01:00.0: disp: 	1184: 000003b7 -> 0000000f
[    9.206389] nouveau 0000:01:00.0: disp: 	1188: 00000000              
[    9.206403] nouveau 0000:01:00.0: disp: 	118c: 04000400              
[    9.206417] nouveau 0000:01:00.0: disp: 	1190: 00100000 -> 00117fff
[    9.206427] nouveau 0000:01:00.0: disp: Core - WINDOW 4:
[    9.206440] nouveau 0000:01:00.0: disp: 	1200: 0000000f -> 00000002
[    9.206455] nouveau 0000:01:00.0: disp: 	1204: 000003b7 -> 0000000f
[    9.206469] nouveau 0000:01:00.0: disp: 	1208: 00000000              
[    9.206481] nouveau 0000:01:00.0: disp: 	120c: 04000400              
[    9.206495] nouveau 0000:01:00.0: disp: 	1210: 00100000 -> 00117fff
[    9.206505] nouveau 0000:01:00.0: disp: Core - WINDOW 5:
[    9.206517] nouveau 0000:01:00.0: disp: 	1280: 0000000f -> 00000002
[    9.206531] nouveau 0000:01:00.0: disp: 	1284: 000003b7 -> 0000000f
[    9.206544] nouveau 0000:01:00.0: disp: 	1288: 00000000              
[    9.206558] nouveau 0000:01:00.0: disp: 	128c: 04000400              
[    9.206571] nouveau 0000:01:00.0: disp: 	1290: 00100000 -> 00117fff
[    9.206582] nouveau 0000:01:00.0: disp: Core - WINDOW 6:
[    9.206594] nouveau 0000:01:00.0: disp: 	1300: 0000000f -> 00000003
[    9.206607] nouveau 0000:01:00.0: disp: 	1304: 000003b7 -> 0000000f
[    9.206620] nouveau 0000:01:00.0: disp: 	1308: 00000000              
[    9.206635] nouveau 0000:01:00.0: disp: 	130c: 04000400              
[    9.206650] nouveau 0000:01:00.0: disp: 	1310: 00100000 -> 00117fff
[    9.206660] nouveau 0000:01:00.0: disp: Core - WINDOW 7:
[    9.206672] nouveau 0000:01:00.0: disp: 	1380: 0000000f -> 00000003
[    9.206685] nouveau 0000:01:00.0: disp: 	1384: 000003b7 -> 0000000f
[    9.206699] nouveau 0000:01:00.0: disp: 	1388: 00000000              
[    9.206713] nouveau 0000:01:00.0: disp: 	138c: 04000400              
[    9.206727] nouveau 0000:01:00.0: disp: 	1390: 00100000 -> 00117fff
[    9.206737] nouveau 0000:01:00.0: disp: Core - HEAD 0:
[    9.206748] nouveau 0000:01:00.0: disp: 	2000: 00000000              
[    9.206762] nouveau 0000:01:00.0: disp: 	2004: fc000040              
[    9.206776] nouveau 0000:01:00.0: disp: 	2008: 00000180              
[    9.206790] nouveau 0000:01:00.0: disp: 	200c: 00000000              
[    9.206804] nouveau 0000:01:00.0: disp: 	2014: 00000011              
[    9.206818] nouveau 0000:01:00.0: disp: 	2018: 00000000              
[    9.206832] nouveau 0000:01:00.0: disp: 	201c: 00000000              
[    9.206846] nouveau 0000:01:00.0: disp: 	2020: 00000000              
[    9.206860] nouveau 0000:01:00.0: disp: 	2028: 00000000              
[    9.206874] nouveau 0000:01:00.0: disp: 	202c: 04000400              
[    9.206889] nouveau 0000:01:00.0: disp: 	2030: 00001000              
[    9.206903] nouveau 0000:01:00.0: disp: 	2038: 00000001              
[    9.206918] nouveau 0000:01:00.0: disp: 	203c: 00000005              
[    9.206933] nouveau 0000:01:00.0: disp: 	2048: 00000000              
[    9.206947] nouveau 0000:01:00.0: disp: 	204c: 00000000              
[    9.206960] nouveau 0000:01:00.0: disp: 	2050: 00000000              
[    9.206973] nouveau 0000:01:00.0: disp: 	2054: 00000000              
[    9.206986] nouveau 0000:01:00.0: disp: 	2058: 00000000              
[    9.206999] nouveau 0000:01:00.0: disp: 	205c: 00000000              
[    9.207013] nouveau 0000:01:00.0: disp: 	2060: 00000000              
[    9.207027] nouveau 0000:01:00.0: disp: 	2064: 00050008              
[    9.207041] nouveau 0000:01:00.0: disp: 	2068: 00000000              
[    9.207055] nouveau 0000:01:00.0: disp: 	206c: 00010003              
[    9.207069] nouveau 0000:01:00.0: disp: 	2070: 00030004              
[    9.207083] nouveau 0000:01:00.0: disp: 	2074: 00000001              
[    9.207098] nouveau 0000:01:00.0: disp: 	2078: 00000000              
[    9.207112] nouveau 0000:01:00.0: disp: 	207c: 00000000              
[    9.207127] nouveau 0000:01:00.0: disp: 	2080: 00000000              
[    9.207141] nouveau 0000:01:00.0: disp: 	2088: 00000000              
[    9.207156] nouveau 0000:01:00.0: disp: 	2090: 00000000              
[    9.207170] nouveau 0000:01:00.0: disp: 	209c: 000000e9              
[    9.207185] nouveau 0000:01:00.0: disp: 	20a0: 000002ff              
[    9.207200] nouveau 0000:01:00.0: disp: 	20a4: 00000000              
[    9.207212] nouveau 0000:01:00.0: disp: 	20a8: 00000000              
[    9.207225] nouveau 0000:01:00.0: disp: 	20ac: 00000000              
[    9.207239] nouveau 0000:01:00.0: disp: 	218c: 00000000              
[    9.207252] nouveau 0000:01:00.0: disp: 	2194: 00000000              
[    9.207266] nouveau 0000:01:00.0: disp: 	2198: 00000000              
[    9.207279] nouveau 0000:01:00.0: disp: 	219c: 00000000              
[    9.207292] nouveau 0000:01:00.0: disp: 	21a0: 00000000              
[    9.207307] nouveau 0000:01:00.0: disp: 	21a4: 00000000              
[    9.207320] nouveau 0000:01:00.0: disp: 	2214: 00000000              
[    9.207332] nouveau 0000:01:00.0: disp: 	2218: 00010002              
[    9.207343] nouveau 0000:01:00.0: disp: Core - HEAD 1:
[    9.207355] nouveau 0000:01:00.0: disp: 	2400: 00000000              
[    9.207369] nouveau 0000:01:00.0: disp: 	2404: fc000040              
[    9.207382] nouveau 0000:01:00.0: disp: 	2408: 00000180              
[    9.207396] nouveau 0000:01:00.0: disp: 	240c: 00000000              
[    9.207410] nouveau 0000:01:00.0: disp: 	2414: 00000011              
[    9.207425] nouveau 0000:01:00.0: disp: 	2418: 00000000              
[    9.207438] nouveau 0000:01:00.0: disp: 	241c: 00000000              
[    9.207451] nouveau 0000:01:00.0: disp: 	2420: 00000000              
[    9.207463] nouveau 0000:01:00.0: disp: 	2428: 00000000              
[    9.207476] nouveau 0000:01:00.0: disp: 	242c: 04000400              
[    9.207490] nouveau 0000:01:00.0: disp: 	2430: 00001000              
[    9.207504] nouveau 0000:01:00.0: disp: 	2438: 00000001              
[    9.207518] nouveau 0000:01:00.0: disp: 	243c: 00000005              
[    9.207531] nouveau 0000:01:00.0: disp: 	2448: 00000000              
[    9.207545] nouveau 0000:01:00.0: disp: 	244c: 00000000              
[    9.207559] nouveau 0000:01:00.0: disp: 	2450: 00000000              
[    9.207573] nouveau 0000:01:00.0: disp: 	2454: 00000000              
[    9.207587] nouveau 0000:01:00.0: disp: 	2458: 00000000              
[    9.207600] nouveau 0000:01:00.0: disp: 	245c: 00000000              
[    9.207613] nouveau 0000:01:00.0: disp: 	2460: 00000000              
[    9.207626] nouveau 0000:01:00.0: disp: 	2464: 00050008              
[    9.207640] nouveau 0000:01:00.0: disp: 	2468: 00000000              
[    9.207654] nouveau 0000:01:00.0: disp: 	246c: 00010003              
[    9.207668] nouveau 0000:01:00.0: disp: 	2470: 00030004              
[    9.207681] nouveau 0000:01:00.0: disp: 	2474: 00000001              
[    9.207695] nouveau 0000:01:00.0: disp: 	2478: 00000000              
[    9.207709] nouveau 0000:01:00.0: disp: 	247c: 00000000              
[    9.207724] nouveau 0000:01:00.0: disp: 	2480: 00000000              
[    9.207738] nouveau 0000:01:00.0: disp: 	2488: 00000000              
[    9.207753] nouveau 0000:01:00.0: disp: 	2490: 00000000              
[    9.207766] nouveau 0000:01:00.0: disp: 	249c: 000000e9              
[    9.207781] nouveau 0000:01:00.0: disp: 	24a0: 000002ff              
[    9.207794] nouveau 0000:01:00.0: disp: 	24a4: 00000000              
[    9.207807] nouveau 0000:01:00.0: disp: 	24a8: 00000000              
[    9.207821] nouveau 0000:01:00.0: disp: 	24ac: 00000000              
[    9.207834] nouveau 0000:01:00.0: disp: 	258c: 00000000              
[    9.207848] nouveau 0000:01:00.0: disp: 	2594: 00000000              
[    9.207861] nouveau 0000:01:00.0: disp: 	2598: 00000000              
[    9.207875] nouveau 0000:01:00.0: disp: 	259c: 00000000              
[    9.207888] nouveau 0000:01:00.0: disp: 	25a0: 00000000              
[    9.207901] nouveau 0000:01:00.0: disp: 	25a4: 00000000              
[    9.207914] nouveau 0000:01:00.0: disp: 	2614: 00000000              
[    9.207927] nouveau 0000:01:00.0: disp: 	2618: 00010002              
[    9.207937] nouveau 0000:01:00.0: disp: Core - HEAD 2:
[    9.207949] nouveau 0000:01:00.0: disp: 	2800: 00000000              
[    9.207963] nouveau 0000:01:00.0: disp: 	2804: fc000040              
[    9.207976] nouveau 0000:01:00.0: disp: 	2808: 00000180              
[    9.207991] nouveau 0000:01:00.0: disp: 	280c: 00000000              
[    9.208004] nouveau 0000:01:00.0: disp: 	2814: 00000011              
[    9.208019] nouveau 0000:01:00.0: disp: 	2818: 00000000              
[    9.208031] nouveau 0000:01:00.0: disp: 	281c: 00000000              
[    9.208044] nouveau 0000:01:00.0: disp: 	2820: 00000000              
[    9.208058] nouveau 0000:01:00.0: disp: 	2828: 00000000              
[    9.208071] nouveau 0000:01:00.0: disp: 	282c: 04000400              
[    9.208085] nouveau 0000:01:00.0: disp: 	2830: 00001000              
[    9.208099] nouveau 0000:01:00.0: disp: 	2838: 00000001              
[    9.208113] nouveau 0000:01:00.0: disp: 	283c: 00000005              
[    9.208126] nouveau 0000:01:00.0: disp: 	2848: 00000000              
[    9.208140] nouveau 0000:01:00.0: disp: 	284c: 00000000              
[    9.208153] nouveau 0000:01:00.0: disp: 	2850: 00000000              
[    9.208165] nouveau 0000:01:00.0: disp: 	2854: 00000000              
[    9.208178] nouveau 0000:01:00.0: disp: 	2858: 00000000              
[    9.208191] nouveau 0000:01:00.0: disp: 	285c: 00000000              
[    9.208205] nouveau 0000:01:00.0: disp: 	2860: 00000000              
[    9.208218] nouveau 0000:01:00.0: disp: 	2864: 00050008              
[    9.208232] nouveau 0000:01:00.0: disp: 	2868: 00000000              
[    9.208246] nouveau 0000:01:00.0: disp: 	286c: 00010003              
[    9.208259] nouveau 0000:01:00.0: disp: 	2870: 00030004              
[    9.208274] nouveau 0000:01:00.0: disp: 	2874: 00000001              
[    9.208289] nouveau 0000:01:00.0: disp: 	2878: 00000000              
[    9.208303] nouveau 0000:01:00.0: disp: 	287c: 00000000              
[    9.208318] nouveau 0000:01:00.0: disp: 	2880: 00000000              
[    9.208332] nouveau 0000:01:00.0: disp: 	2888: 00000000              
[    9.208345] nouveau 0000:01:00.0: disp: 	2890: 00000000              
[    9.208358] nouveau 0000:01:00.0: disp: 	289c: 000000e9              
[    9.208371] nouveau 0000:01:00.0: disp: 	28a0: 000002ff              
[    9.208385] nouveau 0000:01:00.0: disp: 	28a4: 00000000              
[    9.208398] nouveau 0000:01:00.0: disp: 	28a8: 00000000              
[    9.208412] nouveau 0000:01:00.0: disp: 	28ac: 00000000              
[    9.208425] nouveau 0000:01:00.0: disp: 	298c: 00000000              
[    9.208439] nouveau 0000:01:00.0: disp: 	2994: 00000000              
[    9.208452] nouveau 0000:01:00.0: disp: 	2998: 00000000              
[    9.208465] nouveau 0000:01:00.0: disp: 	299c: 00000000              
[    9.208478] nouveau 0000:01:00.0: disp: 	29a0: 00000000              
[    9.208491] nouveau 0000:01:00.0: disp: 	29a4: 00000000              
[    9.208504] nouveau 0000:01:00.0: disp: 	2a14: 00000000              
[    9.208517] nouveau 0000:01:00.0: disp: 	2a18: 00010002              
[    9.208528] nouveau 0000:01:00.0: disp: Core - HEAD 3:
[    9.208540] nouveau 0000:01:00.0: disp: 	2c00: 00000000              
[    9.208554] nouveau 0000:01:00.0: disp: 	2c04: fc000040              
[    9.208568] nouveau 0000:01:00.0: disp: 	2c08: 00000180              
[    9.208583] nouveau 0000:01:00.0: disp: 	2c0c: 00000000              
[    9.208597] nouveau 0000:01:00.0: disp: 	2c14: 00000011              
[    9.208610] nouveau 0000:01:00.0: disp: 	2c18: 00000000              
[    9.208623] nouveau 0000:01:00.0: disp: 	2c1c: 00000000              
[    9.208636] nouveau 0000:01:00.0: disp: 	2c20: 00000000              
[    9.208650] nouveau 0000:01:00.0: disp: 	2c28: 00000000              
[    9.208664] nouveau 0000:01:00.0: disp: 	2c2c: 04000400              
[    9.208677] nouveau 0000:01:00.0: disp: 	2c30: 00001000              
[    9.208691] nouveau 0000:01:00.0: disp: 	2c38: 00000001              
[    9.208722] nouveau 0000:01:00.0: disp: 	2c3c: 00000005              
[    9.208736] nouveau 0000:01:00.0: disp: 	2c48: 00000000              
[    9.208750] nouveau 0000:01:00.0: disp: 	2c4c: 00000000              
[    9.208764] nouveau 0000:01:00.0: disp: 	2c50: 00000000              
[    9.208777] nouveau 0000:01:00.0: disp: 	2c54: 00000000              
[    9.208790] nouveau 0000:01:00.0: disp: 	2c58: 00000000              
[    9.208803] nouveau 0000:01:00.0: disp: 	2c5c: 00000000              
[    9.208815] nouveau 0000:01:00.0: disp: 	2c60: 00000000              
[    9.208829] nouveau 0000:01:00.0: disp: 	2c64: 00050008              
[    9.208842] nouveau 0000:01:00.0: disp: 	2c68: 00000000              
[    9.208856] nouveau 0000:01:00.0: disp: 	2c6c: 00010003              
[    9.208870] nouveau 0000:01:00.0: disp: 	2c70: 00030004              
[    9.208884] nouveau 0000:01:00.0: disp: 	2c74: 00000001              
[    9.208897] nouveau 0000:01:00.0: disp: 	2c78: 00000000              
[    9.208911] nouveau 0000:01:00.0: disp: 	2c7c: 00000000              
[    9.208925] nouveau 0000:01:00.0: disp: 	2c80: 00000000              
[    9.208940] nouveau 0000:01:00.0: disp: 	2c88: 00000000              
[    9.208954] nouveau 0000:01:00.0: disp: 	2c90: 00000000              
[    9.208969] nouveau 0000:01:00.0: disp: 	2c9c: 000000e9              
[    9.208984] nouveau 0000:01:00.0: disp: 	2ca0: 000002ff              
[    9.208999] nouveau 0000:01:00.0: disp: 	2ca4: 00000000              
[    9.209014] nouveau 0000:01:00.0: disp: 	2ca8: 00000000              
[    9.209029] nouveau 0000:01:00.0: disp: 	2cac: 00000000              
[    9.209043] nouveau 0000:01:00.0: disp: 	2d8c: 00000000              
[    9.209058] nouveau 0000:01:00.0: disp: 	2d94: 00000000              
[    9.209073] nouveau 0000:01:00.0: disp: 	2d98: 00000000              
[    9.209087] nouveau 0000:01:00.0: disp: 	2d9c: 00000000              
[    9.209099] nouveau 0000:01:00.0: disp: 	2da0: 00000000              
[    9.209112] nouveau 0000:01:00.0: disp: 	2da4: 00000000              
[    9.209126] nouveau 0000:01:00.0: disp: 	2e14: 00000000              
[    9.209139] nouveau 0000:01:00.0: disp: 	2e18: 00010002              
[    9.209388] nouveau 0000:01:00.0: disp: supervisor 2: 00000010
[    9.209413] nouveau 0000:01:00.0: disp: head-0: 00000000
[    9.209426] nouveau 0000:01:00.0: disp: head-1: 00000000
[    9.209437] nouveau 0000:01:00.0: disp: head-2: 00000000
[    9.209448] nouveau 0000:01:00.0: disp: head-3: 00000000
[    9.209619] nouveau 0000:01:00.0: disp: supervisor 3: 00000010
[    9.209643] nouveau 0000:01:00.0: disp: head-0: 00000000
[    9.209656] nouveau 0000:01:00.0: disp: head-1: 00000000
[    9.209668] nouveau 0000:01:00.0: disp: head-2: 00000000
[    9.209679] nouveau 0000:01:00.0: disp: head-3: 00000000
[    9.210852] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1
[    9.210885] nouveau 0000:01:00.0: DRM: Disabling PCI power management to avoid bug
[    9.212755] usb 1-8: new high-speed USB device number 3 using xhci_hcd
[    9.296013] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
[    9.382897] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
[   18.460917] nouveau 0000:01:00.0: disp: suspend running...
[   18.461005] nouveau 0000:01:00.0: disp: suspend completed in 41us
[   18.561101] ------------[ cut here ]------------
[   18.561138] nouveau 0000:01:00.0: timeout
[   18.561181] WARNING: CPU: 15 PID: 220 at drivers/gpu/drm/nouveau/nvkm/falcon/v1.c:247 nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
[   18.561300] Modules linked in: dm_crypt trusted tpm rng_core dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx multipath sata_sil24 r8169 realtek mdio_devres libphy mii hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel xhci_pci rtsx_pci_sdmmc nouveau ghash_clmulni_intel xhci_hcd mmc_core e1000e i2c_designware_platform mxm_wmi i2c_designware_core hwmon ptp aesni_intel intel_lpss_pci drm_ttm_helper i2c_i801 crypto_simd intel_lpss i2c_smbus psmouse i915 cryptd pps_core thunderbolt rtsx_pci idma64 usbcore ttm i2c_nvidia_gpu thermal wmi battery
[   18.561636] CPU: 15 PID: 220 Comm: kworker/15:2 Tainted: G     U            5.12.1-amd64-preempt-sysrq-20190817 #1
[   18.561707] Hardware name: LENOVO 20QRS00200/20QRS00200, BIOS N2NET40W (1.25 ) 08/26/2020
[   18.561765] Workqueue: pm pm_runtime_work
[   18.561799] RIP: 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
[   18.561874] Code: 8b 40 10 48 8b 78 10 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 eb 5c 43 e2 4c 89 e2 48 c7 c7 ef 95 33 c1 48 89 c6 e8 c4 b2 6e e2 <0f> 0b 85 db b8 00 00 00 00 0f 4e c3 48 8b 4c 24 28 65 48 2b 0c 25
[   18.561995] RSP: 0018:ffffb518007a7b08 EFLAGS: 00010286
[   18.562035] RAX: 0000000000000000 RBX: ffffffffffffff92 RCX: 0000000000000003
[   18.562086] RDX: 0000000000000850 RSI: 0000000000000001 RDI: ffffffffa4b25bac
[   18.562136] RBP: ffff89e351f0a058 R08: 0000000000000003 R09: 0000000000000001
[   18.562187] R10: 0000000000aaaaaa R11: ffffb51821e14440 R12: ffff89e34291c5a0
[   18.562238] R13: 0000000000000000 R14: ffff89e355782e00 R15: ffff89e3524cb000
[   18.562289] FS:  0000000000000000(0000) GS:ffff89f25c7c0000(0000) knlGS:0000000000000000
[   18.562345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   18.562388] CR2: 000055ec245a00a8 CR3: 0000000545410006 CR4: 00000000003706e0
[   18.562439] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   18.562491] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   18.562542] Call Trace:
[   18.562569]  gm200_acr_hsfw_boot+0xc4/0x168 [nouveau]
[   18.562636]  nvkm_acr_hsf_boot+0xad/0x115 [nouveau]
[   18.565673]  nvkm_acr_fini+0x22/0x30 [nouveau]
[   18.568732]  nvkm_subdev_fini+0xb8/0xff [nouveau]
[   18.571775]  nvkm_device_fini+0x8b/0x178 [nouveau]
[   18.574834]  nvkm_udevice_fini+0x34/0x55 [nouveau]
[   18.577872]  nvkm_object_fini+0xeb/0x1d6 [nouveau]
[   18.580862]  nvkm_object_fini+0x8d/0x1d6 [nouveau]
[   18.584095]  nouveau_do_suspend+0x1fe/0x26f [nouveau]
[   18.587135]  nouveau_pmops_runtime_suspend+0x46/0x82 [nouveau]
[   18.590097]  pci_pm_runtime_suspend+0x5e/0x155
[   18.593013]  ? pci_pm_thaw_noirq+0x62/0x62
[   18.595914]  ? pci_pm_thaw_noirq+0x62/0x62
[   18.598802]  __rpm_callback+0x75/0xdb
[   18.601654]  ? pci_pm_thaw_noirq+0x62/0x62
[   18.604491]  rpm_callback+0x55/0x6b
[   18.607317]  rpm_suspend+0x2a6/0x4af
[   18.610117]  ? __raw_spin_unlock_irq+0x8/0x17
[   18.612901]  ? finish_task_switch.isra.0+0x136/0x214
[   18.615673]  pm_runtime_work+0x77/0x81
[   18.618428]  process_one_work+0x1ea/0x2e0
[   18.621156]  worker_thread+0x19c/0x240
[   18.624140]  ? rescuer_thread+0x294/0x294
[   18.626886]  kthread+0x10c/0x114
[   18.629567]  ? kthread_create_worker_on_cpu+0x65/0x65
[   18.632253]  ret_from_fork+0x1f/0x30
[   18.634949] ---[ end trace a858a74de695aa08 ]---
[   18.637620] nouveau 0000:01:00.0: acr: unload binary failed
[   18.913087] nouveau 0000:01:00.0: saving config space at offset 0x0 (reading 0x1eb610de)
[   18.913091] nouveau 0000:01:00.0: saving config space at offset 0x4 (reading 0x100407)
[   18.913093] nouveau 0000:01:00.0: saving config space at offset 0x8 (reading 0x30000a1)
[   18.913095] nouveau 0000:01:00.0: saving config space at offset 0xc (reading 0x800000)
[   18.913097] nouveau 0000:01:00.0: saving config space at offset 0x10 (reading 0xcd000000)
[   18.913099] nouveau 0000:01:00.0: saving config space at offset 0x14 (reading 0xa000000c)
[   18.913102] nouveau 0000:01:00.0: saving config space at offset 0x18 (reading 0x0)
[   18.913104] nouveau 0000:01:00.0: saving config space at offset 0x1c (reading 0xb000000c)
[   18.913106] nouveau 0000:01:00.0: saving config space at offset 0x20 (reading 0x0)
[   18.913108] nouveau 0000:01:00.0: saving config space at offset 0x24 (reading 0x2001)
[   18.913111] nouveau 0000:01:00.0: saving config space at offset 0x28 (reading 0x0)
[   18.913113] nouveau 0000:01:00.0: saving config space at offset 0x2c (reading 0x229b17aa)
[   18.913115] nouveau 0000:01:00.0: saving config space at offset 0x30 (reading 0xfff80000)
[   18.913117] nouveau 0000:01:00.0: saving config space at offset 0x34 (reading 0x60)
[   18.913119] nouveau 0000:01:00.0: saving config space at offset 0x38 (reading 0x0)
[   18.913122] nouveau 0000:01:00.0: saving config space at offset 0x3c (reading 0x1ff)
[   18.913179] nouveau 0000:01:00.0: power state changed by ACPI to D3cold
[   43.064748] nouveau 0000:01:00.0: power state changed by ACPI to D0
[   43.064836] nouveau 0000:01:00.0: restoring config space at offset 0x3c (was 0x100, writing 0x1ff)
[   43.064845] nouveau 0000:01:00.0: restoring config space at offset 0x30 (was 0x0, writing 0xfff80000)
[   43.064853] nouveau 0000:01:00.0: restoring config space at offset 0x24 (was 0x1, writing 0x2001)
[   43.064860] nouveau 0000:01:00.0: restoring config space at offset 0x1c (was 0xc, writing 0xb000000c)
[   43.064868] nouveau 0000:01:00.0: restoring config space at offset 0x14 (was 0xc, writing 0xa000000c)
[   43.064874] nouveau 0000:01:00.0: restoring config space at offset 0x10 (was 0x0, writing 0xcd000000)
[   43.064883] nouveau 0000:01:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
[   43.065008] nouveau 0000:01:00.0: disp: preinit running...
[   43.065038] nouveau 0000:01:00.0: disp: preinit completed in 0us
[   43.065200] nouveau 0000:01:00.0: disp: fini running...
[   43.065226] nouveau 0000:01:00.0: disp: fini completed in 2us
[   43.073510] nouveau 0000:01:00.0: fifo: fault 01 [VIRT_WRITE] at 00000000003b1000 engine c0 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel -1 [01ffedf000 unknown]
[   43.073579] nouveau 0000:01:00.0: fifo: fault 00 [VIRT_READ] at 0000000000000000 engine 0e [sec2] client 16 [HUB/SEC] reason 00 [PDE] on channel -1 [01ffe5d000 unknown]
[   43.073616] nouveau 0000:01:00.0: fifo: runlist 3: scheduled for recovery
[   43.073636] nouveau 0000:01:00.0: fifo: engine 3: scheduled for recovery
[   43.173456] ------------[ cut here ]------------
[   43.173477] nouveau 0000:01:00.0: timeout
[   43.173533] WARNING: CPU: 9 PID: 1468 at drivers/gpu/drm/nouveau/nvkm/falcon/v1.c:247 nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
[   43.173614] Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops btusb videobuf2_v4l2 btrtl videobuf2_common btbcm btintel videodev bluetooth mc ecdh_generic ecc iwlmvm mac80211 libarc4 mei_hdcp x86_pkg_temp_thermal intel_powerclamp kvm_intel nls_utf8 snd_hda_codec_conexant snd_hda_codec_generic nls_cp437 kvm snd_hda_intel(+) vfat snd_intel_dspcfg iwlwifi fat irqbypass snd_hda_codec squashfs input_leds joydev rapl deflate serio_raw intel_cstate efi_pstore pcspkr snd_hda_core iTCO_wdt wmi_bmof intel_wmi_thunderbolt tpm_crb cfg80211 iTCO_vendor_support ee1004 8250_dw snd_hwdep processor_thermal_device processor_thermal_rfim ucsi_ccg(+) snd_pcm sg ucsi_acpi thinkpad_acpi nvidiafb typec_ucsi vgastate mei_me processor_thermal_mbox typec intel_pch_thermal fb_ddc tpm_tis intel_soc_dts_iosf snd_timer nvram roles tpm_tis_core platform_profile ledtrig_audio snd soundcore rfkill int3403_thermal ac int340x_thermal_zone evdev int3400_thermal acpi_thermal_rel acpi_pad loop configs cor
 etemp
[   43.173670]  msr fuse nfsd auth_rpcgss nfs_acl lockd grace sunrpc nfs_ssc ip_tables x_tables autofs4 essiv authenc dm_crypt trusted tpm rng_core dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx multipath sata_sil24 r8169 realtek mdio_devres libphy mii hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel xhci_pci rtsx_pci_sdmmc nouveau ghash_clmulni_intel xhci_hcd mmc_core e1000e i2c_designware_platform mxm_wmi i2c_designware_core hwmon ptp aesni_intel intel_lpss_pci drm_ttm_helper i2c_i801 crypto_simd intel_lpss i2c_smbus psmouse i915 cryptd pps_core thunderbolt rtsx_pci idma64 usbcore ttm i2c_nvidia_gpu thermal wmi battery
[   43.173970] CPU: 9 PID: 1468 Comm: kworker/9:3 Tainted: G     U  W         5.12.1-amd64-preempt-sysrq-20190817 #1
[   43.174001] Hardware name: LENOVO 20QRS00200/20QRS00200, BIOS N2NET40W (1.25 ) 08/26/2020
[   43.174022] Workqueue: pm pm_runtime_work
[   43.174038] RIP: 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
[   43.174296] Code: 8b 40 10 48 8b 78 10 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 eb 5c 43 e2 4c 89 e2 48 c7 c7 ef 95 33 c1 48 89 c6 e8 c4 b2 6e e2 <0f> 0b 85 db b8 00 00 00 00 0f 4e c3 48 8b 4c 24 28 65 48 2b 0c 25
[   43.174336] RSP: 0018:ffffb51800eb39f0 EFLAGS: 00010286
[   43.174351] RAX: 0000000000000000 RBX: ffffffffffffff92 RCX: 0000000000000027
[   43.174370] RDX: 0000000000000027 RSI: 0000000000000001 RDI: ffff89f25c658590
[   43.174388] RBP: ffff89e351f09898 R08: 0000000000000003 R09: 0000000000000001
[   43.174407] R10: 0000000000aaaaaa R11: ffffb5182251c420 R12: ffff89e34291c5a0
[   43.178020] R13: 0000000000000000 R14: ffff89e355782e00 R15: ffff89e3524cb000
[   43.180638] FS:  0000000000000000(0000) GS:ffff89f25c640000(0000) knlGS:0000000000000000
[   43.183418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   43.186150] CR2: 00007f2e4be0e1af CR3: 0000000109928001 CR4: 00000000003706e0
[   43.188876] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   43.191778] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   43.194511] Call Trace:
[   43.197753]  gm200_acr_hsfw_boot+0xc4/0x168 [nouveau]
[   43.203012]  nvkm_acr_hsf_boot+0xad/0x115 [nouveau]
[   43.205781]  tu102_acr_init+0x16/0x2d [nouveau]
[   43.208502]  nvkm_acr_load+0x62/0x135 [nouveau]
[   43.211256]  ? timekeeping_get_ns+0x1c/0x32
[   43.216266]  nvkm_subdev_init+0x100/0x175 [nouveau]
[   43.222767]  nvkm_device_init+0x150/0x203 [nouveau]
[   43.230884]  nvkm_udevice_init+0x31/0x4b [nouveau]
[   43.234889]  nvkm_object_init+0x75/0x15f [nouveau]
[   43.237646]  nvkm_object_init+0x9e/0x15f [nouveau]
[   43.240283]  nvkm_object_init+0x9e/0x15f [nouveau]
[   43.242977]  nouveau_do_resume+0x4b/0x170 [nouveau]
[   43.245737]  nouveau_pmops_runtime_resume+0x76/0x12d [nouveau]
[   43.248416]  pci_pm_runtime_resume+0x75/0x80
[   43.251095]  ? pci_pm_restore+0x7a/0x7a
[   43.253750]  ? pci_pm_restore+0x7a/0x7a
[   43.256355]  __rpm_callback+0x75/0xdb
[   43.259020]  ? pci_pm_restore+0x7a/0x7a
[   43.261687]  rpm_callback+0x55/0x6b
[   43.264269]  ? pci_pm_restore+0x7a/0x7a
[   43.267104]  rpm_resume+0x376/0x47d
[   43.269799]  ? __schedule+0x5de/0x632
[   43.272370]  __pm_runtime_resume+0x5a/0x76
[   43.277743]  ? pci_pm_restore+0x7a/0x7a
[   43.281006]  rpm_get_suppliers+0x39/0x70
[   43.283602]  ? pci_pm_restore+0x7a/0x7a
[   43.286254]  __rpm_callback+0x59/0xdb
[   43.288886]  ? pci_pm_restore+0x7a/0x7a
[   43.296391]  rpm_callback+0x55/0x6b
[   43.300273]  ? pci_pm_restore+0x7a/0x7a
[   43.302811]  rpm_resume+0x376/0x47d
[   43.305372]  ? try_to_wake_up+0x1e8/0x2df
[   43.307844]  pm_runtime_work+0x5f/0x81
[   43.310390]  process_one_work+0x1ea/0x2e0
[   43.312937]  worker_thread+0x19c/0x240
[   43.315389]  ? rescuer_thread+0x294/0x294
[   43.317920]  kthread+0x10c/0x114
[   43.320392]  ? kthread_create_worker_on_cpu+0x65/0x65
[   43.322938]  ret_from_fork+0x1f/0x30
[   43.325469] ---[ end trace a858a74de695aa09 ]---
[   43.327909] nouveau 0000:01:00.0: acr: AHESASC binary failed
[   43.330611] nouveau 0000:01:00.0: acr: init failed, -110
[   43.333198] nouveau 0000:01:00.0: disp: fini running...
[   43.335614] nouveau 0000:01:00.0: disp: fini completed in 23us
[   43.340415] nouveau 0000:01:00.0: disp: fini running...
[   43.344006] nouveau 0000:01:00.0: disp: fini completed in 1us
[   43.346565] nouveau 0000:01:00.0: init failed with -110
[   43.349003] nouveau: systemd-udevd[290]:00000000:00000080: init failed with -110
[   43.351417] nouveau: DRM-master:00000000:00000000: init failed with -110
[   43.354505] nouveau: DRM-master:00000000:00000000: init failed with -110
[   43.362121] nouveau 0000:01:00.0: DRM: Client resume failed with error: -110
[   43.368650] nouveau 0000:01:00.0: DRM: resume failed with: -110
[   43.374973] snd_hda_intel 0000:01:00.1: runtime IRQ mapping not provided by arch
[   43.375016] snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
[   43.377906] snd_hda_intel 0000:01:00.1: Disabling MSI
[   43.380469] snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client
[   43.383361] snd_hda_intel 0000:01:00.1: VGA controller is disabled
[   43.386078] snd_hda_intel 0000:01:00.1: Delaying initialization
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Nouveau] 5.12.1 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
  2021-05-05 21:42                           ` [Nouveau] 5.12.1 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau] Marc MERLIN
@ 2021-05-06 14:50                             ` Bjorn Helgaas
  2021-05-25  3:13                               ` Ben Skeggs
  0 siblings, 1 reply; 77+ messages in thread
From: Bjorn Helgaas @ 2021-05-06 14:50 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: nouveau, Mika Westerberg, Ben Skeggs

[+cc Ben]

Hi Marc,

Thanks for paying attention to these things.  I added Ben (who
probably would see this via nouveau@lists.freedesktop.org anyway).
I don't see a PCI issue here, but the nouveau timeout, which I know
nothing about, does look like it could be interesting.

On Wed, May 05, 2021 at 02:42:27PM -0700, Marc MERLIN wrote:
> Howdy,
> I upgraded my thinkpad P73 from 5.9 to 5.12, and I now get this new
> ug at boot (although the system does continue booting and display works
> since I use i915 for display and only use nouveau for PM)
> 
> Short:
> [   18.561181] WARNING: CPU: 15 PID: 220 at drivers/gpu/drm/nouveau/nvkm/falcon/v1.c:247 nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
> [   18.561300] Modules linked in: dm_crypt trusted tpm rng_core dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx multipath sata_sil24 r8169 realtek mdio_devres libphy mii hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel xhci_pci rtsx_pci_sdmmc nouveau ghash_clmulni_intel xhci_hcd mmc_core e1000e i2c_designware_platform mxm_wmi i2c_designware_core hwmon ptp aesni_intel intel_lpss_pci drm_ttm_helper i2c_i801 crypto_simd intel_lpss i2c_smbus psmouse i915 cryptd pps_core thunderbolt rtsx_pci idma64 usbcore ttm i2c_nvidia_gpu thermal wmi battery
> [   18.561636] CPU: 15 PID: 220 Comm: kworker/15:2 Tainted: G     U            5.12.1-amd64-preempt-sysrq-20190817 #1
> [   18.561707] Hardware name: LENOVO 20QRS00200/20QRS00200, BIOS N2NET40W (1.25 ) 08/26/2020
> [   18.561765] Workqueue: pm pm_runtime_work
> [   18.561799] RIP: 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
> 
> Despite the warning, chip seems to go to sleep on batteries, poewertop
> shows an encouraging low battery use (my lowest one yet of any kernel):
> The battery reports a discharge rate of 10.7 W
> The power consumed was 230 J
> 
> So it seems that what I need from nouveau is working (power management)
> 
> Full warning below with logs
> 
> 
> Long:
> [    0.000000] Linux version 5.12.1-amd64-preempt-sysrq-20190817 (root@sauron.svh.merlins.org) (gcc (Debian 10.2.1-3) 10.2.1 20201224, GNU ld (GNU Binutils for Debian) 2.35.1) #1 SMP PREEMPT Wed May 5 13:05:02 PDT 2021
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.12.1-amd64-preempt-sysrq-20190817 root=/dev/mapper/cryptroot ro rootflags=subvol=root cryptopts=source=/dev/nvme0n1p7,keyscript=/sbin/cryptgetpw usbcore.autosuspend=1 pcie_aspm=force resume=/dev/dm-1 acpi_backlight=vendor nouveau.debug=disp=trace
> [    8.672663] nouveau 0000:01:00.0: runtime IRQ mapping not provided by arch
> [    8.677434] nouveau 0000:01:00.0: enabling device (0000 -> 0003)
> [    8.691872] nouveau 0000:01:00.0: NVIDIA TU104 (164000a1)
> [    8.789240] nouveau 0000:01:00.0: bios: version 90.04.4d.00.2c
> [    8.789605] nouveau 0000:01:00.0: pmu: firmware unavailable
> [    8.789897] nouveau 0000:01:00.0: enabling bus mastering
> [    8.789978] nouveau 0000:01:00.0: disp: preinit running...
> [    8.789981] nouveau 0000:01:00.0: disp: preinit completed in 0us
> [    8.789997] nouveau 0000:01:00.0: disp: fini running...
> [    8.789999] nouveau 0000:01:00.0: disp: fini completed in 0us
> [    8.790189] nouveau 0000:01:00.0: fb: 8192 MiB GDDR6
> [    8.800113] nouveau 0000:01:00.0: disp: init running...
> [    8.800116] nouveau 0000:01:00.0: disp: init skipped, engine has no users
> [    8.800118] nouveau 0000:01:00.0: disp: init completed in 2us
> [    8.801512] nouveau 0000:01:00.0: DRM: VRAM: 8192 MiB
> [    8.801515] nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
> [    8.801517] nouveau 0000:01:00.0: DRM: BIT table 'A' not found
> [    8.801520] nouveau 0000:01:00.0: DRM: BIT table 'L' not found
> [    8.801521] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
> [    8.801525] nouveau 0000:01:00.0: DRM: DCB version 4.1
> [    8.801527] nouveau 0000:01:00.0: DRM: DCB outp 00: 02800f66 04600020
> [    8.801529] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f52 00020010
> [    8.801531] nouveau 0000:01:00.0: DRM: DCB outp 02: 01022f36 04600010
> [    8.801533] nouveau 0000:01:00.0: DRM: DCB outp 03: 04033f76 04600010
> [    8.801535] nouveau 0000:01:00.0: DRM: DCB outp 04: 04044f86 04600020
> [    8.801537] nouveau 0000:01:00.0: DRM: DCB conn 00: 00020047
> [    8.801539] nouveau 0000:01:00.0: DRM: DCB conn 01: 00010161
> [    8.801541] nouveau 0000:01:00.0: DRM: DCB conn 02: 00001248
> [    8.801543] nouveau 0000:01:00.0: DRM: DCB conn 03: 01000348
> [    8.801543] nouveau 0000:01:00.0: DRM: DCB conn 04: 02000471
> [    8.802234] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
> [    8.802255] nouveau 0000:01:00.0: disp: init running...
> [    8.802257] nouveau 0000:01:00.0: disp: one-time init running...
> [    8.802259] nouveau 0000:01:00.0: disp: outp 00:0006:0f82: type 06 loc 0 or 2 link 2 con 0 edid 6 bus 0 head f
> [    8.802265] nouveau 0000:01:00.0: disp: outp 00:0006:0f82: bios dp 42 13 00 00
> [    8.802268] nouveau 0000:01:00.0: disp: outp 01:0002:0f42: type 02 loc 0 or 2 link 1 con 1 edid 5 bus 1 head f
> [    8.802272] nouveau 0000:01:00.0: disp: outp 02:0006:0f41: type 06 loc 0 or 1 link 1 con 2 edid 3 bus 2 head f
> [    8.802276] nouveau 0000:01:00.0: disp: outp 02:0006:0f41: bios dp 42 13 00 00
> [    8.802279] nouveau 0000:01:00.0: disp: outp 03:0006:0f44: type 06 loc 0 or 4 link 1 con 3 edid 7 bus 3 head f
> [    8.802283] nouveau 0000:01:00.0: disp: outp 03:0006:0f44: bios dp 42 13 00 00
> [    8.802285] nouveau 0000:01:00.0: disp: outp 04:0006:0f84: type 06 loc 0 or 4 link 2 con 4 edid 8 bus 4 head f
> [    8.802290] nouveau 0000:01:00.0: disp: outp 04:0006:0f84: bios dp 42 13 00 00
> [    8.802293] nouveau 0000:01:00.0: disp: conn 00:0047: type 47 loc 0 hpd 08 dp 0 di 0 sr 0 lcdid 0
> [    8.802298] nouveau 0000:01:00.0: disp: conn 00:0047: func 52 (HPD)
> [    8.802300] nouveau 0000:01:00.0: disp: conn 01:0161: type 61 loc 1 hpd 04 dp 0 di 0 sr 0 lcdid 0
> [    8.802305] nouveau 0000:01:00.0: disp: conn 01:0161: func 51 (HPD)
> [    8.802307] nouveau 0000:01:00.0: disp: conn 02:0248: type 48 loc 2 hpd 01 dp 0 di 0 sr 0 lcdid 0
> [    8.802311] nouveau 0000:01:00.0: disp: conn 02:0248: func 07 (HPD)
> [    8.802313] nouveau 0000:01:00.0: disp: conn 03:0348: type 48 loc 3 hpd 10 dp 0 di 0 sr 0 lcdid 0
> [    8.802317] nouveau 0000:01:00.0: disp: conn 03:0348: func 5e (HPD)
> [    8.802319] nouveau 0000:01:00.0: disp: conn 04:0471: type 71 loc 4 hpd 20 dp 0 di 0 sr 0 lcdid 0
> [    8.802324] nouveau 0000:01:00.0: disp: conn 04:0471: func 5f (HPD)
> [    8.802329] nouveau 0000:01:00.0: disp: Window(s): 8 (000000ff)
> [    8.802334] nouveau 0000:01:00.0: disp:   Head(s): 4 (0f)
> [    8.802338] nouveau 0000:01:00.0: disp: head-0: ctor
> [    8.802341] nouveau 0000:01:00.0: disp: head-1: ctor
> [    8.802345] nouveau 0000:01:00.0: disp: head-2: ctor
> [    8.802348] nouveau 0000:01:00.0: disp: head-3: ctor
> [    8.802352] nouveau 0000:01:00.0: disp:    SOR(s): 4 (0f)
> [    8.802356] nouveau 0000:01:00.0: disp: SOR-0: ctor
> [    8.802360] nouveau 0000:01:00.0: disp: SOR-1: ctor
> [    8.802364] nouveau 0000:01:00.0: disp: SOR-2: ctor
> [    8.802367] nouveau 0000:01:00.0: disp: SOR-3: ctor
> [    8.802387] nouveau 0000:01:00.0: disp: one-time init completed in 129us
> [    8.802440] nouveau 0000:01:00.0: disp: outp 00:0006:0f82: no route
> [    9.112902] nouveau 0000:01:00.0: disp: outp 00:0006:0f82: aux power -> always
> [    9.112987] nouveau 0000:01:00.0: disp: outp 00:0006:0f82: aux power -> demand
> [    9.113021] nouveau 0000:01:00.0: disp: outp 01:0002:0f42: no route
> [    9.113034] nouveau 0000:01:00.0: disp: outp 02:0006:0f41: no route
> [    9.113059] nouveau 0000:01:00.0: disp: outp 02:0006:0f41: aux power -> always
> [    9.113093] nouveau 0000:01:00.0: disp: outp 02:0006:0f41: aux power -> demand
> [    9.113119] nouveau 0000:01:00.0: disp: outp 03:0006:0f44: no route
> [    9.113141] nouveau 0000:01:00.0: disp: outp 03:0006:0f44: aux power -> always
> [    9.113175] nouveau 0000:01:00.0: disp: outp 03:0006:0f44: aux power -> demand
> [    9.113202] nouveau 0000:01:00.0: disp: outp 04:0006:0f84: no route
> [    9.113224] nouveau 0000:01:00.0: disp: outp 04:0006:0f84: aux power -> always
> [    9.113258] nouveau 0000:01:00.0: disp: outp 04:0006:0f84: aux power -> demand
> [    9.113665] nouveau 0000:01:00.0: disp: init completed in 311407us
> [    9.205451] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
> [    9.205682] nouveau 0000:01:00.0: disp: supervisor 1: 00000000
> [    9.205707] nouveau 0000:01:00.0: disp: head-0: 00000000
> [    9.205720] nouveau 0000:01:00.0: disp: head-1: 00000000
> [    9.205732] nouveau 0000:01:00.0: disp: head-2: 00000000
> [    9.205742] nouveau 0000:01:00.0: disp: head-3: 00000000
> [    9.205751] nouveau 0000:01:00.0: disp: Core:
> [    9.205764] nouveau 0000:01:00.0: disp: 	0200: 7efebfff -> 00000001
> [    9.205781] nouveau 0000:01:00.0: disp: 	0208: 00000000 -> f0000000
> [    9.205795] nouveau 0000:01:00.0: disp: 	020c: 00000000 -> 00001000
> [    9.205810] nouveau 0000:01:00.0: disp: 	0210: 00000000              
> [    9.205824] nouveau 0000:01:00.0: disp: 	0214: 00000000              
> [    9.205837] nouveau 0000:01:00.0: disp: 	0218: 00000000              
> [    9.205851] nouveau 0000:01:00.0: disp: 	021c: 00000000              
> [    9.205862] nouveau 0000:01:00.0: disp: Core - SOR 0:
> [    9.205874] nouveau 0000:01:00.0: disp: 	0300: 00000100              
> [    9.205889] nouveau 0000:01:00.0: disp: 	0304: 00000000              
> [    9.205903] nouveau 0000:01:00.0: disp: 	0308: 00000000              
> [    9.205918] nouveau 0000:01:00.0: disp: 	030c: 00000000              
> [    9.205928] nouveau 0000:01:00.0: disp: Core - SOR 1:
> [    9.205940] nouveau 0000:01:00.0: disp: 	0320: 00000100              
> [    9.205954] nouveau 0000:01:00.0: disp: 	0324: 00000000              
> [    9.205967] nouveau 0000:01:00.0: disp: 	0328: 00000000              
> [    9.205981] nouveau 0000:01:00.0: disp: 	032c: 00000000              
> [    9.205991] nouveau 0000:01:00.0: disp: Core - SOR 2:
> [    9.206003] nouveau 0000:01:00.0: disp: 	0340: 00000100              
> [    9.206017] nouveau 0000:01:00.0: disp: 	0344: 00000000              
> [    9.206030] nouveau 0000:01:00.0: disp: 	0348: 00000000              
> [    9.206044] nouveau 0000:01:00.0: disp: 	034c: 00000000              
> [    9.206054] nouveau 0000:01:00.0: disp: Core - SOR 3:
> [    9.206065] nouveau 0000:01:00.0: disp: 	0360: 00000100              
> [    9.206078] nouveau 0000:01:00.0: disp: 	0364: 00000000              
> [    9.206091] nouveau 0000:01:00.0: disp: 	0368: 00000000              
> [    9.206104] nouveau 0000:01:00.0: disp: 	036c: 00000000              
> [    9.206115] nouveau 0000:01:00.0: disp: Core - WINDOW 0:
> [    9.206127] nouveau 0000:01:00.0: disp: 	1000: 0000000f -> 00000000
> [    9.206142] nouveau 0000:01:00.0: disp: 	1004: 000003b7 -> 0000000f
> [    9.206156] nouveau 0000:01:00.0: disp: 	1008: 00000000              
> [    9.206171] nouveau 0000:01:00.0: disp: 	100c: 04000400              
> [    9.206186] nouveau 0000:01:00.0: disp: 	1010: 00100000 -> 00117fff
> [    9.206197] nouveau 0000:01:00.0: disp: Core - WINDOW 1:
> [    9.206209] nouveau 0000:01:00.0: disp: 	1080: 0000000f -> 00000000
> [    9.206223] nouveau 0000:01:00.0: disp: 	1084: 000003b7 -> 0000000f
> [    9.206237] nouveau 0000:01:00.0: disp: 	1088: 00000000              
> [    9.206250] nouveau 0000:01:00.0: disp: 	108c: 04000400              
> [    9.206265] nouveau 0000:01:00.0: disp: 	1090: 00100000 -> 00117fff
> [    9.206275] nouveau 0000:01:00.0: disp: Core - WINDOW 2:
> [    9.206287] nouveau 0000:01:00.0: disp: 	1100: 0000000f -> 00000001
> [    9.206300] nouveau 0000:01:00.0: disp: 	1104: 000003b7 -> 0000000f
> [    9.206313] nouveau 0000:01:00.0: disp: 	1108: 00000000              
> [    9.206327] nouveau 0000:01:00.0: disp: 	110c: 04000400              
> [    9.206341] nouveau 0000:01:00.0: disp: 	1110: 00100000 -> 00117fff
> [    9.206351] nouveau 0000:01:00.0: disp: Core - WINDOW 3:
> [    9.206362] nouveau 0000:01:00.0: disp: 	1180: 0000000f -> 00000001
> [    9.206375] nouveau 0000:01:00.0: disp: 	1184: 000003b7 -> 0000000f
> [    9.206389] nouveau 0000:01:00.0: disp: 	1188: 00000000              
> [    9.206403] nouveau 0000:01:00.0: disp: 	118c: 04000400              
> [    9.206417] nouveau 0000:01:00.0: disp: 	1190: 00100000 -> 00117fff
> [    9.206427] nouveau 0000:01:00.0: disp: Core - WINDOW 4:
> [    9.206440] nouveau 0000:01:00.0: disp: 	1200: 0000000f -> 00000002
> [    9.206455] nouveau 0000:01:00.0: disp: 	1204: 000003b7 -> 0000000f
> [    9.206469] nouveau 0000:01:00.0: disp: 	1208: 00000000              
> [    9.206481] nouveau 0000:01:00.0: disp: 	120c: 04000400              
> [    9.206495] nouveau 0000:01:00.0: disp: 	1210: 00100000 -> 00117fff
> [    9.206505] nouveau 0000:01:00.0: disp: Core - WINDOW 5:
> [    9.206517] nouveau 0000:01:00.0: disp: 	1280: 0000000f -> 00000002
> [    9.206531] nouveau 0000:01:00.0: disp: 	1284: 000003b7 -> 0000000f
> [    9.206544] nouveau 0000:01:00.0: disp: 	1288: 00000000              
> [    9.206558] nouveau 0000:01:00.0: disp: 	128c: 04000400              
> [    9.206571] nouveau 0000:01:00.0: disp: 	1290: 00100000 -> 00117fff
> [    9.206582] nouveau 0000:01:00.0: disp: Core - WINDOW 6:
> [    9.206594] nouveau 0000:01:00.0: disp: 	1300: 0000000f -> 00000003
> [    9.206607] nouveau 0000:01:00.0: disp: 	1304: 000003b7 -> 0000000f
> [    9.206620] nouveau 0000:01:00.0: disp: 	1308: 00000000              
> [    9.206635] nouveau 0000:01:00.0: disp: 	130c: 04000400              
> [    9.206650] nouveau 0000:01:00.0: disp: 	1310: 00100000 -> 00117fff
> [    9.206660] nouveau 0000:01:00.0: disp: Core - WINDOW 7:
> [    9.206672] nouveau 0000:01:00.0: disp: 	1380: 0000000f -> 00000003
> [    9.206685] nouveau 0000:01:00.0: disp: 	1384: 000003b7 -> 0000000f
> [    9.206699] nouveau 0000:01:00.0: disp: 	1388: 00000000              
> [    9.206713] nouveau 0000:01:00.0: disp: 	138c: 04000400              
> [    9.206727] nouveau 0000:01:00.0: disp: 	1390: 00100000 -> 00117fff
> [    9.206737] nouveau 0000:01:00.0: disp: Core - HEAD 0:
> [    9.206748] nouveau 0000:01:00.0: disp: 	2000: 00000000              
> [    9.206762] nouveau 0000:01:00.0: disp: 	2004: fc000040              
> [    9.206776] nouveau 0000:01:00.0: disp: 	2008: 00000180              
> [    9.206790] nouveau 0000:01:00.0: disp: 	200c: 00000000              
> [    9.206804] nouveau 0000:01:00.0: disp: 	2014: 00000011              
> [    9.206818] nouveau 0000:01:00.0: disp: 	2018: 00000000              
> [    9.206832] nouveau 0000:01:00.0: disp: 	201c: 00000000              
> [    9.206846] nouveau 0000:01:00.0: disp: 	2020: 00000000              
> [    9.206860] nouveau 0000:01:00.0: disp: 	2028: 00000000              
> [    9.206874] nouveau 0000:01:00.0: disp: 	202c: 04000400              
> [    9.206889] nouveau 0000:01:00.0: disp: 	2030: 00001000              
> [    9.206903] nouveau 0000:01:00.0: disp: 	2038: 00000001              
> [    9.206918] nouveau 0000:01:00.0: disp: 	203c: 00000005              
> [    9.206933] nouveau 0000:01:00.0: disp: 	2048: 00000000              
> [    9.206947] nouveau 0000:01:00.0: disp: 	204c: 00000000              
> [    9.206960] nouveau 0000:01:00.0: disp: 	2050: 00000000              
> [    9.206973] nouveau 0000:01:00.0: disp: 	2054: 00000000              
> [    9.206986] nouveau 0000:01:00.0: disp: 	2058: 00000000              
> [    9.206999] nouveau 0000:01:00.0: disp: 	205c: 00000000              
> [    9.207013] nouveau 0000:01:00.0: disp: 	2060: 00000000              
> [    9.207027] nouveau 0000:01:00.0: disp: 	2064: 00050008              
> [    9.207041] nouveau 0000:01:00.0: disp: 	2068: 00000000              
> [    9.207055] nouveau 0000:01:00.0: disp: 	206c: 00010003              
> [    9.207069] nouveau 0000:01:00.0: disp: 	2070: 00030004              
> [    9.207083] nouveau 0000:01:00.0: disp: 	2074: 00000001              
> [    9.207098] nouveau 0000:01:00.0: disp: 	2078: 00000000              
> [    9.207112] nouveau 0000:01:00.0: disp: 	207c: 00000000              
> [    9.207127] nouveau 0000:01:00.0: disp: 	2080: 00000000              
> [    9.207141] nouveau 0000:01:00.0: disp: 	2088: 00000000              
> [    9.207156] nouveau 0000:01:00.0: disp: 	2090: 00000000              
> [    9.207170] nouveau 0000:01:00.0: disp: 	209c: 000000e9              
> [    9.207185] nouveau 0000:01:00.0: disp: 	20a0: 000002ff              
> [    9.207200] nouveau 0000:01:00.0: disp: 	20a4: 00000000              
> [    9.207212] nouveau 0000:01:00.0: disp: 	20a8: 00000000              
> [    9.207225] nouveau 0000:01:00.0: disp: 	20ac: 00000000              
> [    9.207239] nouveau 0000:01:00.0: disp: 	218c: 00000000              
> [    9.207252] nouveau 0000:01:00.0: disp: 	2194: 00000000              
> [    9.207266] nouveau 0000:01:00.0: disp: 	2198: 00000000              
> [    9.207279] nouveau 0000:01:00.0: disp: 	219c: 00000000              
> [    9.207292] nouveau 0000:01:00.0: disp: 	21a0: 00000000              
> [    9.207307] nouveau 0000:01:00.0: disp: 	21a4: 00000000              
> [    9.207320] nouveau 0000:01:00.0: disp: 	2214: 00000000              
> [    9.207332] nouveau 0000:01:00.0: disp: 	2218: 00010002              
> [    9.207343] nouveau 0000:01:00.0: disp: Core - HEAD 1:
> [    9.207355] nouveau 0000:01:00.0: disp: 	2400: 00000000              
> [    9.207369] nouveau 0000:01:00.0: disp: 	2404: fc000040              
> [    9.207382] nouveau 0000:01:00.0: disp: 	2408: 00000180              
> [    9.207396] nouveau 0000:01:00.0: disp: 	240c: 00000000              
> [    9.207410] nouveau 0000:01:00.0: disp: 	2414: 00000011              
> [    9.207425] nouveau 0000:01:00.0: disp: 	2418: 00000000              
> [    9.207438] nouveau 0000:01:00.0: disp: 	241c: 00000000              
> [    9.207451] nouveau 0000:01:00.0: disp: 	2420: 00000000              
> [    9.207463] nouveau 0000:01:00.0: disp: 	2428: 00000000              
> [    9.207476] nouveau 0000:01:00.0: disp: 	242c: 04000400              
> [    9.207490] nouveau 0000:01:00.0: disp: 	2430: 00001000              
> [    9.207504] nouveau 0000:01:00.0: disp: 	2438: 00000001              
> [    9.207518] nouveau 0000:01:00.0: disp: 	243c: 00000005              
> [    9.207531] nouveau 0000:01:00.0: disp: 	2448: 00000000              
> [    9.207545] nouveau 0000:01:00.0: disp: 	244c: 00000000              
> [    9.207559] nouveau 0000:01:00.0: disp: 	2450: 00000000              
> [    9.207573] nouveau 0000:01:00.0: disp: 	2454: 00000000              
> [    9.207587] nouveau 0000:01:00.0: disp: 	2458: 00000000              
> [    9.207600] nouveau 0000:01:00.0: disp: 	245c: 00000000              
> [    9.207613] nouveau 0000:01:00.0: disp: 	2460: 00000000              
> [    9.207626] nouveau 0000:01:00.0: disp: 	2464: 00050008              
> [    9.207640] nouveau 0000:01:00.0: disp: 	2468: 00000000              
> [    9.207654] nouveau 0000:01:00.0: disp: 	246c: 00010003              
> [    9.207668] nouveau 0000:01:00.0: disp: 	2470: 00030004              
> [    9.207681] nouveau 0000:01:00.0: disp: 	2474: 00000001              
> [    9.207695] nouveau 0000:01:00.0: disp: 	2478: 00000000              
> [    9.207709] nouveau 0000:01:00.0: disp: 	247c: 00000000              
> [    9.207724] nouveau 0000:01:00.0: disp: 	2480: 00000000              
> [    9.207738] nouveau 0000:01:00.0: disp: 	2488: 00000000              
> [    9.207753] nouveau 0000:01:00.0: disp: 	2490: 00000000              
> [    9.207766] nouveau 0000:01:00.0: disp: 	249c: 000000e9              
> [    9.207781] nouveau 0000:01:00.0: disp: 	24a0: 000002ff              
> [    9.207794] nouveau 0000:01:00.0: disp: 	24a4: 00000000              
> [    9.207807] nouveau 0000:01:00.0: disp: 	24a8: 00000000              
> [    9.207821] nouveau 0000:01:00.0: disp: 	24ac: 00000000              
> [    9.207834] nouveau 0000:01:00.0: disp: 	258c: 00000000              
> [    9.207848] nouveau 0000:01:00.0: disp: 	2594: 00000000              
> [    9.207861] nouveau 0000:01:00.0: disp: 	2598: 00000000              
> [    9.207875] nouveau 0000:01:00.0: disp: 	259c: 00000000              
> [    9.207888] nouveau 0000:01:00.0: disp: 	25a0: 00000000              
> [    9.207901] nouveau 0000:01:00.0: disp: 	25a4: 00000000              
> [    9.207914] nouveau 0000:01:00.0: disp: 	2614: 00000000              
> [    9.207927] nouveau 0000:01:00.0: disp: 	2618: 00010002              
> [    9.207937] nouveau 0000:01:00.0: disp: Core - HEAD 2:
> [    9.207949] nouveau 0000:01:00.0: disp: 	2800: 00000000              
> [    9.207963] nouveau 0000:01:00.0: disp: 	2804: fc000040              
> [    9.207976] nouveau 0000:01:00.0: disp: 	2808: 00000180              
> [    9.207991] nouveau 0000:01:00.0: disp: 	280c: 00000000              
> [    9.208004] nouveau 0000:01:00.0: disp: 	2814: 00000011              
> [    9.208019] nouveau 0000:01:00.0: disp: 	2818: 00000000              
> [    9.208031] nouveau 0000:01:00.0: disp: 	281c: 00000000              
> [    9.208044] nouveau 0000:01:00.0: disp: 	2820: 00000000              
> [    9.208058] nouveau 0000:01:00.0: disp: 	2828: 00000000              
> [    9.208071] nouveau 0000:01:00.0: disp: 	282c: 04000400              
> [    9.208085] nouveau 0000:01:00.0: disp: 	2830: 00001000              
> [    9.208099] nouveau 0000:01:00.0: disp: 	2838: 00000001              
> [    9.208113] nouveau 0000:01:00.0: disp: 	283c: 00000005              
> [    9.208126] nouveau 0000:01:00.0: disp: 	2848: 00000000              
> [    9.208140] nouveau 0000:01:00.0: disp: 	284c: 00000000              
> [    9.208153] nouveau 0000:01:00.0: disp: 	2850: 00000000              
> [    9.208165] nouveau 0000:01:00.0: disp: 	2854: 00000000              
> [    9.208178] nouveau 0000:01:00.0: disp: 	2858: 00000000              
> [    9.208191] nouveau 0000:01:00.0: disp: 	285c: 00000000              
> [    9.208205] nouveau 0000:01:00.0: disp: 	2860: 00000000              
> [    9.208218] nouveau 0000:01:00.0: disp: 	2864: 00050008              
> [    9.208232] nouveau 0000:01:00.0: disp: 	2868: 00000000              
> [    9.208246] nouveau 0000:01:00.0: disp: 	286c: 00010003              
> [    9.208259] nouveau 0000:01:00.0: disp: 	2870: 00030004              
> [    9.208274] nouveau 0000:01:00.0: disp: 	2874: 00000001              
> [    9.208289] nouveau 0000:01:00.0: disp: 	2878: 00000000              
> [    9.208303] nouveau 0000:01:00.0: disp: 	287c: 00000000              
> [    9.208318] nouveau 0000:01:00.0: disp: 	2880: 00000000              
> [    9.208332] nouveau 0000:01:00.0: disp: 	2888: 00000000              
> [    9.208345] nouveau 0000:01:00.0: disp: 	2890: 00000000              
> [    9.208358] nouveau 0000:01:00.0: disp: 	289c: 000000e9              
> [    9.208371] nouveau 0000:01:00.0: disp: 	28a0: 000002ff              
> [    9.208385] nouveau 0000:01:00.0: disp: 	28a4: 00000000              
> [    9.208398] nouveau 0000:01:00.0: disp: 	28a8: 00000000              
> [    9.208412] nouveau 0000:01:00.0: disp: 	28ac: 00000000              
> [    9.208425] nouveau 0000:01:00.0: disp: 	298c: 00000000              
> [    9.208439] nouveau 0000:01:00.0: disp: 	2994: 00000000              
> [    9.208452] nouveau 0000:01:00.0: disp: 	2998: 00000000              
> [    9.208465] nouveau 0000:01:00.0: disp: 	299c: 00000000              
> [    9.208478] nouveau 0000:01:00.0: disp: 	29a0: 00000000              
> [    9.208491] nouveau 0000:01:00.0: disp: 	29a4: 00000000              
> [    9.208504] nouveau 0000:01:00.0: disp: 	2a14: 00000000              
> [    9.208517] nouveau 0000:01:00.0: disp: 	2a18: 00010002              
> [    9.208528] nouveau 0000:01:00.0: disp: Core - HEAD 3:
> [    9.208540] nouveau 0000:01:00.0: disp: 	2c00: 00000000              
> [    9.208554] nouveau 0000:01:00.0: disp: 	2c04: fc000040              
> [    9.208568] nouveau 0000:01:00.0: disp: 	2c08: 00000180              
> [    9.208583] nouveau 0000:01:00.0: disp: 	2c0c: 00000000              
> [    9.208597] nouveau 0000:01:00.0: disp: 	2c14: 00000011              
> [    9.208610] nouveau 0000:01:00.0: disp: 	2c18: 00000000              
> [    9.208623] nouveau 0000:01:00.0: disp: 	2c1c: 00000000              
> [    9.208636] nouveau 0000:01:00.0: disp: 	2c20: 00000000              
> [    9.208650] nouveau 0000:01:00.0: disp: 	2c28: 00000000              
> [    9.208664] nouveau 0000:01:00.0: disp: 	2c2c: 04000400              
> [    9.208677] nouveau 0000:01:00.0: disp: 	2c30: 00001000              
> [    9.208691] nouveau 0000:01:00.0: disp: 	2c38: 00000001              
> [    9.208722] nouveau 0000:01:00.0: disp: 	2c3c: 00000005              
> [    9.208736] nouveau 0000:01:00.0: disp: 	2c48: 00000000              
> [    9.208750] nouveau 0000:01:00.0: disp: 	2c4c: 00000000              
> [    9.208764] nouveau 0000:01:00.0: disp: 	2c50: 00000000              
> [    9.208777] nouveau 0000:01:00.0: disp: 	2c54: 00000000              
> [    9.208790] nouveau 0000:01:00.0: disp: 	2c58: 00000000              
> [    9.208803] nouveau 0000:01:00.0: disp: 	2c5c: 00000000              
> [    9.208815] nouveau 0000:01:00.0: disp: 	2c60: 00000000              
> [    9.208829] nouveau 0000:01:00.0: disp: 	2c64: 00050008              
> [    9.208842] nouveau 0000:01:00.0: disp: 	2c68: 00000000              
> [    9.208856] nouveau 0000:01:00.0: disp: 	2c6c: 00010003              
> [    9.208870] nouveau 0000:01:00.0: disp: 	2c70: 00030004              
> [    9.208884] nouveau 0000:01:00.0: disp: 	2c74: 00000001              
> [    9.208897] nouveau 0000:01:00.0: disp: 	2c78: 00000000              
> [    9.208911] nouveau 0000:01:00.0: disp: 	2c7c: 00000000              
> [    9.208925] nouveau 0000:01:00.0: disp: 	2c80: 00000000              
> [    9.208940] nouveau 0000:01:00.0: disp: 	2c88: 00000000              
> [    9.208954] nouveau 0000:01:00.0: disp: 	2c90: 00000000              
> [    9.208969] nouveau 0000:01:00.0: disp: 	2c9c: 000000e9              
> [    9.208984] nouveau 0000:01:00.0: disp: 	2ca0: 000002ff              
> [    9.208999] nouveau 0000:01:00.0: disp: 	2ca4: 00000000              
> [    9.209014] nouveau 0000:01:00.0: disp: 	2ca8: 00000000              
> [    9.209029] nouveau 0000:01:00.0: disp: 	2cac: 00000000              
> [    9.209043] nouveau 0000:01:00.0: disp: 	2d8c: 00000000              
> [    9.209058] nouveau 0000:01:00.0: disp: 	2d94: 00000000              
> [    9.209073] nouveau 0000:01:00.0: disp: 	2d98: 00000000              
> [    9.209087] nouveau 0000:01:00.0: disp: 	2d9c: 00000000              
> [    9.209099] nouveau 0000:01:00.0: disp: 	2da0: 00000000              
> [    9.209112] nouveau 0000:01:00.0: disp: 	2da4: 00000000              
> [    9.209126] nouveau 0000:01:00.0: disp: 	2e14: 00000000              
> [    9.209139] nouveau 0000:01:00.0: disp: 	2e18: 00010002              
> [    9.209388] nouveau 0000:01:00.0: disp: supervisor 2: 00000010
> [    9.209413] nouveau 0000:01:00.0: disp: head-0: 00000000
> [    9.209426] nouveau 0000:01:00.0: disp: head-1: 00000000
> [    9.209437] nouveau 0000:01:00.0: disp: head-2: 00000000
> [    9.209448] nouveau 0000:01:00.0: disp: head-3: 00000000
> [    9.209619] nouveau 0000:01:00.0: disp: supervisor 3: 00000010
> [    9.209643] nouveau 0000:01:00.0: disp: head-0: 00000000
> [    9.209656] nouveau 0000:01:00.0: disp: head-1: 00000000
> [    9.209668] nouveau 0000:01:00.0: disp: head-2: 00000000
> [    9.209679] nouveau 0000:01:00.0: disp: head-3: 00000000
> [    9.210852] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1
> [    9.210885] nouveau 0000:01:00.0: DRM: Disabling PCI power management to avoid bug
> [    9.212755] usb 1-8: new high-speed USB device number 3 using xhci_hcd
> [    9.296013] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
> [    9.382897] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
> [   18.460917] nouveau 0000:01:00.0: disp: suspend running...
> [   18.461005] nouveau 0000:01:00.0: disp: suspend completed in 41us
> [   18.561101] ------------[ cut here ]------------
> [   18.561138] nouveau 0000:01:00.0: timeout
> [   18.561181] WARNING: CPU: 15 PID: 220 at drivers/gpu/drm/nouveau/nvkm/falcon/v1.c:247 nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
> [   18.561300] Modules linked in: dm_crypt trusted tpm rng_core dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx multipath sata_sil24 r8169 realtek mdio_devres libphy mii hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel xhci_pci rtsx_pci_sdmmc nouveau ghash_clmulni_intel xhci_hcd mmc_core e1000e i2c_designware_platform mxm_wmi i2c_designware_core hwmon ptp aesni_intel intel_lpss_pci drm_ttm_helper i2c_i801 crypto_simd intel_lpss i2c_smbus psmouse i915 cryptd pps_core thunderbolt rtsx_pci idma64 usbcore ttm i2c_nvidia_gpu thermal wmi battery
> [   18.561636] CPU: 15 PID: 220 Comm: kworker/15:2 Tainted: G     U            5.12.1-amd64-preempt-sysrq-20190817 #1
> [   18.561707] Hardware name: LENOVO 20QRS00200/20QRS00200, BIOS N2NET40W (1.25 ) 08/26/2020
> [   18.561765] Workqueue: pm pm_runtime_work
> [   18.561799] RIP: 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
> [   18.561874] Code: 8b 40 10 48 8b 78 10 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 eb 5c 43 e2 4c 89 e2 48 c7 c7 ef 95 33 c1 48 89 c6 e8 c4 b2 6e e2 <0f> 0b 85 db b8 00 00 00 00 0f 4e c3 48 8b 4c 24 28 65 48 2b 0c 25
> [   18.561995] RSP: 0018:ffffb518007a7b08 EFLAGS: 00010286
> [   18.562035] RAX: 0000000000000000 RBX: ffffffffffffff92 RCX: 0000000000000003
> [   18.562086] RDX: 0000000000000850 RSI: 0000000000000001 RDI: ffffffffa4b25bac
> [   18.562136] RBP: ffff89e351f0a058 R08: 0000000000000003 R09: 0000000000000001
> [   18.562187] R10: 0000000000aaaaaa R11: ffffb51821e14440 R12: ffff89e34291c5a0
> [   18.562238] R13: 0000000000000000 R14: ffff89e355782e00 R15: ffff89e3524cb000
> [   18.562289] FS:  0000000000000000(0000) GS:ffff89f25c7c0000(0000) knlGS:0000000000000000
> [   18.562345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   18.562388] CR2: 000055ec245a00a8 CR3: 0000000545410006 CR4: 00000000003706e0
> [   18.562439] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   18.562491] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   18.562542] Call Trace:
> [   18.562569]  gm200_acr_hsfw_boot+0xc4/0x168 [nouveau]
> [   18.562636]  nvkm_acr_hsf_boot+0xad/0x115 [nouveau]
> [   18.565673]  nvkm_acr_fini+0x22/0x30 [nouveau]
> [   18.568732]  nvkm_subdev_fini+0xb8/0xff [nouveau]
> [   18.571775]  nvkm_device_fini+0x8b/0x178 [nouveau]
> [   18.574834]  nvkm_udevice_fini+0x34/0x55 [nouveau]
> [   18.577872]  nvkm_object_fini+0xeb/0x1d6 [nouveau]
> [   18.580862]  nvkm_object_fini+0x8d/0x1d6 [nouveau]
> [   18.584095]  nouveau_do_suspend+0x1fe/0x26f [nouveau]
> [   18.587135]  nouveau_pmops_runtime_suspend+0x46/0x82 [nouveau]
> [   18.590097]  pci_pm_runtime_suspend+0x5e/0x155
> [   18.593013]  ? pci_pm_thaw_noirq+0x62/0x62
> [   18.595914]  ? pci_pm_thaw_noirq+0x62/0x62
> [   18.598802]  __rpm_callback+0x75/0xdb
> [   18.601654]  ? pci_pm_thaw_noirq+0x62/0x62
> [   18.604491]  rpm_callback+0x55/0x6b
> [   18.607317]  rpm_suspend+0x2a6/0x4af
> [   18.610117]  ? __raw_spin_unlock_irq+0x8/0x17
> [   18.612901]  ? finish_task_switch.isra.0+0x136/0x214
> [   18.615673]  pm_runtime_work+0x77/0x81
> [   18.618428]  process_one_work+0x1ea/0x2e0
> [   18.621156]  worker_thread+0x19c/0x240
> [   18.624140]  ? rescuer_thread+0x294/0x294
> [   18.626886]  kthread+0x10c/0x114
> [   18.629567]  ? kthread_create_worker_on_cpu+0x65/0x65
> [   18.632253]  ret_from_fork+0x1f/0x30
> [   18.634949] ---[ end trace a858a74de695aa08 ]---
> [   18.637620] nouveau 0000:01:00.0: acr: unload binary failed
> [   18.913087] nouveau 0000:01:00.0: saving config space at offset 0x0 (reading 0x1eb610de)
> [   18.913091] nouveau 0000:01:00.0: saving config space at offset 0x4 (reading 0x100407)
> [   18.913093] nouveau 0000:01:00.0: saving config space at offset 0x8 (reading 0x30000a1)
> [   18.913095] nouveau 0000:01:00.0: saving config space at offset 0xc (reading 0x800000)
> [   18.913097] nouveau 0000:01:00.0: saving config space at offset 0x10 (reading 0xcd000000)
> [   18.913099] nouveau 0000:01:00.0: saving config space at offset 0x14 (reading 0xa000000c)
> [   18.913102] nouveau 0000:01:00.0: saving config space at offset 0x18 (reading 0x0)
> [   18.913104] nouveau 0000:01:00.0: saving config space at offset 0x1c (reading 0xb000000c)
> [   18.913106] nouveau 0000:01:00.0: saving config space at offset 0x20 (reading 0x0)
> [   18.913108] nouveau 0000:01:00.0: saving config space at offset 0x24 (reading 0x2001)
> [   18.913111] nouveau 0000:01:00.0: saving config space at offset 0x28 (reading 0x0)
> [   18.913113] nouveau 0000:01:00.0: saving config space at offset 0x2c (reading 0x229b17aa)
> [   18.913115] nouveau 0000:01:00.0: saving config space at offset 0x30 (reading 0xfff80000)
> [   18.913117] nouveau 0000:01:00.0: saving config space at offset 0x34 (reading 0x60)
> [   18.913119] nouveau 0000:01:00.0: saving config space at offset 0x38 (reading 0x0)
> [   18.913122] nouveau 0000:01:00.0: saving config space at offset 0x3c (reading 0x1ff)
> [   18.913179] nouveau 0000:01:00.0: power state changed by ACPI to D3cold
> [   43.064748] nouveau 0000:01:00.0: power state changed by ACPI to D0
> [   43.064836] nouveau 0000:01:00.0: restoring config space at offset 0x3c (was 0x100, writing 0x1ff)
> [   43.064845] nouveau 0000:01:00.0: restoring config space at offset 0x30 (was 0x0, writing 0xfff80000)
> [   43.064853] nouveau 0000:01:00.0: restoring config space at offset 0x24 (was 0x1, writing 0x2001)
> [   43.064860] nouveau 0000:01:00.0: restoring config space at offset 0x1c (was 0xc, writing 0xb000000c)
> [   43.064868] nouveau 0000:01:00.0: restoring config space at offset 0x14 (was 0xc, writing 0xa000000c)
> [   43.064874] nouveau 0000:01:00.0: restoring config space at offset 0x10 (was 0x0, writing 0xcd000000)
> [   43.064883] nouveau 0000:01:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
> [   43.065008] nouveau 0000:01:00.0: disp: preinit running...
> [   43.065038] nouveau 0000:01:00.0: disp: preinit completed in 0us
> [   43.065200] nouveau 0000:01:00.0: disp: fini running...
> [   43.065226] nouveau 0000:01:00.0: disp: fini completed in 2us
> [   43.073510] nouveau 0000:01:00.0: fifo: fault 01 [VIRT_WRITE] at 00000000003b1000 engine c0 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel -1 [01ffedf000 unknown]
> [   43.073579] nouveau 0000:01:00.0: fifo: fault 00 [VIRT_READ] at 0000000000000000 engine 0e [sec2] client 16 [HUB/SEC] reason 00 [PDE] on channel -1 [01ffe5d000 unknown]
> [   43.073616] nouveau 0000:01:00.0: fifo: runlist 3: scheduled for recovery
> [   43.073636] nouveau 0000:01:00.0: fifo: engine 3: scheduled for recovery
> [   43.173456] ------------[ cut here ]------------
> [   43.173477] nouveau 0000:01:00.0: timeout
> [   43.173533] WARNING: CPU: 9 PID: 1468 at drivers/gpu/drm/nouveau/nvkm/falcon/v1.c:247 nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
> [   43.173614] Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops btusb videobuf2_v4l2 btrtl videobuf2_common btbcm btintel videodev bluetooth mc ecdh_generic ecc iwlmvm mac80211 libarc4 mei_hdcp x86_pkg_temp_thermal intel_powerclamp kvm_intel nls_utf8 snd_hda_codec_conexant snd_hda_codec_generic nls_cp437 kvm snd_hda_intel(+) vfat snd_intel_dspcfg iwlwifi fat irqbypass snd_hda_codec squashfs input_leds joydev rapl deflate serio_raw intel_cstate efi_pstore pcspkr snd_hda_core iTCO_wdt wmi_bmof intel_wmi_thunderbolt tpm_crb cfg80211 iTCO_vendor_support ee1004 8250_dw snd_hwdep processor_thermal_device processor_thermal_rfim ucsi_ccg(+) snd_pcm sg ucsi_acpi thinkpad_acpi nvidiafb typec_ucsi vgastate mei_me processor_thermal_mbox typec intel_pch_thermal fb_ddc tpm_tis intel_soc_dts_iosf snd_timer nvram roles tpm_tis_core platform_profile ledtrig_audio snd soundcore rfkill int3403_thermal ac int340x_thermal_zone evdev int3400_thermal acpi_thermal_rel acpi_pad loop configs c
 oretemp
> [   43.173670]  msr fuse nfsd auth_rpcgss nfs_acl lockd grace sunrpc nfs_ssc ip_tables x_tables autofs4 essiv authenc dm_crypt trusted tpm rng_core dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx multipath sata_sil24 r8169 realtek mdio_devres libphy mii hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel xhci_pci rtsx_pci_sdmmc nouveau ghash_clmulni_intel xhci_hcd mmc_core e1000e i2c_designware_platform mxm_wmi i2c_designware_core hwmon ptp aesni_intel intel_lpss_pci drm_ttm_helper i2c_i801 crypto_simd intel_lpss i2c_smbus psmouse i915 cryptd pps_core thunderbolt rtsx_pci idma64 usbcore ttm i2c_nvidia_gpu thermal wmi battery
> [   43.173970] CPU: 9 PID: 1468 Comm: kworker/9:3 Tainted: G     U  W         5.12.1-amd64-preempt-sysrq-20190817 #1
> [   43.174001] Hardware name: LENOVO 20QRS00200/20QRS00200, BIOS N2NET40W (1.25 ) 08/26/2020
> [   43.174022] Workqueue: pm pm_runtime_work
> [   43.174038] RIP: 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
> [   43.174296] Code: 8b 40 10 48 8b 78 10 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 eb 5c 43 e2 4c 89 e2 48 c7 c7 ef 95 33 c1 48 89 c6 e8 c4 b2 6e e2 <0f> 0b 85 db b8 00 00 00 00 0f 4e c3 48 8b 4c 24 28 65 48 2b 0c 25
> [   43.174336] RSP: 0018:ffffb51800eb39f0 EFLAGS: 00010286
> [   43.174351] RAX: 0000000000000000 RBX: ffffffffffffff92 RCX: 0000000000000027
> [   43.174370] RDX: 0000000000000027 RSI: 0000000000000001 RDI: ffff89f25c658590
> [   43.174388] RBP: ffff89e351f09898 R08: 0000000000000003 R09: 0000000000000001
> [   43.174407] R10: 0000000000aaaaaa R11: ffffb5182251c420 R12: ffff89e34291c5a0
> [   43.178020] R13: 0000000000000000 R14: ffff89e355782e00 R15: ffff89e3524cb000
> [   43.180638] FS:  0000000000000000(0000) GS:ffff89f25c640000(0000) knlGS:0000000000000000
> [   43.183418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   43.186150] CR2: 00007f2e4be0e1af CR3: 0000000109928001 CR4: 00000000003706e0
> [   43.188876] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   43.191778] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   43.194511] Call Trace:
> [   43.197753]  gm200_acr_hsfw_boot+0xc4/0x168 [nouveau]
> [   43.203012]  nvkm_acr_hsf_boot+0xad/0x115 [nouveau]
> [   43.205781]  tu102_acr_init+0x16/0x2d [nouveau]
> [   43.208502]  nvkm_acr_load+0x62/0x135 [nouveau]
> [   43.211256]  ? timekeeping_get_ns+0x1c/0x32
> [   43.216266]  nvkm_subdev_init+0x100/0x175 [nouveau]
> [   43.222767]  nvkm_device_init+0x150/0x203 [nouveau]
> [   43.230884]  nvkm_udevice_init+0x31/0x4b [nouveau]
> [   43.234889]  nvkm_object_init+0x75/0x15f [nouveau]
> [   43.237646]  nvkm_object_init+0x9e/0x15f [nouveau]
> [   43.240283]  nvkm_object_init+0x9e/0x15f [nouveau]
> [   43.242977]  nouveau_do_resume+0x4b/0x170 [nouveau]
> [   43.245737]  nouveau_pmops_runtime_resume+0x76/0x12d [nouveau]
> [   43.248416]  pci_pm_runtime_resume+0x75/0x80
> [   43.251095]  ? pci_pm_restore+0x7a/0x7a
> [   43.253750]  ? pci_pm_restore+0x7a/0x7a
> [   43.256355]  __rpm_callback+0x75/0xdb
> [   43.259020]  ? pci_pm_restore+0x7a/0x7a
> [   43.261687]  rpm_callback+0x55/0x6b
> [   43.264269]  ? pci_pm_restore+0x7a/0x7a
> [   43.267104]  rpm_resume+0x376/0x47d
> [   43.269799]  ? __schedule+0x5de/0x632
> [   43.272370]  __pm_runtime_resume+0x5a/0x76
> [   43.277743]  ? pci_pm_restore+0x7a/0x7a
> [   43.281006]  rpm_get_suppliers+0x39/0x70
> [   43.283602]  ? pci_pm_restore+0x7a/0x7a
> [   43.286254]  __rpm_callback+0x59/0xdb
> [   43.288886]  ? pci_pm_restore+0x7a/0x7a
> [   43.296391]  rpm_callback+0x55/0x6b
> [   43.300273]  ? pci_pm_restore+0x7a/0x7a
> [   43.302811]  rpm_resume+0x376/0x47d
> [   43.305372]  ? try_to_wake_up+0x1e8/0x2df
> [   43.307844]  pm_runtime_work+0x5f/0x81
> [   43.310390]  process_one_work+0x1ea/0x2e0
> [   43.312937]  worker_thread+0x19c/0x240
> [   43.315389]  ? rescuer_thread+0x294/0x294
> [   43.317920]  kthread+0x10c/0x114
> [   43.320392]  ? kthread_create_worker_on_cpu+0x65/0x65
> [   43.322938]  ret_from_fork+0x1f/0x30
> [   43.325469] ---[ end trace a858a74de695aa09 ]---
> [   43.327909] nouveau 0000:01:00.0: acr: AHESASC binary failed
> [   43.330611] nouveau 0000:01:00.0: acr: init failed, -110
> [   43.333198] nouveau 0000:01:00.0: disp: fini running...
> [   43.335614] nouveau 0000:01:00.0: disp: fini completed in 23us
> [   43.340415] nouveau 0000:01:00.0: disp: fini running...
> [   43.344006] nouveau 0000:01:00.0: disp: fini completed in 1us
> [   43.346565] nouveau 0000:01:00.0: init failed with -110
> [   43.349003] nouveau: systemd-udevd[290]:00000000:00000080: init failed with -110
> [   43.351417] nouveau: DRM-master:00000000:00000000: init failed with -110
> [   43.354505] nouveau: DRM-master:00000000:00000000: init failed with -110
> [   43.362121] nouveau 0000:01:00.0: DRM: Client resume failed with error: -110
> [   43.368650] nouveau 0000:01:00.0: DRM: resume failed with: -110
> [   43.374973] snd_hda_intel 0000:01:00.1: runtime IRQ mapping not provided by arch
> [   43.375016] snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
> [   43.377906] snd_hda_intel 0000:01:00.1: Disabling MSI
> [   43.380469] snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client
> [   43.383361] snd_hda_intel 0000:01:00.1: VGA controller is disabled
> [   43.386078] snd_hda_intel 0000:01:00.1: Delaying initialization
> -- 
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
>  
> Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [Nouveau] 5.12.1 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
  2021-05-06 14:50                             ` Bjorn Helgaas
@ 2021-05-25  3:13                               ` Ben Skeggs
  0 siblings, 0 replies; 77+ messages in thread
From: Ben Skeggs @ 2021-05-25  3:13 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Mika Westerberg, Ben Skeggs, ML nouveau

On Fri, 7 May 2021 at 00:50, Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Ben]
>
> Hi Marc,
>
> Thanks for paying attention to these things.  I added Ben (who
> probably would see this via nouveau@lists.freedesktop.org anyway).
> I don't see a PCI issue here, but the nouveau timeout, which I know
> nothing about, does look like it could be interesting.
This is likely from a bug that snuck into linux-firmware, I've sent a
patch[1] recently that will probably solve this.

Ben.

[1] https://lore.kernel.org/linux-firmware/20210518063631.5072-1-bskeggs@redhat.com/T/#u

>
> On Wed, May 05, 2021 at 02:42:27PM -0700, Marc MERLIN wrote:
> > Howdy,
> > I upgraded my thinkpad P73 from 5.9 to 5.12, and I now get this new
> > ug at boot (although the system does continue booting and display works
> > since I use i915 for display and only use nouveau for PM)
> >
> > Short:
> > [   18.561181] WARNING: CPU: 15 PID: 220 at drivers/gpu/drm/nouveau/nvkm/falcon/v1.c:247 nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
> > [   18.561300] Modules linked in: dm_crypt trusted tpm rng_core dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx multipath sata_sil24 r8169 realtek mdio_devres libphy mii hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel xhci_pci rtsx_pci_sdmmc nouveau ghash_clmulni_intel xhci_hcd mmc_core e1000e i2c_designware_platform mxm_wmi i2c_designware_core hwmon ptp aesni_intel intel_lpss_pci drm_ttm_helper i2c_i801 crypto_simd intel_lpss i2c_smbus psmouse i915 cryptd pps_core thunderbolt rtsx_pci idma64 usbcore ttm i2c_nvidia_gpu thermal wmi battery
> > [   18.561636] CPU: 15 PID: 220 Comm: kworker/15:2 Tainted: G     U            5.12.1-amd64-preempt-sysrq-20190817 #1
> > [   18.561707] Hardware name: LENOVO 20QRS00200/20QRS00200, BIOS N2NET40W (1.25 ) 08/26/2020
> > [   18.561765] Workqueue: pm pm_runtime_work
> > [   18.561799] RIP: 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
> >
> > Despite the warning, chip seems to go to sleep on batteries, poewertop
> > shows an encouraging low battery use (my lowest one yet of any kernel):
> > The battery reports a discharge rate of 10.7 W
> > The power consumed was 230 J
> >
> > So it seems that what I need from nouveau is working (power management)
> >
> > Full warning below with logs
> >
> >
> > Long:
> > [    0.000000] Linux version 5.12.1-amd64-preempt-sysrq-20190817 (root@sauron.svh.merlins.org) (gcc (Debian 10.2.1-3) 10.2.1 20201224, GNU ld (GNU Binutils for Debian) 2.35.1) #1 SMP PREEMPT Wed May 5 13:05:02 PDT 2021
> > [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.12.1-amd64-preempt-sysrq-20190817 root=/dev/mapper/cryptroot ro rootflags=subvol=root cryptopts=source=/dev/nvme0n1p7,keyscript=/sbin/cryptgetpw usbcore.autosuspend=1 pcie_aspm=force resume=/dev/dm-1 acpi_backlight=vendor nouveau.debug=disp=trace
> > [    8.672663] nouveau 0000:01:00.0: runtime IRQ mapping not provided by arch
> > [    8.677434] nouveau 0000:01:00.0: enabling device (0000 -> 0003)
> > [    8.691872] nouveau 0000:01:00.0: NVIDIA TU104 (164000a1)
> > [    8.789240] nouveau 0000:01:00.0: bios: version 90.04.4d.00.2c
> > [    8.789605] nouveau 0000:01:00.0: pmu: firmware unavailable
> > [    8.789897] nouveau 0000:01:00.0: enabling bus mastering
> > [    8.789978] nouveau 0000:01:00.0: disp: preinit running...
> > [    8.789981] nouveau 0000:01:00.0: disp: preinit completed in 0us
> > [    8.789997] nouveau 0000:01:00.0: disp: fini running...
> > [    8.789999] nouveau 0000:01:00.0: disp: fini completed in 0us
> > [    8.790189] nouveau 0000:01:00.0: fb: 8192 MiB GDDR6
> > [    8.800113] nouveau 0000:01:00.0: disp: init running...
> > [    8.800116] nouveau 0000:01:00.0: disp: init skipped, engine has no users
> > [    8.800118] nouveau 0000:01:00.0: disp: init completed in 2us
> > [    8.801512] nouveau 0000:01:00.0: DRM: VRAM: 8192 MiB
> > [    8.801515] nouveau 0000:01:00.0: DRM: GART: 536870912 MiB
> > [    8.801517] nouveau 0000:01:00.0: DRM: BIT table 'A' not found
> > [    8.801520] nouveau 0000:01:00.0: DRM: BIT table 'L' not found
> > [    8.801521] nouveau 0000:01:00.0: DRM: TMDS table version 2.0
> > [    8.801525] nouveau 0000:01:00.0: DRM: DCB version 4.1
> > [    8.801527] nouveau 0000:01:00.0: DRM: DCB outp 00: 02800f66 04600020
> > [    8.801529] nouveau 0000:01:00.0: DRM: DCB outp 01: 02011f52 00020010
> > [    8.801531] nouveau 0000:01:00.0: DRM: DCB outp 02: 01022f36 04600010
> > [    8.801533] nouveau 0000:01:00.0: DRM: DCB outp 03: 04033f76 04600010
> > [    8.801535] nouveau 0000:01:00.0: DRM: DCB outp 04: 04044f86 04600020
> > [    8.801537] nouveau 0000:01:00.0: DRM: DCB conn 00: 00020047
> > [    8.801539] nouveau 0000:01:00.0: DRM: DCB conn 01: 00010161
> > [    8.801541] nouveau 0000:01:00.0: DRM: DCB conn 02: 00001248
> > [    8.801543] nouveau 0000:01:00.0: DRM: DCB conn 03: 01000348
> > [    8.801543] nouveau 0000:01:00.0: DRM: DCB conn 04: 02000471
> > [    8.802234] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies
> > [    8.802255] nouveau 0000:01:00.0: disp: init running...
> > [    8.802257] nouveau 0000:01:00.0: disp: one-time init running...
> > [    8.802259] nouveau 0000:01:00.0: disp: outp 00:0006:0f82: type 06 loc 0 or 2 link 2 con 0 edid 6 bus 0 head f
> > [    8.802265] nouveau 0000:01:00.0: disp: outp 00:0006:0f82: bios dp 42 13 00 00
> > [    8.802268] nouveau 0000:01:00.0: disp: outp 01:0002:0f42: type 02 loc 0 or 2 link 1 con 1 edid 5 bus 1 head f
> > [    8.802272] nouveau 0000:01:00.0: disp: outp 02:0006:0f41: type 06 loc 0 or 1 link 1 con 2 edid 3 bus 2 head f
> > [    8.802276] nouveau 0000:01:00.0: disp: outp 02:0006:0f41: bios dp 42 13 00 00
> > [    8.802279] nouveau 0000:01:00.0: disp: outp 03:0006:0f44: type 06 loc 0 or 4 link 1 con 3 edid 7 bus 3 head f
> > [    8.802283] nouveau 0000:01:00.0: disp: outp 03:0006:0f44: bios dp 42 13 00 00
> > [    8.802285] nouveau 0000:01:00.0: disp: outp 04:0006:0f84: type 06 loc 0 or 4 link 2 con 4 edid 8 bus 4 head f
> > [    8.802290] nouveau 0000:01:00.0: disp: outp 04:0006:0f84: bios dp 42 13 00 00
> > [    8.802293] nouveau 0000:01:00.0: disp: conn 00:0047: type 47 loc 0 hpd 08 dp 0 di 0 sr 0 lcdid 0
> > [    8.802298] nouveau 0000:01:00.0: disp: conn 00:0047: func 52 (HPD)
> > [    8.802300] nouveau 0000:01:00.0: disp: conn 01:0161: type 61 loc 1 hpd 04 dp 0 di 0 sr 0 lcdid 0
> > [    8.802305] nouveau 0000:01:00.0: disp: conn 01:0161: func 51 (HPD)
> > [    8.802307] nouveau 0000:01:00.0: disp: conn 02:0248: type 48 loc 2 hpd 01 dp 0 di 0 sr 0 lcdid 0
> > [    8.802311] nouveau 0000:01:00.0: disp: conn 02:0248: func 07 (HPD)
> > [    8.802313] nouveau 0000:01:00.0: disp: conn 03:0348: type 48 loc 3 hpd 10 dp 0 di 0 sr 0 lcdid 0
> > [    8.802317] nouveau 0000:01:00.0: disp: conn 03:0348: func 5e (HPD)
> > [    8.802319] nouveau 0000:01:00.0: disp: conn 04:0471: type 71 loc 4 hpd 20 dp 0 di 0 sr 0 lcdid 0
> > [    8.802324] nouveau 0000:01:00.0: disp: conn 04:0471: func 5f (HPD)
> > [    8.802329] nouveau 0000:01:00.0: disp: Window(s): 8 (000000ff)
> > [    8.802334] nouveau 0000:01:00.0: disp:   Head(s): 4 (0f)
> > [    8.802338] nouveau 0000:01:00.0: disp: head-0: ctor
> > [    8.802341] nouveau 0000:01:00.0: disp: head-1: ctor
> > [    8.802345] nouveau 0000:01:00.0: disp: head-2: ctor
> > [    8.802348] nouveau 0000:01:00.0: disp: head-3: ctor
> > [    8.802352] nouveau 0000:01:00.0: disp:    SOR(s): 4 (0f)
> > [    8.802356] nouveau 0000:01:00.0: disp: SOR-0: ctor
> > [    8.802360] nouveau 0000:01:00.0: disp: SOR-1: ctor
> > [    8.802364] nouveau 0000:01:00.0: disp: SOR-2: ctor
> > [    8.802367] nouveau 0000:01:00.0: disp: SOR-3: ctor
> > [    8.802387] nouveau 0000:01:00.0: disp: one-time init completed in 129us
> > [    8.802440] nouveau 0000:01:00.0: disp: outp 00:0006:0f82: no route
> > [    9.112902] nouveau 0000:01:00.0: disp: outp 00:0006:0f82: aux power -> always
> > [    9.112987] nouveau 0000:01:00.0: disp: outp 00:0006:0f82: aux power -> demand
> > [    9.113021] nouveau 0000:01:00.0: disp: outp 01:0002:0f42: no route
> > [    9.113034] nouveau 0000:01:00.0: disp: outp 02:0006:0f41: no route
> > [    9.113059] nouveau 0000:01:00.0: disp: outp 02:0006:0f41: aux power -> always
> > [    9.113093] nouveau 0000:01:00.0: disp: outp 02:0006:0f41: aux power -> demand
> > [    9.113119] nouveau 0000:01:00.0: disp: outp 03:0006:0f44: no route
> > [    9.113141] nouveau 0000:01:00.0: disp: outp 03:0006:0f44: aux power -> always
> > [    9.113175] nouveau 0000:01:00.0: disp: outp 03:0006:0f44: aux power -> demand
> > [    9.113202] nouveau 0000:01:00.0: disp: outp 04:0006:0f84: no route
> > [    9.113224] nouveau 0000:01:00.0: disp: outp 04:0006:0f84: aux power -> always
> > [    9.113258] nouveau 0000:01:00.0: disp: outp 04:0006:0f84: aux power -> demand
> > [    9.113665] nouveau 0000:01:00.0: disp: init completed in 311407us
> > [    9.205451] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
> > [    9.205682] nouveau 0000:01:00.0: disp: supervisor 1: 00000000
> > [    9.205707] nouveau 0000:01:00.0: disp: head-0: 00000000
> > [    9.205720] nouveau 0000:01:00.0: disp: head-1: 00000000
> > [    9.205732] nouveau 0000:01:00.0: disp: head-2: 00000000
> > [    9.205742] nouveau 0000:01:00.0: disp: head-3: 00000000
> > [    9.205751] nouveau 0000:01:00.0: disp: Core:
> > [    9.205764] nouveau 0000:01:00.0: disp:    0200: 7efebfff -> 00000001
> > [    9.205781] nouveau 0000:01:00.0: disp:    0208: 00000000 -> f0000000
> > [    9.205795] nouveau 0000:01:00.0: disp:    020c: 00000000 -> 00001000
> > [    9.205810] nouveau 0000:01:00.0: disp:    0210: 00000000
> > [    9.205824] nouveau 0000:01:00.0: disp:    0214: 00000000
> > [    9.205837] nouveau 0000:01:00.0: disp:    0218: 00000000
> > [    9.205851] nouveau 0000:01:00.0: disp:    021c: 00000000
> > [    9.205862] nouveau 0000:01:00.0: disp: Core - SOR 0:
> > [    9.205874] nouveau 0000:01:00.0: disp:    0300: 00000100
> > [    9.205889] nouveau 0000:01:00.0: disp:    0304: 00000000
> > [    9.205903] nouveau 0000:01:00.0: disp:    0308: 00000000
> > [    9.205918] nouveau 0000:01:00.0: disp:    030c: 00000000
> > [    9.205928] nouveau 0000:01:00.0: disp: Core - SOR 1:
> > [    9.205940] nouveau 0000:01:00.0: disp:    0320: 00000100
> > [    9.205954] nouveau 0000:01:00.0: disp:    0324: 00000000
> > [    9.205967] nouveau 0000:01:00.0: disp:    0328: 00000000
> > [    9.205981] nouveau 0000:01:00.0: disp:    032c: 00000000
> > [    9.205991] nouveau 0000:01:00.0: disp: Core - SOR 2:
> > [    9.206003] nouveau 0000:01:00.0: disp:    0340: 00000100
> > [    9.206017] nouveau 0000:01:00.0: disp:    0344: 00000000
> > [    9.206030] nouveau 0000:01:00.0: disp:    0348: 00000000
> > [    9.206044] nouveau 0000:01:00.0: disp:    034c: 00000000
> > [    9.206054] nouveau 0000:01:00.0: disp: Core - SOR 3:
> > [    9.206065] nouveau 0000:01:00.0: disp:    0360: 00000100
> > [    9.206078] nouveau 0000:01:00.0: disp:    0364: 00000000
> > [    9.206091] nouveau 0000:01:00.0: disp:    0368: 00000000
> > [    9.206104] nouveau 0000:01:00.0: disp:    036c: 00000000
> > [    9.206115] nouveau 0000:01:00.0: disp: Core - WINDOW 0:
> > [    9.206127] nouveau 0000:01:00.0: disp:    1000: 0000000f -> 00000000
> > [    9.206142] nouveau 0000:01:00.0: disp:    1004: 000003b7 -> 0000000f
> > [    9.206156] nouveau 0000:01:00.0: disp:    1008: 00000000
> > [    9.206171] nouveau 0000:01:00.0: disp:    100c: 04000400
> > [    9.206186] nouveau 0000:01:00.0: disp:    1010: 00100000 -> 00117fff
> > [    9.206197] nouveau 0000:01:00.0: disp: Core - WINDOW 1:
> > [    9.206209] nouveau 0000:01:00.0: disp:    1080: 0000000f -> 00000000
> > [    9.206223] nouveau 0000:01:00.0: disp:    1084: 000003b7 -> 0000000f
> > [    9.206237] nouveau 0000:01:00.0: disp:    1088: 00000000
> > [    9.206250] nouveau 0000:01:00.0: disp:    108c: 04000400
> > [    9.206265] nouveau 0000:01:00.0: disp:    1090: 00100000 -> 00117fff
> > [    9.206275] nouveau 0000:01:00.0: disp: Core - WINDOW 2:
> > [    9.206287] nouveau 0000:01:00.0: disp:    1100: 0000000f -> 00000001
> > [    9.206300] nouveau 0000:01:00.0: disp:    1104: 000003b7 -> 0000000f
> > [    9.206313] nouveau 0000:01:00.0: disp:    1108: 00000000
> > [    9.206327] nouveau 0000:01:00.0: disp:    110c: 04000400
> > [    9.206341] nouveau 0000:01:00.0: disp:    1110: 00100000 -> 00117fff
> > [    9.206351] nouveau 0000:01:00.0: disp: Core - WINDOW 3:
> > [    9.206362] nouveau 0000:01:00.0: disp:    1180: 0000000f -> 00000001
> > [    9.206375] nouveau 0000:01:00.0: disp:    1184: 000003b7 -> 0000000f
> > [    9.206389] nouveau 0000:01:00.0: disp:    1188: 00000000
> > [    9.206403] nouveau 0000:01:00.0: disp:    118c: 04000400
> > [    9.206417] nouveau 0000:01:00.0: disp:    1190: 00100000 -> 00117fff
> > [    9.206427] nouveau 0000:01:00.0: disp: Core - WINDOW 4:
> > [    9.206440] nouveau 0000:01:00.0: disp:    1200: 0000000f -> 00000002
> > [    9.206455] nouveau 0000:01:00.0: disp:    1204: 000003b7 -> 0000000f
> > [    9.206469] nouveau 0000:01:00.0: disp:    1208: 00000000
> > [    9.206481] nouveau 0000:01:00.0: disp:    120c: 04000400
> > [    9.206495] nouveau 0000:01:00.0: disp:    1210: 00100000 -> 00117fff
> > [    9.206505] nouveau 0000:01:00.0: disp: Core - WINDOW 5:
> > [    9.206517] nouveau 0000:01:00.0: disp:    1280: 0000000f -> 00000002
> > [    9.206531] nouveau 0000:01:00.0: disp:    1284: 000003b7 -> 0000000f
> > [    9.206544] nouveau 0000:01:00.0: disp:    1288: 00000000
> > [    9.206558] nouveau 0000:01:00.0: disp:    128c: 04000400
> > [    9.206571] nouveau 0000:01:00.0: disp:    1290: 00100000 -> 00117fff
> > [    9.206582] nouveau 0000:01:00.0: disp: Core - WINDOW 6:
> > [    9.206594] nouveau 0000:01:00.0: disp:    1300: 0000000f -> 00000003
> > [    9.206607] nouveau 0000:01:00.0: disp:    1304: 000003b7 -> 0000000f
> > [    9.206620] nouveau 0000:01:00.0: disp:    1308: 00000000
> > [    9.206635] nouveau 0000:01:00.0: disp:    130c: 04000400
> > [    9.206650] nouveau 0000:01:00.0: disp:    1310: 00100000 -> 00117fff
> > [    9.206660] nouveau 0000:01:00.0: disp: Core - WINDOW 7:
> > [    9.206672] nouveau 0000:01:00.0: disp:    1380: 0000000f -> 00000003
> > [    9.206685] nouveau 0000:01:00.0: disp:    1384: 000003b7 -> 0000000f
> > [    9.206699] nouveau 0000:01:00.0: disp:    1388: 00000000
> > [    9.206713] nouveau 0000:01:00.0: disp:    138c: 04000400
> > [    9.206727] nouveau 0000:01:00.0: disp:    1390: 00100000 -> 00117fff
> > [    9.206737] nouveau 0000:01:00.0: disp: Core - HEAD 0:
> > [    9.206748] nouveau 0000:01:00.0: disp:    2000: 00000000
> > [    9.206762] nouveau 0000:01:00.0: disp:    2004: fc000040
> > [    9.206776] nouveau 0000:01:00.0: disp:    2008: 00000180
> > [    9.206790] nouveau 0000:01:00.0: disp:    200c: 00000000
> > [    9.206804] nouveau 0000:01:00.0: disp:    2014: 00000011
> > [    9.206818] nouveau 0000:01:00.0: disp:    2018: 00000000
> > [    9.206832] nouveau 0000:01:00.0: disp:    201c: 00000000
> > [    9.206846] nouveau 0000:01:00.0: disp:    2020: 00000000
> > [    9.206860] nouveau 0000:01:00.0: disp:    2028: 00000000
> > [    9.206874] nouveau 0000:01:00.0: disp:    202c: 04000400
> > [    9.206889] nouveau 0000:01:00.0: disp:    2030: 00001000
> > [    9.206903] nouveau 0000:01:00.0: disp:    2038: 00000001
> > [    9.206918] nouveau 0000:01:00.0: disp:    203c: 00000005
> > [    9.206933] nouveau 0000:01:00.0: disp:    2048: 00000000
> > [    9.206947] nouveau 0000:01:00.0: disp:    204c: 00000000
> > [    9.206960] nouveau 0000:01:00.0: disp:    2050: 00000000
> > [    9.206973] nouveau 0000:01:00.0: disp:    2054: 00000000
> > [    9.206986] nouveau 0000:01:00.0: disp:    2058: 00000000
> > [    9.206999] nouveau 0000:01:00.0: disp:    205c: 00000000
> > [    9.207013] nouveau 0000:01:00.0: disp:    2060: 00000000
> > [    9.207027] nouveau 0000:01:00.0: disp:    2064: 00050008
> > [    9.207041] nouveau 0000:01:00.0: disp:    2068: 00000000
> > [    9.207055] nouveau 0000:01:00.0: disp:    206c: 00010003
> > [    9.207069] nouveau 0000:01:00.0: disp:    2070: 00030004
> > [    9.207083] nouveau 0000:01:00.0: disp:    2074: 00000001
> > [    9.207098] nouveau 0000:01:00.0: disp:    2078: 00000000
> > [    9.207112] nouveau 0000:01:00.0: disp:    207c: 00000000
> > [    9.207127] nouveau 0000:01:00.0: disp:    2080: 00000000
> > [    9.207141] nouveau 0000:01:00.0: disp:    2088: 00000000
> > [    9.207156] nouveau 0000:01:00.0: disp:    2090: 00000000
> > [    9.207170] nouveau 0000:01:00.0: disp:    209c: 000000e9
> > [    9.207185] nouveau 0000:01:00.0: disp:    20a0: 000002ff
> > [    9.207200] nouveau 0000:01:00.0: disp:    20a4: 00000000
> > [    9.207212] nouveau 0000:01:00.0: disp:    20a8: 00000000
> > [    9.207225] nouveau 0000:01:00.0: disp:    20ac: 00000000
> > [    9.207239] nouveau 0000:01:00.0: disp:    218c: 00000000
> > [    9.207252] nouveau 0000:01:00.0: disp:    2194: 00000000
> > [    9.207266] nouveau 0000:01:00.0: disp:    2198: 00000000
> > [    9.207279] nouveau 0000:01:00.0: disp:    219c: 00000000
> > [    9.207292] nouveau 0000:01:00.0: disp:    21a0: 00000000
> > [    9.207307] nouveau 0000:01:00.0: disp:    21a4: 00000000
> > [    9.207320] nouveau 0000:01:00.0: disp:    2214: 00000000
> > [    9.207332] nouveau 0000:01:00.0: disp:    2218: 00010002
> > [    9.207343] nouveau 0000:01:00.0: disp: Core - HEAD 1:
> > [    9.207355] nouveau 0000:01:00.0: disp:    2400: 00000000
> > [    9.207369] nouveau 0000:01:00.0: disp:    2404: fc000040
> > [    9.207382] nouveau 0000:01:00.0: disp:    2408: 00000180
> > [    9.207396] nouveau 0000:01:00.0: disp:    240c: 00000000
> > [    9.207410] nouveau 0000:01:00.0: disp:    2414: 00000011
> > [    9.207425] nouveau 0000:01:00.0: disp:    2418: 00000000
> > [    9.207438] nouveau 0000:01:00.0: disp:    241c: 00000000
> > [    9.207451] nouveau 0000:01:00.0: disp:    2420: 00000000
> > [    9.207463] nouveau 0000:01:00.0: disp:    2428: 00000000
> > [    9.207476] nouveau 0000:01:00.0: disp:    242c: 04000400
> > [    9.207490] nouveau 0000:01:00.0: disp:    2430: 00001000
> > [    9.207504] nouveau 0000:01:00.0: disp:    2438: 00000001
> > [    9.207518] nouveau 0000:01:00.0: disp:    243c: 00000005
> > [    9.207531] nouveau 0000:01:00.0: disp:    2448: 00000000
> > [    9.207545] nouveau 0000:01:00.0: disp:    244c: 00000000
> > [    9.207559] nouveau 0000:01:00.0: disp:    2450: 00000000
> > [    9.207573] nouveau 0000:01:00.0: disp:    2454: 00000000
> > [    9.207587] nouveau 0000:01:00.0: disp:    2458: 00000000
> > [    9.207600] nouveau 0000:01:00.0: disp:    245c: 00000000
> > [    9.207613] nouveau 0000:01:00.0: disp:    2460: 00000000
> > [    9.207626] nouveau 0000:01:00.0: disp:    2464: 00050008
> > [    9.207640] nouveau 0000:01:00.0: disp:    2468: 00000000
> > [    9.207654] nouveau 0000:01:00.0: disp:    246c: 00010003
> > [    9.207668] nouveau 0000:01:00.0: disp:    2470: 00030004
> > [    9.207681] nouveau 0000:01:00.0: disp:    2474: 00000001
> > [    9.207695] nouveau 0000:01:00.0: disp:    2478: 00000000
> > [    9.207709] nouveau 0000:01:00.0: disp:    247c: 00000000
> > [    9.207724] nouveau 0000:01:00.0: disp:    2480: 00000000
> > [    9.207738] nouveau 0000:01:00.0: disp:    2488: 00000000
> > [    9.207753] nouveau 0000:01:00.0: disp:    2490: 00000000
> > [    9.207766] nouveau 0000:01:00.0: disp:    249c: 000000e9
> > [    9.207781] nouveau 0000:01:00.0: disp:    24a0: 000002ff
> > [    9.207794] nouveau 0000:01:00.0: disp:    24a4: 00000000
> > [    9.207807] nouveau 0000:01:00.0: disp:    24a8: 00000000
> > [    9.207821] nouveau 0000:01:00.0: disp:    24ac: 00000000
> > [    9.207834] nouveau 0000:01:00.0: disp:    258c: 00000000
> > [    9.207848] nouveau 0000:01:00.0: disp:    2594: 00000000
> > [    9.207861] nouveau 0000:01:00.0: disp:    2598: 00000000
> > [    9.207875] nouveau 0000:01:00.0: disp:    259c: 00000000
> > [    9.207888] nouveau 0000:01:00.0: disp:    25a0: 00000000
> > [    9.207901] nouveau 0000:01:00.0: disp:    25a4: 00000000
> > [    9.207914] nouveau 0000:01:00.0: disp:    2614: 00000000
> > [    9.207927] nouveau 0000:01:00.0: disp:    2618: 00010002
> > [    9.207937] nouveau 0000:01:00.0: disp: Core - HEAD 2:
> > [    9.207949] nouveau 0000:01:00.0: disp:    2800: 00000000
> > [    9.207963] nouveau 0000:01:00.0: disp:    2804: fc000040
> > [    9.207976] nouveau 0000:01:00.0: disp:    2808: 00000180
> > [    9.207991] nouveau 0000:01:00.0: disp:    280c: 00000000
> > [    9.208004] nouveau 0000:01:00.0: disp:    2814: 00000011
> > [    9.208019] nouveau 0000:01:00.0: disp:    2818: 00000000
> > [    9.208031] nouveau 0000:01:00.0: disp:    281c: 00000000
> > [    9.208044] nouveau 0000:01:00.0: disp:    2820: 00000000
> > [    9.208058] nouveau 0000:01:00.0: disp:    2828: 00000000
> > [    9.208071] nouveau 0000:01:00.0: disp:    282c: 04000400
> > [    9.208085] nouveau 0000:01:00.0: disp:    2830: 00001000
> > [    9.208099] nouveau 0000:01:00.0: disp:    2838: 00000001
> > [    9.208113] nouveau 0000:01:00.0: disp:    283c: 00000005
> > [    9.208126] nouveau 0000:01:00.0: disp:    2848: 00000000
> > [    9.208140] nouveau 0000:01:00.0: disp:    284c: 00000000
> > [    9.208153] nouveau 0000:01:00.0: disp:    2850: 00000000
> > [    9.208165] nouveau 0000:01:00.0: disp:    2854: 00000000
> > [    9.208178] nouveau 0000:01:00.0: disp:    2858: 00000000
> > [    9.208191] nouveau 0000:01:00.0: disp:    285c: 00000000
> > [    9.208205] nouveau 0000:01:00.0: disp:    2860: 00000000
> > [    9.208218] nouveau 0000:01:00.0: disp:    2864: 00050008
> > [    9.208232] nouveau 0000:01:00.0: disp:    2868: 00000000
> > [    9.208246] nouveau 0000:01:00.0: disp:    286c: 00010003
> > [    9.208259] nouveau 0000:01:00.0: disp:    2870: 00030004
> > [    9.208274] nouveau 0000:01:00.0: disp:    2874: 00000001
> > [    9.208289] nouveau 0000:01:00.0: disp:    2878: 00000000
> > [    9.208303] nouveau 0000:01:00.0: disp:    287c: 00000000
> > [    9.208318] nouveau 0000:01:00.0: disp:    2880: 00000000
> > [    9.208332] nouveau 0000:01:00.0: disp:    2888: 00000000
> > [    9.208345] nouveau 0000:01:00.0: disp:    2890: 00000000
> > [    9.208358] nouveau 0000:01:00.0: disp:    289c: 000000e9
> > [    9.208371] nouveau 0000:01:00.0: disp:    28a0: 000002ff
> > [    9.208385] nouveau 0000:01:00.0: disp:    28a4: 00000000
> > [    9.208398] nouveau 0000:01:00.0: disp:    28a8: 00000000
> > [    9.208412] nouveau 0000:01:00.0: disp:    28ac: 00000000
> > [    9.208425] nouveau 0000:01:00.0: disp:    298c: 00000000
> > [    9.208439] nouveau 0000:01:00.0: disp:    2994: 00000000
> > [    9.208452] nouveau 0000:01:00.0: disp:    2998: 00000000
> > [    9.208465] nouveau 0000:01:00.0: disp:    299c: 00000000
> > [    9.208478] nouveau 0000:01:00.0: disp:    29a0: 00000000
> > [    9.208491] nouveau 0000:01:00.0: disp:    29a4: 00000000
> > [    9.208504] nouveau 0000:01:00.0: disp:    2a14: 00000000
> > [    9.208517] nouveau 0000:01:00.0: disp:    2a18: 00010002
> > [    9.208528] nouveau 0000:01:00.0: disp: Core - HEAD 3:
> > [    9.208540] nouveau 0000:01:00.0: disp:    2c00: 00000000
> > [    9.208554] nouveau 0000:01:00.0: disp:    2c04: fc000040
> > [    9.208568] nouveau 0000:01:00.0: disp:    2c08: 00000180
> > [    9.208583] nouveau 0000:01:00.0: disp:    2c0c: 00000000
> > [    9.208597] nouveau 0000:01:00.0: disp:    2c14: 00000011
> > [    9.208610] nouveau 0000:01:00.0: disp:    2c18: 00000000
> > [    9.208623] nouveau 0000:01:00.0: disp:    2c1c: 00000000
> > [    9.208636] nouveau 0000:01:00.0: disp:    2c20: 00000000
> > [    9.208650] nouveau 0000:01:00.0: disp:    2c28: 00000000
> > [    9.208664] nouveau 0000:01:00.0: disp:    2c2c: 04000400
> > [    9.208677] nouveau 0000:01:00.0: disp:    2c30: 00001000
> > [    9.208691] nouveau 0000:01:00.0: disp:    2c38: 00000001
> > [    9.208722] nouveau 0000:01:00.0: disp:    2c3c: 00000005
> > [    9.208736] nouveau 0000:01:00.0: disp:    2c48: 00000000
> > [    9.208750] nouveau 0000:01:00.0: disp:    2c4c: 00000000
> > [    9.208764] nouveau 0000:01:00.0: disp:    2c50: 00000000
> > [    9.208777] nouveau 0000:01:00.0: disp:    2c54: 00000000
> > [    9.208790] nouveau 0000:01:00.0: disp:    2c58: 00000000
> > [    9.208803] nouveau 0000:01:00.0: disp:    2c5c: 00000000
> > [    9.208815] nouveau 0000:01:00.0: disp:    2c60: 00000000
> > [    9.208829] nouveau 0000:01:00.0: disp:    2c64: 00050008
> > [    9.208842] nouveau 0000:01:00.0: disp:    2c68: 00000000
> > [    9.208856] nouveau 0000:01:00.0: disp:    2c6c: 00010003
> > [    9.208870] nouveau 0000:01:00.0: disp:    2c70: 00030004
> > [    9.208884] nouveau 0000:01:00.0: disp:    2c74: 00000001
> > [    9.208897] nouveau 0000:01:00.0: disp:    2c78: 00000000
> > [    9.208911] nouveau 0000:01:00.0: disp:    2c7c: 00000000
> > [    9.208925] nouveau 0000:01:00.0: disp:    2c80: 00000000
> > [    9.208940] nouveau 0000:01:00.0: disp:    2c88: 00000000
> > [    9.208954] nouveau 0000:01:00.0: disp:    2c90: 00000000
> > [    9.208969] nouveau 0000:01:00.0: disp:    2c9c: 000000e9
> > [    9.208984] nouveau 0000:01:00.0: disp:    2ca0: 000002ff
> > [    9.208999] nouveau 0000:01:00.0: disp:    2ca4: 00000000
> > [    9.209014] nouveau 0000:01:00.0: disp:    2ca8: 00000000
> > [    9.209029] nouveau 0000:01:00.0: disp:    2cac: 00000000
> > [    9.209043] nouveau 0000:01:00.0: disp:    2d8c: 00000000
> > [    9.209058] nouveau 0000:01:00.0: disp:    2d94: 00000000
> > [    9.209073] nouveau 0000:01:00.0: disp:    2d98: 00000000
> > [    9.209087] nouveau 0000:01:00.0: disp:    2d9c: 00000000
> > [    9.209099] nouveau 0000:01:00.0: disp:    2da0: 00000000
> > [    9.209112] nouveau 0000:01:00.0: disp:    2da4: 00000000
> > [    9.209126] nouveau 0000:01:00.0: disp:    2e14: 00000000
> > [    9.209139] nouveau 0000:01:00.0: disp:    2e18: 00010002
> > [    9.209388] nouveau 0000:01:00.0: disp: supervisor 2: 00000010
> > [    9.209413] nouveau 0000:01:00.0: disp: head-0: 00000000
> > [    9.209426] nouveau 0000:01:00.0: disp: head-1: 00000000
> > [    9.209437] nouveau 0000:01:00.0: disp: head-2: 00000000
> > [    9.209448] nouveau 0000:01:00.0: disp: head-3: 00000000
> > [    9.209619] nouveau 0000:01:00.0: disp: supervisor 3: 00000010
> > [    9.209643] nouveau 0000:01:00.0: disp: head-0: 00000000
> > [    9.209656] nouveau 0000:01:00.0: disp: head-1: 00000000
> > [    9.209668] nouveau 0000:01:00.0: disp: head-2: 00000000
> > [    9.209679] nouveau 0000:01:00.0: disp: head-3: 00000000
> > [    9.210852] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1
> > [    9.210885] nouveau 0000:01:00.0: DRM: Disabling PCI power management to avoid bug
> > [    9.212755] usb 1-8: new high-speed USB device number 3 using xhci_hcd
> > [    9.296013] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
> > [    9.382897] nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
> > [   18.460917] nouveau 0000:01:00.0: disp: suspend running...
> > [   18.461005] nouveau 0000:01:00.0: disp: suspend completed in 41us
> > [   18.561101] ------------[ cut here ]------------
> > [   18.561138] nouveau 0000:01:00.0: timeout
> > [   18.561181] WARNING: CPU: 15 PID: 220 at drivers/gpu/drm/nouveau/nvkm/falcon/v1.c:247 nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
> > [   18.561300] Modules linked in: dm_crypt trusted tpm rng_core dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx multipath sata_sil24 r8169 realtek mdio_devres libphy mii hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel xhci_pci rtsx_pci_sdmmc nouveau ghash_clmulni_intel xhci_hcd mmc_core e1000e i2c_designware_platform mxm_wmi i2c_designware_core hwmon ptp aesni_intel intel_lpss_pci drm_ttm_helper i2c_i801 crypto_simd intel_lpss i2c_smbus psmouse i915 cryptd pps_core thunderbolt rtsx_pci idma64 usbcore ttm i2c_nvidia_gpu thermal wmi battery
> > [   18.561636] CPU: 15 PID: 220 Comm: kworker/15:2 Tainted: G     U            5.12.1-amd64-preempt-sysrq-20190817 #1
> > [   18.561707] Hardware name: LENOVO 20QRS00200/20QRS00200, BIOS N2NET40W (1.25 ) 08/26/2020
> > [   18.561765] Workqueue: pm pm_runtime_work
> > [   18.561799] RIP: 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
> > [   18.561874] Code: 8b 40 10 48 8b 78 10 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 eb 5c 43 e2 4c 89 e2 48 c7 c7 ef 95 33 c1 48 89 c6 e8 c4 b2 6e e2 <0f> 0b 85 db b8 00 00 00 00 0f 4e c3 48 8b 4c 24 28 65 48 2b 0c 25
> > [   18.561995] RSP: 0018:ffffb518007a7b08 EFLAGS: 00010286
> > [   18.562035] RAX: 0000000000000000 RBX: ffffffffffffff92 RCX: 0000000000000003
> > [   18.562086] RDX: 0000000000000850 RSI: 0000000000000001 RDI: ffffffffa4b25bac
> > [   18.562136] RBP: ffff89e351f0a058 R08: 0000000000000003 R09: 0000000000000001
> > [   18.562187] R10: 0000000000aaaaaa R11: ffffb51821e14440 R12: ffff89e34291c5a0
> > [   18.562238] R13: 0000000000000000 R14: ffff89e355782e00 R15: ffff89e3524cb000
> > [   18.562289] FS:  0000000000000000(0000) GS:ffff89f25c7c0000(0000) knlGS:0000000000000000
> > [   18.562345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   18.562388] CR2: 000055ec245a00a8 CR3: 0000000545410006 CR4: 00000000003706e0
> > [   18.562439] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [   18.562491] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [   18.562542] Call Trace:
> > [   18.562569]  gm200_acr_hsfw_boot+0xc4/0x168 [nouveau]
> > [   18.562636]  nvkm_acr_hsf_boot+0xad/0x115 [nouveau]
> > [   18.565673]  nvkm_acr_fini+0x22/0x30 [nouveau]
> > [   18.568732]  nvkm_subdev_fini+0xb8/0xff [nouveau]
> > [   18.571775]  nvkm_device_fini+0x8b/0x178 [nouveau]
> > [   18.574834]  nvkm_udevice_fini+0x34/0x55 [nouveau]
> > [   18.577872]  nvkm_object_fini+0xeb/0x1d6 [nouveau]
> > [   18.580862]  nvkm_object_fini+0x8d/0x1d6 [nouveau]
> > [   18.584095]  nouveau_do_suspend+0x1fe/0x26f [nouveau]
> > [   18.587135]  nouveau_pmops_runtime_suspend+0x46/0x82 [nouveau]
> > [   18.590097]  pci_pm_runtime_suspend+0x5e/0x155
> > [   18.593013]  ? pci_pm_thaw_noirq+0x62/0x62
> > [   18.595914]  ? pci_pm_thaw_noirq+0x62/0x62
> > [   18.598802]  __rpm_callback+0x75/0xdb
> > [   18.601654]  ? pci_pm_thaw_noirq+0x62/0x62
> > [   18.604491]  rpm_callback+0x55/0x6b
> > [   18.607317]  rpm_suspend+0x2a6/0x4af
> > [   18.610117]  ? __raw_spin_unlock_irq+0x8/0x17
> > [   18.612901]  ? finish_task_switch.isra.0+0x136/0x214
> > [   18.615673]  pm_runtime_work+0x77/0x81
> > [   18.618428]  process_one_work+0x1ea/0x2e0
> > [   18.621156]  worker_thread+0x19c/0x240
> > [   18.624140]  ? rescuer_thread+0x294/0x294
> > [   18.626886]  kthread+0x10c/0x114
> > [   18.629567]  ? kthread_create_worker_on_cpu+0x65/0x65
> > [   18.632253]  ret_from_fork+0x1f/0x30
> > [   18.634949] ---[ end trace a858a74de695aa08 ]---
> > [   18.637620] nouveau 0000:01:00.0: acr: unload binary failed
> > [   18.913087] nouveau 0000:01:00.0: saving config space at offset 0x0 (reading 0x1eb610de)
> > [   18.913091] nouveau 0000:01:00.0: saving config space at offset 0x4 (reading 0x100407)
> > [   18.913093] nouveau 0000:01:00.0: saving config space at offset 0x8 (reading 0x30000a1)
> > [   18.913095] nouveau 0000:01:00.0: saving config space at offset 0xc (reading 0x800000)
> > [   18.913097] nouveau 0000:01:00.0: saving config space at offset 0x10 (reading 0xcd000000)
> > [   18.913099] nouveau 0000:01:00.0: saving config space at offset 0x14 (reading 0xa000000c)
> > [   18.913102] nouveau 0000:01:00.0: saving config space at offset 0x18 (reading 0x0)
> > [   18.913104] nouveau 0000:01:00.0: saving config space at offset 0x1c (reading 0xb000000c)
> > [   18.913106] nouveau 0000:01:00.0: saving config space at offset 0x20 (reading 0x0)
> > [   18.913108] nouveau 0000:01:00.0: saving config space at offset 0x24 (reading 0x2001)
> > [   18.913111] nouveau 0000:01:00.0: saving config space at offset 0x28 (reading 0x0)
> > [   18.913113] nouveau 0000:01:00.0: saving config space at offset 0x2c (reading 0x229b17aa)
> > [   18.913115] nouveau 0000:01:00.0: saving config space at offset 0x30 (reading 0xfff80000)
> > [   18.913117] nouveau 0000:01:00.0: saving config space at offset 0x34 (reading 0x60)
> > [   18.913119] nouveau 0000:01:00.0: saving config space at offset 0x38 (reading 0x0)
> > [   18.913122] nouveau 0000:01:00.0: saving config space at offset 0x3c (reading 0x1ff)
> > [   18.913179] nouveau 0000:01:00.0: power state changed by ACPI to D3cold
> > [   43.064748] nouveau 0000:01:00.0: power state changed by ACPI to D0
> > [   43.064836] nouveau 0000:01:00.0: restoring config space at offset 0x3c (was 0x100, writing 0x1ff)
> > [   43.064845] nouveau 0000:01:00.0: restoring config space at offset 0x30 (was 0x0, writing 0xfff80000)
> > [   43.064853] nouveau 0000:01:00.0: restoring config space at offset 0x24 (was 0x1, writing 0x2001)
> > [   43.064860] nouveau 0000:01:00.0: restoring config space at offset 0x1c (was 0xc, writing 0xb000000c)
> > [   43.064868] nouveau 0000:01:00.0: restoring config space at offset 0x14 (was 0xc, writing 0xa000000c)
> > [   43.064874] nouveau 0000:01:00.0: restoring config space at offset 0x10 (was 0x0, writing 0xcd000000)
> > [   43.064883] nouveau 0000:01:00.0: restoring config space at offset 0x4 (was 0x100000, writing 0x100407)
> > [   43.065008] nouveau 0000:01:00.0: disp: preinit running...
> > [   43.065038] nouveau 0000:01:00.0: disp: preinit completed in 0us
> > [   43.065200] nouveau 0000:01:00.0: disp: fini running...
> > [   43.065226] nouveau 0000:01:00.0: disp: fini completed in 2us
> > [   43.073510] nouveau 0000:01:00.0: fifo: fault 01 [VIRT_WRITE] at 00000000003b1000 engine c0 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 02 [PTE] on channel -1 [01ffedf000 unknown]
> > [   43.073579] nouveau 0000:01:00.0: fifo: fault 00 [VIRT_READ] at 0000000000000000 engine 0e [sec2] client 16 [HUB/SEC] reason 00 [PDE] on channel -1 [01ffe5d000 unknown]
> > [   43.073616] nouveau 0000:01:00.0: fifo: runlist 3: scheduled for recovery
> > [   43.073636] nouveau 0000:01:00.0: fifo: engine 3: scheduled for recovery
> > [   43.173456] ------------[ cut here ]------------
> > [   43.173477] nouveau 0000:01:00.0: timeout
> > [   43.173533] WARNING: CPU: 9 PID: 1468 at drivers/gpu/drm/nouveau/nvkm/falcon/v1.c:247 nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
> > [   43.173614] Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops btusb videobuf2_v4l2 btrtl videobuf2_common btbcm btintel videodev bluetooth mc ecdh_generic ecc iwlmvm mac80211 libarc4 mei_hdcp x86_pkg_temp_thermal intel_powerclamp kvm_intel nls_utf8 snd_hda_codec_conexant snd_hda_codec_generic nls_cp437 kvm snd_hda_intel(+) vfat snd_intel_dspcfg iwlwifi fat irqbypass snd_hda_codec squashfs input_leds joydev rapl deflate serio_raw intel_cstate efi_pstore pcspkr snd_hda_core iTCO_wdt wmi_bmof intel_wmi_thunderbolt tpm_crb cfg80211 iTCO_vendor_support ee1004 8250_dw snd_hwdep processor_thermal_device processor_thermal_rfim ucsi_ccg(+) snd_pcm sg ucsi_acpi thinkpad_acpi nvidiafb typec_ucsi vgastate mei_me processor_thermal_mbox typec intel_pch_thermal fb_ddc tpm_tis intel_soc_dts_iosf snd_timer nvram roles tpm_tis_core platform_profile ledtrig_audio snd soundcore rfkill int3403_thermal ac int340x_thermal_zone evdev int3400_thermal acpi_thermal_rel acpi_pad loop configs
  c
>  oretemp
> > [   43.173670]  msr fuse nfsd auth_rpcgss nfs_acl lockd grace sunrpc nfs_ssc ip_tables x_tables autofs4 essiv authenc dm_crypt trusted tpm rng_core dm_mod raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx multipath sata_sil24 r8169 realtek mdio_devres libphy mii hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel xhci_pci rtsx_pci_sdmmc nouveau ghash_clmulni_intel xhci_hcd mmc_core e1000e i2c_designware_platform mxm_wmi i2c_designware_core hwmon ptp aesni_intel intel_lpss_pci drm_ttm_helper i2c_i801 crypto_simd intel_lpss i2c_smbus psmouse i915 cryptd pps_core thunderbolt rtsx_pci idma64 usbcore ttm i2c_nvidia_gpu thermal wmi battery
> > [   43.173970] CPU: 9 PID: 1468 Comm: kworker/9:3 Tainted: G     U  W         5.12.1-amd64-preempt-sysrq-20190817 #1
> > [   43.174001] Hardware name: LENOVO 20QRS00200/20QRS00200, BIOS N2NET40W (1.25 ) 08/26/2020
> > [   43.174022] Workqueue: pm pm_runtime_work
> > [   43.174038] RIP: 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau]
> > [   43.174296] Code: 8b 40 10 48 8b 78 10 4c 8b 67 50 4d 85 e4 75 03 4c 8b 27 e8 eb 5c 43 e2 4c 89 e2 48 c7 c7 ef 95 33 c1 48 89 c6 e8 c4 b2 6e e2 <0f> 0b 85 db b8 00 00 00 00 0f 4e c3 48 8b 4c 24 28 65 48 2b 0c 25
> > [   43.174336] RSP: 0018:ffffb51800eb39f0 EFLAGS: 00010286
> > [   43.174351] RAX: 0000000000000000 RBX: ffffffffffffff92 RCX: 0000000000000027
> > [   43.174370] RDX: 0000000000000027 RSI: 0000000000000001 RDI: ffff89f25c658590
> > [   43.174388] RBP: ffff89e351f09898 R08: 0000000000000003 R09: 0000000000000001
> > [   43.174407] R10: 0000000000aaaaaa R11: ffffb5182251c420 R12: ffff89e34291c5a0
> > [   43.178020] R13: 0000000000000000 R14: ffff89e355782e00 R15: ffff89e3524cb000
> > [   43.180638] FS:  0000000000000000(0000) GS:ffff89f25c640000(0000) knlGS:0000000000000000
> > [   43.183418] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   43.186150] CR2: 00007f2e4be0e1af CR3: 0000000109928001 CR4: 00000000003706e0
> > [   43.188876] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [   43.191778] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [   43.194511] Call Trace:
> > [   43.197753]  gm200_acr_hsfw_boot+0xc4/0x168 [nouveau]
> > [   43.203012]  nvkm_acr_hsf_boot+0xad/0x115 [nouveau]
> > [   43.205781]  tu102_acr_init+0x16/0x2d [nouveau]
> > [   43.208502]  nvkm_acr_load+0x62/0x135 [nouveau]
> > [   43.211256]  ? timekeeping_get_ns+0x1c/0x32
> > [   43.216266]  nvkm_subdev_init+0x100/0x175 [nouveau]
> > [   43.222767]  nvkm_device_init+0x150/0x203 [nouveau]
> > [   43.230884]  nvkm_udevice_init+0x31/0x4b [nouveau]
> > [   43.234889]  nvkm_object_init+0x75/0x15f [nouveau]
> > [   43.237646]  nvkm_object_init+0x9e/0x15f [nouveau]
> > [   43.240283]  nvkm_object_init+0x9e/0x15f [nouveau]
> > [   43.242977]  nouveau_do_resume+0x4b/0x170 [nouveau]
> > [   43.245737]  nouveau_pmops_runtime_resume+0x76/0x12d [nouveau]
> > [   43.248416]  pci_pm_runtime_resume+0x75/0x80
> > [   43.251095]  ? pci_pm_restore+0x7a/0x7a
> > [   43.253750]  ? pci_pm_restore+0x7a/0x7a
> > [   43.256355]  __rpm_callback+0x75/0xdb
> > [   43.259020]  ? pci_pm_restore+0x7a/0x7a
> > [   43.261687]  rpm_callback+0x55/0x6b
> > [   43.264269]  ? pci_pm_restore+0x7a/0x7a
> > [   43.267104]  rpm_resume+0x376/0x47d
> > [   43.269799]  ? __schedule+0x5de/0x632
> > [   43.272370]  __pm_runtime_resume+0x5a/0x76
> > [   43.277743]  ? pci_pm_restore+0x7a/0x7a
> > [   43.281006]  rpm_get_suppliers+0x39/0x70
> > [   43.283602]  ? pci_pm_restore+0x7a/0x7a
> > [   43.286254]  __rpm_callback+0x59/0xdb
> > [   43.288886]  ? pci_pm_restore+0x7a/0x7a
> > [   43.296391]  rpm_callback+0x55/0x6b
> > [   43.300273]  ? pci_pm_restore+0x7a/0x7a
> > [   43.302811]  rpm_resume+0x376/0x47d
> > [   43.305372]  ? try_to_wake_up+0x1e8/0x2df
> > [   43.307844]  pm_runtime_work+0x5f/0x81
> > [   43.310390]  process_one_work+0x1ea/0x2e0
> > [   43.312937]  worker_thread+0x19c/0x240
> > [   43.315389]  ? rescuer_thread+0x294/0x294
> > [   43.317920]  kthread+0x10c/0x114
> > [   43.320392]  ? kthread_create_worker_on_cpu+0x65/0x65
> > [   43.322938]  ret_from_fork+0x1f/0x30
> > [   43.325469] ---[ end trace a858a74de695aa09 ]---
> > [   43.327909] nouveau 0000:01:00.0: acr: AHESASC binary failed
> > [   43.330611] nouveau 0000:01:00.0: acr: init failed, -110
> > [   43.333198] nouveau 0000:01:00.0: disp: fini running...
> > [   43.335614] nouveau 0000:01:00.0: disp: fini completed in 23us
> > [   43.340415] nouveau 0000:01:00.0: disp: fini running...
> > [   43.344006] nouveau 0000:01:00.0: disp: fini completed in 1us
> > [   43.346565] nouveau 0000:01:00.0: init failed with -110
> > [   43.349003] nouveau: systemd-udevd[290]:00000000:00000080: init failed with -110
> > [   43.351417] nouveau: DRM-master:00000000:00000000: init failed with -110
> > [   43.354505] nouveau: DRM-master:00000000:00000000: init failed with -110
> > [   43.362121] nouveau 0000:01:00.0: DRM: Client resume failed with error: -110
> > [   43.368650] nouveau 0000:01:00.0: DRM: resume failed with: -110
> > [   43.374973] snd_hda_intel 0000:01:00.1: runtime IRQ mapping not provided by arch
> > [   43.375016] snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
> > [   43.377906] snd_hda_intel 0000:01:00.1: Disabling MSI
> > [   43.380469] snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client
> > [   43.383361] snd_hda_intel 0000:01:00.1: VGA controller is disabled
> > [   43.386078] snd_hda_intel 0000:01:00.1: Delaying initialization
> > --
> > "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> >
> > Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2021-05-25  3:14 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-04 12:39 [PATCH v2 0/2] PCI: Add missing link delays Mika Westerberg
2019-10-04 12:39 ` [PATCH v2 1/2] PCI: Introduce pcie_wait_for_link_delay() Mika Westerberg
2020-08-08 20:22   ` Marc MERLIN
2020-08-08 20:23     ` Marc MERLIN
2020-08-09 16:31     ` Marc MERLIN
2020-09-06 18:18     ` pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN
2020-09-06 18:18       ` Marc MERLIN
2020-09-06 18:26       ` Matthias Andree
2020-09-07 19:14       ` [Nouveau] " Karol Herbst
2020-09-07 19:14         ` Karol Herbst
2020-09-07 20:58         ` [Nouveau] " Marc MERLIN
2020-09-07 20:58           ` Marc MERLIN
2020-09-07 23:51           ` [Nouveau] " Karol Herbst
2020-09-07 23:51             ` Karol Herbst
2020-09-08  0:29             ` [Nouveau] " Marc MERLIN
2020-05-29 18:03               ` 5.5 kernel: using nouveau or something else just long enough to turn off Quadro RTX 4000 Mobile for hybrid graphics? Marc MERLIN
     [not found]                 ` <20200529180315.GA18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-05-29 18:53                   ` Ilia Mirkin
     [not found]                     ` <CAKb7Uvhw2EYo1RR-=NGgLO3CU9QTRWchcAw1injffybZbJ-zOA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-05-29 19:46                       ` Marc MERLIN
     [not found]                         ` <20200529194605.GB18804-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-05-30 17:32                           ` Karol Herbst
     [not found]                       ` <CACO55tsvY0t_z986VVoYCvxuBASdZ+rQcDtZ_dAtQR60NLmQQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-05-31 18:31                         ` Marc MERLIN
2020-12-26 11:12                 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN
2020-12-26 11:12                   ` Marc MERLIN
2020-12-27 18:28                   ` [Nouveau] " Ilia Mirkin
2020-12-27 18:28                     ` Ilia Mirkin
2021-01-27 21:33                   ` Bjorn Helgaas
2021-01-27 21:33                     ` Bjorn Helgaas
2021-01-28 20:59                     ` Bjorn Helgaas
2021-01-28 20:59                       ` [Nouveau] " Bjorn Helgaas
2021-01-29  0:56                     ` Marc MERLIN
2021-01-29  0:56                       ` [Nouveau] " Marc MERLIN
2021-01-29 21:20                       ` Bjorn Helgaas
2021-01-29 21:20                         ` [Nouveau] " Bjorn Helgaas
2021-01-30  2:04                         ` Marc MERLIN
2021-01-30  2:04                           ` [Nouveau] " Marc MERLIN
2021-05-05 21:42                           ` [Nouveau] 5.12.1 0010:nvkm_falcon_v1_wait_for_halt+0x8f/0xb9 [nouveau] Marc MERLIN
2021-05-06 14:50                             ` Bjorn Helgaas
2021-05-25  3:13                               ` Ben Skeggs
2020-12-29 15:51                 ` 5.9.11 still hanging 2mn at each boot and looping on nvidia-gpu 0000:01:00.3: PME# enabled (Quadro RTX 4000 Mobile) Marc MERLIN
2020-12-29 15:51                   ` Marc MERLIN
2020-12-29 16:33                   ` Ilia Mirkin
2020-12-29 16:33                     ` Ilia Mirkin
     [not found]                     ` <CAKb7UviFP_YVxC4PO7MDNnw6NDrD=3BCGF37umwAfaimjbX9Pw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2020-12-29 17:47                       ` Marc MERLIN
     [not found]                         ` <20201229174750.GI23389-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2021-01-04 11:49                           ` Marc MERLIN
     [not found]                             ` <20210104114955.GM32533-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2021-01-04 13:28                               ` Karol Herbst
     [not found]                                 ` <CACO55tsdG37YKv7FV2er4hRnXk9vmwMbPuPptA+=ZtziWXC2+g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-01-07 11:49                                   ` Marc MERLIN
2020-12-30 12:16                       ` ael
2020-09-13 20:15               ` [Nouveau] pcieport 0000:00:01.0: PME: Spurious native interrupt (nvidia with nouveau and thunderbolt on thinkpad P73) Marc MERLIN
2020-09-13 20:15                 ` Marc MERLIN
     [not found]                 ` <20200913201545.GL2622-xnduUnryOU1AfugRpC6u6w@public.gmane.org>
2020-09-19 23:18                   ` Marc MERLIN
2019-10-04 12:39 ` [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec Mika Westerberg
2019-10-26 14:19   ` Bjorn Helgaas
2019-10-28 11:28     ` Mika Westerberg
2019-10-28 13:42       ` Bjorn Helgaas
2019-10-28 18:06         ` Mika Westerberg
2019-10-28 20:16           ` Bjorn Helgaas
2019-10-29 11:15             ` Mika Westerberg
2019-10-29 20:27               ` Bjorn Helgaas
2019-10-30 11:15                 ` Mika Westerberg
2019-10-31 22:31                   ` Bjorn Helgaas
2019-11-01 11:19                     ` Mika Westerberg
2019-11-05  0:00                       ` Bjorn Helgaas
2019-11-05  9:54                         ` Mika Westerberg
2019-11-05 12:58                           ` Mika Westerberg
2019-11-05 20:01                             ` Bjorn Helgaas
2019-11-06 13:31                               ` Mika Westerberg
2019-11-05 15:00                           ` Bjorn Helgaas
2019-11-05 15:28                             ` Mika Westerberg
2019-11-05 16:10                               ` Bjorn Helgaas
2019-11-06 13:29                                 ` Mika Westerberg
2019-10-29 20:54   ` Bjorn Helgaas
2019-10-30 11:33     ` Mika Westerberg
2019-10-04 12:57 ` [PATCH v2 0/2] PCI: Add missing link delays Matthias Andree
2019-10-04 13:06   ` Mika Westerberg
2019-10-05  7:34     ` Matthias Andree
2019-10-07  9:32       ` Mika Westerberg
2019-10-07 15:15         ` Matthias Andree
2019-10-08  9:05           ` Mika Westerberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.