linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] Potential fix for runpm issues on various laptops
@ 2019-05-07 20:12 Karol Herbst
  2019-05-07 20:12 ` [PATCH v2 1/4] drm: don't set the pci power state if the pci subsystem handles the ACPI bits Karol Herbst
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Karol Herbst @ 2019-05-07 20:12 UTC (permalink / raw)
  To: nouveau; +Cc: Lyude Paul, linux-pci, Bjorn Helgaas, Karol Herbst

CCing linux-pci and Bjorn Helgaas. Maybe we could get better insights on
how a reasonable fix would look like.

Anyway, to me this entire issue looks like something which has to be fixed
on a PCI level instead of inside a driver, so it makes sense to ask the
pci folks if they have any better suggestions.

Original cover letter:
While investigating the runpm issues on my GP107 I noticed that something
inside devinit makes runpm break. If Nouveau loads up to the point right
before doing devinit, runpm works without any issues, if devinit is ran,
not anymore.

Out of curiousity I even tried to "bisect" devinit by not running it on
vbios provided signed PMU image, but on the devinit parser we have inside
Nouveau.
Allthough this one isn't as feature complete as the vbios one, I was able
to reproduce the runpm issues as well. From that point I was able to only
run a certain amount of commands until I got to some PCIe initialization
code inside devinit which trigger those runpm issues.

Devinit on my GPU was changing the PCIe link from 8.0 to 2.5, reversing
that on the fini path makes runpm work again.

There are a few other things going on, but with my limited knowledge about
PCIe in general, the change in the link speed sounded like it could cause
issues on resume if the controller and the device disagree on the actual
link.

Maybe this is just a bug within the PCI subsystem inside linux instead and
the controller has to be forced to do _something_?

Anyway, with this runpm seems to work nicely on my machine. Secure booting
the gr (even with my workaround applied I need anyway) might fail after
the GPU got runtime resumed though...

Karol Herbst (4):
  drm: don't set the pci power state if the pci subsystem handles the
    ACPI bits
  pci: enable pcie link changes for pascal
  pci: add nvkm_pcie_get_speed
  pci: save the boot pcie link speed and restore it on fini

 drm/nouveau/include/nvkm/subdev/pci.h |  6 +++--
 drm/nouveau/nouveau_acpi.c            |  7 +++++-
 drm/nouveau/nouveau_acpi.h            |  2 ++
 drm/nouveau/nouveau_drm.c             | 14 +++++++++---
 drm/nouveau/nouveau_drv.h             |  2 ++
 drm/nouveau/nvkm/subdev/pci/base.c    |  9 ++++++--
 drm/nouveau/nvkm/subdev/pci/gk104.c   |  8 +++----
 drm/nouveau/nvkm/subdev/pci/gp100.c   | 10 +++++++++
 drm/nouveau/nvkm/subdev/pci/pcie.c    | 32 +++++++++++++++++++++++----
 drm/nouveau/nvkm/subdev/pci/priv.h    |  7 ++++++
 10 files changed, 81 insertions(+), 16 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 1/4] drm: don't set the pci power state if the pci subsystem handles the ACPI bits
  2019-05-07 20:12 [PATCH v2 0/4] Potential fix for runpm issues on various laptops Karol Herbst
@ 2019-05-07 20:12 ` Karol Herbst
  2019-05-08 19:10   ` Lyude Paul
  2019-05-07 20:12 ` [PATCH v2 2/4] pci: enable pcie link changes for pascal Karol Herbst
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Karol Herbst @ 2019-05-07 20:12 UTC (permalink / raw)
  To: nouveau; +Cc: Lyude Paul, linux-pci, Bjorn Helgaas, Karol Herbst

v2: rework detection of if Nouveau calling a DSM method or not

Signed-off-by: Karol Herbst <kherbst@redhat.com>
---
 drm/nouveau/nouveau_acpi.c |  7 ++++++-
 drm/nouveau/nouveau_acpi.h |  2 ++
 drm/nouveau/nouveau_drm.c  | 14 +++++++++++---
 drm/nouveau/nouveau_drv.h  |  2 ++
 4 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drm/nouveau/nouveau_acpi.c b/drm/nouveau/nouveau_acpi.c
index ffb19585..92dfc900 100644
--- a/drm/nouveau/nouveau_acpi.c
+++ b/drm/nouveau/nouveau_acpi.c
@@ -358,6 +358,12 @@ void nouveau_register_dsm_handler(void)
 	vga_switcheroo_register_handler(&nouveau_dsm_handler, 0);
 }
 
+bool nouveau_runpm_calls_dsm(void)
+{
+	return nouveau_dsm_priv.optimus_detected &&
+		!nouveau_dsm_priv.optimus_skip_dsm;
+}
+
 /* Must be called for Optimus models before the card can be turned off */
 void nouveau_switcheroo_optimus_dsm(void)
 {
@@ -371,7 +377,6 @@ void nouveau_switcheroo_optimus_dsm(void)
 
 	nouveau_optimus_dsm(nouveau_dsm_priv.dhandle, NOUVEAU_DSM_OPTIMUS_CAPS,
 		NOUVEAU_DSM_OPTIMUS_SET_POWERDOWN, &result);
-
 }
 
 void nouveau_unregister_dsm_handler(void)
diff --git a/drm/nouveau/nouveau_acpi.h b/drm/nouveau/nouveau_acpi.h
index b86294fc..0f5d7aa0 100644
--- a/drm/nouveau/nouveau_acpi.h
+++ b/drm/nouveau/nouveau_acpi.h
@@ -13,6 +13,7 @@ void nouveau_switcheroo_optimus_dsm(void);
 int nouveau_acpi_get_bios_chunk(uint8_t *bios, int offset, int len);
 bool nouveau_acpi_rom_supported(struct device *);
 void *nouveau_acpi_edid(struct drm_device *, struct drm_connector *);
+bool nouveau_runpm_calls_dsm(void);
 #else
 static inline bool nouveau_is_optimus(void) { return false; };
 static inline bool nouveau_is_v1_dsm(void) { return false; };
@@ -22,6 +23,7 @@ static inline void nouveau_switcheroo_optimus_dsm(void) {}
 static inline bool nouveau_acpi_rom_supported(struct device *dev) { return false; }
 static inline int nouveau_acpi_get_bios_chunk(uint8_t *bios, int offset, int len) { return -EINVAL; }
 static inline void *nouveau_acpi_edid(struct drm_device *dev, struct drm_connector *connector) { return NULL; }
+static inline bool nouveau_runpm_calls_dsm(void) { return false; }
 #endif
 
 #endif
diff --git a/drm/nouveau/nouveau_drm.c b/drm/nouveau/nouveau_drm.c
index 5020265b..09e68e61 100644
--- a/drm/nouveau/nouveau_drm.c
+++ b/drm/nouveau/nouveau_drm.c
@@ -556,6 +556,7 @@ nouveau_drm_device_init(struct drm_device *dev)
 	nouveau_fbcon_init(dev);
 	nouveau_led_init(dev);
 
+	drm->runpm_dsm = nouveau_runpm_calls_dsm();
 	if (nouveau_pmops_runtime()) {
 		pm_runtime_use_autosuspend(dev->dev);
 		pm_runtime_set_autosuspend_delay(dev->dev, 5000);
@@ -903,6 +904,7 @@ nouveau_pmops_runtime_suspend(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct drm_device *drm_dev = pci_get_drvdata(pdev);
+	struct nouveau_drm *drm = nouveau_drm(drm_dev);
 	int ret;
 
 	if (!nouveau_pmops_runtime()) {
@@ -910,12 +912,15 @@ nouveau_pmops_runtime_suspend(struct device *dev)
 		return -EBUSY;
 	}
 
+	drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
 	nouveau_switcheroo_optimus_dsm();
 	ret = nouveau_do_suspend(drm_dev, true);
 	pci_save_state(pdev);
 	pci_disable_device(pdev);
 	pci_ignore_hotplug(pdev);
-	pci_set_power_state(pdev, PCI_D3cold);
+	if (drm->runpm_dsm)
+		pci_set_power_state(pdev, PCI_D3cold);
+
 	drm_dev->switch_power_state = DRM_SWITCH_POWER_DYNAMIC_OFF;
 	return ret;
 }
@@ -925,7 +930,8 @@ nouveau_pmops_runtime_resume(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
 	struct drm_device *drm_dev = pci_get_drvdata(pdev);
-	struct nvif_device *device = &nouveau_drm(drm_dev)->client.device;
+	struct nouveau_drm *drm = nouveau_drm(drm_dev);
+	struct nvif_device *device = &drm->client.device;
 	int ret;
 
 	if (!nouveau_pmops_runtime()) {
@@ -933,7 +939,9 @@ nouveau_pmops_runtime_resume(struct device *dev)
 		return -EBUSY;
 	}
 
-	pci_set_power_state(pdev, PCI_D0);
+	drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
+	if (drm->runpm_dsm)
+		pci_set_power_state(pdev, PCI_D0);
 	pci_restore_state(pdev);
 	ret = pci_enable_device(pdev);
 	if (ret)
diff --git a/drm/nouveau/nouveau_drv.h b/drm/nouveau/nouveau_drv.h
index da847244..941600e9 100644
--- a/drm/nouveau/nouveau_drv.h
+++ b/drm/nouveau/nouveau_drv.h
@@ -214,6 +214,8 @@ struct nouveau_drm {
 	struct nouveau_svm *svm;
 
 	struct nouveau_dmem *dmem;
+
+	bool runpm_dsm;
 };
 
 static inline struct nouveau_drm *
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 2/4] pci: enable pcie link changes for pascal
  2019-05-07 20:12 [PATCH v2 0/4] Potential fix for runpm issues on various laptops Karol Herbst
  2019-05-07 20:12 ` [PATCH v2 1/4] drm: don't set the pci power state if the pci subsystem handles the ACPI bits Karol Herbst
@ 2019-05-07 20:12 ` Karol Herbst
  2019-05-20 21:25   ` Bjorn Helgaas
  2019-05-07 20:12 ` [PATCH v2 3/4] pci: add nvkm_pcie_get_speed Karol Herbst
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 23+ messages in thread
From: Karol Herbst @ 2019-05-07 20:12 UTC (permalink / raw)
  To: nouveau; +Cc: Lyude Paul, linux-pci, Bjorn Helgaas, Karol Herbst

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
---
 drm/nouveau/nvkm/subdev/pci/gk104.c |  8 ++++----
 drm/nouveau/nvkm/subdev/pci/gp100.c | 10 ++++++++++
 drm/nouveau/nvkm/subdev/pci/priv.h  |  5 +++++
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drm/nouveau/nvkm/subdev/pci/gk104.c b/drm/nouveau/nvkm/subdev/pci/gk104.c
index e6803050..66489018 100644
--- a/drm/nouveau/nvkm/subdev/pci/gk104.c
+++ b/drm/nouveau/nvkm/subdev/pci/gk104.c
@@ -23,7 +23,7 @@
  */
 #include "priv.h"
 
-static int
+int
 gk104_pcie_version_supported(struct nvkm_pci *pci)
 {
 	return (nvkm_rd32(pci->subdev.device, 0x8c1c0) & 0x4) == 0x4 ? 2 : 1;
@@ -108,7 +108,7 @@ gk104_pcie_lnkctl_speed(struct nvkm_pci *pci)
 	return -1;
 }
 
-static enum nvkm_pcie_speed
+enum nvkm_pcie_speed
 gk104_pcie_max_speed(struct nvkm_pci *pci)
 {
 	u32 max_speed = nvkm_rd32(pci->subdev.device, 0x8c1c0) & 0x300000;
@@ -146,7 +146,7 @@ gk104_pcie_set_link_speed(struct nvkm_pci *pci, enum nvkm_pcie_speed speed)
 	nvkm_mask(device, 0x8c040, 0x1, 0x1);
 }
 
-static int
+int
 gk104_pcie_init(struct nvkm_pci * pci)
 {
 	enum nvkm_pcie_speed lnkctl_speed, max_speed, cap_speed;
@@ -178,7 +178,7 @@ gk104_pcie_init(struct nvkm_pci * pci)
 	return 0;
 }
 
-static int
+int
 gk104_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
 {
 	struct nvkm_subdev *subdev = &pci->subdev;
diff --git a/drm/nouveau/nvkm/subdev/pci/gp100.c b/drm/nouveau/nvkm/subdev/pci/gp100.c
index 82c5234a..eb19c7a4 100644
--- a/drm/nouveau/nvkm/subdev/pci/gp100.c
+++ b/drm/nouveau/nvkm/subdev/pci/gp100.c
@@ -35,6 +35,16 @@ gp100_pci_func = {
 	.wr08 = nv40_pci_wr08,
 	.wr32 = nv40_pci_wr32,
 	.msi_rearm = gp100_pci_msi_rearm,
+
+	.pcie.init = gk104_pcie_init,
+	.pcie.set_link = gk104_pcie_set_link,
+
+	.pcie.max_speed = gk104_pcie_max_speed,
+	.pcie.cur_speed = g84_pcie_cur_speed,
+
+	.pcie.set_version = gf100_pcie_set_version,
+	.pcie.version = gf100_pcie_version,
+	.pcie.version_supported = gk104_pcie_version_supported,
 };
 
 int
diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
index c17f6063..a0d4c007 100644
--- a/drm/nouveau/nvkm/subdev/pci/priv.h
+++ b/drm/nouveau/nvkm/subdev/pci/priv.h
@@ -54,6 +54,11 @@ int gf100_pcie_cap_speed(struct nvkm_pci *);
 int gf100_pcie_init(struct nvkm_pci *);
 int gf100_pcie_set_link(struct nvkm_pci *, enum nvkm_pcie_speed, u8);
 
+int gk104_pcie_init(struct nvkm_pci *);
+int gk104_pcie_set_link(struct nvkm_pci *, enum nvkm_pcie_speed, u8 width);
+enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
+int gk104_pcie_version_supported(struct nvkm_pci *);
+
 int nvkm_pcie_oneinit(struct nvkm_pci *);
 int nvkm_pcie_init(struct nvkm_pci *);
 #endif
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 3/4] pci: add nvkm_pcie_get_speed
  2019-05-07 20:12 [PATCH v2 0/4] Potential fix for runpm issues on various laptops Karol Herbst
  2019-05-07 20:12 ` [PATCH v2 1/4] drm: don't set the pci power state if the pci subsystem handles the ACPI bits Karol Herbst
  2019-05-07 20:12 ` [PATCH v2 2/4] pci: enable pcie link changes for pascal Karol Herbst
@ 2019-05-07 20:12 ` Karol Herbst
  2019-05-07 20:12 ` [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini Karol Herbst
  2019-05-20 13:23 ` [PATCH v2 0/4] Potential fix for runpm issues on various laptops Karol Herbst
  4 siblings, 0 replies; 23+ messages in thread
From: Karol Herbst @ 2019-05-07 20:12 UTC (permalink / raw)
  To: nouveau; +Cc: Lyude Paul, linux-pci, Bjorn Helgaas, Karol Herbst

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
---
 drm/nouveau/include/nvkm/subdev/pci.h | 1 +
 drm/nouveau/nvkm/subdev/pci/pcie.c    | 8 ++++++++
 2 files changed, 9 insertions(+)

diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
index 23803cc8..1fdf3098 100644
--- a/drm/nouveau/include/nvkm/subdev/pci.h
+++ b/drm/nouveau/include/nvkm/subdev/pci.h
@@ -53,4 +53,5 @@ int gp100_pci_new(struct nvkm_device *, int, struct nvkm_pci **);
 
 /* pcie functions */
 int nvkm_pcie_set_link(struct nvkm_pci *, enum nvkm_pcie_speed, u8 width);
+enum nvkm_pcie_speed nvkm_pcie_get_speed(struct nvkm_pci *);
 #endif
diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
index d71e5db5..70ccbe0d 100644
--- a/drm/nouveau/nvkm/subdev/pci/pcie.c
+++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
@@ -163,3 +163,11 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
 
 	return ret;
 }
+
+enum nvkm_pcie_speed
+nvkm_pcie_get_speed(struct nvkm_pci *pci)
+{
+	if (!pci || !pci_is_pcie(pci->pdev) || !pci->pcie.cur_speed)
+		return -ENODEV;
+	return pci->func->pcie.cur_speed(pci);
+}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-05-07 20:12 [PATCH v2 0/4] Potential fix for runpm issues on various laptops Karol Herbst
                   ` (2 preceding siblings ...)
  2019-05-07 20:12 ` [PATCH v2 3/4] pci: add nvkm_pcie_get_speed Karol Herbst
@ 2019-05-07 20:12 ` Karol Herbst
  2019-05-20 21:19   ` Bjorn Helgaas
  2019-05-20 13:23 ` [PATCH v2 0/4] Potential fix for runpm issues on various laptops Karol Herbst
  4 siblings, 1 reply; 23+ messages in thread
From: Karol Herbst @ 2019-05-07 20:12 UTC (permalink / raw)
  To: nouveau; +Cc: Lyude Paul, linux-pci, Bjorn Helgaas, Karol Herbst

Apperantly things go south if we suspend the device with a different PCIE
link speed set than it got booted with. Fixes runtime suspend on my gp107.

This all looks like some bug inside the pci subsystem and I would prefer a
fix there instead of nouveau, but maybe there is no real nice way of doing
that outside of drivers?

v2: squashed together patch 4 and 5

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
---
 drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
 drm/nouveau/nvkm/subdev/pci/base.c    |  9 +++++++--
 drm/nouveau/nvkm/subdev/pci/pcie.c    | 24 ++++++++++++++++++++----
 drm/nouveau/nvkm/subdev/pci/priv.h    |  2 ++
 4 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
index 1fdf3098..b23793a2 100644
--- a/drm/nouveau/include/nvkm/subdev/pci.h
+++ b/drm/nouveau/include/nvkm/subdev/pci.h
@@ -26,8 +26,9 @@ struct nvkm_pci {
 	} agp;
 
 	struct {
-		enum nvkm_pcie_speed speed;
-		u8 width;
+		enum nvkm_pcie_speed cur_speed;
+		enum nvkm_pcie_speed def_speed;
+		u8 cur_width;
 	} pcie;
 
 	bool msi;
diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
index ee2431a7..d9fb5a83 100644
--- a/drm/nouveau/nvkm/subdev/pci/base.c
+++ b/drm/nouveau/nvkm/subdev/pci/base.c
@@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
 
 	if (pci->agp.bridge)
 		nvkm_agp_fini(pci);
+	else if (pci_is_pcie(pci->pdev))
+		nvkm_pcie_fini(pci);
 
 	return 0;
 }
@@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
 	struct nvkm_pci *pci = nvkm_pci(subdev);
 	if (pci->agp.bridge)
 		nvkm_agp_preinit(pci);
+	else if (pci_is_pcie(pci->pdev))
+		nvkm_pcie_preinit(pci);
 	return 0;
 }
 
@@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
 	pci->func = func;
 	pci->pdev = device->func->pci(device)->pdev;
 	pci->irq = -1;
-	pci->pcie.speed = -1;
-	pci->pcie.width = -1;
+	pci->pcie.cur_speed = -1;
+	pci->pcie.def_speed = -1;
+	pci->pcie.cur_width = -1;
 
 	if (device->type == NVKM_DEVICE_AGP)
 		nvkm_agp_ctor(pci);
diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
index 70ccbe0d..731dd30e 100644
--- a/drm/nouveau/nvkm/subdev/pci/pcie.c
+++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
@@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
 	return 0;
 }
 
+int
+nvkm_pcie_preinit(struct nvkm_pci *pci)
+{
+	pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
+	return 0;
+}
+
 int
 nvkm_pcie_init(struct nvkm_pci *pci)
 {
@@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
 	if (pci->func->pcie.init)
 		pci->func->pcie.init(pci);
 
-	if (pci->pcie.speed != -1)
-		nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
+	if (pci->pcie.cur_speed != -1)
+		nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
+				   pci->pcie.cur_width);
 
 	return 0;
 }
 
+int
+nvkm_pcie_fini(struct nvkm_pci *pci)
+{
+	if (!IS_ERR_VALUE(pci->pcie.def_speed))
+		return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
+	return 0;
+}
+
 int
 nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
 {
@@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
 		speed = max_speed;
 	}
 
-	pci->pcie.speed = speed;
-	pci->pcie.width = width;
+	pci->pcie.cur_speed = speed;
+	pci->pcie.cur_width = width;
 
 	if (speed == cur_speed) {
 		nvkm_debug(subdev, "requested matches current speed\n");
diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
index a0d4c007..e7744671 100644
--- a/drm/nouveau/nvkm/subdev/pci/priv.h
+++ b/drm/nouveau/nvkm/subdev/pci/priv.h
@@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
 int gk104_pcie_version_supported(struct nvkm_pci *);
 
 int nvkm_pcie_oneinit(struct nvkm_pci *);
+int nvkm_pcie_preinit(struct nvkm_pci *);
 int nvkm_pcie_init(struct nvkm_pci *);
+int nvkm_pcie_fini(struct nvkm_pci *);
 #endif
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 1/4] drm: don't set the pci power state if the pci subsystem handles the ACPI bits
  2019-05-07 20:12 ` [PATCH v2 1/4] drm: don't set the pci power state if the pci subsystem handles the ACPI bits Karol Herbst
@ 2019-05-08 19:10   ` Lyude Paul
  0 siblings, 0 replies; 23+ messages in thread
From: Lyude Paul @ 2019-05-08 19:10 UTC (permalink / raw)
  To: Karol Herbst, nouveau; +Cc: linux-pci, Bjorn Helgaas

Reviewed-by: Lyude Paul <lyude@redhat.com>

On Tue, 2019-05-07 at 22:12 +0200, Karol Herbst wrote:
> v2: rework detection of if Nouveau calling a DSM method or not
> 
> Signed-off-by: Karol Herbst <kherbst@redhat.com>
> ---
>  drm/nouveau/nouveau_acpi.c |  7 ++++++-
>  drm/nouveau/nouveau_acpi.h |  2 ++
>  drm/nouveau/nouveau_drm.c  | 14 +++++++++++---
>  drm/nouveau/nouveau_drv.h  |  2 ++
>  4 files changed, 21 insertions(+), 4 deletions(-)
> 
> diff --git a/drm/nouveau/nouveau_acpi.c b/drm/nouveau/nouveau_acpi.c
> index ffb19585..92dfc900 100644
> --- a/drm/nouveau/nouveau_acpi.c
> +++ b/drm/nouveau/nouveau_acpi.c
> @@ -358,6 +358,12 @@ void nouveau_register_dsm_handler(void)
>  	vga_switcheroo_register_handler(&nouveau_dsm_handler, 0);
>  }
>  
> +bool nouveau_runpm_calls_dsm(void)
> +{
> +	return nouveau_dsm_priv.optimus_detected &&
> +		!nouveau_dsm_priv.optimus_skip_dsm;
> +}
> +
>  /* Must be called for Optimus models before the card can be turned off */
>  void nouveau_switcheroo_optimus_dsm(void)
>  {
> @@ -371,7 +377,6 @@ void nouveau_switcheroo_optimus_dsm(void)
>  
>  	nouveau_optimus_dsm(nouveau_dsm_priv.dhandle,
> NOUVEAU_DSM_OPTIMUS_CAPS,
>  		NOUVEAU_DSM_OPTIMUS_SET_POWERDOWN, &result);
> -
>  }
>  
>  void nouveau_unregister_dsm_handler(void)
> diff --git a/drm/nouveau/nouveau_acpi.h b/drm/nouveau/nouveau_acpi.h
> index b86294fc..0f5d7aa0 100644
> --- a/drm/nouveau/nouveau_acpi.h
> +++ b/drm/nouveau/nouveau_acpi.h
> @@ -13,6 +13,7 @@ void nouveau_switcheroo_optimus_dsm(void);
>  int nouveau_acpi_get_bios_chunk(uint8_t *bios, int offset, int len);
>  bool nouveau_acpi_rom_supported(struct device *);
>  void *nouveau_acpi_edid(struct drm_device *, struct drm_connector *);
> +bool nouveau_runpm_calls_dsm(void);
>  #else
>  static inline bool nouveau_is_optimus(void) { return false; };
>  static inline bool nouveau_is_v1_dsm(void) { return false; };
> @@ -22,6 +23,7 @@ static inline void nouveau_switcheroo_optimus_dsm(void) {}
>  static inline bool nouveau_acpi_rom_supported(struct device *dev) { return
> false; }
>  static inline int nouveau_acpi_get_bios_chunk(uint8_t *bios, int offset,
> int len) { return -EINVAL; }
>  static inline void *nouveau_acpi_edid(struct drm_device *dev, struct
> drm_connector *connector) { return NULL; }
> +static inline bool nouveau_runpm_calls_dsm(void) { return false; }
>  #endif
>  
>  #endif
> diff --git a/drm/nouveau/nouveau_drm.c b/drm/nouveau/nouveau_drm.c
> index 5020265b..09e68e61 100644
> --- a/drm/nouveau/nouveau_drm.c
> +++ b/drm/nouveau/nouveau_drm.c
> @@ -556,6 +556,7 @@ nouveau_drm_device_init(struct drm_device *dev)
>  	nouveau_fbcon_init(dev);
>  	nouveau_led_init(dev);
>  
> +	drm->runpm_dsm = nouveau_runpm_calls_dsm();
>  	if (nouveau_pmops_runtime()) {
>  		pm_runtime_use_autosuspend(dev->dev);
>  		pm_runtime_set_autosuspend_delay(dev->dev, 5000);
> @@ -903,6 +904,7 @@ nouveau_pmops_runtime_suspend(struct device *dev)
>  {
>  	struct pci_dev *pdev = to_pci_dev(dev);
>  	struct drm_device *drm_dev = pci_get_drvdata(pdev);
> +	struct nouveau_drm *drm = nouveau_drm(drm_dev);
>  	int ret;
>  
>  	if (!nouveau_pmops_runtime()) {
> @@ -910,12 +912,15 @@ nouveau_pmops_runtime_suspend(struct device *dev)
>  		return -EBUSY;
>  	}
>  
> +	drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
>  	nouveau_switcheroo_optimus_dsm();
>  	ret = nouveau_do_suspend(drm_dev, true);
>  	pci_save_state(pdev);
>  	pci_disable_device(pdev);
>  	pci_ignore_hotplug(pdev);
> -	pci_set_power_state(pdev, PCI_D3cold);
> +	if (drm->runpm_dsm)
> +		pci_set_power_state(pdev, PCI_D3cold);
> +
>  	drm_dev->switch_power_state = DRM_SWITCH_POWER_DYNAMIC_OFF;
>  	return ret;
>  }
> @@ -925,7 +930,8 @@ nouveau_pmops_runtime_resume(struct device *dev)
>  {
>  	struct pci_dev *pdev = to_pci_dev(dev);
>  	struct drm_device *drm_dev = pci_get_drvdata(pdev);
> -	struct nvif_device *device = &nouveau_drm(drm_dev)->client.device;
> +	struct nouveau_drm *drm = nouveau_drm(drm_dev);
> +	struct nvif_device *device = &drm->client.device;
>  	int ret;
>  
>  	if (!nouveau_pmops_runtime()) {
> @@ -933,7 +939,9 @@ nouveau_pmops_runtime_resume(struct device *dev)
>  		return -EBUSY;
>  	}
>  
> -	pci_set_power_state(pdev, PCI_D0);
> +	drm_dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
> +	if (drm->runpm_dsm)
> +		pci_set_power_state(pdev, PCI_D0);
>  	pci_restore_state(pdev);
>  	ret = pci_enable_device(pdev);
>  	if (ret)
> diff --git a/drm/nouveau/nouveau_drv.h b/drm/nouveau/nouveau_drv.h
> index da847244..941600e9 100644
> --- a/drm/nouveau/nouveau_drv.h
> +++ b/drm/nouveau/nouveau_drv.h
> @@ -214,6 +214,8 @@ struct nouveau_drm {
>  	struct nouveau_svm *svm;
>  
>  	struct nouveau_dmem *dmem;
> +
> +	bool runpm_dsm;
>  };
>  
>  static inline struct nouveau_drm *
-- 
Cheers,
	Lyude Paul


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 0/4] Potential fix for runpm issues on various laptops
  2019-05-07 20:12 [PATCH v2 0/4] Potential fix for runpm issues on various laptops Karol Herbst
                   ` (3 preceding siblings ...)
  2019-05-07 20:12 ` [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini Karol Herbst
@ 2019-05-20 13:23 ` Karol Herbst
  4 siblings, 0 replies; 23+ messages in thread
From: Karol Herbst @ 2019-05-20 13:23 UTC (permalink / raw)
  To: nouveau; +Cc: Lyude Paul, linux-pci, Bjorn Helgaas

ping to the pci folks? I really would like to know what you make out of it.

In fact, this kind of looks like a pcie issue, but I just don't know
enough to really be able to tell. I am mainly wondering why putting
the device with a 2.5 vs a 8.0 link into d3cold makes the resume path
break? Any ideas? broken pcie controller? broken implementation on the
gpu?

On Tue, May 7, 2019 at 10:12 PM Karol Herbst <kherbst@redhat.com> wrote:
>
> CCing linux-pci and Bjorn Helgaas. Maybe we could get better insights on
> how a reasonable fix would look like.
>
> Anyway, to me this entire issue looks like something which has to be fixed
> on a PCI level instead of inside a driver, so it makes sense to ask the
> pci folks if they have any better suggestions.
>
> Original cover letter:
> While investigating the runpm issues on my GP107 I noticed that something
> inside devinit makes runpm break. If Nouveau loads up to the point right
> before doing devinit, runpm works without any issues, if devinit is ran,
> not anymore.
>
> Out of curiousity I even tried to "bisect" devinit by not running it on
> vbios provided signed PMU image, but on the devinit parser we have inside
> Nouveau.
> Allthough this one isn't as feature complete as the vbios one, I was able
> to reproduce the runpm issues as well. From that point I was able to only
> run a certain amount of commands until I got to some PCIe initialization
> code inside devinit which trigger those runpm issues.
>
> Devinit on my GPU was changing the PCIe link from 8.0 to 2.5, reversing
> that on the fini path makes runpm work again.
>
> There are a few other things going on, but with my limited knowledge about
> PCIe in general, the change in the link speed sounded like it could cause
> issues on resume if the controller and the device disagree on the actual
> link.
>
> Maybe this is just a bug within the PCI subsystem inside linux instead and
> the controller has to be forced to do _something_?
>
> Anyway, with this runpm seems to work nicely on my machine. Secure booting
> the gr (even with my workaround applied I need anyway) might fail after
> the GPU got runtime resumed though...
>
> Karol Herbst (4):
>   drm: don't set the pci power state if the pci subsystem handles the
>     ACPI bits
>   pci: enable pcie link changes for pascal
>   pci: add nvkm_pcie_get_speed
>   pci: save the boot pcie link speed and restore it on fini
>
>  drm/nouveau/include/nvkm/subdev/pci.h |  6 +++--
>  drm/nouveau/nouveau_acpi.c            |  7 +++++-
>  drm/nouveau/nouveau_acpi.h            |  2 ++
>  drm/nouveau/nouveau_drm.c             | 14 +++++++++---
>  drm/nouveau/nouveau_drv.h             |  2 ++
>  drm/nouveau/nvkm/subdev/pci/base.c    |  9 ++++++--
>  drm/nouveau/nvkm/subdev/pci/gk104.c   |  8 +++----
>  drm/nouveau/nvkm/subdev/pci/gp100.c   | 10 +++++++++
>  drm/nouveau/nvkm/subdev/pci/pcie.c    | 32 +++++++++++++++++++++++----
>  drm/nouveau/nvkm/subdev/pci/priv.h    |  7 ++++++
>  10 files changed, 81 insertions(+), 16 deletions(-)
>
> --
> 2.21.0
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-05-07 20:12 ` [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini Karol Herbst
@ 2019-05-20 21:19   ` Bjorn Helgaas
  2019-05-20 22:30     ` Karol Herbst
  0 siblings, 1 reply; 23+ messages in thread
From: Bjorn Helgaas @ 2019-05-20 21:19 UTC (permalink / raw)
  To: Karol Herbst; +Cc: nouveau, Lyude Paul, linux-pci

On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> Apperantly things go south if we suspend the device with a different PCIE
> link speed set than it got booted with. Fixes runtime suspend on my gp107.
> 
> This all looks like some bug inside the pci subsystem and I would prefer a
> fix there instead of nouveau, but maybe there is no real nice way of doing
> that outside of drivers?

I agree it would be nice to fix this in the PCI core if that's
feasible.

It looks like this driver changes the PCIe link speed using some
device-specific mechanism.  When we suspend, we put the device in
D3cold, so it loses all its state.  When we resume, the link probably
comes up at the boot speed because nothing did that device-specific
magic to change it, so you probably end up with the link being slow
but the driver thinking it's configured to be fast, and maybe that
combination doesn't work.

If it requires something device-specific to change that link speed, I
don't know how to put that in the PCI core.  But maybe I'm missing
something?

Per the PCIe spec (r4.0, sec 1.2):

  Initialization – During hardware initialization, each PCI Express
  Link is set up following a negotiation of Lane widths and frequency
  of operation by the two agents at each end of the Link. No firmware
  or operating system software is involved.

I have been assuming that this means device-specific link speed
management is out of spec, but it seems pretty common that devices
don't come up by themselves at the fastest possible link speed.  So
maybe the spec just intends that devices can operate at *some* valid
speed.

> v2: squashed together patch 4 and 5
> 
> Signed-off-by: Karol Herbst <kherbst@redhat.com>
> Reviewed-by: Lyude Paul <lyude@redhat.com>
> ---
>  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
>  drm/nouveau/nvkm/subdev/pci/base.c    |  9 +++++++--
>  drm/nouveau/nvkm/subdev/pci/pcie.c    | 24 ++++++++++++++++++++----
>  drm/nouveau/nvkm/subdev/pci/priv.h    |  2 ++
>  4 files changed, 32 insertions(+), 8 deletions(-)
> 
> diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> index 1fdf3098..b23793a2 100644
> --- a/drm/nouveau/include/nvkm/subdev/pci.h
> +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> @@ -26,8 +26,9 @@ struct nvkm_pci {
>  	} agp;
>  
>  	struct {
> -		enum nvkm_pcie_speed speed;
> -		u8 width;
> +		enum nvkm_pcie_speed cur_speed;
> +		enum nvkm_pcie_speed def_speed;
> +		u8 cur_width;
>  	} pcie;
>  
>  	bool msi;
> diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> index ee2431a7..d9fb5a83 100644
> --- a/drm/nouveau/nvkm/subdev/pci/base.c
> +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
>  
>  	if (pci->agp.bridge)
>  		nvkm_agp_fini(pci);
> +	else if (pci_is_pcie(pci->pdev))
> +		nvkm_pcie_fini(pci);
>  
>  	return 0;
>  }
> @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
>  	struct nvkm_pci *pci = nvkm_pci(subdev);
>  	if (pci->agp.bridge)
>  		nvkm_agp_preinit(pci);
> +	else if (pci_is_pcie(pci->pdev))
> +		nvkm_pcie_preinit(pci);
>  	return 0;
>  }
>  
> @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
>  	pci->func = func;
>  	pci->pdev = device->func->pci(device)->pdev;
>  	pci->irq = -1;
> -	pci->pcie.speed = -1;
> -	pci->pcie.width = -1;
> +	pci->pcie.cur_speed = -1;
> +	pci->pcie.def_speed = -1;
> +	pci->pcie.cur_width = -1;
>  
>  	if (device->type == NVKM_DEVICE_AGP)
>  		nvkm_agp_ctor(pci);
> diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> index 70ccbe0d..731dd30e 100644
> --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
>  	return 0;
>  }
>  
> +int
> +nvkm_pcie_preinit(struct nvkm_pci *pci)
> +{
> +	pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
> +	return 0;
> +}
> +
>  int
>  nvkm_pcie_init(struct nvkm_pci *pci)
>  {
> @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
>  	if (pci->func->pcie.init)
>  		pci->func->pcie.init(pci);
>  
> -	if (pci->pcie.speed != -1)
> -		nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> +	if (pci->pcie.cur_speed != -1)
> +		nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> +				   pci->pcie.cur_width);
>  
>  	return 0;
>  }
>  
> +int
> +nvkm_pcie_fini(struct nvkm_pci *pci)
> +{
> +	if (!IS_ERR_VALUE(pci->pcie.def_speed))
> +		return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> +	return 0;
> +}
> +
>  int
>  nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
>  {
> @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
>  		speed = max_speed;
>  	}
>  
> -	pci->pcie.speed = speed;
> -	pci->pcie.width = width;
> +	pci->pcie.cur_speed = speed;
> +	pci->pcie.cur_width = width;
>  
>  	if (speed == cur_speed) {
>  		nvkm_debug(subdev, "requested matches current speed\n");
> diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> index a0d4c007..e7744671 100644
> --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
>  int gk104_pcie_version_supported(struct nvkm_pci *);
>  
>  int nvkm_pcie_oneinit(struct nvkm_pci *);
> +int nvkm_pcie_preinit(struct nvkm_pci *);
>  int nvkm_pcie_init(struct nvkm_pci *);
> +int nvkm_pcie_fini(struct nvkm_pci *);
>  #endif
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 2/4] pci: enable pcie link changes for pascal
  2019-05-07 20:12 ` [PATCH v2 2/4] pci: enable pcie link changes for pascal Karol Herbst
@ 2019-05-20 21:25   ` Bjorn Helgaas
  0 siblings, 0 replies; 23+ messages in thread
From: Bjorn Helgaas @ 2019-05-20 21:25 UTC (permalink / raw)
  To: Karol Herbst; +Cc: nouveau, Lyude Paul, linux-pci

On Tue, May 07, 2019 at 10:12:43PM +0200, Karol Herbst wrote:
> Signed-off-by: Karol Herbst <kherbst@redhat.com>
> Reviewed-by: Lyude Paul <lyude@redhat.com>
> ---
>  drm/nouveau/nvkm/subdev/pci/gk104.c |  8 ++++----
>  drm/nouveau/nvkm/subdev/pci/gp100.c | 10 ++++++++++
>  drm/nouveau/nvkm/subdev/pci/priv.h  |  5 +++++
>  3 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/drm/nouveau/nvkm/subdev/pci/gk104.c b/drm/nouveau/nvkm/subdev/pci/gk104.c
> index e6803050..66489018 100644
> --- a/drm/nouveau/nvkm/subdev/pci/gk104.c
> +++ b/drm/nouveau/nvkm/subdev/pci/gk104.c
> @@ -23,7 +23,7 @@
>   */
>  #include "priv.h"
>  
> -static int
> +int
>  gk104_pcie_version_supported(struct nvkm_pci *pci)
>  {
>  	return (nvkm_rd32(pci->subdev.device, 0x8c1c0) & 0x4) == 0x4 ? 2 : 1;
> @@ -108,7 +108,7 @@ gk104_pcie_lnkctl_speed(struct nvkm_pci *pci)
>  	return -1;
>  }
>  
> -static enum nvkm_pcie_speed
> +enum nvkm_pcie_speed
>  gk104_pcie_max_speed(struct nvkm_pci *pci)
>  {
>  	u32 max_speed = nvkm_rd32(pci->subdev.device, 0x8c1c0) & 0x300000;

Some of this looks pretty similar to the PCI core code that reads
PCI_EXP_LNKSTA to find the link speed (but admittedly there's not
really a good interface to do that unless bus->cur_bus_speed already
has what you need).

It doesn't look like this is directly reading the PCI_EXP_LNKSTA from
PCI config space; maybe it's reading a mirror of that via a
device-specific MMIO address, or maybe it's reading something else
completely.

But it makes me wonder if there's a way to make generic PCI core
interfaces for some of this stuff.

> @@ -146,7 +146,7 @@ gk104_pcie_set_link_speed(struct nvkm_pci *pci, enum nvkm_pcie_speed speed)
>  	nvkm_mask(device, 0x8c040, 0x1, 0x1);
>  }
>  
> -static int
> +int
>  gk104_pcie_init(struct nvkm_pci * pci)
>  {
>  	enum nvkm_pcie_speed lnkctl_speed, max_speed, cap_speed;
> @@ -178,7 +178,7 @@ gk104_pcie_init(struct nvkm_pci * pci)
>  	return 0;
>  }
>  
> -static int
> +int
>  gk104_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
>  {
>  	struct nvkm_subdev *subdev = &pci->subdev;
> diff --git a/drm/nouveau/nvkm/subdev/pci/gp100.c b/drm/nouveau/nvkm/subdev/pci/gp100.c
> index 82c5234a..eb19c7a4 100644
> --- a/drm/nouveau/nvkm/subdev/pci/gp100.c
> +++ b/drm/nouveau/nvkm/subdev/pci/gp100.c
> @@ -35,6 +35,16 @@ gp100_pci_func = {
>  	.wr08 = nv40_pci_wr08,
>  	.wr32 = nv40_pci_wr32,
>  	.msi_rearm = gp100_pci_msi_rearm,
> +
> +	.pcie.init = gk104_pcie_init,
> +	.pcie.set_link = gk104_pcie_set_link,
> +
> +	.pcie.max_speed = gk104_pcie_max_speed,
> +	.pcie.cur_speed = g84_pcie_cur_speed,
> +
> +	.pcie.set_version = gf100_pcie_set_version,
> +	.pcie.version = gf100_pcie_version,
> +	.pcie.version_supported = gk104_pcie_version_supported,
>  };
>  
>  int
> diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> index c17f6063..a0d4c007 100644
> --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> @@ -54,6 +54,11 @@ int gf100_pcie_cap_speed(struct nvkm_pci *);
>  int gf100_pcie_init(struct nvkm_pci *);
>  int gf100_pcie_set_link(struct nvkm_pci *, enum nvkm_pcie_speed, u8);
>  
> +int gk104_pcie_init(struct nvkm_pci *);
> +int gk104_pcie_set_link(struct nvkm_pci *, enum nvkm_pcie_speed, u8 width);
> +enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
> +int gk104_pcie_version_supported(struct nvkm_pci *);
> +
>  int nvkm_pcie_oneinit(struct nvkm_pci *);
>  int nvkm_pcie_init(struct nvkm_pci *);
>  #endif
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-05-20 21:19   ` Bjorn Helgaas
@ 2019-05-20 22:30     ` Karol Herbst
  2019-05-21 13:10       ` Bjorn Helgaas
  0 siblings, 1 reply; 23+ messages in thread
From: Karol Herbst @ 2019-05-20 22:30 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: nouveau, Lyude Paul, linux-pci

On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > Apperantly things go south if we suspend the device with a different PCIE
> > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> >
> > This all looks like some bug inside the pci subsystem and I would prefer a
> > fix there instead of nouveau, but maybe there is no real nice way of doing
> > that outside of drivers?
>
> I agree it would be nice to fix this in the PCI core if that's
> feasible.
>
> It looks like this driver changes the PCIe link speed using some
> device-specific mechanism.  When we suspend, we put the device in
> D3cold, so it loses all its state.  When we resume, the link probably
> comes up at the boot speed because nothing did that device-specific
> magic to change it, so you probably end up with the link being slow
> but the driver thinking it's configured to be fast, and maybe that
> combination doesn't work.
>
> If it requires something device-specific to change that link speed, I
> don't know how to put that in the PCI core.  But maybe I'm missing
> something?
>
> Per the PCIe spec (r4.0, sec 1.2):
>
>   Initialization – During hardware initialization, each PCI Express
>   Link is set up following a negotiation of Lane widths and frequency
>   of operation by the two agents at each end of the Link. No firmware
>   or operating system software is involved.
>
> I have been assuming that this means device-specific link speed
> management is out of spec, but it seems pretty common that devices
> don't come up by themselves at the fastest possible link speed.  So
> maybe the spec just intends that devices can operate at *some* valid
> speed.
>

I would expect that devices kind of have to figure out what they can
operate on and the operating system kind of just checks what the
current state is and doesn't try to "restore" the old state or
something?

We don't do anything in the driver after the device was suspended. And
the 0x88000 is a mirror of the PCI config space, but we also got some
PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
essentially. I have no idea how much of this is part of the actual pci
standard and how much is driver specific. But the driver also wants to
have some control over the link speed as it's tight to performance
states on GPU.

The big issue here is just, that the GPU boots with 8.0, some on-gpu
init mechanism decreases it to 2.5. If we suspend, the GPU or at least
the communication with the controller is broken. But if we set it to
the boot speed, resuming the GPU just works. So my assumption was,
that _something_ (might it be the controller or the pci subsystem)
tries to force to operate on an invalid link speed and because the
bridge controller is actually powered down as well (as all children
are in D3cold) I could imagine that something in the pci subsystem
actually restores the state which lets the controller fail to
establish communication again?

Just something which kind of would make sense to me.

> > v2: squashed together patch 4 and 5
> >
> > Signed-off-by: Karol Herbst <kherbst@redhat.com>
> > Reviewed-by: Lyude Paul <lyude@redhat.com>
> > ---
> >  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
> >  drm/nouveau/nvkm/subdev/pci/base.c    |  9 +++++++--
> >  drm/nouveau/nvkm/subdev/pci/pcie.c    | 24 ++++++++++++++++++++----
> >  drm/nouveau/nvkm/subdev/pci/priv.h    |  2 ++
> >  4 files changed, 32 insertions(+), 8 deletions(-)
> >
> > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > index 1fdf3098..b23793a2 100644
> > --- a/drm/nouveau/include/nvkm/subdev/pci.h
> > +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> > @@ -26,8 +26,9 @@ struct nvkm_pci {
> >       } agp;
> >
> >       struct {
> > -             enum nvkm_pcie_speed speed;
> > -             u8 width;
> > +             enum nvkm_pcie_speed cur_speed;
> > +             enum nvkm_pcie_speed def_speed;
> > +             u8 cur_width;
> >       } pcie;
> >
> >       bool msi;
> > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > index ee2431a7..d9fb5a83 100644
> > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
> >
> >       if (pci->agp.bridge)
> >               nvkm_agp_fini(pci);
> > +     else if (pci_is_pcie(pci->pdev))
> > +             nvkm_pcie_fini(pci);
> >
> >       return 0;
> >  }
> > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
> >       struct nvkm_pci *pci = nvkm_pci(subdev);
> >       if (pci->agp.bridge)
> >               nvkm_agp_preinit(pci);
> > +     else if (pci_is_pcie(pci->pdev))
> > +             nvkm_pcie_preinit(pci);
> >       return 0;
> >  }
> >
> > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
> >       pci->func = func;
> >       pci->pdev = device->func->pci(device)->pdev;
> >       pci->irq = -1;
> > -     pci->pcie.speed = -1;
> > -     pci->pcie.width = -1;
> > +     pci->pcie.cur_speed = -1;
> > +     pci->pcie.def_speed = -1;
> > +     pci->pcie.cur_width = -1;
> >
> >       if (device->type == NVKM_DEVICE_AGP)
> >               nvkm_agp_ctor(pci);
> > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > index 70ccbe0d..731dd30e 100644
> > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
> >       return 0;
> >  }
> >
> > +int
> > +nvkm_pcie_preinit(struct nvkm_pci *pci)
> > +{
> > +     pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
> > +     return 0;
> > +}
> > +
> >  int
> >  nvkm_pcie_init(struct nvkm_pci *pci)
> >  {
> > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
> >       if (pci->func->pcie.init)
> >               pci->func->pcie.init(pci);
> >
> > -     if (pci->pcie.speed != -1)
> > -             nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> > +     if (pci->pcie.cur_speed != -1)
> > +             nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> > +                                pci->pcie.cur_width);
> >
> >       return 0;
> >  }
> >
> > +int
> > +nvkm_pcie_fini(struct nvkm_pci *pci)
> > +{
> > +     if (!IS_ERR_VALUE(pci->pcie.def_speed))
> > +             return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> > +     return 0;
> > +}
> > +
> >  int
> >  nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> >  {
> > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> >               speed = max_speed;
> >       }
> >
> > -     pci->pcie.speed = speed;
> > -     pci->pcie.width = width;
> > +     pci->pcie.cur_speed = speed;
> > +     pci->pcie.cur_width = width;
> >
> >       if (speed == cur_speed) {
> >               nvkm_debug(subdev, "requested matches current speed\n");
> > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > index a0d4c007..e7744671 100644
> > --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
> >  int gk104_pcie_version_supported(struct nvkm_pci *);
> >
> >  int nvkm_pcie_oneinit(struct nvkm_pci *);
> > +int nvkm_pcie_preinit(struct nvkm_pci *);
> >  int nvkm_pcie_init(struct nvkm_pci *);
> > +int nvkm_pcie_fini(struct nvkm_pci *);
> >  #endif
> > --
> > 2.21.0
> >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-05-20 22:30     ` Karol Herbst
@ 2019-05-21 13:10       ` Bjorn Helgaas
  2019-05-21 13:28         ` Karol Herbst
  0 siblings, 1 reply; 23+ messages in thread
From: Bjorn Helgaas @ 2019-05-21 13:10 UTC (permalink / raw)
  To: Karol Herbst; +Cc: nouveau, Lyude Paul, linux-pci

On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > Apperantly things go south if we suspend the device with a different PCIE
> > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > >
> > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > that outside of drivers?
> >
> > I agree it would be nice to fix this in the PCI core if that's
> > feasible.
> >
> > It looks like this driver changes the PCIe link speed using some
> > device-specific mechanism.  When we suspend, we put the device in
> > D3cold, so it loses all its state.  When we resume, the link probably
> > comes up at the boot speed because nothing did that device-specific
> > magic to change it, so you probably end up with the link being slow
> > but the driver thinking it's configured to be fast, and maybe that
> > combination doesn't work.
> >
> > If it requires something device-specific to change that link speed, I
> > don't know how to put that in the PCI core.  But maybe I'm missing
> > something?
> >
> > Per the PCIe spec (r4.0, sec 1.2):
> >
> >   Initialization – During hardware initialization, each PCI Express
> >   Link is set up following a negotiation of Lane widths and frequency
> >   of operation by the two agents at each end of the Link. No firmware
> >   or operating system software is involved.
> >
> > I have been assuming that this means device-specific link speed
> > management is out of spec, but it seems pretty common that devices
> > don't come up by themselves at the fastest possible link speed.  So
> > maybe the spec just intends that devices can operate at *some* valid
> > speed.
> 
> I would expect that devices kind of have to figure out what they can
> operate on and the operating system kind of just checks what the
> current state is and doesn't try to "restore" the old state or
> something?

The devices at each end of the link negotiate the width and speed of
the link.  This is done directly by the hardware without any help from
the OS.

The OS can read the current link state (Current Link Speed and
Negotiated Link Width, both in the Link Status register).  The OS has
very little control over that state.  It can't directly restore the
state because the hardware has to negotiate a width & speed that
result in reliable operation.

> We don't do anything in the driver after the device was suspended. And
> the 0x88000 is a mirror of the PCI config space, but we also got some
> PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> essentially. I have no idea how much of this is part of the actual pci
> standard and how much is driver specific. But the driver also wants to
> have some control over the link speed as it's tight to performance
> states on GPU.

As far as I'm aware, there is no generic PCIe way for the OS to
influence the link width or speed.  If the GPU driver needs to do
that, it would be via some device-specific mechanism.

> The big issue here is just, that the GPU boots with 8.0, some on-gpu
> init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> the communication with the controller is broken. But if we set it to
> the boot speed, resuming the GPU just works. So my assumption was,
> that _something_ (might it be the controller or the pci subsystem)
> tries to force to operate on an invalid link speed and because the
> bridge controller is actually powered down as well (as all children
> are in D3cold) I could imagine that something in the pci subsystem
> actually restores the state which lets the controller fail to
> establish communication again?

  1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
     without OS/driver intervention.

  2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
     requires driver intervention or at least some ACPI method.

  3) Suspend puts GPU into D3cold (powered off).

  4) Resume restores GPU to D0, and the Port and GPU hardware again
     negotiate 8.0 GT/s without OS/driver intervention, just like at
     initial boot.

  5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
     8.0 GT/s.

Without knowing more about the transition to 2.5 GT/s, I can't guess
why the GPU wouldn't work after resume.  From a PCIe point of view,
the link is supposed to work and the device should be reachable
independent of the link speed.  But maybe there's some weird
dependency between the GPU and the driver here.

It sounds like things work if you return to 8.0 GT/s before suspend,
things work.  That would make sense to me because then the driver's
idea of the link state after resume would match the actual state.

But I don't see a way to deal with this in the PCI core.  The PCI core
does save and restore most of the architected config space around
suspend/resume, but since this appears to be a device-specific thing,
the PCI core would have no idea how to save/restore it.

> > > Signed-off-by: Karol Herbst <kherbst@redhat.com>
> > > Reviewed-by: Lyude Paul <lyude@redhat.com>
> > > ---
> > >  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
> > >  drm/nouveau/nvkm/subdev/pci/base.c    |  9 +++++++--
> > >  drm/nouveau/nvkm/subdev/pci/pcie.c    | 24 ++++++++++++++++++++----
> > >  drm/nouveau/nvkm/subdev/pci/priv.h    |  2 ++
> > >  4 files changed, 32 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > > index 1fdf3098..b23793a2 100644
> > > --- a/drm/nouveau/include/nvkm/subdev/pci.h
> > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> > > @@ -26,8 +26,9 @@ struct nvkm_pci {
> > >       } agp;
> > >
> > >       struct {
> > > -             enum nvkm_pcie_speed speed;
> > > -             u8 width;
> > > +             enum nvkm_pcie_speed cur_speed;
> > > +             enum nvkm_pcie_speed def_speed;
> > > +             u8 cur_width;
> > >       } pcie;
> > >
> > >       bool msi;
> > > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > > index ee2431a7..d9fb5a83 100644
> > > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
> > >
> > >       if (pci->agp.bridge)
> > >               nvkm_agp_fini(pci);
> > > +     else if (pci_is_pcie(pci->pdev))
> > > +             nvkm_pcie_fini(pci);
> > >
> > >       return 0;
> > >  }
> > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
> > >       struct nvkm_pci *pci = nvkm_pci(subdev);
> > >       if (pci->agp.bridge)
> > >               nvkm_agp_preinit(pci);
> > > +     else if (pci_is_pcie(pci->pdev))
> > > +             nvkm_pcie_preinit(pci);
> > >       return 0;
> > >  }
> > >
> > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
> > >       pci->func = func;
> > >       pci->pdev = device->func->pci(device)->pdev;
> > >       pci->irq = -1;
> > > -     pci->pcie.speed = -1;
> > > -     pci->pcie.width = -1;
> > > +     pci->pcie.cur_speed = -1;
> > > +     pci->pcie.def_speed = -1;
> > > +     pci->pcie.cur_width = -1;
> > >
> > >       if (device->type == NVKM_DEVICE_AGP)
> > >               nvkm_agp_ctor(pci);
> > > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > index 70ccbe0d..731dd30e 100644
> > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
> > >       return 0;
> > >  }
> > >
> > > +int
> > > +nvkm_pcie_preinit(struct nvkm_pci *pci)
> > > +{
> > > +     pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
> > > +     return 0;
> > > +}
> > > +
> > >  int
> > >  nvkm_pcie_init(struct nvkm_pci *pci)
> > >  {
> > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
> > >       if (pci->func->pcie.init)
> > >               pci->func->pcie.init(pci);
> > >
> > > -     if (pci->pcie.speed != -1)
> > > -             nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> > > +     if (pci->pcie.cur_speed != -1)
> > > +             nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> > > +                                pci->pcie.cur_width);
> > >
> > >       return 0;
> > >  }
> > >
> > > +int
> > > +nvkm_pcie_fini(struct nvkm_pci *pci)
> > > +{
> > > +     if (!IS_ERR_VALUE(pci->pcie.def_speed))
> > > +             return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> > > +     return 0;
> > > +}
> > > +
> > >  int
> > >  nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > >  {
> > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > >               speed = max_speed;
> > >       }
> > >
> > > -     pci->pcie.speed = speed;
> > > -     pci->pcie.width = width;
> > > +     pci->pcie.cur_speed = speed;
> > > +     pci->pcie.cur_width = width;
> > >
> > >       if (speed == cur_speed) {
> > >               nvkm_debug(subdev, "requested matches current speed\n");
> > > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > index a0d4c007..e7744671 100644
> > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
> > >  int gk104_pcie_version_supported(struct nvkm_pci *);
> > >
> > >  int nvkm_pcie_oneinit(struct nvkm_pci *);
> > > +int nvkm_pcie_preinit(struct nvkm_pci *);
> > >  int nvkm_pcie_init(struct nvkm_pci *);
> > > +int nvkm_pcie_fini(struct nvkm_pci *);
> > >  #endif
> > > --
> > > 2.21.0
> > >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-05-21 13:10       ` Bjorn Helgaas
@ 2019-05-21 13:28         ` Karol Herbst
  2019-05-21 13:50           ` [Nouveau] " Ilia Mirkin
  2019-05-21 14:13           ` Bjorn Helgaas
  0 siblings, 2 replies; 23+ messages in thread
From: Karol Herbst @ 2019-05-21 13:28 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: nouveau, Lyude Paul, linux-pci

On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > Apperantly things go south if we suspend the device with a different PCIE
> > > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > > >
> > > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > > that outside of drivers?
> > >
> > > I agree it would be nice to fix this in the PCI core if that's
> > > feasible.
> > >
> > > It looks like this driver changes the PCIe link speed using some
> > > device-specific mechanism.  When we suspend, we put the device in
> > > D3cold, so it loses all its state.  When we resume, the link probably
> > > comes up at the boot speed because nothing did that device-specific
> > > magic to change it, so you probably end up with the link being slow
> > > but the driver thinking it's configured to be fast, and maybe that
> > > combination doesn't work.
> > >
> > > If it requires something device-specific to change that link speed, I
> > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > something?
> > >
> > > Per the PCIe spec (r4.0, sec 1.2):
> > >
> > >   Initialization – During hardware initialization, each PCI Express
> > >   Link is set up following a negotiation of Lane widths and frequency
> > >   of operation by the two agents at each end of the Link. No firmware
> > >   or operating system software is involved.
> > >
> > > I have been assuming that this means device-specific link speed
> > > management is out of spec, but it seems pretty common that devices
> > > don't come up by themselves at the fastest possible link speed.  So
> > > maybe the spec just intends that devices can operate at *some* valid
> > > speed.
> >
> > I would expect that devices kind of have to figure out what they can
> > operate on and the operating system kind of just checks what the
> > current state is and doesn't try to "restore" the old state or
> > something?
>
> The devices at each end of the link negotiate the width and speed of
> the link.  This is done directly by the hardware without any help from
> the OS.
>
> The OS can read the current link state (Current Link Speed and
> Negotiated Link Width, both in the Link Status register).  The OS has
> very little control over that state.  It can't directly restore the
> state because the hardware has to negotiate a width & speed that
> result in reliable operation.
>
> > We don't do anything in the driver after the device was suspended. And
> > the 0x88000 is a mirror of the PCI config space, but we also got some
> > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > essentially. I have no idea how much of this is part of the actual pci
> > standard and how much is driver specific. But the driver also wants to
> > have some control over the link speed as it's tight to performance
> > states on GPU.
>
> As far as I'm aware, there is no generic PCIe way for the OS to
> influence the link width or speed.  If the GPU driver needs to do
> that, it would be via some device-specific mechanism.
>
> > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > the communication with the controller is broken. But if we set it to
> > the boot speed, resuming the GPU just works. So my assumption was,
> > that _something_ (might it be the controller or the pci subsystem)
> > tries to force to operate on an invalid link speed and because the
> > bridge controller is actually powered down as well (as all children
> > are in D3cold) I could imagine that something in the pci subsystem
> > actually restores the state which lets the controller fail to
> > establish communication again?
>
>   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
>      without OS/driver intervention.
>
>   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
>      requires driver intervention or at least some ACPI method.
>

there is no driver intervention and Nouveau doesn't care at all. It's
all done on the GPU. We just upload a script and some firmware on to
the GPU. The script runs then on the PMU inside the GPU and this
script also causes changing the PCIe link settings. But from a Nouveau
point of view we don't care about the link before or after that script
was invoked. Also there is no ACPI method involved.

But if there is something we should notify pci core about, maybe
that's something we have to do then?

>   3) Suspend puts GPU into D3cold (powered off).
>
>   4) Resume restores GPU to D0, and the Port and GPU hardware again
>      negotiate 8.0 GT/s without OS/driver intervention, just like at
>      initial boot.
>

No, that negotiation fails apparently as any attempt to read anything
from the device just fails inside pci core. Or something goes wrong
when resuming the bridge controller.

>   5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
>      8.0 GT/s.
>

what is actually meant by "driver" here? The pci subsystem or Nouveau?

> Without knowing more about the transition to 2.5 GT/s, I can't guess
> why the GPU wouldn't work after resume.  From a PCIe point of view,
> the link is supposed to work and the device should be reachable
> independent of the link speed.  But maybe there's some weird
> dependency between the GPU and the driver here.
>

but the device isn't reachable at all, not even from the pci
subsystem. All reads fail/return a default error value (0xffffffff).

> It sounds like things work if you return to 8.0 GT/s before suspend,
> things work.  That would make sense to me because then the driver's
> idea of the link state after resume would match the actual state.
>

depends on what is meant by the driver here. Inside Nouveau we don't
care one bit about the current link speed, so I assume you mean
something inside the pci core code?

> But I don't see a way to deal with this in the PCI core.  The PCI core
> does save and restore most of the architected config space around
> suspend/resume, but since this appears to be a device-specific thing,
> the PCI core would have no idea how to save/restore it.
>

if we assume that the negotiation on a device level works as intended,
then I would expect this to be a pci core issue, which might actually
be not fixable there. But if it's not, then we would have to put
something like that inside the runpm documentation to tell drivers
they have to do something about it.

But again, for me it just sounds like the negotiation on the device
level fails or something inside pci core messes it up.

> > > > Signed-off-by: Karol Herbst <kherbst@redhat.com>
> > > > Reviewed-by: Lyude Paul <lyude@redhat.com>
> > > > ---
> > > >  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
> > > >  drm/nouveau/nvkm/subdev/pci/base.c    |  9 +++++++--
> > > >  drm/nouveau/nvkm/subdev/pci/pcie.c    | 24 ++++++++++++++++++++----
> > > >  drm/nouveau/nvkm/subdev/pci/priv.h    |  2 ++
> > > >  4 files changed, 32 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > index 1fdf3098..b23793a2 100644
> > > > --- a/drm/nouveau/include/nvkm/subdev/pci.h
> > > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > @@ -26,8 +26,9 @@ struct nvkm_pci {
> > > >       } agp;
> > > >
> > > >       struct {
> > > > -             enum nvkm_pcie_speed speed;
> > > > -             u8 width;
> > > > +             enum nvkm_pcie_speed cur_speed;
> > > > +             enum nvkm_pcie_speed def_speed;
> > > > +             u8 cur_width;
> > > >       } pcie;
> > > >
> > > >       bool msi;
> > > > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > index ee2431a7..d9fb5a83 100644
> > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
> > > >
> > > >       if (pci->agp.bridge)
> > > >               nvkm_agp_fini(pci);
> > > > +     else if (pci_is_pcie(pci->pdev))
> > > > +             nvkm_pcie_fini(pci);
> > > >
> > > >       return 0;
> > > >  }
> > > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
> > > >       struct nvkm_pci *pci = nvkm_pci(subdev);
> > > >       if (pci->agp.bridge)
> > > >               nvkm_agp_preinit(pci);
> > > > +     else if (pci_is_pcie(pci->pdev))
> > > > +             nvkm_pcie_preinit(pci);
> > > >       return 0;
> > > >  }
> > > >
> > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
> > > >       pci->func = func;
> > > >       pci->pdev = device->func->pci(device)->pdev;
> > > >       pci->irq = -1;
> > > > -     pci->pcie.speed = -1;
> > > > -     pci->pcie.width = -1;
> > > > +     pci->pcie.cur_speed = -1;
> > > > +     pci->pcie.def_speed = -1;
> > > > +     pci->pcie.cur_width = -1;
> > > >
> > > >       if (device->type == NVKM_DEVICE_AGP)
> > > >               nvkm_agp_ctor(pci);
> > > > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > index 70ccbe0d..731dd30e 100644
> > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
> > > >       return 0;
> > > >  }
> > > >
> > > > +int
> > > > +nvkm_pcie_preinit(struct nvkm_pci *pci)
> > > > +{
> > > > +     pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
> > > > +     return 0;
> > > > +}
> > > > +
> > > >  int
> > > >  nvkm_pcie_init(struct nvkm_pci *pci)
> > > >  {
> > > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
> > > >       if (pci->func->pcie.init)
> > > >               pci->func->pcie.init(pci);
> > > >
> > > > -     if (pci->pcie.speed != -1)
> > > > -             nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> > > > +     if (pci->pcie.cur_speed != -1)
> > > > +             nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> > > > +                                pci->pcie.cur_width);
> > > >
> > > >       return 0;
> > > >  }
> > > >
> > > > +int
> > > > +nvkm_pcie_fini(struct nvkm_pci *pci)
> > > > +{
> > > > +     if (!IS_ERR_VALUE(pci->pcie.def_speed))
> > > > +             return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> > > > +     return 0;
> > > > +}
> > > > +
> > > >  int
> > > >  nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > >  {
> > > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > >               speed = max_speed;
> > > >       }
> > > >
> > > > -     pci->pcie.speed = speed;
> > > > -     pci->pcie.width = width;
> > > > +     pci->pcie.cur_speed = speed;
> > > > +     pci->pcie.cur_width = width;
> > > >
> > > >       if (speed == cur_speed) {
> > > >               nvkm_debug(subdev, "requested matches current speed\n");
> > > > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > index a0d4c007..e7744671 100644
> > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
> > > >  int gk104_pcie_version_supported(struct nvkm_pci *);
> > > >
> > > >  int nvkm_pcie_oneinit(struct nvkm_pci *);
> > > > +int nvkm_pcie_preinit(struct nvkm_pci *);
> > > >  int nvkm_pcie_init(struct nvkm_pci *);
> > > > +int nvkm_pcie_fini(struct nvkm_pci *);
> > > >  #endif
> > > > --
> > > > 2.21.0
> > > >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-05-21 13:28         ` Karol Herbst
@ 2019-05-21 13:50           ` Ilia Mirkin
  2019-05-21 13:56             ` Karol Herbst
  2019-05-21 14:13           ` Bjorn Helgaas
  1 sibling, 1 reply; 23+ messages in thread
From: Ilia Mirkin @ 2019-05-21 13:50 UTC (permalink / raw)
  To: Karol Herbst; +Cc: Bjorn Helgaas, nouveau, Linux PCI

On Tue, May 21, 2019 at 9:29 AM Karol Herbst <kherbst@redhat.com> wrote:
>
> On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > Apperantly things go south if we suspend the device with a different PCIE
> > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > > > >
> > > > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > > > that outside of drivers?
> > > >
> > > > I agree it would be nice to fix this in the PCI core if that's
> > > > feasible.
> > > >
> > > > It looks like this driver changes the PCIe link speed using some
> > > > device-specific mechanism.  When we suspend, we put the device in
> > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > comes up at the boot speed because nothing did that device-specific
> > > > magic to change it, so you probably end up with the link being slow
> > > > but the driver thinking it's configured to be fast, and maybe that
> > > > combination doesn't work.
> > > >
> > > > If it requires something device-specific to change that link speed, I
> > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > something?
> > > >
> > > > Per the PCIe spec (r4.0, sec 1.2):
> > > >
> > > >   Initialization – During hardware initialization, each PCI Express
> > > >   Link is set up following a negotiation of Lane widths and frequency
> > > >   of operation by the two agents at each end of the Link. No firmware
> > > >   or operating system software is involved.
> > > >
> > > > I have been assuming that this means device-specific link speed
> > > > management is out of spec, but it seems pretty common that devices
> > > > don't come up by themselves at the fastest possible link speed.  So
> > > > maybe the spec just intends that devices can operate at *some* valid
> > > > speed.
> > >
> > > I would expect that devices kind of have to figure out what they can
> > > operate on and the operating system kind of just checks what the
> > > current state is and doesn't try to "restore" the old state or
> > > something?
> >
> > The devices at each end of the link negotiate the width and speed of
> > the link.  This is done directly by the hardware without any help from
> > the OS.
> >
> > The OS can read the current link state (Current Link Speed and
> > Negotiated Link Width, both in the Link Status register).  The OS has
> > very little control over that state.  It can't directly restore the
> > state because the hardware has to negotiate a width & speed that
> > result in reliable operation.
> >
> > > We don't do anything in the driver after the device was suspended. And
> > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > essentially. I have no idea how much of this is part of the actual pci
> > > standard and how much is driver specific. But the driver also wants to
> > > have some control over the link speed as it's tight to performance
> > > states on GPU.
> >
> > As far as I'm aware, there is no generic PCIe way for the OS to
> > influence the link width or speed.  If the GPU driver needs to do
> > that, it would be via some device-specific mechanism.
> >
> > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > the communication with the controller is broken. But if we set it to
> > > the boot speed, resuming the GPU just works. So my assumption was,
> > > that _something_ (might it be the controller or the pci subsystem)
> > > tries to force to operate on an invalid link speed and because the
> > > bridge controller is actually powered down as well (as all children
> > > are in D3cold) I could imagine that something in the pci subsystem
> > > actually restores the state which lets the controller fail to
> > > establish communication again?
> >
> >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> >      without OS/driver intervention.
> >
> >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> >      requires driver intervention or at least some ACPI method.
> >
>
> there is no driver intervention and Nouveau doesn't care at all. It's
> all done on the GPU. We just upload a script and some firmware on to
> the GPU. The script runs then on the PMU inside the GPU and this
> script also causes changing the PCIe link settings. But from a Nouveau
> point of view we don't care about the link before or after that script
> was invoked. Also there is no ACPI method involved.
>
> But if there is something we should notify pci core about, maybe
> that's something we have to do then?
>
> >   3) Suspend puts GPU into D3cold (powered off).
> >
> >   4) Resume restores GPU to D0, and the Port and GPU hardware again
> >      negotiate 8.0 GT/s without OS/driver intervention, just like at
> >      initial boot.
> >
>
> No, that negotiation fails apparently as any attempt to read anything
> from the device just fails inside pci core. Or something goes wrong
> when resuming the bridge controller.
>
> >   5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
> >      8.0 GT/s.
> >
>
> what is actually meant by "driver" here? The pci subsystem or Nouveau?
>
> > Without knowing more about the transition to 2.5 GT/s, I can't guess
> > why the GPU wouldn't work after resume.  From a PCIe point of view,
> > the link is supposed to work and the device should be reachable
> > independent of the link speed.  But maybe there's some weird
> > dependency between the GPU and the driver here.
> >
>
> but the device isn't reachable at all, not even from the pci
> subsystem. All reads fail/return a default error value (0xffffffff).
>
> > It sounds like things work if you return to 8.0 GT/s before suspend,
> > things work.  That would make sense to me because then the driver's
> > idea of the link state after resume would match the actual state.
> >
>
> depends on what is meant by the driver here. Inside Nouveau we don't
> care one bit about the current link speed, so I assume you mean
> something inside the pci core code?
>
> > But I don't see a way to deal with this in the PCI core.  The PCI core
> > does save and restore most of the architected config space around
> > suspend/resume, but since this appears to be a device-specific thing,
> > the PCI core would have no idea how to save/restore it.
> >
>
> if we assume that the negotiation on a device level works as intended,
> then I would expect this to be a pci core issue, which might actually
> be not fixable there. But if it's not, then we would have to put
> something like that inside the runpm documentation to tell drivers
> they have to do something about it.
>
> But again, for me it just sounds like the negotiation on the device
> level fails or something inside pci core messes it up.

Bjorn -- nouveau has a way of requesting that the GPU change PCIe
settings. It sets the PCIe version to the max version (esp older GPUs
tended to boot as PCIe 1.0, and had to be set to 2.0/3.0 "by hand"),
and then the link speed is adjusted based on the perf level settings
by writing to a PCI config-ish mmio space -- however on the GPUs that
Karol is talking about, we can't do the perf level adjustments, so
nouveau never touches the speed. (Does it touch the PCIe version? Not
100% sure ... Karol?) In this case, it sounds like it's firmware
running on the GPU which is doing this (probably using the exact same
mechanism nouveau would -- those internal engines also have access to
the mmio space).

Perhaps there's a way to capture PCI config space of both the GPU and
its link partner, to see if there's anything obviously wrong? (But
even if there is, doesn't sound like we have too much recourse...)
From the sounds of it, the two link partners disagree on settings
somehow and don't establish a proper link.

  -ilia

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-05-21 13:50           ` [Nouveau] " Ilia Mirkin
@ 2019-05-21 13:56             ` Karol Herbst
  0 siblings, 0 replies; 23+ messages in thread
From: Karol Herbst @ 2019-05-21 13:56 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: Bjorn Helgaas, nouveau, Linux PCI

On Tue, May 21, 2019 at 3:51 PM Ilia Mirkin <imirkin@alum.mit.edu> wrote:
>
> On Tue, May 21, 2019 at 9:29 AM Karol Herbst <kherbst@redhat.com> wrote:
> >
> > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > > Apperantly things go south if we suspend the device with a different PCIE
> > > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > > > > >
> > > > > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > > > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > > > > that outside of drivers?
> > > > >
> > > > > I agree it would be nice to fix this in the PCI core if that's
> > > > > feasible.
> > > > >
> > > > > It looks like this driver changes the PCIe link speed using some
> > > > > device-specific mechanism.  When we suspend, we put the device in
> > > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > > comes up at the boot speed because nothing did that device-specific
> > > > > magic to change it, so you probably end up with the link being slow
> > > > > but the driver thinking it's configured to be fast, and maybe that
> > > > > combination doesn't work.
> > > > >
> > > > > If it requires something device-specific to change that link speed, I
> > > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > > something?
> > > > >
> > > > > Per the PCIe spec (r4.0, sec 1.2):
> > > > >
> > > > >   Initialization – During hardware initialization, each PCI Express
> > > > >   Link is set up following a negotiation of Lane widths and frequency
> > > > >   of operation by the two agents at each end of the Link. No firmware
> > > > >   or operating system software is involved.
> > > > >
> > > > > I have been assuming that this means device-specific link speed
> > > > > management is out of spec, but it seems pretty common that devices
> > > > > don't come up by themselves at the fastest possible link speed.  So
> > > > > maybe the spec just intends that devices can operate at *some* valid
> > > > > speed.
> > > >
> > > > I would expect that devices kind of have to figure out what they can
> > > > operate on and the operating system kind of just checks what the
> > > > current state is and doesn't try to "restore" the old state or
> > > > something?
> > >
> > > The devices at each end of the link negotiate the width and speed of
> > > the link.  This is done directly by the hardware without any help from
> > > the OS.
> > >
> > > The OS can read the current link state (Current Link Speed and
> > > Negotiated Link Width, both in the Link Status register).  The OS has
> > > very little control over that state.  It can't directly restore the
> > > state because the hardware has to negotiate a width & speed that
> > > result in reliable operation.
> > >
> > > > We don't do anything in the driver after the device was suspended. And
> > > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > > essentially. I have no idea how much of this is part of the actual pci
> > > > standard and how much is driver specific. But the driver also wants to
> > > > have some control over the link speed as it's tight to performance
> > > > states on GPU.
> > >
> > > As far as I'm aware, there is no generic PCIe way for the OS to
> > > influence the link width or speed.  If the GPU driver needs to do
> > > that, it would be via some device-specific mechanism.
> > >
> > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > > the communication with the controller is broken. But if we set it to
> > > > the boot speed, resuming the GPU just works. So my assumption was,
> > > > that _something_ (might it be the controller or the pci subsystem)
> > > > tries to force to operate on an invalid link speed and because the
> > > > bridge controller is actually powered down as well (as all children
> > > > are in D3cold) I could imagine that something in the pci subsystem
> > > > actually restores the state which lets the controller fail to
> > > > establish communication again?
> > >
> > >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> > >      without OS/driver intervention.
> > >
> > >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> > >      requires driver intervention or at least some ACPI method.
> > >
> >
> > there is no driver intervention and Nouveau doesn't care at all. It's
> > all done on the GPU. We just upload a script and some firmware on to
> > the GPU. The script runs then on the PMU inside the GPU and this
> > script also causes changing the PCIe link settings. But from a Nouveau
> > point of view we don't care about the link before or after that script
> > was invoked. Also there is no ACPI method involved.
> >
> > But if there is something we should notify pci core about, maybe
> > that's something we have to do then?
> >
> > >   3) Suspend puts GPU into D3cold (powered off).
> > >
> > >   4) Resume restores GPU to D0, and the Port and GPU hardware again
> > >      negotiate 8.0 GT/s without OS/driver intervention, just like at
> > >      initial boot.
> > >
> >
> > No, that negotiation fails apparently as any attempt to read anything
> > from the device just fails inside pci core. Or something goes wrong
> > when resuming the bridge controller.
> >
> > >   5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
> > >      8.0 GT/s.
> > >
> >
> > what is actually meant by "driver" here? The pci subsystem or Nouveau?
> >
> > > Without knowing more about the transition to 2.5 GT/s, I can't guess
> > > why the GPU wouldn't work after resume.  From a PCIe point of view,
> > > the link is supposed to work and the device should be reachable
> > > independent of the link speed.  But maybe there's some weird
> > > dependency between the GPU and the driver here.
> > >
> >
> > but the device isn't reachable at all, not even from the pci
> > subsystem. All reads fail/return a default error value (0xffffffff).
> >
> > > It sounds like things work if you return to 8.0 GT/s before suspend,
> > > things work.  That would make sense to me because then the driver's
> > > idea of the link state after resume would match the actual state.
> > >
> >
> > depends on what is meant by the driver here. Inside Nouveau we don't
> > care one bit about the current link speed, so I assume you mean
> > something inside the pci core code?
> >
> > > But I don't see a way to deal with this in the PCI core.  The PCI core
> > > does save and restore most of the architected config space around
> > > suspend/resume, but since this appears to be a device-specific thing,
> > > the PCI core would have no idea how to save/restore it.
> > >
> >
> > if we assume that the negotiation on a device level works as intended,
> > then I would expect this to be a pci core issue, which might actually
> > be not fixable there. But if it's not, then we would have to put
> > something like that inside the runpm documentation to tell drivers
> > they have to do something about it.
> >
> > But again, for me it just sounds like the negotiation on the device
> > level fails or something inside pci core messes it up.
>
> Bjorn -- nouveau has a way of requesting that the GPU change PCIe
> settings. It sets the PCIe version to the max version (esp older GPUs
> tended to boot as PCIe 1.0, and had to be set to 2.0/3.0 "by hand"),
> and then the link speed is adjusted based on the perf level settings
> by writing to a PCI config-ish mmio space -- however on the GPUs that
> Karol is talking about, we can't do the perf level adjustments, so
> nouveau never touches the speed. (Does it touch the PCIe version? Not
> 100% sure ... Karol?)

I think we only do it if the GPU comes up as v1, but that was mainly a
tesla thing, saw it on Fermi a few times, but never on newer chips.
And we also only do it if the pci->func->pcie.version callback was set
(which we don't do on Pascal, and this is the gen where we have the
runpm issue).

> In this case, it sounds like it's firmware
> running on the GPU which is doing this (probably using the exact same
> mechanism nouveau would -- those internal engines also have access to
> the mmio space).
>
> Perhaps there's a way to capture PCI config space of both the GPU and
> its link partner, to see if there's anything obviously wrong? (But
> even if there is, doesn't sound like we have too much recourse...)
> From the sounds of it, the two link partners disagree on settings
> somehow and don't establish a proper link.
>
>   -ilia

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-05-21 13:28         ` Karol Herbst
  2019-05-21 13:50           ` [Nouveau] " Ilia Mirkin
@ 2019-05-21 14:13           ` Bjorn Helgaas
  2019-05-21 14:30             ` Karol Herbst
  1 sibling, 1 reply; 23+ messages in thread
From: Bjorn Helgaas @ 2019-05-21 14:13 UTC (permalink / raw)
  To: Karol Herbst; +Cc: nouveau, Lyude Paul, linux-pci

On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote:
> On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > Apperantly things go south if we suspend the device with a different PCIE
> > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > > > >
> > > > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > > > that outside of drivers?
> > > >
> > > > I agree it would be nice to fix this in the PCI core if that's
> > > > feasible.
> > > >
> > > > It looks like this driver changes the PCIe link speed using some
> > > > device-specific mechanism.  When we suspend, we put the device in
> > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > comes up at the boot speed because nothing did that device-specific
> > > > magic to change it, so you probably end up with the link being slow
> > > > but the driver thinking it's configured to be fast, and maybe that
> > > > combination doesn't work.
> > > >
> > > > If it requires something device-specific to change that link speed, I
> > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > something?
> > > >
> > > > Per the PCIe spec (r4.0, sec 1.2):
> > > >
> > > >   Initialization – During hardware initialization, each PCI Express
> > > >   Link is set up following a negotiation of Lane widths and frequency
> > > >   of operation by the two agents at each end of the Link. No firmware
> > > >   or operating system software is involved.
> > > >
> > > > I have been assuming that this means device-specific link speed
> > > > management is out of spec, but it seems pretty common that devices
> > > > don't come up by themselves at the fastest possible link speed.  So
> > > > maybe the spec just intends that devices can operate at *some* valid
> > > > speed.
> > >
> > > I would expect that devices kind of have to figure out what they can
> > > operate on and the operating system kind of just checks what the
> > > current state is and doesn't try to "restore" the old state or
> > > something?
> >
> > The devices at each end of the link negotiate the width and speed of
> > the link.  This is done directly by the hardware without any help from
> > the OS.
> >
> > The OS can read the current link state (Current Link Speed and
> > Negotiated Link Width, both in the Link Status register).  The OS has
> > very little control over that state.  It can't directly restore the
> > state because the hardware has to negotiate a width & speed that
> > result in reliable operation.
> >
> > > We don't do anything in the driver after the device was suspended. And
> > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > essentially. I have no idea how much of this is part of the actual pci
> > > standard and how much is driver specific. But the driver also wants to
> > > have some control over the link speed as it's tight to performance
> > > states on GPU.
> >
> > As far as I'm aware, there is no generic PCIe way for the OS to
> > influence the link width or speed.  If the GPU driver needs to do
> > that, it would be via some device-specific mechanism.
> >
> > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > the communication with the controller is broken. But if we set it to
> > > the boot speed, resuming the GPU just works. So my assumption was,
> > > that _something_ (might it be the controller or the pci subsystem)
> > > tries to force to operate on an invalid link speed and because the
> > > bridge controller is actually powered down as well (as all children
> > > are in D3cold) I could imagine that something in the pci subsystem
> > > actually restores the state which lets the controller fail to
> > > establish communication again?
> >
> >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> >      without OS/driver intervention.
> >
> >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> >      requires driver intervention or at least some ACPI method.
> 
> there is no driver intervention and Nouveau doesn't care at all. It's
> all done on the GPU. We just upload a script and some firmware on to
> the GPU. The script runs then on the PMU inside the GPU and this
> script also causes changing the PCIe link settings. But from a Nouveau
> point of view we don't care about the link before or after that script
> was invoked. Also there is no ACPI method involved.
> 
> But if there is something we should notify pci core about, maybe
> that's something we have to do then?

I don't think there's anything the PCI core could do with that
information anyway.  The PCI core doesn't care at all about the link
speed, and it really can't influence it directly.

> >   3) Suspend puts GPU into D3cold (powered off).
> >
> >   4) Resume restores GPU to D0, and the Port and GPU hardware again
> >      negotiate 8.0 GT/s without OS/driver intervention, just like at
> >      initial boot.
> 
> No, that negotiation fails apparently as any attempt to read anything
> from the device just fails inside pci core. Or something goes wrong
> when resuming the bridge controller.

I'm surprised the negotiation would fail even after a power cycle of
the device.  But if you can avoid the issue by running another script
on the PMU before suspend, that's probably what you'll have to do.

> >   5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
> >      8.0 GT/s.
> 
> what is actually meant by "driver" here? The pci subsystem or Nouveau?

I was thinking Nouveau because the PCI core doesn't care about the
link speed.

> > Without knowing more about the transition to 2.5 GT/s, I can't guess
> > why the GPU wouldn't work after resume.  From a PCIe point of view,
> > the link is supposed to work and the device should be reachable
> > independent of the link speed.  But maybe there's some weird
> > dependency between the GPU and the driver here.
> 
> but the device isn't reachable at all, not even from the pci
> subsystem. All reads fail/return a default error value (0xffffffff).

Are these PCI config reads that return 0xffffffff?  Or MMIO reads?
"lspci -vvxxxx" of the bridge and the GPU might have a clue about
whether a PCI error occurred.

> > It sounds like things work if you return to 8.0 GT/s before suspend,
> > things work.  That would make sense to me because then the driver's
> > idea of the link state after resume would match the actual state.
> 
> depends on what is meant by the driver here. Inside Nouveau we don't
> care one bit about the current link speed, so I assume you mean
> something inside the pci core code?
> 
> > But I don't see a way to deal with this in the PCI core.  The PCI core
> > does save and restore most of the architected config space around
> > suspend/resume, but since this appears to be a device-specific thing,
> > the PCI core would have no idea how to save/restore it.
> 
> if we assume that the negotiation on a device level works as intended,
> then I would expect this to be a pci core issue, which might actually
> be not fixable there. But if it's not, then we would have to put
> something like that inside the runpm documentation to tell drivers
> they have to do something about it.
> 
> But again, for me it just sounds like the negotiation on the device
> level fails or something inside pci core messes it up.

To me it sounds like the PMU script messed something up, and the PCI
core has no way to know what that was or how to fix it.

> > > > > Signed-off-by: Karol Herbst <kherbst@redhat.com>
> > > > > Reviewed-by: Lyude Paul <lyude@redhat.com>
> > > > > ---
> > > > >  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
> > > > >  drm/nouveau/nvkm/subdev/pci/base.c    |  9 +++++++--
> > > > >  drm/nouveau/nvkm/subdev/pci/pcie.c    | 24 ++++++++++++++++++++----
> > > > >  drm/nouveau/nvkm/subdev/pci/priv.h    |  2 ++
> > > > >  4 files changed, 32 insertions(+), 8 deletions(-)
> > > > >
> > > > > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > index 1fdf3098..b23793a2 100644
> > > > > --- a/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > @@ -26,8 +26,9 @@ struct nvkm_pci {
> > > > >       } agp;
> > > > >
> > > > >       struct {
> > > > > -             enum nvkm_pcie_speed speed;
> > > > > -             u8 width;
> > > > > +             enum nvkm_pcie_speed cur_speed;
> > > > > +             enum nvkm_pcie_speed def_speed;
> > > > > +             u8 cur_width;
> > > > >       } pcie;
> > > > >
> > > > >       bool msi;
> > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > index ee2431a7..d9fb5a83 100644
> > > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
> > > > >
> > > > >       if (pci->agp.bridge)
> > > > >               nvkm_agp_fini(pci);
> > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > +             nvkm_pcie_fini(pci);
> > > > >
> > > > >       return 0;
> > > > >  }
> > > > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
> > > > >       struct nvkm_pci *pci = nvkm_pci(subdev);
> > > > >       if (pci->agp.bridge)
> > > > >               nvkm_agp_preinit(pci);
> > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > +             nvkm_pcie_preinit(pci);
> > > > >       return 0;
> > > > >  }
> > > > >
> > > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
> > > > >       pci->func = func;
> > > > >       pci->pdev = device->func->pci(device)->pdev;
> > > > >       pci->irq = -1;
> > > > > -     pci->pcie.speed = -1;
> > > > > -     pci->pcie.width = -1;
> > > > > +     pci->pcie.cur_speed = -1;
> > > > > +     pci->pcie.def_speed = -1;
> > > > > +     pci->pcie.cur_width = -1;
> > > > >
> > > > >       if (device->type == NVKM_DEVICE_AGP)
> > > > >               nvkm_agp_ctor(pci);
> > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > index 70ccbe0d..731dd30e 100644
> > > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
> > > > >       return 0;
> > > > >  }
> > > > >
> > > > > +int
> > > > > +nvkm_pcie_preinit(struct nvkm_pci *pci)
> > > > > +{
> > > > > +     pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
> > > > > +     return 0;
> > > > > +}
> > > > > +
> > > > >  int
> > > > >  nvkm_pcie_init(struct nvkm_pci *pci)
> > > > >  {
> > > > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
> > > > >       if (pci->func->pcie.init)
> > > > >               pci->func->pcie.init(pci);
> > > > >
> > > > > -     if (pci->pcie.speed != -1)
> > > > > -             nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> > > > > +     if (pci->pcie.cur_speed != -1)
> > > > > +             nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> > > > > +                                pci->pcie.cur_width);
> > > > >
> > > > >       return 0;
> > > > >  }
> > > > >
> > > > > +int
> > > > > +nvkm_pcie_fini(struct nvkm_pci *pci)
> > > > > +{
> > > > > +     if (!IS_ERR_VALUE(pci->pcie.def_speed))
> > > > > +             return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> > > > > +     return 0;
> > > > > +}
> > > > > +
> > > > >  int
> > > > >  nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > >  {
> > > > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > >               speed = max_speed;
> > > > >       }
> > > > >
> > > > > -     pci->pcie.speed = speed;
> > > > > -     pci->pcie.width = width;
> > > > > +     pci->pcie.cur_speed = speed;
> > > > > +     pci->pcie.cur_width = width;
> > > > >
> > > > >       if (speed == cur_speed) {
> > > > >               nvkm_debug(subdev, "requested matches current speed\n");
> > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > index a0d4c007..e7744671 100644
> > > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
> > > > >  int gk104_pcie_version_supported(struct nvkm_pci *);
> > > > >
> > > > >  int nvkm_pcie_oneinit(struct nvkm_pci *);
> > > > > +int nvkm_pcie_preinit(struct nvkm_pci *);
> > > > >  int nvkm_pcie_init(struct nvkm_pci *);
> > > > > +int nvkm_pcie_fini(struct nvkm_pci *);
> > > > >  #endif
> > > > > --
> > > > > 2.21.0
> > > > >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-05-21 14:13           ` Bjorn Helgaas
@ 2019-05-21 14:30             ` Karol Herbst
  2019-05-21 17:35               ` Karol Herbst
  0 siblings, 1 reply; 23+ messages in thread
From: Karol Herbst @ 2019-05-21 14:30 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: nouveau, Lyude Paul, linux-pci

On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote:
> > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > > Apperantly things go south if we suspend the device with a different PCIE
> > > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > > > > >
> > > > > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > > > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > > > > that outside of drivers?
> > > > >
> > > > > I agree it would be nice to fix this in the PCI core if that's
> > > > > feasible.
> > > > >
> > > > > It looks like this driver changes the PCIe link speed using some
> > > > > device-specific mechanism.  When we suspend, we put the device in
> > > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > > comes up at the boot speed because nothing did that device-specific
> > > > > magic to change it, so you probably end up with the link being slow
> > > > > but the driver thinking it's configured to be fast, and maybe that
> > > > > combination doesn't work.
> > > > >
> > > > > If it requires something device-specific to change that link speed, I
> > > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > > something?
> > > > >
> > > > > Per the PCIe spec (r4.0, sec 1.2):
> > > > >
> > > > >   Initialization – During hardware initialization, each PCI Express
> > > > >   Link is set up following a negotiation of Lane widths and frequency
> > > > >   of operation by the two agents at each end of the Link. No firmware
> > > > >   or operating system software is involved.
> > > > >
> > > > > I have been assuming that this means device-specific link speed
> > > > > management is out of spec, but it seems pretty common that devices
> > > > > don't come up by themselves at the fastest possible link speed.  So
> > > > > maybe the spec just intends that devices can operate at *some* valid
> > > > > speed.
> > > >
> > > > I would expect that devices kind of have to figure out what they can
> > > > operate on and the operating system kind of just checks what the
> > > > current state is and doesn't try to "restore" the old state or
> > > > something?
> > >
> > > The devices at each end of the link negotiate the width and speed of
> > > the link.  This is done directly by the hardware without any help from
> > > the OS.
> > >
> > > The OS can read the current link state (Current Link Speed and
> > > Negotiated Link Width, both in the Link Status register).  The OS has
> > > very little control over that state.  It can't directly restore the
> > > state because the hardware has to negotiate a width & speed that
> > > result in reliable operation.
> > >
> > > > We don't do anything in the driver after the device was suspended. And
> > > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > > essentially. I have no idea how much of this is part of the actual pci
> > > > standard and how much is driver specific. But the driver also wants to
> > > > have some control over the link speed as it's tight to performance
> > > > states on GPU.
> > >
> > > As far as I'm aware, there is no generic PCIe way for the OS to
> > > influence the link width or speed.  If the GPU driver needs to do
> > > that, it would be via some device-specific mechanism.
> > >
> > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > > the communication with the controller is broken. But if we set it to
> > > > the boot speed, resuming the GPU just works. So my assumption was,
> > > > that _something_ (might it be the controller or the pci subsystem)
> > > > tries to force to operate on an invalid link speed and because the
> > > > bridge controller is actually powered down as well (as all children
> > > > are in D3cold) I could imagine that something in the pci subsystem
> > > > actually restores the state which lets the controller fail to
> > > > establish communication again?
> > >
> > >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> > >      without OS/driver intervention.
> > >
> > >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> > >      requires driver intervention or at least some ACPI method.
> >
> > there is no driver intervention and Nouveau doesn't care at all. It's
> > all done on the GPU. We just upload a script and some firmware on to
> > the GPU. The script runs then on the PMU inside the GPU and this
> > script also causes changing the PCIe link settings. But from a Nouveau
> > point of view we don't care about the link before or after that script
> > was invoked. Also there is no ACPI method involved.
> >
> > But if there is something we should notify pci core about, maybe
> > that's something we have to do then?
>
> I don't think there's anything the PCI core could do with that
> information anyway.  The PCI core doesn't care at all about the link
> speed, and it really can't influence it directly.
>
> > >   3) Suspend puts GPU into D3cold (powered off).
> > >
> > >   4) Resume restores GPU to D0, and the Port and GPU hardware again
> > >      negotiate 8.0 GT/s without OS/driver intervention, just like at
> > >      initial boot.
> >
> > No, that negotiation fails apparently as any attempt to read anything
> > from the device just fails inside pci core. Or something goes wrong
> > when resuming the bridge controller.
>
> I'm surprised the negotiation would fail even after a power cycle of
> the device.  But if you can avoid the issue by running another script
> on the PMU before suspend, that's probably what you'll have to do.
>

there is none as far as we know. Or at least nothing inside the vbios.
We still have to get signed PMU firmware images from Nvidia for full
support, but this still would be a hacky issue as we would depend on
those then (and without having those in  redistributable form, there
isn't much we can do about except fixing it on the kernel side).

> > >   5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
> > >      8.0 GT/s.
> >
> > what is actually meant by "driver" here? The pci subsystem or Nouveau?
>
> I was thinking Nouveau because the PCI core doesn't care about the
> link speed.
>
> > > Without knowing more about the transition to 2.5 GT/s, I can't guess
> > > why the GPU wouldn't work after resume.  From a PCIe point of view,
> > > the link is supposed to work and the device should be reachable
> > > independent of the link speed.  But maybe there's some weird
> > > dependency between the GPU and the driver here.
> >
> > but the device isn't reachable at all, not even from the pci
> > subsystem. All reads fail/return a default error value (0xffffffff).
>
> Are these PCI config reads that return 0xffffffff?  Or MMIO reads?
> "lspci -vvxxxx" of the bridge and the GPU might have a clue about
> whether a PCI error occurred.
>

that's kind of problematic as it might just lock up my machine... but
let me try that.

> > > It sounds like things work if you return to 8.0 GT/s before suspend,
> > > things work.  That would make sense to me because then the driver's
> > > idea of the link state after resume would match the actual state.
> >
> > depends on what is meant by the driver here. Inside Nouveau we don't
> > care one bit about the current link speed, so I assume you mean
> > something inside the pci core code?
> >
> > > But I don't see a way to deal with this in the PCI core.  The PCI core
> > > does save and restore most of the architected config space around
> > > suspend/resume, but since this appears to be a device-specific thing,
> > > the PCI core would have no idea how to save/restore it.
> >
> > if we assume that the negotiation on a device level works as intended,
> > then I would expect this to be a pci core issue, which might actually
> > be not fixable there. But if it's not, then we would have to put
> > something like that inside the runpm documentation to tell drivers
> > they have to do something about it.
> >lspci -vvxxxx
> > But again, for me it just sounds like the negotiation on the device
> > level fails or something inside pci core messes it up.
>
> To me it sounds like the PMU script messed something up, and the PCI
> core has no way to know what that was or how to fix it.
>

sure, I am mainly wondering why it doesn't work after we power cycled
the GPU and the host bridge controller, because no matter what the
state was before, we have to reprobe instead of relying on a known
state, no?

> > > > > > Signed-off-by: Karol Herbst <kherbst@redhat.com>
> > > > > > Reviewed-by: Lyude Paul <lyude@redhat.com>
> > > > > > ---
> > > > > >  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
> > > > > >  drm/nouveau/nvkm/subdev/pci/base.c    |  9 +++++++--
> > > > > >  drm/nouveau/nvkm/subdev/pci/pcie.c    | 24 ++++++++++++++++++++----
> > > > > >  drm/nouveau/nvkm/subdev/pci/priv.h    |  2 ++
> > > > > >  4 files changed, 32 insertions(+), 8 deletions(-)
> > > > > >
> > > > > > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > index 1fdf3098..b23793a2 100644
> > > > > > --- a/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > @@ -26,8 +26,9 @@ struct nvkm_pci {
> > > > > >       } agp;
> > > > > >
> > > > > >       struct {
> > > > > > -             enum nvkm_pcie_speed speed;
> > > > > > -             u8 width;
> > > > > > +             enum nvkm_pcie_speed cur_speed;
> > > > > > +             enum nvkm_pcie_speed def_speed;
> > > > > > +             u8 cur_width;
> > > > > >       } pcie;
> > > > > >
> > > > > >       bool msi;
> > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > index ee2431a7..d9fb5a83 100644
> > > > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
> > > > > >
> > > > > >       if (pci->agp.bridge)
> > > > > >               nvkm_agp_fini(pci);
> > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > +             nvkm_pcie_fini(pci);
> > > > > >
> > > > > >       return 0;
> > > > > >  }
> > > > > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
> > > > > >       struct nvkm_pci *pci = nvkm_pci(subdev);
> > > > > >       if (pci->agp.bridge)
> > > > > >               nvkm_agp_preinit(pci);
> > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > +             nvkm_pcie_preinit(pci);
> > > > > >       return 0;
> > > > > >  }
> > > > > >
> > > > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
> > > > > >       pci->func = func;
> > > > > >       pci->pdev = device->func->pci(device)->pdev;
> > > > > >       pci->irq = -1;
> > > > > > -     pci->pcie.speed = -1;
> > > > > > -     pci->pcie.width = -1;
> > > > > > +     pci->pcie.cur_speed = -1;
> > > > > > +     pci->pcie.def_speed = -1;
> > > > > > +     pci->pcie.cur_width = -1;
> > > > > >
> > > > > >       if (device->type == NVKM_DEVICE_AGP)
> > > > > >               nvkm_agp_ctor(pci);
> > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > index 70ccbe0d..731dd30e 100644
> > > > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
> > > > > >       return 0;
> > > > > >  }
> > > > > >
> > > > > > +int
> > > > > > +nvkm_pcie_preinit(struct nvkm_pci *pci)
> > > > > > +{
> > > > > > +     pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
> > > > > > +     return 0;
> > > > > > +}
> > > > > > +
> > > > > >  int
> > > > > >  nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > >  {
> > > > > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > >       if (pci->func->pcie.init)
> > > > > >               pci->func->pcie.init(pci);
> > > > > >
> > > > > > -     if (pci->pcie.speed != -1)
> > > > > > -             nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> > > > > > +     if (pci->pcie.cur_speed != -1)
> > > > > > +             nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> > > > > > +                                pci->pcie.cur_width);
> > > > > >
> > > > > >       return 0;
> > > > > >  }
> > > > > >
> > > > > > +int
> > > > > > +nvkm_pcie_fini(struct nvkm_pci *pci)
> > > > > > +{
> > > > > > +     if (!IS_ERR_VALUE(pci->pcie.def_speed))
> > > > > > +             return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> > > > > > +     return 0;
> > > > > > +}
> > > > > > +
> > > > > >  int
> > > > > >  nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > >  {
> > > > > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > >               speed = max_speed;
> > > > > >       }
> > > > > >
> > > > > > -     pci->pcie.speed = speed;
> > > > > > -     pci->pcie.width = width;
> > > > > > +     pci->pcie.cur_speed = speed;
> > > > > > +     pci->pcie.cur_width = width;
> > > > > >
> > > > > >       if (speed == cur_speed) {
> > > > > >               nvkm_debug(subdev, "requested matches current speed\n");
> > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > index a0d4c007..e7744671 100644
> > > > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
> > > > > >  int gk104_pcie_version_supported(struct nvkm_pci *);
> > > > > >
> > > > > >  int nvkm_pcie_oneinit(struct nvkm_pci *);
> > > > > > +int nvkm_pcie_preinit(struct nvkm_pci *);
> > > > > >  int nvkm_pcie_init(struct nvkm_pci *);
> > > > > > +int nvkm_pcie_fini(struct nvkm_pci *);
> > > > > >  #endif
> > > > > > --
> > > > > > 2.21.0
> > > > > >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-05-21 14:30             ` Karol Herbst
@ 2019-05-21 17:35               ` Karol Herbst
  2019-05-21 17:48                 ` Karol Herbst
  0 siblings, 1 reply; 23+ messages in thread
From: Karol Herbst @ 2019-05-21 17:35 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: nouveau, Lyude Paul, Linux PCI

was able to get the lspci prints via ssh. Machine rebooted
automatically each time though.

relevant dmesg:
kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff

(last one is a 64 bit mmio read to get the on GPU timer value)

# lspci -vvxxx -s 0:01.00
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
decode])
       Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
       Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
       Latency: 0
       Interrupt: pin A routed to IRQ 16
       Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
       I/O behind bridge: 0000e000-0000efff [size=4K]
       Memory behind bridge: ec000000-ed0fffff [size=17M]
       Prefetchable memory behind bridge:
00000000c0000000-00000000d1ffffff [size=288M]
       Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
       BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
               PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
       Capabilities: [88] Subsystem: Dell Device 07be
       Capabilities: [80] Power Management version 3
               Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
               Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
       Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
               Address: 00000000  Data: 0000
       Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
               DevCap: MaxPayload 256 bytes, PhantFunc 0
                       ExtTag- RBE+
               DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                       RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                       MaxPayload 256 bytes, MaxReadReq 128 bytes
               DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
AuxPwr- TransPend-
               LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
Exit Latency L0s <256ns, L1 <8us
                       ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
               LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
               LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
                       TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
               SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug- Surprise-
                       Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
               SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt-
HPIrq- LinkChg-
                       Control: AttnInd Unknown, PwrInd Unknown,
Power- Interlock-
               SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
PresDet- Interlock-
                       Changed: MRL- PresDet+ LinkState-
               RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
PMEIntEna- CRSVisible-
               RootCap: CRSVisible-
               RootSta: PME ReqID 0000, PMEStatus- PMEPending-
               DevCap2: Completion Timeout: Not Supported,
TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
                        AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
               DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-,
LTR+, OBFF Via WAKE# ARIFwd-
                        AtomicOpsCtl: ReqEn- EgressBlck-
               LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                        Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                        Compliance De-emphasis: -6dB
               LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
                        EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
       Capabilities: [100 v1] Virtual Channel
               Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
               Arb:    Fixed- WRR32- WRR64- WRR128-
               Ctrl:   ArbSelect=Fixed
               Status: InProgress-
               VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                       Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                       Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                       Status: NegoPending+ InProgress-
       Capabilities: [140 v1] Root Complex Link
               Desc:   PortNumber=02 ComponentID=01 EltType=Config
               Link0:  Desc:   TargetPort=00 TargetComponent=01
AssocRCRB- LinkType=MemMapped LinkValid+
                       Addr:   00000000fed19000
       Capabilities: [d94 v1] Secondary PCI Express <?>
       Kernel driver in use: pcieport
00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
b0: 40 00 01 d1 80 25 0c 00 00 00 08 00 00 00 00 00
c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
d0: 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 40 01 00 4e 01 01 22 00 00 00 00 e0 00 10 00

lspci -vvxxx -s 1:00.00
01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050
Mobile] (rev ff) (prog-if ff)
       !!! Unknown header type 7f
       Kernel driver in use: nouveau
       Kernel modules: nouveau
00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

On Tue, May 21, 2019 at 4:30 PM Karol Herbst <kherbst@redhat.com> wrote:
>
> On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote:
> > > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > > > Apperantly things go south if we suspend the device with a different PCIE
> > > > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > > > > > >
> > > > > > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > > > > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > > > > > that outside of drivers?
> > > > > >
> > > > > > I agree it would be nice to fix this in the PCI core if that's
> > > > > > feasible.
> > > > > >
> > > > > > It looks like this driver changes the PCIe link speed using some
> > > > > > device-specific mechanism.  When we suspend, we put the device in
> > > > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > > > comes up at the boot speed because nothing did that device-specific
> > > > > > magic to change it, so you probably end up with the link being slow
> > > > > > but the driver thinking it's configured to be fast, and maybe that
> > > > > > combination doesn't work.
> > > > > >
> > > > > > If it requires something device-specific to change that link speed, I
> > > > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > > > something?
> > > > > >
> > > > > > Per the PCIe spec (r4.0, sec 1.2):
> > > > > >
> > > > > >   Initialization – During hardware initialization, each PCI Express
> > > > > >   Link is set up following a negotiation of Lane widths and frequency
> > > > > >   of operation by the two agents at each end of the Link. No firmware
> > > > > >   or operating system software is involved.
> > > > > >
> > > > > > I have been assuming that this means device-specific link speed
> > > > > > management is out of spec, but it seems pretty common that devices
> > > > > > don't come up by themselves at the fastest possible link speed.  So
> > > > > > maybe the spec just intends that devices can operate at *some* valid
> > > > > > speed.
> > > > >
> > > > > I would expect that devices kind of have to figure out what they can
> > > > > operate on and the operating system kind of just checks what the
> > > > > current state is and doesn't try to "restore" the old state or
> > > > > something?
> > > >
> > > > The devices at each end of the link negotiate the width and speed of
> > > > the link.  This is done directly by the hardware without any help from
> > > > the OS.
> > > >
> > > > The OS can read the current link state (Current Link Speed and
> > > > Negotiated Link Width, both in the Link Status register).  The OS has
> > > > very little control over that state.  It can't directly restore the
> > > > state because the hardware has to negotiate a width & speed that
> > > > result in reliable operation.
> > > >
> > > > > We don't do anything in the driver after the device was suspended. And
> > > > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > > > essentially. I have no idea how much of this is part of the actual pci
> > > > > standard and how much is driver specific. But the driver also wants to
> > > > > have some control over the link speed as it's tight to performance
> > > > > states on GPU.
> > > >
> > > > As far as I'm aware, there is no generic PCIe way for the OS to
> > > > influence the link width or speed.  If the GPU driver needs to do
> > > > that, it would be via some device-specific mechanism.
> > > >
> > > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > > > the communication with the controller is broken. But if we set it to
> > > > > the boot speed, resuming the GPU just works. So my assumption was,
> > > > > that _something_ (might it be the controller or the pci subsystem)
> > > > > tries to force to operate on an invalid link speed and because the
> > > > > bridge controller is actually powered down as well (as all children
> > > > > are in D3cold) I could imagine that something in the pci subsystem
> > > > > actually restores the state which lets the controller fail to
> > > > > establish communication again?
> > > >
> > > >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> > > >      without OS/driver intervention.
> > > >
> > > >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> > > >      requires driver intervention or at least some ACPI method.
> > >
> > > there is no driver intervention and Nouveau doesn't care at all. It's
> > > all done on the GPU. We just upload a script and some firmware on to
> > > the GPU. The script runs then on the PMU inside the GPU and this
> > > script also causes changing the PCIe link settings. But from a Nouveau
> > > point of view we don't care about the link before or after that script
> > > was invoked. Also there is no ACPI method involved.
> > >
> > > But if there is something we should notify pci core about, maybe
> > > that's something we have to do then?
> >
> > I don't think there's anything the PCI core could do with that
> > information anyway.  The PCI core doesn't care at all about the link
> > speed, and it really can't influence it directly.
> >
> > > >   3) Suspend puts GPU into D3cold (powered off).
> > > >
> > > >   4) Resume restores GPU to D0, and the Port and GPU hardware again
> > > >      negotiate 8.0 GT/s without OS/driver intervention, just like at
> > > >      initial boot.
> > >
> > > No, that negotiation fails apparently as any attempt to read anything
> > > from the device just fails inside pci core. Or something goes wrong
> > > when resuming the bridge controller.
> >
> > I'm surprised the negotiation would fail even after a power cycle of
> > the device.  But if you can avoid the issue by running another script
> > on the PMU before suspend, that's probably what you'll have to do.
> >
>
> there is none as far as we know. Or at least nothing inside the vbios.
> We still have to get signed PMU firmware images from Nvidia for full
> support, but this still would be a hacky issue as we would depend on
> those then (and without having those in  redistributable form, there
> isn't much we can do about except fixing it on the kernel side).
>
> > > >   5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
> > > >      8.0 GT/s.
> > >
> > > what is actually meant by "driver" here? The pci subsystem or Nouveau?
> >
> > I was thinking Nouveau because the PCI core doesn't care about the
> > link speed.
> >
> > > > Without knowing more about the transition to 2.5 GT/s, I can't guess
> > > > why the GPU wouldn't work after resume.  From a PCIe point of view,
> > > > the link is supposed to work and the device should be reachable
> > > > independent of the link speed.  But maybe there's some weird
> > > > dependency between the GPU and the driver here.
> > >
> > > but the device isn't reachable at all, not even from the pci
> > > subsystem. All reads fail/return a default error value (0xffffffff).
> >
> > Are these PCI config reads that return 0xffffffff?  Or MMIO reads?
> > "lspci -vvxxxx" of the bridge and the GPU might have a clue about
> > whether a PCI error occurred.
> >
>
> that's kind of problematic as it might just lock up my machine... but
> let me try that.
>
> > > > It sounds like things work if you return to 8.0 GT/s before suspend,
> > > > things work.  That would make sense to me because then the driver's
> > > > idea of the link state after resume would match the actual state.
> > >
> > > depends on what is meant by the driver here. Inside Nouveau we don't
> > > care one bit about the current link speed, so I assume you mean
> > > something inside the pci core code?
> > >
> > > > But I don't see a way to deal with this in the PCI core.  The PCI core
> > > > does save and restore most of the architected config space around
> > > > suspend/resume, but since this appears to be a device-specific thing,
> > > > the PCI core would have no idea how to save/restore it.
> > >
> > > if we assume that the negotiation on a device level works as intended,
> > > then I would expect this to be a pci core issue, which might actually
> > > be not fixable there. But if it's not, then we would have to put
> > > something like that inside the runpm documentation to tell drivers
> > > they have to do something about it.
> > >lspci -vvxxxx
> > > But again, for me it just sounds like the negotiation on the device
> > > level fails or something inside pci core messes it up.
> >
> > To me it sounds like the PMU script messed something up, and the PCI
> > core has no way to know what that was or how to fix it.
> >
>
> sure, I am mainly wondering why it doesn't work after we power cycled
> the GPU and the host bridge controller, because no matter what the
> state was before, we have to reprobe instead of relying on a known
> state, no?
>
> > > > > > > Signed-off-by: Karol Herbst <kherbst@redhat.com>
> > > > > > > Reviewed-by: Lyude Paul <lyude@redhat.com>
> > > > > > > ---
> > > > > > >  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
> > > > > > >  drm/nouveau/nvkm/subdev/pci/base.c    |  9 +++++++--
> > > > > > >  drm/nouveau/nvkm/subdev/pci/pcie.c    | 24 ++++++++++++++++++++----
> > > > > > >  drm/nouveau/nvkm/subdev/pci/priv.h    |  2 ++
> > > > > > >  4 files changed, 32 insertions(+), 8 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > index 1fdf3098..b23793a2 100644
> > > > > > > --- a/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > @@ -26,8 +26,9 @@ struct nvkm_pci {
> > > > > > >       } agp;
> > > > > > >
> > > > > > >       struct {
> > > > > > > -             enum nvkm_pcie_speed speed;
> > > > > > > -             u8 width;
> > > > > > > +             enum nvkm_pcie_speed cur_speed;
> > > > > > > +             enum nvkm_pcie_speed def_speed;
> > > > > > > +             u8 cur_width;
> > > > > > >       } pcie;
> > > > > > >
> > > > > > >       bool msi;
> > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > index ee2431a7..d9fb5a83 100644
> > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
> > > > > > >
> > > > > > >       if (pci->agp.bridge)
> > > > > > >               nvkm_agp_fini(pci);
> > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > +             nvkm_pcie_fini(pci);
> > > > > > >
> > > > > > >       return 0;
> > > > > > >  }
> > > > > > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
> > > > > > >       struct nvkm_pci *pci = nvkm_pci(subdev);
> > > > > > >       if (pci->agp.bridge)
> > > > > > >               nvkm_agp_preinit(pci);
> > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > +             nvkm_pcie_preinit(pci);
> > > > > > >       return 0;
> > > > > > >  }
> > > > > > >
> > > > > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
> > > > > > >       pci->func = func;
> > > > > > >       pci->pdev = device->func->pci(device)->pdev;
> > > > > > >       pci->irq = -1;
> > > > > > > -     pci->pcie.speed = -1;
> > > > > > > -     pci->pcie.width = -1;
> > > > > > > +     pci->pcie.cur_speed = -1;
> > > > > > > +     pci->pcie.def_speed = -1;
> > > > > > > +     pci->pcie.cur_width = -1;
> > > > > > >
> > > > > > >       if (device->type == NVKM_DEVICE_AGP)
> > > > > > >               nvkm_agp_ctor(pci);
> > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > index 70ccbe0d..731dd30e 100644
> > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
> > > > > > >       return 0;
> > > > > > >  }
> > > > > > >
> > > > > > > +int
> > > > > > > +nvkm_pcie_preinit(struct nvkm_pci *pci)
> > > > > > > +{
> > > > > > > +     pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
> > > > > > > +     return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > >  int
> > > > > > >  nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > >  {
> > > > > > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > >       if (pci->func->pcie.init)
> > > > > > >               pci->func->pcie.init(pci);
> > > > > > >
> > > > > > > -     if (pci->pcie.speed != -1)
> > > > > > > -             nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> > > > > > > +     if (pci->pcie.cur_speed != -1)
> > > > > > > +             nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> > > > > > > +                                pci->pcie.cur_width);
> > > > > > >
> > > > > > >       return 0;
> > > > > > >  }
> > > > > > >
> > > > > > > +int
> > > > > > > +nvkm_pcie_fini(struct nvkm_pci *pci)
> > > > > > > +{
> > > > > > > +     if (!IS_ERR_VALUE(pci->pcie.def_speed))
> > > > > > > +             return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> > > > > > > +     return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > >  int
> > > > > > >  nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > >  {
> > > > > > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > >               speed = max_speed;
> > > > > > >       }
> > > > > > >
> > > > > > > -     pci->pcie.speed = speed;
> > > > > > > -     pci->pcie.width = width;
> > > > > > > +     pci->pcie.cur_speed = speed;
> > > > > > > +     pci->pcie.cur_width = width;
> > > > > > >
> > > > > > >       if (speed == cur_speed) {
> > > > > > >               nvkm_debug(subdev, "requested matches current speed\n");
> > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > index a0d4c007..e7744671 100644
> > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
> > > > > > >  int gk104_pcie_version_supported(struct nvkm_pci *);
> > > > > > >
> > > > > > >  int nvkm_pcie_oneinit(struct nvkm_pci *);
> > > > > > > +int nvkm_pcie_preinit(struct nvkm_pci *);
> > > > > > >  int nvkm_pcie_init(struct nvkm_pci *);
> > > > > > > +int nvkm_pcie_fini(struct nvkm_pci *);
> > > > > > >  #endif
> > > > > > > --
> > > > > > > 2.21.0
> > > > > > >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-05-21 17:35               ` Karol Herbst
@ 2019-05-21 17:48                 ` Karol Herbst
  2019-06-03 13:18                   ` Karol Herbst
  0 siblings, 1 reply; 23+ messages in thread
From: Karol Herbst @ 2019-05-21 17:48 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: nouveau, Lyude Paul, Linux PCI

doing the same on the bridge controller with my workarounds applied:

please note some differences:
LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded)
SltSta: PresDet+ vs PresDet-
LnkSta2: Equalization stuff
Virtual channel: NegoPending- vs NegoPending+

both times I executed lspci while the GPU was still suspended.

00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 16
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 0000e000-0000efff [size=4K]
        Memory behind bridge: ec000000-ed0fffff [size=17M]
        Prefetchable memory behind bridge:
00000000c0000000-00000000d1ffffff [size=288M]
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [88] Subsystem: Dell Device 07be
        Capabilities: [80] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
                Address: 00000000  Data: 0000
        Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 256 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
AuxPwr- TransPend-
                LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
Exit Latency L0s <256ns, L1 <8us
                        ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x16 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
HotPlug- Surprise-
                        Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
CmdCplt- HPIrq- LinkChg-
                        Control: AttnInd Unknown, PwrInd Unknown,
Power- Interlock-
                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
PresDet+ Interlock-
                        Changed: MRL- PresDet+ LinkState-
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
PMEIntEna- CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Not Supported,
TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
                         AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
                DevCtl2: Completion Timeout: 50us to 50ms,
TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
                         AtomicOpsCtl: ReqEn- EgressBlck-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+,
LinkEqualizationRequest-
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [140 v1] Root Complex Link
                Desc:   PortNumber=02 ComponentID=01 EltType=Config
                Link0:  Desc:   TargetPort=00 TargetComponent=01
AssocRCRB- LinkType=MemMapped LinkValid+
                        Addr:   00000000fed19000
        Capabilities: [d94 v1] Secondary PCI Express <?>
        Kernel driver in use: pcieport
00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
b0: 40 00 03 d1 80 25 0c 00 00 00 48 00 00 00 00 00
c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
d0: 43 00 1e 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 84 4e 01 01 20 00 00 00 00 e0 00 10 00

On Tue, May 21, 2019 at 7:35 PM Karol Herbst <kherbst@redhat.com> wrote:
>
> was able to get the lspci prints via ssh. Machine rebooted
> automatically each time though.
>
> relevant dmesg:
> kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
>
> (last one is a 64 bit mmio read to get the on GPU timer value)
>
> # lspci -vvxxx -s 0:01.00
> 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> decode])
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx-
>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Latency: 0
>        Interrupt: pin A routed to IRQ 16
>        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>        I/O behind bridge: 0000e000-0000efff [size=4K]
>        Memory behind bridge: ec000000-ed0fffff [size=17M]
>        Prefetchable memory behind bridge:
> 00000000c0000000-00000000d1ffffff [size=288M]
>        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort+ <SERR- <PERR-
>        BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
>                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>        Capabilities: [88] Subsystem: Dell Device 07be
>        Capabilities: [80] Power Management version 3
>                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
>                Address: 00000000  Data: 0000
>        Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
>                DevCap: MaxPayload 256 bytes, PhantFunc 0
>                        ExtTag- RBE+
>                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                        MaxPayload 256 bytes, MaxReadReq 128 bytes
>                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> AuxPwr- TransPend-
>                LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> Exit Latency L0s <256ns, L1 <8us
>                        ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
>                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
>                        TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
>                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> HotPlug- Surprise-
>                        Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
>                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt-
> HPIrq- LinkChg-
>                        Control: AttnInd Unknown, PwrInd Unknown,
> Power- Interlock-
>                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> PresDet- Interlock-
>                        Changed: MRL- PresDet+ LinkState-
>                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> PMEIntEna- CRSVisible-
>                RootCap: CRSVisible-
>                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
>                DevCap2: Completion Timeout: Not Supported,
> TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
>                         AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
>                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-,
> LTR+, OBFF Via WAKE# ARIFwd-
>                         AtomicOpsCtl: ReqEn- EgressBlck-
>                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
>                         Transmit Margin: Normal Operating Range,
> EnterModifiedCompliance- ComplianceSOS-
>                         Compliance De-emphasis: -6dB
>                LnkSta2: Current De-emphasis Level: -6dB,
> EqualizationComplete-, EqualizationPhase1-
>                         EqualizationPhase2-, EqualizationPhase3-,
> LinkEqualizationRequest-
>        Capabilities: [100 v1] Virtual Channel
>                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
>                Arb:    Fixed- WRR32- WRR64- WRR128-
>                Ctrl:   ArbSelect=Fixed
>                Status: InProgress-
>                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
>                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
>                        Status: NegoPending+ InProgress-
>        Capabilities: [140 v1] Root Complex Link
>                Desc:   PortNumber=02 ComponentID=01 EltType=Config
>                Link0:  Desc:   TargetPort=00 TargetComponent=01
> AssocRCRB- LinkType=MemMapped LinkValid+
>                        Addr:   00000000fed19000
>        Capabilities: [d94 v1] Secondary PCI Express <?>
>        Kernel driver in use: pcieport
> 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> b0: 40 00 01 d1 80 25 0c 00 00 00 08 00 00 00 00 00
> c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> d0: 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> f0: 00 40 01 00 4e 01 01 22 00 00 00 00 e0 00 10 00
>
> lspci -vvxxx -s 1:00.00
> 01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050
> Mobile] (rev ff) (prog-if ff)
>        !!! Unknown header type 7f
>        Kernel driver in use: nouveau
>        Kernel modules: nouveau
> 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> 90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>
> On Tue, May 21, 2019 at 4:30 PM Karol Herbst <kherbst@redhat.com> wrote:
> >
> > On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote:
> > > > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > > > > Apperantly things go south if we suspend the device with a different PCIE
> > > > > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > > > > > > >
> > > > > > > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > > > > > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > > > > > > that outside of drivers?
> > > > > > >
> > > > > > > I agree it would be nice to fix this in the PCI core if that's
> > > > > > > feasible.
> > > > > > >
> > > > > > > It looks like this driver changes the PCIe link speed using some
> > > > > > > device-specific mechanism.  When we suspend, we put the device in
> > > > > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > > > > comes up at the boot speed because nothing did that device-specific
> > > > > > > magic to change it, so you probably end up with the link being slow
> > > > > > > but the driver thinking it's configured to be fast, and maybe that
> > > > > > > combination doesn't work.
> > > > > > >
> > > > > > > If it requires something device-specific to change that link speed, I
> > > > > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > > > > something?
> > > > > > >
> > > > > > > Per the PCIe spec (r4.0, sec 1.2):
> > > > > > >
> > > > > > >   Initialization – During hardware initialization, each PCI Express
> > > > > > >   Link is set up following a negotiation of Lane widths and frequency
> > > > > > >   of operation by the two agents at each end of the Link. No firmware
> > > > > > >   or operating system software is involved.
> > > > > > >
> > > > > > > I have been assuming that this means device-specific link speed
> > > > > > > management is out of spec, but it seems pretty common that devices
> > > > > > > don't come up by themselves at the fastest possible link speed.  So
> > > > > > > maybe the spec just intends that devices can operate at *some* valid
> > > > > > > speed.
> > > > > >
> > > > > > I would expect that devices kind of have to figure out what they can
> > > > > > operate on and the operating system kind of just checks what the
> > > > > > current state is and doesn't try to "restore" the old state or
> > > > > > something?
> > > > >
> > > > > The devices at each end of the link negotiate the width and speed of
> > > > > the link.  This is done directly by the hardware without any help from
> > > > > the OS.
> > > > >
> > > > > The OS can read the current link state (Current Link Speed and
> > > > > Negotiated Link Width, both in the Link Status register).  The OS has
> > > > > very little control over that state.  It can't directly restore the
> > > > > state because the hardware has to negotiate a width & speed that
> > > > > result in reliable operation.
> > > > >
> > > > > > We don't do anything in the driver after the device was suspended. And
> > > > > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > > > > essentially. I have no idea how much of this is part of the actual pci
> > > > > > standard and how much is driver specific. But the driver also wants to
> > > > > > have some control over the link speed as it's tight to performance
> > > > > > states on GPU.
> > > > >
> > > > > As far as I'm aware, there is no generic PCIe way for the OS to
> > > > > influence the link width or speed.  If the GPU driver needs to do
> > > > > that, it would be via some device-specific mechanism.
> > > > >
> > > > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > > > > the communication with the controller is broken. But if we set it to
> > > > > > the boot speed, resuming the GPU just works. So my assumption was,
> > > > > > that _something_ (might it be the controller or the pci subsystem)
> > > > > > tries to force to operate on an invalid link speed and because the
> > > > > > bridge controller is actually powered down as well (as all children
> > > > > > are in D3cold) I could imagine that something in the pci subsystem
> > > > > > actually restores the state which lets the controller fail to
> > > > > > establish communication again?
> > > > >
> > > > >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> > > > >      without OS/driver intervention.
> > > > >
> > > > >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> > > > >      requires driver intervention or at least some ACPI method.
> > > >
> > > > there is no driver intervention and Nouveau doesn't care at all. It's
> > > > all done on the GPU. We just upload a script and some firmware on to
> > > > the GPU. The script runs then on the PMU inside the GPU and this
> > > > script also causes changing the PCIe link settings. But from a Nouveau
> > > > point of view we don't care about the link before or after that script
> > > > was invoked. Also there is no ACPI method involved.
> > > >
> > > > But if there is something we should notify pci core about, maybe
> > > > that's something we have to do then?
> > >
> > > I don't think there's anything the PCI core could do with that
> > > information anyway.  The PCI core doesn't care at all about the link
> > > speed, and it really can't influence it directly.
> > >
> > > > >   3) Suspend puts GPU into D3cold (powered off).
> > > > >
> > > > >   4) Resume restores GPU to D0, and the Port and GPU hardware again
> > > > >      negotiate 8.0 GT/s without OS/driver intervention, just like at
> > > > >      initial boot.
> > > >
> > > > No, that negotiation fails apparently as any attempt to read anything
> > > > from the device just fails inside pci core. Or something goes wrong
> > > > when resuming the bridge controller.
> > >
> > > I'm surprised the negotiation would fail even after a power cycle of
> > > the device.  But if you can avoid the issue by running another script
> > > on the PMU before suspend, that's probably what you'll have to do.
> > >
> >
> > there is none as far as we know. Or at least nothing inside the vbios.
> > We still have to get signed PMU firmware images from Nvidia for full
> > support, but this still would be a hacky issue as we would depend on
> > those then (and without having those in  redistributable form, there
> > isn't much we can do about except fixing it on the kernel side).
> >
> > > > >   5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
> > > > >      8.0 GT/s.
> > > >
> > > > what is actually meant by "driver" here? The pci subsystem or Nouveau?
> > >
> > > I was thinking Nouveau because the PCI core doesn't care about the
> > > link speed.
> > >
> > > > > Without knowing more about the transition to 2.5 GT/s, I can't guess
> > > > > why the GPU wouldn't work after resume.  From a PCIe point of view,
> > > > > the link is supposed to work and the device should be reachable
> > > > > independent of the link speed.  But maybe there's some weird
> > > > > dependency between the GPU and the driver here.
> > > >
> > > > but the device isn't reachable at all, not even from the pci
> > > > subsystem. All reads fail/return a default error value (0xffffffff).
> > >
> > > Are these PCI config reads that return 0xffffffff?  Or MMIO reads?
> > > "lspci -vvxxxx" of the bridge and the GPU might have a clue about
> > > whether a PCI error occurred.
> > >
> >
> > that's kind of problematic as it might just lock up my machine... but
> > let me try that.
> >
> > > > > It sounds like things work if you return to 8.0 GT/s before suspend,
> > > > > things work.  That would make sense to me because then the driver's
> > > > > idea of the link state after resume would match the actual state.
> > > >
> > > > depends on what is meant by the driver here. Inside Nouveau we don't
> > > > care one bit about the current link speed, so I assume you mean
> > > > something inside the pci core code?
> > > >
> > > > > But I don't see a way to deal with this in the PCI core.  The PCI core
> > > > > does save and restore most of the architected config space around
> > > > > suspend/resume, but since this appears to be a device-specific thing,
> > > > > the PCI core would have no idea how to save/restore it.
> > > >
> > > > if we assume that the negotiation on a device level works as intended,
> > > > then I would expect this to be a pci core issue, which might actually
> > > > be not fixable there. But if it's not, then we would have to put
> > > > something like that inside the runpm documentation to tell drivers
> > > > they have to do something about it.
> > > >lspci -vvxxxx
> > > > But again, for me it just sounds like the negotiation on the device
> > > > level fails or something inside pci core messes it up.
> > >
> > > To me it sounds like the PMU script messed something up, and the PCI
> > > core has no way to know what that was or how to fix it.
> > >
> >
> > sure, I am mainly wondering why it doesn't work after we power cycled
> > the GPU and the host bridge controller, because no matter what the
> > state was before, we have to reprobe instead of relying on a known
> > state, no?
> >
> > > > > > > > Signed-off-by: Karol Herbst <kherbst@redhat.com>
> > > > > > > > Reviewed-by: Lyude Paul <lyude@redhat.com>
> > > > > > > > ---
> > > > > > > >  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
> > > > > > > >  drm/nouveau/nvkm/subdev/pci/base.c    |  9 +++++++--
> > > > > > > >  drm/nouveau/nvkm/subdev/pci/pcie.c    | 24 ++++++++++++++++++++----
> > > > > > > >  drm/nouveau/nvkm/subdev/pci/priv.h    |  2 ++
> > > > > > > >  4 files changed, 32 insertions(+), 8 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > index 1fdf3098..b23793a2 100644
> > > > > > > > --- a/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > @@ -26,8 +26,9 @@ struct nvkm_pci {
> > > > > > > >       } agp;
> > > > > > > >
> > > > > > > >       struct {
> > > > > > > > -             enum nvkm_pcie_speed speed;
> > > > > > > > -             u8 width;
> > > > > > > > +             enum nvkm_pcie_speed cur_speed;
> > > > > > > > +             enum nvkm_pcie_speed def_speed;
> > > > > > > > +             u8 cur_width;
> > > > > > > >       } pcie;
> > > > > > > >
> > > > > > > >       bool msi;
> > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > index ee2431a7..d9fb5a83 100644
> > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
> > > > > > > >
> > > > > > > >       if (pci->agp.bridge)
> > > > > > > >               nvkm_agp_fini(pci);
> > > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > > +             nvkm_pcie_fini(pci);
> > > > > > > >
> > > > > > > >       return 0;
> > > > > > > >  }
> > > > > > > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
> > > > > > > >       struct nvkm_pci *pci = nvkm_pci(subdev);
> > > > > > > >       if (pci->agp.bridge)
> > > > > > > >               nvkm_agp_preinit(pci);
> > > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > > +             nvkm_pcie_preinit(pci);
> > > > > > > >       return 0;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
> > > > > > > >       pci->func = func;
> > > > > > > >       pci->pdev = device->func->pci(device)->pdev;
> > > > > > > >       pci->irq = -1;
> > > > > > > > -     pci->pcie.speed = -1;
> > > > > > > > -     pci->pcie.width = -1;
> > > > > > > > +     pci->pcie.cur_speed = -1;
> > > > > > > > +     pci->pcie.def_speed = -1;
> > > > > > > > +     pci->pcie.cur_width = -1;
> > > > > > > >
> > > > > > > >       if (device->type == NVKM_DEVICE_AGP)
> > > > > > > >               nvkm_agp_ctor(pci);
> > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > index 70ccbe0d..731dd30e 100644
> > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
> > > > > > > >       return 0;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +int
> > > > > > > > +nvkm_pcie_preinit(struct nvkm_pci *pci)
> > > > > > > > +{
> > > > > > > > +     pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
> > > > > > > > +     return 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  int
> > > > > > > >  nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > >  {
> > > > > > > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > >       if (pci->func->pcie.init)
> > > > > > > >               pci->func->pcie.init(pci);
> > > > > > > >
> > > > > > > > -     if (pci->pcie.speed != -1)
> > > > > > > > -             nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> > > > > > > > +     if (pci->pcie.cur_speed != -1)
> > > > > > > > +             nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> > > > > > > > +                                pci->pcie.cur_width);
> > > > > > > >
> > > > > > > >       return 0;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +int
> > > > > > > > +nvkm_pcie_fini(struct nvkm_pci *pci)
> > > > > > > > +{
> > > > > > > > +     if (!IS_ERR_VALUE(pci->pcie.def_speed))
> > > > > > > > +             return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> > > > > > > > +     return 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  int
> > > > > > > >  nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > >  {
> > > > > > > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > >               speed = max_speed;
> > > > > > > >       }
> > > > > > > >
> > > > > > > > -     pci->pcie.speed = speed;
> > > > > > > > -     pci->pcie.width = width;
> > > > > > > > +     pci->pcie.cur_speed = speed;
> > > > > > > > +     pci->pcie.cur_width = width;
> > > > > > > >
> > > > > > > >       if (speed == cur_speed) {
> > > > > > > >               nvkm_debug(subdev, "requested matches current speed\n");
> > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > index a0d4c007..e7744671 100644
> > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
> > > > > > > >  int gk104_pcie_version_supported(struct nvkm_pci *);
> > > > > > > >
> > > > > > > >  int nvkm_pcie_oneinit(struct nvkm_pci *);
> > > > > > > > +int nvkm_pcie_preinit(struct nvkm_pci *);
> > > > > > > >  int nvkm_pcie_init(struct nvkm_pci *);
> > > > > > > > +int nvkm_pcie_fini(struct nvkm_pci *);
> > > > > > > >  #endif
> > > > > > > > --
> > > > > > > > 2.21.0
> > > > > > > >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-05-21 17:48                 ` Karol Herbst
@ 2019-06-03 13:18                   ` Karol Herbst
  2019-06-03 18:10                     ` Bjorn Helgaas
  0 siblings, 1 reply; 23+ messages in thread
From: Karol Herbst @ 2019-06-03 13:18 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: nouveau, Lyude Paul, Linux PCI

@bjorn: any further ideas? Otherwise I'd like to just go ahead and fix
this issue inside Nouveau and leave it there until we have a better
understanding or non Nouveau cases of this issue.

On Tue, May 21, 2019 at 7:48 PM Karol Herbst <kherbst@redhat.com> wrote:
>
> doing the same on the bridge controller with my workarounds applied:
>
> please note some differences:
> LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded)
> SltSta: PresDet+ vs PresDet-
> LnkSta2: Equalization stuff
> Virtual channel: NegoPending- vs NegoPending+
>
> both times I executed lspci while the GPU was still suspended.
>
> 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> decode])
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Interrupt: pin A routed to IRQ 16
>         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>         I/O behind bridge: 0000e000-0000efff [size=4K]
>         Memory behind bridge: ec000000-ed0fffff [size=17M]
>         Prefetchable memory behind bridge:
> 00000000c0000000-00000000d1ffffff [size=288M]
>         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort+ <SERR- <PERR-
>         BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
>                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>         Capabilities: [88] Subsystem: Dell Device 07be
>         Capabilities: [80] Power Management version 3
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
>                 Address: 00000000  Data: 0000
>         Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
>                 DevCap: MaxPayload 256 bytes, PhantFunc 0
>                         ExtTag- RBE+
>                 DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                         MaxPayload 256 bytes, MaxReadReq 128 bytes
>                 DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> AuxPwr- TransPend-
>                 LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> Exit Latency L0s <256ns, L1 <8us
>                         ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 8GT/s (ok), Width x16 (ok)
>                         TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
>                 SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> HotPlug- Surprise-
>                         Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
>                 SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
> CmdCplt- HPIrq- LinkChg-
>                         Control: AttnInd Unknown, PwrInd Unknown,
> Power- Interlock-
>                 SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> PresDet+ Interlock-
>                         Changed: MRL- PresDet+ LinkState-
>                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> PMEIntEna- CRSVisible-
>                 RootCap: CRSVisible-
>                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
>                 DevCap2: Completion Timeout: Not Supported,
> TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
>                          AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
>                 DevCtl2: Completion Timeout: 50us to 50ms,
> TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
>                          AtomicOpsCtl: ReqEn- EgressBlck-
>                 LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
>                          Transmit Margin: Normal Operating Range,
> EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -6dB,
> EqualizationComplete+, EqualizationPhase1+
>                          EqualizationPhase2+, EqualizationPhase3+,
> LinkEqualizationRequest-
>         Capabilities: [100 v1] Virtual Channel
>                 Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
>                 Arb:    Fixed- WRR32- WRR64- WRR128-
>                 Ctrl:   ArbSelect=Fixed
>                 Status: InProgress-
>                 VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>                         Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
>                         Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
>                         Status: NegoPending- InProgress-
>         Capabilities: [140 v1] Root Complex Link
>                 Desc:   PortNumber=02 ComponentID=01 EltType=Config
>                 Link0:  Desc:   TargetPort=00 TargetComponent=01
> AssocRCRB- LinkType=MemMapped LinkValid+
>                         Addr:   00000000fed19000
>         Capabilities: [d94 v1] Secondary PCI Express <?>
>         Kernel driver in use: pcieport
> 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> b0: 40 00 03 d1 80 25 0c 00 00 00 48 00 00 00 00 00
> c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> d0: 43 00 1e 00 00 00 00 00 00 00 00 00 00 00 00 00
> e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> f0: 00 00 00 84 4e 01 01 20 00 00 00 00 e0 00 10 00
>
> On Tue, May 21, 2019 at 7:35 PM Karol Herbst <kherbst@redhat.com> wrote:
> >
> > was able to get the lspci prints via ssh. Machine rebooted
> > automatically each time though.
> >
> > relevant dmesg:
> > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> >
> > (last one is a 64 bit mmio read to get the on GPU timer value)
> >
> > # lspci -vvxxx -s 0:01.00
> > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > decode])
> >        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr- Stepping- SERR- FastB2B- DisINTx-
> >        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> >        Latency: 0
> >        Interrupt: pin A routed to IRQ 16
> >        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> >        I/O behind bridge: 0000e000-0000efff [size=4K]
> >        Memory behind bridge: ec000000-ed0fffff [size=17M]
> >        Prefetchable memory behind bridge:
> > 00000000c0000000-00000000d1ffffff [size=288M]
> >        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > <TAbort- <MAbort+ <SERR- <PERR-
> >        BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> >                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> >        Capabilities: [88] Subsystem: Dell Device 07be
> >        Capabilities: [80] Power Management version 3
> >                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> >                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> >                Address: 00000000  Data: 0000
> >        Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> >                DevCap: MaxPayload 256 bytes, PhantFunc 0
> >                        ExtTag- RBE+
> >                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >                        MaxPayload 256 bytes, MaxReadReq 128 bytes
> >                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > AuxPwr- TransPend-
> >                LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> > Exit Latency L0s <256ns, L1 <8us
> >                        ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> >                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> >                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >                LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
> >                        TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
> >                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> > HotPlug- Surprise-
> >                        Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
> >                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt-
> > HPIrq- LinkChg-
> >                        Control: AttnInd Unknown, PwrInd Unknown,
> > Power- Interlock-
> >                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> > PresDet- Interlock-
> >                        Changed: MRL- PresDet+ LinkState-
> >                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > PMEIntEna- CRSVisible-
> >                RootCap: CRSVisible-
> >                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> >                DevCap2: Completion Timeout: Not Supported,
> > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> >                         AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
> >                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-,
> > LTR+, OBFF Via WAKE# ARIFwd-
> >                         AtomicOpsCtl: ReqEn- EgressBlck-
> >                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> >                         Transmit Margin: Normal Operating Range,
> > EnterModifiedCompliance- ComplianceSOS-
> >                         Compliance De-emphasis: -6dB
> >                LnkSta2: Current De-emphasis Level: -6dB,
> > EqualizationComplete-, EqualizationPhase1-
> >                         EqualizationPhase2-, EqualizationPhase3-,
> > LinkEqualizationRequest-
> >        Capabilities: [100 v1] Virtual Channel
> >                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
> >                Arb:    Fixed- WRR32- WRR64- WRR128-
> >                Ctrl:   ArbSelect=Fixed
> >                Status: InProgress-
> >                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
> >                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> >                        Status: NegoPending+ InProgress-
> >        Capabilities: [140 v1] Root Complex Link
> >                Desc:   PortNumber=02 ComponentID=01 EltType=Config
> >                Link0:  Desc:   TargetPort=00 TargetComponent=01
> > AssocRCRB- LinkType=MemMapped LinkValid+
> >                        Addr:   00000000fed19000
> >        Capabilities: [d94 v1] Secondary PCI Express <?>
> >        Kernel driver in use: pcieport
> > 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> > 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> > 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> > 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> > 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> > 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> > b0: 40 00 01 d1 80 25 0c 00 00 00 08 00 00 00 00 00
> > c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> > d0: 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > f0: 00 40 01 00 4e 01 01 22 00 00 00 00 e0 00 10 00
> >
> > lspci -vvxxx -s 1:00.00
> > 01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050
> > Mobile] (rev ff) (prog-if ff)
> >        !!! Unknown header type 7f
> >        Kernel driver in use: nouveau
> >        Kernel modules: nouveau
> > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > 90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> >
> > On Tue, May 21, 2019 at 4:30 PM Karol Herbst <kherbst@redhat.com> wrote:
> > >
> > > On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > >
> > > > On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote:
> > > > > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > > > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > > > > > Apperantly things go south if we suspend the device with a different PCIE
> > > > > > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > > > > > > > >
> > > > > > > > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > > > > > > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > > > > > > > that outside of drivers?
> > > > > > > >
> > > > > > > > I agree it would be nice to fix this in the PCI core if that's
> > > > > > > > feasible.
> > > > > > > >
> > > > > > > > It looks like this driver changes the PCIe link speed using some
> > > > > > > > device-specific mechanism.  When we suspend, we put the device in
> > > > > > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > > > > > comes up at the boot speed because nothing did that device-specific
> > > > > > > > magic to change it, so you probably end up with the link being slow
> > > > > > > > but the driver thinking it's configured to be fast, and maybe that
> > > > > > > > combination doesn't work.
> > > > > > > >
> > > > > > > > If it requires something device-specific to change that link speed, I
> > > > > > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > > > > > something?
> > > > > > > >
> > > > > > > > Per the PCIe spec (r4.0, sec 1.2):
> > > > > > > >
> > > > > > > >   Initialization – During hardware initialization, each PCI Express
> > > > > > > >   Link is set up following a negotiation of Lane widths and frequency
> > > > > > > >   of operation by the two agents at each end of the Link. No firmware
> > > > > > > >   or operating system software is involved.
> > > > > > > >
> > > > > > > > I have been assuming that this means device-specific link speed
> > > > > > > > management is out of spec, but it seems pretty common that devices
> > > > > > > > don't come up by themselves at the fastest possible link speed.  So
> > > > > > > > maybe the spec just intends that devices can operate at *some* valid
> > > > > > > > speed.
> > > > > > >
> > > > > > > I would expect that devices kind of have to figure out what they can
> > > > > > > operate on and the operating system kind of just checks what the
> > > > > > > current state is and doesn't try to "restore" the old state or
> > > > > > > something?
> > > > > >
> > > > > > The devices at each end of the link negotiate the width and speed of
> > > > > > the link.  This is done directly by the hardware without any help from
> > > > > > the OS.
> > > > > >
> > > > > > The OS can read the current link state (Current Link Speed and
> > > > > > Negotiated Link Width, both in the Link Status register).  The OS has
> > > > > > very little control over that state.  It can't directly restore the
> > > > > > state because the hardware has to negotiate a width & speed that
> > > > > > result in reliable operation.
> > > > > >
> > > > > > > We don't do anything in the driver after the device was suspended. And
> > > > > > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > > > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > > > > > essentially. I have no idea how much of this is part of the actual pci
> > > > > > > standard and how much is driver specific. But the driver also wants to
> > > > > > > have some control over the link speed as it's tight to performance
> > > > > > > states on GPU.
> > > > > >
> > > > > > As far as I'm aware, there is no generic PCIe way for the OS to
> > > > > > influence the link width or speed.  If the GPU driver needs to do
> > > > > > that, it would be via some device-specific mechanism.
> > > > > >
> > > > > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > > > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > > > > > the communication with the controller is broken. But if we set it to
> > > > > > > the boot speed, resuming the GPU just works. So my assumption was,
> > > > > > > that _something_ (might it be the controller or the pci subsystem)
> > > > > > > tries to force to operate on an invalid link speed and because the
> > > > > > > bridge controller is actually powered down as well (as all children
> > > > > > > are in D3cold) I could imagine that something in the pci subsystem
> > > > > > > actually restores the state which lets the controller fail to
> > > > > > > establish communication again?
> > > > > >
> > > > > >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> > > > > >      without OS/driver intervention.
> > > > > >
> > > > > >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> > > > > >      requires driver intervention or at least some ACPI method.
> > > > >
> > > > > there is no driver intervention and Nouveau doesn't care at all. It's
> > > > > all done on the GPU. We just upload a script and some firmware on to
> > > > > the GPU. The script runs then on the PMU inside the GPU and this
> > > > > script also causes changing the PCIe link settings. But from a Nouveau
> > > > > point of view we don't care about the link before or after that script
> > > > > was invoked. Also there is no ACPI method involved.
> > > > >
> > > > > But if there is something we should notify pci core about, maybe
> > > > > that's something we have to do then?
> > > >
> > > > I don't think there's anything the PCI core could do with that
> > > > information anyway.  The PCI core doesn't care at all about the link
> > > > speed, and it really can't influence it directly.
> > > >
> > > > > >   3) Suspend puts GPU into D3cold (powered off).
> > > > > >
> > > > > >   4) Resume restores GPU to D0, and the Port and GPU hardware again
> > > > > >      negotiate 8.0 GT/s without OS/driver intervention, just like at
> > > > > >      initial boot.
> > > > >
> > > > > No, that negotiation fails apparently as any attempt to read anything
> > > > > from the device just fails inside pci core. Or something goes wrong
> > > > > when resuming the bridge controller.
> > > >
> > > > I'm surprised the negotiation would fail even after a power cycle of
> > > > the device.  But if you can avoid the issue by running another script
> > > > on the PMU before suspend, that's probably what you'll have to do.
> > > >
> > >
> > > there is none as far as we know. Or at least nothing inside the vbios.
> > > We still have to get signed PMU firmware images from Nvidia for full
> > > support, but this still would be a hacky issue as we would depend on
> > > those then (and without having those in  redistributable form, there
> > > isn't much we can do about except fixing it on the kernel side).
> > >
> > > > > >   5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
> > > > > >      8.0 GT/s.
> > > > >
> > > > > what is actually meant by "driver" here? The pci subsystem or Nouveau?
> > > >
> > > > I was thinking Nouveau because the PCI core doesn't care about the
> > > > link speed.
> > > >
> > > > > > Without knowing more about the transition to 2.5 GT/s, I can't guess
> > > > > > why the GPU wouldn't work after resume.  From a PCIe point of view,
> > > > > > the link is supposed to work and the device should be reachable
> > > > > > independent of the link speed.  But maybe there's some weird
> > > > > > dependency between the GPU and the driver here.
> > > > >
> > > > > but the device isn't reachable at all, not even from the pci
> > > > > subsystem. All reads fail/return a default error value (0xffffffff).
> > > >
> > > > Are these PCI config reads that return 0xffffffff?  Or MMIO reads?
> > > > "lspci -vvxxxx" of the bridge and the GPU might have a clue about
> > > > whether a PCI error occurred.
> > > >
> > >
> > > that's kind of problematic as it might just lock up my machine... but
> > > let me try that.
> > >
> > > > > > It sounds like things work if you return to 8.0 GT/s before suspend,
> > > > > > things work.  That would make sense to me because then the driver's
> > > > > > idea of the link state after resume would match the actual state.
> > > > >
> > > > > depends on what is meant by the driver here. Inside Nouveau we don't
> > > > > care one bit about the current link speed, so I assume you mean
> > > > > something inside the pci core code?
> > > > >
> > > > > > But I don't see a way to deal with this in the PCI core.  The PCI core
> > > > > > does save and restore most of the architected config space around
> > > > > > suspend/resume, but since this appears to be a device-specific thing,
> > > > > > the PCI core would have no idea how to save/restore it.
> > > > >
> > > > > if we assume that the negotiation on a device level works as intended,
> > > > > then I would expect this to be a pci core issue, which might actually
> > > > > be not fixable there. But if it's not, then we would have to put
> > > > > something like that inside the runpm documentation to tell drivers
> > > > > they have to do something about it.
> > > > >lspci -vvxxxx
> > > > > But again, for me it just sounds like the negotiation on the device
> > > > > level fails or something inside pci core messes it up.
> > > >
> > > > To me it sounds like the PMU script messed something up, and the PCI
> > > > core has no way to know what that was or how to fix it.
> > > >
> > >
> > > sure, I am mainly wondering why it doesn't work after we power cycled
> > > the GPU and the host bridge controller, because no matter what the
> > > state was before, we have to reprobe instead of relying on a known
> > > state, no?
> > >
> > > > > > > > > Signed-off-by: Karol Herbst <kherbst@redhat.com>
> > > > > > > > > Reviewed-by: Lyude Paul <lyude@redhat.com>
> > > > > > > > > ---
> > > > > > > > >  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
> > > > > > > > >  drm/nouveau/nvkm/subdev/pci/base.c    |  9 +++++++--
> > > > > > > > >  drm/nouveau/nvkm/subdev/pci/pcie.c    | 24 ++++++++++++++++++++----
> > > > > > > > >  drm/nouveau/nvkm/subdev/pci/priv.h    |  2 ++
> > > > > > > > >  4 files changed, 32 insertions(+), 8 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > index 1fdf3098..b23793a2 100644
> > > > > > > > > --- a/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > @@ -26,8 +26,9 @@ struct nvkm_pci {
> > > > > > > > >       } agp;
> > > > > > > > >
> > > > > > > > >       struct {
> > > > > > > > > -             enum nvkm_pcie_speed speed;
> > > > > > > > > -             u8 width;
> > > > > > > > > +             enum nvkm_pcie_speed cur_speed;
> > > > > > > > > +             enum nvkm_pcie_speed def_speed;
> > > > > > > > > +             u8 cur_width;
> > > > > > > > >       } pcie;
> > > > > > > > >
> > > > > > > > >       bool msi;
> > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > index ee2431a7..d9fb5a83 100644
> > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
> > > > > > > > >
> > > > > > > > >       if (pci->agp.bridge)
> > > > > > > > >               nvkm_agp_fini(pci);
> > > > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > > > +             nvkm_pcie_fini(pci);
> > > > > > > > >
> > > > > > > > >       return 0;
> > > > > > > > >  }
> > > > > > > > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
> > > > > > > > >       struct nvkm_pci *pci = nvkm_pci(subdev);
> > > > > > > > >       if (pci->agp.bridge)
> > > > > > > > >               nvkm_agp_preinit(pci);
> > > > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > > > +             nvkm_pcie_preinit(pci);
> > > > > > > > >       return 0;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
> > > > > > > > >       pci->func = func;
> > > > > > > > >       pci->pdev = device->func->pci(device)->pdev;
> > > > > > > > >       pci->irq = -1;
> > > > > > > > > -     pci->pcie.speed = -1;
> > > > > > > > > -     pci->pcie.width = -1;
> > > > > > > > > +     pci->pcie.cur_speed = -1;
> > > > > > > > > +     pci->pcie.def_speed = -1;
> > > > > > > > > +     pci->pcie.cur_width = -1;
> > > > > > > > >
> > > > > > > > >       if (device->type == NVKM_DEVICE_AGP)
> > > > > > > > >               nvkm_agp_ctor(pci);
> > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > index 70ccbe0d..731dd30e 100644
> > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
> > > > > > > > >       return 0;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +int
> > > > > > > > > +nvkm_pcie_preinit(struct nvkm_pci *pci)
> > > > > > > > > +{
> > > > > > > > > +     pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
> > > > > > > > > +     return 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  int
> > > > > > > > >  nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > > >  {
> > > > > > > > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > > >       if (pci->func->pcie.init)
> > > > > > > > >               pci->func->pcie.init(pci);
> > > > > > > > >
> > > > > > > > > -     if (pci->pcie.speed != -1)
> > > > > > > > > -             nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> > > > > > > > > +     if (pci->pcie.cur_speed != -1)
> > > > > > > > > +             nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> > > > > > > > > +                                pci->pcie.cur_width);
> > > > > > > > >
> > > > > > > > >       return 0;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +int
> > > > > > > > > +nvkm_pcie_fini(struct nvkm_pci *pci)
> > > > > > > > > +{
> > > > > > > > > +     if (!IS_ERR_VALUE(pci->pcie.def_speed))
> > > > > > > > > +             return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> > > > > > > > > +     return 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  int
> > > > > > > > >  nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > > >  {
> > > > > > > > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > > >               speed = max_speed;
> > > > > > > > >       }
> > > > > > > > >
> > > > > > > > > -     pci->pcie.speed = speed;
> > > > > > > > > -     pci->pcie.width = width;
> > > > > > > > > +     pci->pcie.cur_speed = speed;
> > > > > > > > > +     pci->pcie.cur_width = width;
> > > > > > > > >
> > > > > > > > >       if (speed == cur_speed) {
> > > > > > > > >               nvkm_debug(subdev, "requested matches current speed\n");
> > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > index a0d4c007..e7744671 100644
> > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
> > > > > > > > >  int gk104_pcie_version_supported(struct nvkm_pci *);
> > > > > > > > >
> > > > > > > > >  int nvkm_pcie_oneinit(struct nvkm_pci *);
> > > > > > > > > +int nvkm_pcie_preinit(struct nvkm_pci *);
> > > > > > > > >  int nvkm_pcie_init(struct nvkm_pci *);
> > > > > > > > > +int nvkm_pcie_fini(struct nvkm_pci *);
> > > > > > > > >  #endif
> > > > > > > > > --
> > > > > > > > > 2.21.0
> > > > > > > > >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-06-03 13:18                   ` Karol Herbst
@ 2019-06-03 18:10                     ` Bjorn Helgaas
  2019-06-19 12:07                       ` Karol Herbst
  0 siblings, 1 reply; 23+ messages in thread
From: Bjorn Helgaas @ 2019-06-03 18:10 UTC (permalink / raw)
  To: Karol Herbst; +Cc: nouveau, Lyude Paul, Linux PCI

On Mon, Jun 03, 2019 at 03:18:56PM +0200, Karol Herbst wrote:
> @bjorn: any further ideas? Otherwise I'd like to just go ahead and fix
> this issue inside Nouveau and leave it there until we have a better
> understanding or non Nouveau cases of this issue.

Nope, I have no more ideas.

> On Tue, May 21, 2019 at 7:48 PM Karol Herbst <kherbst@redhat.com> wrote:
> >
> > doing the same on the bridge controller with my workarounds applied:
> >
> > please note some differences:
> > LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded)
> > SltSta: PresDet+ vs PresDet-
> > LnkSta2: Equalization stuff
> > Virtual channel: NegoPending- vs NegoPending+
> >
> > both times I executed lspci while the GPU was still suspended.
> >
> > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > decode])
> >         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr- Stepping- SERR- FastB2B- DisINTx-
> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Latency: 0
> >         Interrupt: pin A routed to IRQ 16
> >         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> >         I/O behind bridge: 0000e000-0000efff [size=4K]
> >         Memory behind bridge: ec000000-ed0fffff [size=17M]
> >         Prefetchable memory behind bridge:
> > 00000000c0000000-00000000d1ffffff [size=288M]
> >         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > <TAbort- <MAbort+ <SERR- <PERR-
> >         BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> >                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> >         Capabilities: [88] Subsystem: Dell Device 07be
> >         Capabilities: [80] Power Management version 3
> >                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> >                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> >         Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> >                 Address: 00000000  Data: 0000
> >         Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> >                 DevCap: MaxPayload 256 bytes, PhantFunc 0
> >                         ExtTag- RBE+
> >                 DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> >                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >                         MaxPayload 256 bytes, MaxReadReq 128 bytes
> >                 DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > AuxPwr- TransPend-
> >                 LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> > Exit Latency L0s <256ns, L1 <8us
> >                         ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> >                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >                 LnkSta: Speed 8GT/s (ok), Width x16 (ok)
> >                         TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
> >                 SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> > HotPlug- Surprise-
> >                         Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
> >                 SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
> > CmdCplt- HPIrq- LinkChg-
> >                         Control: AttnInd Unknown, PwrInd Unknown,
> > Power- Interlock-
> >                 SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> > PresDet+ Interlock-
> >                         Changed: MRL- PresDet+ LinkState-
> >                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > PMEIntEna- CRSVisible-
> >                 RootCap: CRSVisible-
> >                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> >                 DevCap2: Completion Timeout: Not Supported,
> > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> >                          AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
> >                 DevCtl2: Completion Timeout: 50us to 50ms,
> > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> >                          AtomicOpsCtl: ReqEn- EgressBlck-
> >                 LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> >                          Transmit Margin: Normal Operating Range,
> > EnterModifiedCompliance- ComplianceSOS-
> >                          Compliance De-emphasis: -6dB
> >                 LnkSta2: Current De-emphasis Level: -6dB,
> > EqualizationComplete+, EqualizationPhase1+
> >                          EqualizationPhase2+, EqualizationPhase3+,
> > LinkEqualizationRequest-
> >         Capabilities: [100 v1] Virtual Channel
> >                 Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
> >                 Arb:    Fixed- WRR32- WRR64- WRR128-
> >                 Ctrl:   ArbSelect=Fixed
> >                 Status: InProgress-
> >                 VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> >                         Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
> >                         Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> >                         Status: NegoPending- InProgress-
> >         Capabilities: [140 v1] Root Complex Link
> >                 Desc:   PortNumber=02 ComponentID=01 EltType=Config
> >                 Link0:  Desc:   TargetPort=00 TargetComponent=01
> > AssocRCRB- LinkType=MemMapped LinkValid+
> >                         Addr:   00000000fed19000
> >         Capabilities: [d94 v1] Secondary PCI Express <?>
> >         Kernel driver in use: pcieport
> > 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> > 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> > 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> > 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> > 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> > 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> > b0: 40 00 03 d1 80 25 0c 00 00 00 48 00 00 00 00 00
> > c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> > d0: 43 00 1e 00 00 00 00 00 00 00 00 00 00 00 00 00
> > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > f0: 00 00 00 84 4e 01 01 20 00 00 00 00 e0 00 10 00
> >
> > On Tue, May 21, 2019 at 7:35 PM Karol Herbst <kherbst@redhat.com> wrote:
> > >
> > > was able to get the lspci prints via ssh. Machine rebooted
> > > automatically each time though.
> > >
> > > relevant dmesg:
> > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> > >
> > > (last one is a 64 bit mmio read to get the on GPU timer value)
> > >
> > > # lspci -vvxxx -s 0:01.00
> > > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > > decode])
> > >        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > > ParErr- Stepping- SERR- FastB2B- DisINTx-
> > >        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > >        Latency: 0
> > >        Interrupt: pin A routed to IRQ 16
> > >        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> > >        I/O behind bridge: 0000e000-0000efff [size=4K]
> > >        Memory behind bridge: ec000000-ed0fffff [size=17M]
> > >        Prefetchable memory behind bridge:
> > > 00000000c0000000-00000000d1ffffff [size=288M]
> > >        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > <TAbort- <MAbort+ <SERR- <PERR-
> > >        BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> > >                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> > >        Capabilities: [88] Subsystem: Dell Device 07be
> > >        Capabilities: [80] Power Management version 3
> > >                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > >                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > >        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> > >                Address: 00000000  Data: 0000
> > >        Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> > >                DevCap: MaxPayload 256 bytes, PhantFunc 0
> > >                        ExtTag- RBE+
> > >                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > >                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > >                        MaxPayload 256 bytes, MaxReadReq 128 bytes
> > >                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > AuxPwr- TransPend-
> > >                LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> > > Exit Latency L0s <256ns, L1 <8us
> > >                        ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> > >                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > >                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > >                LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
> > >                        TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
> > >                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> > > HotPlug- Surprise-
> > >                        Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
> > >                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt-
> > > HPIrq- LinkChg-
> > >                        Control: AttnInd Unknown, PwrInd Unknown,
> > > Power- Interlock-
> > >                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> > > PresDet- Interlock-
> > >                        Changed: MRL- PresDet+ LinkState-
> > >                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > > PMEIntEna- CRSVisible-
> > >                RootCap: CRSVisible-
> > >                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> > >                DevCap2: Completion Timeout: Not Supported,
> > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> > >                         AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
> > >                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-,
> > > LTR+, OBFF Via WAKE# ARIFwd-
> > >                         AtomicOpsCtl: ReqEn- EgressBlck-
> > >                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> > >                         Transmit Margin: Normal Operating Range,
> > > EnterModifiedCompliance- ComplianceSOS-
> > >                         Compliance De-emphasis: -6dB
> > >                LnkSta2: Current De-emphasis Level: -6dB,
> > > EqualizationComplete-, EqualizationPhase1-
> > >                         EqualizationPhase2-, EqualizationPhase3-,
> > > LinkEqualizationRequest-
> > >        Capabilities: [100 v1] Virtual Channel
> > >                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
> > >                Arb:    Fixed- WRR32- WRR64- WRR128-
> > >                Ctrl:   ArbSelect=Fixed
> > >                Status: InProgress-
> > >                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> > >                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
> > >                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> > >                        Status: NegoPending+ InProgress-
> > >        Capabilities: [140 v1] Root Complex Link
> > >                Desc:   PortNumber=02 ComponentID=01 EltType=Config
> > >                Link0:  Desc:   TargetPort=00 TargetComponent=01
> > > AssocRCRB- LinkType=MemMapped LinkValid+
> > >                        Addr:   00000000fed19000
> > >        Capabilities: [d94 v1] Secondary PCI Express <?>
> > >        Kernel driver in use: pcieport
> > > 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> > > 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> > > 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> > > 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> > > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> > > 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> > > 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> > > b0: 40 00 01 d1 80 25 0c 00 00 00 08 00 00 00 00 00
> > > c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> > > d0: 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > f0: 00 40 01 00 4e 01 01 22 00 00 00 00 e0 00 10 00
> > >
> > > lspci -vvxxx -s 1:00.00
> > > 01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050
> > > Mobile] (rev ff) (prog-if ff)
> > >        !!! Unknown header type 7f
> > >        Kernel driver in use: nouveau
> > >        Kernel modules: nouveau
> > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > 40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > 50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > 60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > 70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > 80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > 90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > >
> > > On Tue, May 21, 2019 at 4:30 PM Karol Herbst <kherbst@redhat.com> wrote:
> > > >
> > > > On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > >
> > > > > On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote:
> > > > > > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > > > > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > > > > > > Apperantly things go south if we suspend the device with a different PCIE
> > > > > > > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > > > > > > > > >
> > > > > > > > > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > > > > > > > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > > > > > > > > that outside of drivers?
> > > > > > > > >
> > > > > > > > > I agree it would be nice to fix this in the PCI core if that's
> > > > > > > > > feasible.
> > > > > > > > >
> > > > > > > > > It looks like this driver changes the PCIe link speed using some
> > > > > > > > > device-specific mechanism.  When we suspend, we put the device in
> > > > > > > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > > > > > > comes up at the boot speed because nothing did that device-specific
> > > > > > > > > magic to change it, so you probably end up with the link being slow
> > > > > > > > > but the driver thinking it's configured to be fast, and maybe that
> > > > > > > > > combination doesn't work.
> > > > > > > > >
> > > > > > > > > If it requires something device-specific to change that link speed, I
> > > > > > > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > > > > > > something?
> > > > > > > > >
> > > > > > > > > Per the PCIe spec (r4.0, sec 1.2):
> > > > > > > > >
> > > > > > > > >   Initialization – During hardware initialization, each PCI Express
> > > > > > > > >   Link is set up following a negotiation of Lane widths and frequency
> > > > > > > > >   of operation by the two agents at each end of the Link. No firmware
> > > > > > > > >   or operating system software is involved.
> > > > > > > > >
> > > > > > > > > I have been assuming that this means device-specific link speed
> > > > > > > > > management is out of spec, but it seems pretty common that devices
> > > > > > > > > don't come up by themselves at the fastest possible link speed.  So
> > > > > > > > > maybe the spec just intends that devices can operate at *some* valid
> > > > > > > > > speed.
> > > > > > > >
> > > > > > > > I would expect that devices kind of have to figure out what they can
> > > > > > > > operate on and the operating system kind of just checks what the
> > > > > > > > current state is and doesn't try to "restore" the old state or
> > > > > > > > something?
> > > > > > >
> > > > > > > The devices at each end of the link negotiate the width and speed of
> > > > > > > the link.  This is done directly by the hardware without any help from
> > > > > > > the OS.
> > > > > > >
> > > > > > > The OS can read the current link state (Current Link Speed and
> > > > > > > Negotiated Link Width, both in the Link Status register).  The OS has
> > > > > > > very little control over that state.  It can't directly restore the
> > > > > > > state because the hardware has to negotiate a width & speed that
> > > > > > > result in reliable operation.
> > > > > > >
> > > > > > > > We don't do anything in the driver after the device was suspended. And
> > > > > > > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > > > > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > > > > > > essentially. I have no idea how much of this is part of the actual pci
> > > > > > > > standard and how much is driver specific. But the driver also wants to
> > > > > > > > have some control over the link speed as it's tight to performance
> > > > > > > > states on GPU.
> > > > > > >
> > > > > > > As far as I'm aware, there is no generic PCIe way for the OS to
> > > > > > > influence the link width or speed.  If the GPU driver needs to do
> > > > > > > that, it would be via some device-specific mechanism.
> > > > > > >
> > > > > > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > > > > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > > > > > > the communication with the controller is broken. But if we set it to
> > > > > > > > the boot speed, resuming the GPU just works. So my assumption was,
> > > > > > > > that _something_ (might it be the controller or the pci subsystem)
> > > > > > > > tries to force to operate on an invalid link speed and because the
> > > > > > > > bridge controller is actually powered down as well (as all children
> > > > > > > > are in D3cold) I could imagine that something in the pci subsystem
> > > > > > > > actually restores the state which lets the controller fail to
> > > > > > > > establish communication again?
> > > > > > >
> > > > > > >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> > > > > > >      without OS/driver intervention.
> > > > > > >
> > > > > > >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> > > > > > >      requires driver intervention or at least some ACPI method.
> > > > > >
> > > > > > there is no driver intervention and Nouveau doesn't care at all. It's
> > > > > > all done on the GPU. We just upload a script and some firmware on to
> > > > > > the GPU. The script runs then on the PMU inside the GPU and this
> > > > > > script also causes changing the PCIe link settings. But from a Nouveau
> > > > > > point of view we don't care about the link before or after that script
> > > > > > was invoked. Also there is no ACPI method involved.
> > > > > >
> > > > > > But if there is something we should notify pci core about, maybe
> > > > > > that's something we have to do then?
> > > > >
> > > > > I don't think there's anything the PCI core could do with that
> > > > > information anyway.  The PCI core doesn't care at all about the link
> > > > > speed, and it really can't influence it directly.
> > > > >
> > > > > > >   3) Suspend puts GPU into D3cold (powered off).
> > > > > > >
> > > > > > >   4) Resume restores GPU to D0, and the Port and GPU hardware again
> > > > > > >      negotiate 8.0 GT/s without OS/driver intervention, just like at
> > > > > > >      initial boot.
> > > > > >
> > > > > > No, that negotiation fails apparently as any attempt to read anything
> > > > > > from the device just fails inside pci core. Or something goes wrong
> > > > > > when resuming the bridge controller.
> > > > >
> > > > > I'm surprised the negotiation would fail even after a power cycle of
> > > > > the device.  But if you can avoid the issue by running another script
> > > > > on the PMU before suspend, that's probably what you'll have to do.
> > > > >
> > > >
> > > > there is none as far as we know. Or at least nothing inside the vbios.
> > > > We still have to get signed PMU firmware images from Nvidia for full
> > > > support, but this still would be a hacky issue as we would depend on
> > > > those then (and without having those in  redistributable form, there
> > > > isn't much we can do about except fixing it on the kernel side).
> > > >
> > > > > > >   5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
> > > > > > >      8.0 GT/s.
> > > > > >
> > > > > > what is actually meant by "driver" here? The pci subsystem or Nouveau?
> > > > >
> > > > > I was thinking Nouveau because the PCI core doesn't care about the
> > > > > link speed.
> > > > >
> > > > > > > Without knowing more about the transition to 2.5 GT/s, I can't guess
> > > > > > > why the GPU wouldn't work after resume.  From a PCIe point of view,
> > > > > > > the link is supposed to work and the device should be reachable
> > > > > > > independent of the link speed.  But maybe there's some weird
> > > > > > > dependency between the GPU and the driver here.
> > > > > >
> > > > > > but the device isn't reachable at all, not even from the pci
> > > > > > subsystem. All reads fail/return a default error value (0xffffffff).
> > > > >
> > > > > Are these PCI config reads that return 0xffffffff?  Or MMIO reads?
> > > > > "lspci -vvxxxx" of the bridge and the GPU might have a clue about
> > > > > whether a PCI error occurred.
> > > > >
> > > >
> > > > that's kind of problematic as it might just lock up my machine... but
> > > > let me try that.
> > > >
> > > > > > > It sounds like things work if you return to 8.0 GT/s before suspend,
> > > > > > > things work.  That would make sense to me because then the driver's
> > > > > > > idea of the link state after resume would match the actual state.
> > > > > >
> > > > > > depends on what is meant by the driver here. Inside Nouveau we don't
> > > > > > care one bit about the current link speed, so I assume you mean
> > > > > > something inside the pci core code?
> > > > > >
> > > > > > > But I don't see a way to deal with this in the PCI core.  The PCI core
> > > > > > > does save and restore most of the architected config space around
> > > > > > > suspend/resume, but since this appears to be a device-specific thing,
> > > > > > > the PCI core would have no idea how to save/restore it.
> > > > > >
> > > > > > if we assume that the negotiation on a device level works as intended,
> > > > > > then I would expect this to be a pci core issue, which might actually
> > > > > > be not fixable there. But if it's not, then we would have to put
> > > > > > something like that inside the runpm documentation to tell drivers
> > > > > > they have to do something about it.
> > > > > >lspci -vvxxxx
> > > > > > But again, for me it just sounds like the negotiation on the device
> > > > > > level fails or something inside pci core messes it up.
> > > > >
> > > > > To me it sounds like the PMU script messed something up, and the PCI
> > > > > core has no way to know what that was or how to fix it.
> > > > >
> > > >
> > > > sure, I am mainly wondering why it doesn't work after we power cycled
> > > > the GPU and the host bridge controller, because no matter what the
> > > > state was before, we have to reprobe instead of relying on a known
> > > > state, no?
> > > >
> > > > > > > > > > Signed-off-by: Karol Herbst <kherbst@redhat.com>
> > > > > > > > > > Reviewed-by: Lyude Paul <lyude@redhat.com>
> > > > > > > > > > ---
> > > > > > > > > >  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
> > > > > > > > > >  drm/nouveau/nvkm/subdev/pci/base.c    |  9 +++++++--
> > > > > > > > > >  drm/nouveau/nvkm/subdev/pci/pcie.c    | 24 ++++++++++++++++++++----
> > > > > > > > > >  drm/nouveau/nvkm/subdev/pci/priv.h    |  2 ++
> > > > > > > > > >  4 files changed, 32 insertions(+), 8 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > > index 1fdf3098..b23793a2 100644
> > > > > > > > > > --- a/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > > @@ -26,8 +26,9 @@ struct nvkm_pci {
> > > > > > > > > >       } agp;
> > > > > > > > > >
> > > > > > > > > >       struct {
> > > > > > > > > > -             enum nvkm_pcie_speed speed;
> > > > > > > > > > -             u8 width;
> > > > > > > > > > +             enum nvkm_pcie_speed cur_speed;
> > > > > > > > > > +             enum nvkm_pcie_speed def_speed;
> > > > > > > > > > +             u8 cur_width;
> > > > > > > > > >       } pcie;
> > > > > > > > > >
> > > > > > > > > >       bool msi;
> > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > > index ee2431a7..d9fb5a83 100644
> > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
> > > > > > > > > >
> > > > > > > > > >       if (pci->agp.bridge)
> > > > > > > > > >               nvkm_agp_fini(pci);
> > > > > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > > > > +             nvkm_pcie_fini(pci);
> > > > > > > > > >
> > > > > > > > > >       return 0;
> > > > > > > > > >  }
> > > > > > > > > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
> > > > > > > > > >       struct nvkm_pci *pci = nvkm_pci(subdev);
> > > > > > > > > >       if (pci->agp.bridge)
> > > > > > > > > >               nvkm_agp_preinit(pci);
> > > > > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > > > > +             nvkm_pcie_preinit(pci);
> > > > > > > > > >       return 0;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
> > > > > > > > > >       pci->func = func;
> > > > > > > > > >       pci->pdev = device->func->pci(device)->pdev;
> > > > > > > > > >       pci->irq = -1;
> > > > > > > > > > -     pci->pcie.speed = -1;
> > > > > > > > > > -     pci->pcie.width = -1;
> > > > > > > > > > +     pci->pcie.cur_speed = -1;
> > > > > > > > > > +     pci->pcie.def_speed = -1;
> > > > > > > > > > +     pci->pcie.cur_width = -1;
> > > > > > > > > >
> > > > > > > > > >       if (device->type == NVKM_DEVICE_AGP)
> > > > > > > > > >               nvkm_agp_ctor(pci);
> > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > > index 70ccbe0d..731dd30e 100644
> > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
> > > > > > > > > >       return 0;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > +int
> > > > > > > > > > +nvkm_pcie_preinit(struct nvkm_pci *pci)
> > > > > > > > > > +{
> > > > > > > > > > +     pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
> > > > > > > > > > +     return 0;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > >  int
> > > > > > > > > >  nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > > > >  {
> > > > > > > > > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > > > >       if (pci->func->pcie.init)
> > > > > > > > > >               pci->func->pcie.init(pci);
> > > > > > > > > >
> > > > > > > > > > -     if (pci->pcie.speed != -1)
> > > > > > > > > > -             nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> > > > > > > > > > +     if (pci->pcie.cur_speed != -1)
> > > > > > > > > > +             nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> > > > > > > > > > +                                pci->pcie.cur_width);
> > > > > > > > > >
> > > > > > > > > >       return 0;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > +int
> > > > > > > > > > +nvkm_pcie_fini(struct nvkm_pci *pci)
> > > > > > > > > > +{
> > > > > > > > > > +     if (!IS_ERR_VALUE(pci->pcie.def_speed))
> > > > > > > > > > +             return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> > > > > > > > > > +     return 0;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > >  int
> > > > > > > > > >  nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > > > >  {
> > > > > > > > > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > > > >               speed = max_speed;
> > > > > > > > > >       }
> > > > > > > > > >
> > > > > > > > > > -     pci->pcie.speed = speed;
> > > > > > > > > > -     pci->pcie.width = width;
> > > > > > > > > > +     pci->pcie.cur_speed = speed;
> > > > > > > > > > +     pci->pcie.cur_width = width;
> > > > > > > > > >
> > > > > > > > > >       if (speed == cur_speed) {
> > > > > > > > > >               nvkm_debug(subdev, "requested matches current speed\n");
> > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > > index a0d4c007..e7744671 100644
> > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
> > > > > > > > > >  int gk104_pcie_version_supported(struct nvkm_pci *);
> > > > > > > > > >
> > > > > > > > > >  int nvkm_pcie_oneinit(struct nvkm_pci *);
> > > > > > > > > > +int nvkm_pcie_preinit(struct nvkm_pci *);
> > > > > > > > > >  int nvkm_pcie_init(struct nvkm_pci *);
> > > > > > > > > > +int nvkm_pcie_fini(struct nvkm_pci *);
> > > > > > > > > >  #endif
> > > > > > > > > > --
> > > > > > > > > > 2.21.0
> > > > > > > > > >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-06-03 18:10                     ` Bjorn Helgaas
@ 2019-06-19 12:07                       ` Karol Herbst
  2019-06-19 12:12                         ` Karol Herbst
  0 siblings, 1 reply; 23+ messages in thread
From: Karol Herbst @ 2019-06-19 12:07 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: nouveau, Lyude Paul, Linux PCI

Hi Bjorn,

I was playing around with some older information again (write into the
PCI config to put the card into d3 state). And there is something
which made me very curious:
If I put the card manually into any other state besides D0 via the
0x64 pci config register, the card just dies and pci core seems to
expect this to not happen. pci_raw_set_power_state has this
"pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);" call,
and reads the value back later, but if the card is already gone, maybe
we can't do this for nvidia GPUs?

No idea why I didn't played around more with that register, but if the
card already dies there then this kind of shows there is indeed an
issue on a PCI level, no?

On Mon, Jun 3, 2019 at 8:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Mon, Jun 03, 2019 at 03:18:56PM +0200, Karol Herbst wrote:
> > @bjorn: any further ideas? Otherwise I'd like to just go ahead and fix
> > this issue inside Nouveau and leave it there until we have a better
> > understanding or non Nouveau cases of this issue.
>
> Nope, I have no more ideas.
>
> > On Tue, May 21, 2019 at 7:48 PM Karol Herbst <kherbst@redhat.com> wrote:
> > >
> > > doing the same on the bridge controller with my workarounds applied:
> > >
> > > please note some differences:
> > > LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded)
> > > SltSta: PresDet+ vs PresDet-
> > > LnkSta2: Equalization stuff
> > > Virtual channel: NegoPending- vs NegoPending+
> > >
> > > both times I executed lspci while the GPU was still suspended.
> > >
> > > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > > decode])
> > >         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > > ParErr- Stepping- SERR- FastB2B- DisINTx-
> > >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > >         Latency: 0
> > >         Interrupt: pin A routed to IRQ 16
> > >         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> > >         I/O behind bridge: 0000e000-0000efff [size=4K]
> > >         Memory behind bridge: ec000000-ed0fffff [size=17M]
> > >         Prefetchable memory behind bridge:
> > > 00000000c0000000-00000000d1ffffff [size=288M]
> > >         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > <TAbort- <MAbort+ <SERR- <PERR-
> > >         BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> > >                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> > >         Capabilities: [88] Subsystem: Dell Device 07be
> > >         Capabilities: [80] Power Management version 3
> > >                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > >                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > >         Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> > >                 Address: 00000000  Data: 0000
> > >         Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> > >                 DevCap: MaxPayload 256 bytes, PhantFunc 0
> > >                         ExtTag- RBE+
> > >                 DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > >                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > >                         MaxPayload 256 bytes, MaxReadReq 128 bytes
> > >                 DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > AuxPwr- TransPend-
> > >                 LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> > > Exit Latency L0s <256ns, L1 <8us
> > >                         ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> > >                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > >                 LnkSta: Speed 8GT/s (ok), Width x16 (ok)
> > >                         TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
> > >                 SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> > > HotPlug- Surprise-
> > >                         Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
> > >                 SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
> > > CmdCplt- HPIrq- LinkChg-
> > >                         Control: AttnInd Unknown, PwrInd Unknown,
> > > Power- Interlock-
> > >                 SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> > > PresDet+ Interlock-
> > >                         Changed: MRL- PresDet+ LinkState-
> > >                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > > PMEIntEna- CRSVisible-
> > >                 RootCap: CRSVisible-
> > >                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> > >                 DevCap2: Completion Timeout: Not Supported,
> > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> > >                          AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
> > >                 DevCtl2: Completion Timeout: 50us to 50ms,
> > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> > >                          AtomicOpsCtl: ReqEn- EgressBlck-
> > >                 LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> > >                          Transmit Margin: Normal Operating Range,
> > > EnterModifiedCompliance- ComplianceSOS-
> > >                          Compliance De-emphasis: -6dB
> > >                 LnkSta2: Current De-emphasis Level: -6dB,
> > > EqualizationComplete+, EqualizationPhase1+
> > >                          EqualizationPhase2+, EqualizationPhase3+,
> > > LinkEqualizationRequest-
> > >         Capabilities: [100 v1] Virtual Channel
> > >                 Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
> > >                 Arb:    Fixed- WRR32- WRR64- WRR128-
> > >                 Ctrl:   ArbSelect=Fixed
> > >                 Status: InProgress-
> > >                 VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> > >                         Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
> > >                         Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> > >                         Status: NegoPending- InProgress-
> > >         Capabilities: [140 v1] Root Complex Link
> > >                 Desc:   PortNumber=02 ComponentID=01 EltType=Config
> > >                 Link0:  Desc:   TargetPort=00 TargetComponent=01
> > > AssocRCRB- LinkType=MemMapped LinkValid+
> > >                         Addr:   00000000fed19000
> > >         Capabilities: [d94 v1] Secondary PCI Express <?>
> > >         Kernel driver in use: pcieport
> > > 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> > > 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> > > 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> > > 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> > > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> > > 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> > > 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> > > b0: 40 00 03 d1 80 25 0c 00 00 00 48 00 00 00 00 00
> > > c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> > > d0: 43 00 1e 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > f0: 00 00 00 84 4e 01 01 20 00 00 00 00 e0 00 10 00
> > >
> > > On Tue, May 21, 2019 at 7:35 PM Karol Herbst <kherbst@redhat.com> wrote:
> > > >
> > > > was able to get the lspci prints via ssh. Machine rebooted
> > > > automatically each time though.
> > > >
> > > > relevant dmesg:
> > > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > > kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> > > >
> > > > (last one is a 64 bit mmio read to get the on GPU timer value)
> > > >
> > > > # lspci -vvxxx -s 0:01.00
> > > > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > > > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > > > decode])
> > > >        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > > > ParErr- Stepping- SERR- FastB2B- DisINTx-
> > > >        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > >        Latency: 0
> > > >        Interrupt: pin A routed to IRQ 16
> > > >        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> > > >        I/O behind bridge: 0000e000-0000efff [size=4K]
> > > >        Memory behind bridge: ec000000-ed0fffff [size=17M]
> > > >        Prefetchable memory behind bridge:
> > > > 00000000c0000000-00000000d1ffffff [size=288M]
> > > >        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > <TAbort- <MAbort+ <SERR- <PERR-
> > > >        BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> > > >                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> > > >        Capabilities: [88] Subsystem: Dell Device 07be
> > > >        Capabilities: [80] Power Management version 3
> > > >                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > > > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > > >                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > > >        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> > > >                Address: 00000000  Data: 0000
> > > >        Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> > > >                DevCap: MaxPayload 256 bytes, PhantFunc 0
> > > >                        ExtTag- RBE+
> > > >                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > >                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > > >                        MaxPayload 256 bytes, MaxReadReq 128 bytes
> > > >                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > > AuxPwr- TransPend-
> > > >                LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> > > > Exit Latency L0s <256ns, L1 <8us
> > > >                        ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> > > >                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > > >                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > >                LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
> > > >                        TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
> > > >                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> > > > HotPlug- Surprise-
> > > >                        Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
> > > >                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt-
> > > > HPIrq- LinkChg-
> > > >                        Control: AttnInd Unknown, PwrInd Unknown,
> > > > Power- Interlock-
> > > >                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> > > > PresDet- Interlock-
> > > >                        Changed: MRL- PresDet+ LinkState-
> > > >                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > > > PMEIntEna- CRSVisible-
> > > >                RootCap: CRSVisible-
> > > >                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> > > >                DevCap2: Completion Timeout: Not Supported,
> > > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> > > >                         AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
> > > >                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-,
> > > > LTR+, OBFF Via WAKE# ARIFwd-
> > > >                         AtomicOpsCtl: ReqEn- EgressBlck-
> > > >                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> > > >                         Transmit Margin: Normal Operating Range,
> > > > EnterModifiedCompliance- ComplianceSOS-
> > > >                         Compliance De-emphasis: -6dB
> > > >                LnkSta2: Current De-emphasis Level: -6dB,
> > > > EqualizationComplete-, EqualizationPhase1-
> > > >                         EqualizationPhase2-, EqualizationPhase3-,
> > > > LinkEqualizationRequest-
> > > >        Capabilities: [100 v1] Virtual Channel
> > > >                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
> > > >                Arb:    Fixed- WRR32- WRR64- WRR128-
> > > >                Ctrl:   ArbSelect=Fixed
> > > >                Status: InProgress-
> > > >                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> > > >                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
> > > >                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> > > >                        Status: NegoPending+ InProgress-
> > > >        Capabilities: [140 v1] Root Complex Link
> > > >                Desc:   PortNumber=02 ComponentID=01 EltType=Config
> > > >                Link0:  Desc:   TargetPort=00 TargetComponent=01
> > > > AssocRCRB- LinkType=MemMapped LinkValid+
> > > >                        Addr:   00000000fed19000
> > > >        Capabilities: [d94 v1] Secondary PCI Express <?>
> > > >        Kernel driver in use: pcieport
> > > > 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> > > > 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> > > > 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> > > > 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> > > > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> > > > 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> > > > 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> > > > b0: 40 00 01 d1 80 25 0c 00 00 00 08 00 00 00 00 00
> > > > c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> > > > d0: 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > f0: 00 40 01 00 4e 01 01 22 00 00 00 00 e0 00 10 00
> > > >
> > > > lspci -vvxxx -s 1:00.00
> > > > 01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050
> > > > Mobile] (rev ff) (prog-if ff)
> > > >        !!! Unknown header type 7f
> > > >        Kernel driver in use: nouveau
> > > >        Kernel modules: nouveau
> > > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > 40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > 50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > 60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > 70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > 80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > 90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > >
> > > > On Tue, May 21, 2019 at 4:30 PM Karol Herbst <kherbst@redhat.com> wrote:
> > > > >
> > > > > On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > >
> > > > > > On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote:
> > > > > > > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > > > > > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > > > > > > > Apperantly things go south if we suspend the device with a different PCIE
> > > > > > > > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > > > > > > > > > >
> > > > > > > > > > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > > > > > > > > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > > > > > > > > > that outside of drivers?
> > > > > > > > > >
> > > > > > > > > > I agree it would be nice to fix this in the PCI core if that's
> > > > > > > > > > feasible.
> > > > > > > > > >
> > > > > > > > > > It looks like this driver changes the PCIe link speed using some
> > > > > > > > > > device-specific mechanism.  When we suspend, we put the device in
> > > > > > > > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > > > > > > > comes up at the boot speed because nothing did that device-specific
> > > > > > > > > > magic to change it, so you probably end up with the link being slow
> > > > > > > > > > but the driver thinking it's configured to be fast, and maybe that
> > > > > > > > > > combination doesn't work.
> > > > > > > > > >
> > > > > > > > > > If it requires something device-specific to change that link speed, I
> > > > > > > > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > > > > > > > something?
> > > > > > > > > >
> > > > > > > > > > Per the PCIe spec (r4.0, sec 1.2):
> > > > > > > > > >
> > > > > > > > > >   Initialization – During hardware initialization, each PCI Express
> > > > > > > > > >   Link is set up following a negotiation of Lane widths and frequency
> > > > > > > > > >   of operation by the two agents at each end of the Link. No firmware
> > > > > > > > > >   or operating system software is involved.
> > > > > > > > > >
> > > > > > > > > > I have been assuming that this means device-specific link speed
> > > > > > > > > > management is out of spec, but it seems pretty common that devices
> > > > > > > > > > don't come up by themselves at the fastest possible link speed.  So
> > > > > > > > > > maybe the spec just intends that devices can operate at *some* valid
> > > > > > > > > > speed.
> > > > > > > > >
> > > > > > > > > I would expect that devices kind of have to figure out what they can
> > > > > > > > > operate on and the operating system kind of just checks what the
> > > > > > > > > current state is and doesn't try to "restore" the old state or
> > > > > > > > > something?
> > > > > > > >
> > > > > > > > The devices at each end of the link negotiate the width and speed of
> > > > > > > > the link.  This is done directly by the hardware without any help from
> > > > > > > > the OS.
> > > > > > > >
> > > > > > > > The OS can read the current link state (Current Link Speed and
> > > > > > > > Negotiated Link Width, both in the Link Status register).  The OS has
> > > > > > > > very little control over that state.  It can't directly restore the
> > > > > > > > state because the hardware has to negotiate a width & speed that
> > > > > > > > result in reliable operation.
> > > > > > > >
> > > > > > > > > We don't do anything in the driver after the device was suspended. And
> > > > > > > > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > > > > > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > > > > > > > essentially. I have no idea how much of this is part of the actual pci
> > > > > > > > > standard and how much is driver specific. But the driver also wants to
> > > > > > > > > have some control over the link speed as it's tight to performance
> > > > > > > > > states on GPU.
> > > > > > > >
> > > > > > > > As far as I'm aware, there is no generic PCIe way for the OS to
> > > > > > > > influence the link width or speed.  If the GPU driver needs to do
> > > > > > > > that, it would be via some device-specific mechanism.
> > > > > > > >
> > > > > > > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > > > > > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > > > > > > > the communication with the controller is broken. But if we set it to
> > > > > > > > > the boot speed, resuming the GPU just works. So my assumption was,
> > > > > > > > > that _something_ (might it be the controller or the pci subsystem)
> > > > > > > > > tries to force to operate on an invalid link speed and because the
> > > > > > > > > bridge controller is actually powered down as well (as all children
> > > > > > > > > are in D3cold) I could imagine that something in the pci subsystem
> > > > > > > > > actually restores the state which lets the controller fail to
> > > > > > > > > establish communication again?
> > > > > > > >
> > > > > > > >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> > > > > > > >      without OS/driver intervention.
> > > > > > > >
> > > > > > > >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> > > > > > > >      requires driver intervention or at least some ACPI method.
> > > > > > >
> > > > > > > there is no driver intervention and Nouveau doesn't care at all. It's
> > > > > > > all done on the GPU. We just upload a script and some firmware on to
> > > > > > > the GPU. The script runs then on the PMU inside the GPU and this
> > > > > > > script also causes changing the PCIe link settings. But from a Nouveau
> > > > > > > point of view we don't care about the link before or after that script
> > > > > > > was invoked. Also there is no ACPI method involved.
> > > > > > >
> > > > > > > But if there is something we should notify pci core about, maybe
> > > > > > > that's something we have to do then?
> > > > > >
> > > > > > I don't think there's anything the PCI core could do with that
> > > > > > information anyway.  The PCI core doesn't care at all about the link
> > > > > > speed, and it really can't influence it directly.
> > > > > >
> > > > > > > >   3) Suspend puts GPU into D3cold (powered off).
> > > > > > > >
> > > > > > > >   4) Resume restores GPU to D0, and the Port and GPU hardware again
> > > > > > > >      negotiate 8.0 GT/s without OS/driver intervention, just like at
> > > > > > > >      initial boot.
> > > > > > >
> > > > > > > No, that negotiation fails apparently as any attempt to read anything
> > > > > > > from the device just fails inside pci core. Or something goes wrong
> > > > > > > when resuming the bridge controller.
> > > > > >
> > > > > > I'm surprised the negotiation would fail even after a power cycle of
> > > > > > the device.  But if you can avoid the issue by running another script
> > > > > > on the PMU before suspend, that's probably what you'll have to do.
> > > > > >
> > > > >
> > > > > there is none as far as we know. Or at least nothing inside the vbios.
> > > > > We still have to get signed PMU firmware images from Nvidia for full
> > > > > support, but this still would be a hacky issue as we would depend on
> > > > > those then (and without having those in  redistributable form, there
> > > > > isn't much we can do about except fixing it on the kernel side).
> > > > >
> > > > > > > >   5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
> > > > > > > >      8.0 GT/s.
> > > > > > >
> > > > > > > what is actually meant by "driver" here? The pci subsystem or Nouveau?
> > > > > >
> > > > > > I was thinking Nouveau because the PCI core doesn't care about the
> > > > > > link speed.
> > > > > >
> > > > > > > > Without knowing more about the transition to 2.5 GT/s, I can't guess
> > > > > > > > why the GPU wouldn't work after resume.  From a PCIe point of view,
> > > > > > > > the link is supposed to work and the device should be reachable
> > > > > > > > independent of the link speed.  But maybe there's some weird
> > > > > > > > dependency between the GPU and the driver here.
> > > > > > >
> > > > > > > but the device isn't reachable at all, not even from the pci
> > > > > > > subsystem. All reads fail/return a default error value (0xffffffff).
> > > > > >
> > > > > > Are these PCI config reads that return 0xffffffff?  Or MMIO reads?
> > > > > > "lspci -vvxxxx" of the bridge and the GPU might have a clue about
> > > > > > whether a PCI error occurred.
> > > > > >
> > > > >
> > > > > that's kind of problematic as it might just lock up my machine... but
> > > > > let me try that.
> > > > >
> > > > > > > > It sounds like things work if you return to 8.0 GT/s before suspend,
> > > > > > > > things work.  That would make sense to me because then the driver's
> > > > > > > > idea of the link state after resume would match the actual state.
> > > > > > >
> > > > > > > depends on what is meant by the driver here. Inside Nouveau we don't
> > > > > > > care one bit about the current link speed, so I assume you mean
> > > > > > > something inside the pci core code?
> > > > > > >
> > > > > > > > But I don't see a way to deal with this in the PCI core.  The PCI core
> > > > > > > > does save and restore most of the architected config space around
> > > > > > > > suspend/resume, but since this appears to be a device-specific thing,
> > > > > > > > the PCI core would have no idea how to save/restore it.
> > > > > > >
> > > > > > > if we assume that the negotiation on a device level works as intended,
> > > > > > > then I would expect this to be a pci core issue, which might actually
> > > > > > > be not fixable there. But if it's not, then we would have to put
> > > > > > > something like that inside the runpm documentation to tell drivers
> > > > > > > they have to do something about it.
> > > > > > >lspci -vvxxxx
> > > > > > > But again, for me it just sounds like the negotiation on the device
> > > > > > > level fails or something inside pci core messes it up.
> > > > > >
> > > > > > To me it sounds like the PMU script messed something up, and the PCI
> > > > > > core has no way to know what that was or how to fix it.
> > > > > >
> > > > >
> > > > > sure, I am mainly wondering why it doesn't work after we power cycled
> > > > > the GPU and the host bridge controller, because no matter what the
> > > > > state was before, we have to reprobe instead of relying on a known
> > > > > state, no?
> > > > >
> > > > > > > > > > > Signed-off-by: Karol Herbst <kherbst@redhat.com>
> > > > > > > > > > > Reviewed-by: Lyude Paul <lyude@redhat.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
> > > > > > > > > > >  drm/nouveau/nvkm/subdev/pci/base.c    |  9 +++++++--
> > > > > > > > > > >  drm/nouveau/nvkm/subdev/pci/pcie.c    | 24 ++++++++++++++++++++----
> > > > > > > > > > >  drm/nouveau/nvkm/subdev/pci/priv.h    |  2 ++
> > > > > > > > > > >  4 files changed, 32 insertions(+), 8 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > > > index 1fdf3098..b23793a2 100644
> > > > > > > > > > > --- a/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > > > @@ -26,8 +26,9 @@ struct nvkm_pci {
> > > > > > > > > > >       } agp;
> > > > > > > > > > >
> > > > > > > > > > >       struct {
> > > > > > > > > > > -             enum nvkm_pcie_speed speed;
> > > > > > > > > > > -             u8 width;
> > > > > > > > > > > +             enum nvkm_pcie_speed cur_speed;
> > > > > > > > > > > +             enum nvkm_pcie_speed def_speed;
> > > > > > > > > > > +             u8 cur_width;
> > > > > > > > > > >       } pcie;
> > > > > > > > > > >
> > > > > > > > > > >       bool msi;
> > > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > > > index ee2431a7..d9fb5a83 100644
> > > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
> > > > > > > > > > >
> > > > > > > > > > >       if (pci->agp.bridge)
> > > > > > > > > > >               nvkm_agp_fini(pci);
> > > > > > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > > > > > +             nvkm_pcie_fini(pci);
> > > > > > > > > > >
> > > > > > > > > > >       return 0;
> > > > > > > > > > >  }
> > > > > > > > > > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
> > > > > > > > > > >       struct nvkm_pci *pci = nvkm_pci(subdev);
> > > > > > > > > > >       if (pci->agp.bridge)
> > > > > > > > > > >               nvkm_agp_preinit(pci);
> > > > > > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > > > > > +             nvkm_pcie_preinit(pci);
> > > > > > > > > > >       return 0;
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
> > > > > > > > > > >       pci->func = func;
> > > > > > > > > > >       pci->pdev = device->func->pci(device)->pdev;
> > > > > > > > > > >       pci->irq = -1;
> > > > > > > > > > > -     pci->pcie.speed = -1;
> > > > > > > > > > > -     pci->pcie.width = -1;
> > > > > > > > > > > +     pci->pcie.cur_speed = -1;
> > > > > > > > > > > +     pci->pcie.def_speed = -1;
> > > > > > > > > > > +     pci->pcie.cur_width = -1;
> > > > > > > > > > >
> > > > > > > > > > >       if (device->type == NVKM_DEVICE_AGP)
> > > > > > > > > > >               nvkm_agp_ctor(pci);
> > > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > > > index 70ccbe0d..731dd30e 100644
> > > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
> > > > > > > > > > >       return 0;
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > > +int
> > > > > > > > > > > +nvkm_pcie_preinit(struct nvkm_pci *pci)
> > > > > > > > > > > +{
> > > > > > > > > > > +     pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
> > > > > > > > > > > +     return 0;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > >  int
> > > > > > > > > > >  nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > > > > >  {
> > > > > > > > > > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > > > > >       if (pci->func->pcie.init)
> > > > > > > > > > >               pci->func->pcie.init(pci);
> > > > > > > > > > >
> > > > > > > > > > > -     if (pci->pcie.speed != -1)
> > > > > > > > > > > -             nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> > > > > > > > > > > +     if (pci->pcie.cur_speed != -1)
> > > > > > > > > > > +             nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> > > > > > > > > > > +                                pci->pcie.cur_width);
> > > > > > > > > > >
> > > > > > > > > > >       return 0;
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > > +int
> > > > > > > > > > > +nvkm_pcie_fini(struct nvkm_pci *pci)
> > > > > > > > > > > +{
> > > > > > > > > > > +     if (!IS_ERR_VALUE(pci->pcie.def_speed))
> > > > > > > > > > > +             return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> > > > > > > > > > > +     return 0;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > >  int
> > > > > > > > > > >  nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > > > > >  {
> > > > > > > > > > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > > > > >               speed = max_speed;
> > > > > > > > > > >       }
> > > > > > > > > > >
> > > > > > > > > > > -     pci->pcie.speed = speed;
> > > > > > > > > > > -     pci->pcie.width = width;
> > > > > > > > > > > +     pci->pcie.cur_speed = speed;
> > > > > > > > > > > +     pci->pcie.cur_width = width;
> > > > > > > > > > >
> > > > > > > > > > >       if (speed == cur_speed) {
> > > > > > > > > > >               nvkm_debug(subdev, "requested matches current speed\n");
> > > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > > > index a0d4c007..e7744671 100644
> > > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
> > > > > > > > > > >  int gk104_pcie_version_supported(struct nvkm_pci *);
> > > > > > > > > > >
> > > > > > > > > > >  int nvkm_pcie_oneinit(struct nvkm_pci *);
> > > > > > > > > > > +int nvkm_pcie_preinit(struct nvkm_pci *);
> > > > > > > > > > >  int nvkm_pcie_init(struct nvkm_pci *);
> > > > > > > > > > > +int nvkm_pcie_fini(struct nvkm_pci *);
> > > > > > > > > > >  #endif
> > > > > > > > > > > --
> > > > > > > > > > > 2.21.0
> > > > > > > > > > >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-06-19 12:07                       ` Karol Herbst
@ 2019-06-19 12:12                         ` Karol Herbst
  2019-06-24 15:04                           ` Karol Herbst
  0 siblings, 1 reply; 23+ messages in thread
From: Karol Herbst @ 2019-06-19 12:12 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: nouveau, Lyude Paul, Linux PCI

ohh nvm. It was a mistake on my end. Sorry for the noise

On Wed, Jun 19, 2019 at 2:07 PM Karol Herbst <kherbst@redhat.com> wrote:
>
> Hi Bjorn,
>
> I was playing around with some older information again (write into the
> PCI config to put the card into d3 state). And there is something
> which made me very curious:
> If I put the card manually into any other state besides D0 via the
> 0x64 pci config register, the card just dies and pci core seems to
> expect this to not happen. pci_raw_set_power_state has this
> "pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);" call,
> and reads the value back later, but if the card is already gone, maybe
> we can't do this for nvidia GPUs?
>
> No idea why I didn't played around more with that register, but if the
> card already dies there then this kind of shows there is indeed an
> issue on a PCI level, no?
>
> On Mon, Jun 3, 2019 at 8:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Mon, Jun 03, 2019 at 03:18:56PM +0200, Karol Herbst wrote:
> > > @bjorn: any further ideas? Otherwise I'd like to just go ahead and fix
> > > this issue inside Nouveau and leave it there until we have a better
> > > understanding or non Nouveau cases of this issue.
> >
> > Nope, I have no more ideas.
> >
> > > On Tue, May 21, 2019 at 7:48 PM Karol Herbst <kherbst@redhat.com> wrote:
> > > >
> > > > doing the same on the bridge controller with my workarounds applied:
> > > >
> > > > please note some differences:
> > > > LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded)
> > > > SltSta: PresDet+ vs PresDet-
> > > > LnkSta2: Equalization stuff
> > > > Virtual channel: NegoPending- vs NegoPending+
> > > >
> > > > both times I executed lspci while the GPU was still suspended.
> > > >
> > > > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > > > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > > > decode])
> > > >         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > > > ParErr- Stepping- SERR- FastB2B- DisINTx-
> > > >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > >         Latency: 0
> > > >         Interrupt: pin A routed to IRQ 16
> > > >         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> > > >         I/O behind bridge: 0000e000-0000efff [size=4K]
> > > >         Memory behind bridge: ec000000-ed0fffff [size=17M]
> > > >         Prefetchable memory behind bridge:
> > > > 00000000c0000000-00000000d1ffffff [size=288M]
> > > >         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > <TAbort- <MAbort+ <SERR- <PERR-
> > > >         BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> > > >                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> > > >         Capabilities: [88] Subsystem: Dell Device 07be
> > > >         Capabilities: [80] Power Management version 3
> > > >                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > > > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > > >                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > > >         Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> > > >                 Address: 00000000  Data: 0000
> > > >         Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> > > >                 DevCap: MaxPayload 256 bytes, PhantFunc 0
> > > >                         ExtTag- RBE+
> > > >                 DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > >                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > > >                         MaxPayload 256 bytes, MaxReadReq 128 bytes
> > > >                 DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > > AuxPwr- TransPend-
> > > >                 LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> > > > Exit Latency L0s <256ns, L1 <8us
> > > >                         ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> > > >                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > > >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > >                 LnkSta: Speed 8GT/s (ok), Width x16 (ok)
> > > >                         TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
> > > >                 SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> > > > HotPlug- Surprise-
> > > >                         Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
> > > >                 SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
> > > > CmdCplt- HPIrq- LinkChg-
> > > >                         Control: AttnInd Unknown, PwrInd Unknown,
> > > > Power- Interlock-
> > > >                 SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> > > > PresDet+ Interlock-
> > > >                         Changed: MRL- PresDet+ LinkState-
> > > >                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > > > PMEIntEna- CRSVisible-
> > > >                 RootCap: CRSVisible-
> > > >                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> > > >                 DevCap2: Completion Timeout: Not Supported,
> > > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> > > >                          AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
> > > >                 DevCtl2: Completion Timeout: 50us to 50ms,
> > > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> > > >                          AtomicOpsCtl: ReqEn- EgressBlck-
> > > >                 LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> > > >                          Transmit Margin: Normal Operating Range,
> > > > EnterModifiedCompliance- ComplianceSOS-
> > > >                          Compliance De-emphasis: -6dB
> > > >                 LnkSta2: Current De-emphasis Level: -6dB,
> > > > EqualizationComplete+, EqualizationPhase1+
> > > >                          EqualizationPhase2+, EqualizationPhase3+,
> > > > LinkEqualizationRequest-
> > > >         Capabilities: [100 v1] Virtual Channel
> > > >                 Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
> > > >                 Arb:    Fixed- WRR32- WRR64- WRR128-
> > > >                 Ctrl:   ArbSelect=Fixed
> > > >                 Status: InProgress-
> > > >                 VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> > > >                         Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
> > > >                         Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> > > >                         Status: NegoPending- InProgress-
> > > >         Capabilities: [140 v1] Root Complex Link
> > > >                 Desc:   PortNumber=02 ComponentID=01 EltType=Config
> > > >                 Link0:  Desc:   TargetPort=00 TargetComponent=01
> > > > AssocRCRB- LinkType=MemMapped LinkValid+
> > > >                         Addr:   00000000fed19000
> > > >         Capabilities: [d94 v1] Secondary PCI Express <?>
> > > >         Kernel driver in use: pcieport
> > > > 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> > > > 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> > > > 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> > > > 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> > > > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> > > > 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> > > > 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> > > > b0: 40 00 03 d1 80 25 0c 00 00 00 48 00 00 00 00 00
> > > > c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> > > > d0: 43 00 1e 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > f0: 00 00 00 84 4e 01 01 20 00 00 00 00 e0 00 10 00
> > > >
> > > > On Tue, May 21, 2019 at 7:35 PM Karol Herbst <kherbst@redhat.com> wrote:
> > > > >
> > > > > was able to get the lspci prints via ssh. Machine rebooted
> > > > > automatically each time though.
> > > > >
> > > > > relevant dmesg:
> > > > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > > > kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> > > > >
> > > > > (last one is a 64 bit mmio read to get the on GPU timer value)
> > > > >
> > > > > # lspci -vvxxx -s 0:01.00
> > > > > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > > > > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > > > > decode])
> > > > >        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > > > > ParErr- Stepping- SERR- FastB2B- DisINTx-
> > > > >        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > > >        Latency: 0
> > > > >        Interrupt: pin A routed to IRQ 16
> > > > >        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> > > > >        I/O behind bridge: 0000e000-0000efff [size=4K]
> > > > >        Memory behind bridge: ec000000-ed0fffff [size=17M]
> > > > >        Prefetchable memory behind bridge:
> > > > > 00000000c0000000-00000000d1ffffff [size=288M]
> > > > >        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > > <TAbort- <MAbort+ <SERR- <PERR-
> > > > >        BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> > > > >                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> > > > >        Capabilities: [88] Subsystem: Dell Device 07be
> > > > >        Capabilities: [80] Power Management version 3
> > > > >                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > > > > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > > > >                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > > > >        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> > > > >                Address: 00000000  Data: 0000
> > > > >        Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> > > > >                DevCap: MaxPayload 256 bytes, PhantFunc 0
> > > > >                        ExtTag- RBE+
> > > > >                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > > >                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > > > >                        MaxPayload 256 bytes, MaxReadReq 128 bytes
> > > > >                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > > > AuxPwr- TransPend-
> > > > >                LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> > > > > Exit Latency L0s <256ns, L1 <8us
> > > > >                        ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> > > > >                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > > > >                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > > >                LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
> > > > >                        TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
> > > > >                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> > > > > HotPlug- Surprise-
> > > > >                        Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
> > > > >                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt-
> > > > > HPIrq- LinkChg-
> > > > >                        Control: AttnInd Unknown, PwrInd Unknown,
> > > > > Power- Interlock-
> > > > >                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> > > > > PresDet- Interlock-
> > > > >                        Changed: MRL- PresDet+ LinkState-
> > > > >                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > > > > PMEIntEna- CRSVisible-
> > > > >                RootCap: CRSVisible-
> > > > >                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> > > > >                DevCap2: Completion Timeout: Not Supported,
> > > > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> > > > >                         AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
> > > > >                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-,
> > > > > LTR+, OBFF Via WAKE# ARIFwd-
> > > > >                         AtomicOpsCtl: ReqEn- EgressBlck-
> > > > >                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> > > > >                         Transmit Margin: Normal Operating Range,
> > > > > EnterModifiedCompliance- ComplianceSOS-
> > > > >                         Compliance De-emphasis: -6dB
> > > > >                LnkSta2: Current De-emphasis Level: -6dB,
> > > > > EqualizationComplete-, EqualizationPhase1-
> > > > >                         EqualizationPhase2-, EqualizationPhase3-,
> > > > > LinkEqualizationRequest-
> > > > >        Capabilities: [100 v1] Virtual Channel
> > > > >                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
> > > > >                Arb:    Fixed- WRR32- WRR64- WRR128-
> > > > >                Ctrl:   ArbSelect=Fixed
> > > > >                Status: InProgress-
> > > > >                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> > > > >                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
> > > > >                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> > > > >                        Status: NegoPending+ InProgress-
> > > > >        Capabilities: [140 v1] Root Complex Link
> > > > >                Desc:   PortNumber=02 ComponentID=01 EltType=Config
> > > > >                Link0:  Desc:   TargetPort=00 TargetComponent=01
> > > > > AssocRCRB- LinkType=MemMapped LinkValid+
> > > > >                        Addr:   00000000fed19000
> > > > >        Capabilities: [d94 v1] Secondary PCI Express <?>
> > > > >        Kernel driver in use: pcieport
> > > > > 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> > > > > 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> > > > > 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> > > > > 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> > > > > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> > > > > 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> > > > > 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> > > > > b0: 40 00 01 d1 80 25 0c 00 00 00 08 00 00 00 00 00
> > > > > c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> > > > > d0: 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > f0: 00 40 01 00 4e 01 01 22 00 00 00 00 e0 00 10 00
> > > > >
> > > > > lspci -vvxxx -s 1:00.00
> > > > > 01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050
> > > > > Mobile] (rev ff) (prog-if ff)
> > > > >        !!! Unknown header type 7f
> > > > >        Kernel driver in use: nouveau
> > > > >        Kernel modules: nouveau
> > > > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > 40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > 50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > 60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > 70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > 80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > 90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > >
> > > > > On Tue, May 21, 2019 at 4:30 PM Karol Herbst <kherbst@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > >
> > > > > > > On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote:
> > > > > > > > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > > > > > > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > > > > > > > > Apperantly things go south if we suspend the device with a different PCIE
> > > > > > > > > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > > > > > > > > > > >
> > > > > > > > > > > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > > > > > > > > > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > > > > > > > > > > that outside of drivers?
> > > > > > > > > > >
> > > > > > > > > > > I agree it would be nice to fix this in the PCI core if that's
> > > > > > > > > > > feasible.
> > > > > > > > > > >
> > > > > > > > > > > It looks like this driver changes the PCIe link speed using some
> > > > > > > > > > > device-specific mechanism.  When we suspend, we put the device in
> > > > > > > > > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > > > > > > > > comes up at the boot speed because nothing did that device-specific
> > > > > > > > > > > magic to change it, so you probably end up with the link being slow
> > > > > > > > > > > but the driver thinking it's configured to be fast, and maybe that
> > > > > > > > > > > combination doesn't work.
> > > > > > > > > > >
> > > > > > > > > > > If it requires something device-specific to change that link speed, I
> > > > > > > > > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > > > > > > > > something?
> > > > > > > > > > >
> > > > > > > > > > > Per the PCIe spec (r4.0, sec 1.2):
> > > > > > > > > > >
> > > > > > > > > > >   Initialization – During hardware initialization, each PCI Express
> > > > > > > > > > >   Link is set up following a negotiation of Lane widths and frequency
> > > > > > > > > > >   of operation by the two agents at each end of the Link. No firmware
> > > > > > > > > > >   or operating system software is involved.
> > > > > > > > > > >
> > > > > > > > > > > I have been assuming that this means device-specific link speed
> > > > > > > > > > > management is out of spec, but it seems pretty common that devices
> > > > > > > > > > > don't come up by themselves at the fastest possible link speed.  So
> > > > > > > > > > > maybe the spec just intends that devices can operate at *some* valid
> > > > > > > > > > > speed.
> > > > > > > > > >
> > > > > > > > > > I would expect that devices kind of have to figure out what they can
> > > > > > > > > > operate on and the operating system kind of just checks what the
> > > > > > > > > > current state is and doesn't try to "restore" the old state or
> > > > > > > > > > something?
> > > > > > > > >
> > > > > > > > > The devices at each end of the link negotiate the width and speed of
> > > > > > > > > the link.  This is done directly by the hardware without any help from
> > > > > > > > > the OS.
> > > > > > > > >
> > > > > > > > > The OS can read the current link state (Current Link Speed and
> > > > > > > > > Negotiated Link Width, both in the Link Status register).  The OS has
> > > > > > > > > very little control over that state.  It can't directly restore the
> > > > > > > > > state because the hardware has to negotiate a width & speed that
> > > > > > > > > result in reliable operation.
> > > > > > > > >
> > > > > > > > > > We don't do anything in the driver after the device was suspended. And
> > > > > > > > > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > > > > > > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > > > > > > > > essentially. I have no idea how much of this is part of the actual pci
> > > > > > > > > > standard and how much is driver specific. But the driver also wants to
> > > > > > > > > > have some control over the link speed as it's tight to performance
> > > > > > > > > > states on GPU.
> > > > > > > > >
> > > > > > > > > As far as I'm aware, there is no generic PCIe way for the OS to
> > > > > > > > > influence the link width or speed.  If the GPU driver needs to do
> > > > > > > > > that, it would be via some device-specific mechanism.
> > > > > > > > >
> > > > > > > > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > > > > > > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > > > > > > > > the communication with the controller is broken. But if we set it to
> > > > > > > > > > the boot speed, resuming the GPU just works. So my assumption was,
> > > > > > > > > > that _something_ (might it be the controller or the pci subsystem)
> > > > > > > > > > tries to force to operate on an invalid link speed and because the
> > > > > > > > > > bridge controller is actually powered down as well (as all children
> > > > > > > > > > are in D3cold) I could imagine that something in the pci subsystem
> > > > > > > > > > actually restores the state which lets the controller fail to
> > > > > > > > > > establish communication again?
> > > > > > > > >
> > > > > > > > >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> > > > > > > > >      without OS/driver intervention.
> > > > > > > > >
> > > > > > > > >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> > > > > > > > >      requires driver intervention or at least some ACPI method.
> > > > > > > >
> > > > > > > > there is no driver intervention and Nouveau doesn't care at all. It's
> > > > > > > > all done on the GPU. We just upload a script and some firmware on to
> > > > > > > > the GPU. The script runs then on the PMU inside the GPU and this
> > > > > > > > script also causes changing the PCIe link settings. But from a Nouveau
> > > > > > > > point of view we don't care about the link before or after that script
> > > > > > > > was invoked. Also there is no ACPI method involved.
> > > > > > > >
> > > > > > > > But if there is something we should notify pci core about, maybe
> > > > > > > > that's something we have to do then?
> > > > > > >
> > > > > > > I don't think there's anything the PCI core could do with that
> > > > > > > information anyway.  The PCI core doesn't care at all about the link
> > > > > > > speed, and it really can't influence it directly.
> > > > > > >
> > > > > > > > >   3) Suspend puts GPU into D3cold (powered off).
> > > > > > > > >
> > > > > > > > >   4) Resume restores GPU to D0, and the Port and GPU hardware again
> > > > > > > > >      negotiate 8.0 GT/s without OS/driver intervention, just like at
> > > > > > > > >      initial boot.
> > > > > > > >
> > > > > > > > No, that negotiation fails apparently as any attempt to read anything
> > > > > > > > from the device just fails inside pci core. Or something goes wrong
> > > > > > > > when resuming the bridge controller.
> > > > > > >
> > > > > > > I'm surprised the negotiation would fail even after a power cycle of
> > > > > > > the device.  But if you can avoid the issue by running another script
> > > > > > > on the PMU before suspend, that's probably what you'll have to do.
> > > > > > >
> > > > > >
> > > > > > there is none as far as we know. Or at least nothing inside the vbios.
> > > > > > We still have to get signed PMU firmware images from Nvidia for full
> > > > > > support, but this still would be a hacky issue as we would depend on
> > > > > > those then (and without having those in  redistributable form, there
> > > > > > isn't much we can do about except fixing it on the kernel side).
> > > > > >
> > > > > > > > >   5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
> > > > > > > > >      8.0 GT/s.
> > > > > > > >
> > > > > > > > what is actually meant by "driver" here? The pci subsystem or Nouveau?
> > > > > > >
> > > > > > > I was thinking Nouveau because the PCI core doesn't care about the
> > > > > > > link speed.
> > > > > > >
> > > > > > > > > Without knowing more about the transition to 2.5 GT/s, I can't guess
> > > > > > > > > why the GPU wouldn't work after resume.  From a PCIe point of view,
> > > > > > > > > the link is supposed to work and the device should be reachable
> > > > > > > > > independent of the link speed.  But maybe there's some weird
> > > > > > > > > dependency between the GPU and the driver here.
> > > > > > > >
> > > > > > > > but the device isn't reachable at all, not even from the pci
> > > > > > > > subsystem. All reads fail/return a default error value (0xffffffff).
> > > > > > >
> > > > > > > Are these PCI config reads that return 0xffffffff?  Or MMIO reads?
> > > > > > > "lspci -vvxxxx" of the bridge and the GPU might have a clue about
> > > > > > > whether a PCI error occurred.
> > > > > > >
> > > > > >
> > > > > > that's kind of problematic as it might just lock up my machine... but
> > > > > > let me try that.
> > > > > >
> > > > > > > > > It sounds like things work if you return to 8.0 GT/s before suspend,
> > > > > > > > > things work.  That would make sense to me because then the driver's
> > > > > > > > > idea of the link state after resume would match the actual state.
> > > > > > > >
> > > > > > > > depends on what is meant by the driver here. Inside Nouveau we don't
> > > > > > > > care one bit about the current link speed, so I assume you mean
> > > > > > > > something inside the pci core code?
> > > > > > > >
> > > > > > > > > But I don't see a way to deal with this in the PCI core.  The PCI core
> > > > > > > > > does save and restore most of the architected config space around
> > > > > > > > > suspend/resume, but since this appears to be a device-specific thing,
> > > > > > > > > the PCI core would have no idea how to save/restore it.
> > > > > > > >
> > > > > > > > if we assume that the negotiation on a device level works as intended,
> > > > > > > > then I would expect this to be a pci core issue, which might actually
> > > > > > > > be not fixable there. But if it's not, then we would have to put
> > > > > > > > something like that inside the runpm documentation to tell drivers
> > > > > > > > they have to do something about it.
> > > > > > > >lspci -vvxxxx
> > > > > > > > But again, for me it just sounds like the negotiation on the device
> > > > > > > > level fails or something inside pci core messes it up.
> > > > > > >
> > > > > > > To me it sounds like the PMU script messed something up, and the PCI
> > > > > > > core has no way to know what that was or how to fix it.
> > > > > > >
> > > > > >
> > > > > > sure, I am mainly wondering why it doesn't work after we power cycled
> > > > > > the GPU and the host bridge controller, because no matter what the
> > > > > > state was before, we have to reprobe instead of relying on a known
> > > > > > state, no?
> > > > > >
> > > > > > > > > > > > Signed-off-by: Karol Herbst <kherbst@redhat.com>
> > > > > > > > > > > > Reviewed-by: Lyude Paul <lyude@redhat.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
> > > > > > > > > > > >  drm/nouveau/nvkm/subdev/pci/base.c    |  9 +++++++--
> > > > > > > > > > > >  drm/nouveau/nvkm/subdev/pci/pcie.c    | 24 ++++++++++++++++++++----
> > > > > > > > > > > >  drm/nouveau/nvkm/subdev/pci/priv.h    |  2 ++
> > > > > > > > > > > >  4 files changed, 32 insertions(+), 8 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > > > > index 1fdf3098..b23793a2 100644
> > > > > > > > > > > > --- a/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > > > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > > > > @@ -26,8 +26,9 @@ struct nvkm_pci {
> > > > > > > > > > > >       } agp;
> > > > > > > > > > > >
> > > > > > > > > > > >       struct {
> > > > > > > > > > > > -             enum nvkm_pcie_speed speed;
> > > > > > > > > > > > -             u8 width;
> > > > > > > > > > > > +             enum nvkm_pcie_speed cur_speed;
> > > > > > > > > > > > +             enum nvkm_pcie_speed def_speed;
> > > > > > > > > > > > +             u8 cur_width;
> > > > > > > > > > > >       } pcie;
> > > > > > > > > > > >
> > > > > > > > > > > >       bool msi;
> > > > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > > > > index ee2431a7..d9fb5a83 100644
> > > > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
> > > > > > > > > > > >
> > > > > > > > > > > >       if (pci->agp.bridge)
> > > > > > > > > > > >               nvkm_agp_fini(pci);
> > > > > > > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > > > > > > +             nvkm_pcie_fini(pci);
> > > > > > > > > > > >
> > > > > > > > > > > >       return 0;
> > > > > > > > > > > >  }
> > > > > > > > > > > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
> > > > > > > > > > > >       struct nvkm_pci *pci = nvkm_pci(subdev);
> > > > > > > > > > > >       if (pci->agp.bridge)
> > > > > > > > > > > >               nvkm_agp_preinit(pci);
> > > > > > > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > > > > > > +             nvkm_pcie_preinit(pci);
> > > > > > > > > > > >       return 0;
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
> > > > > > > > > > > >       pci->func = func;
> > > > > > > > > > > >       pci->pdev = device->func->pci(device)->pdev;
> > > > > > > > > > > >       pci->irq = -1;
> > > > > > > > > > > > -     pci->pcie.speed = -1;
> > > > > > > > > > > > -     pci->pcie.width = -1;
> > > > > > > > > > > > +     pci->pcie.cur_speed = -1;
> > > > > > > > > > > > +     pci->pcie.def_speed = -1;
> > > > > > > > > > > > +     pci->pcie.cur_width = -1;
> > > > > > > > > > > >
> > > > > > > > > > > >       if (device->type == NVKM_DEVICE_AGP)
> > > > > > > > > > > >               nvkm_agp_ctor(pci);
> > > > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > > > > index 70ccbe0d..731dd30e 100644
> > > > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > > > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
> > > > > > > > > > > >       return 0;
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > +int
> > > > > > > > > > > > +nvkm_pcie_preinit(struct nvkm_pci *pci)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +     pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
> > > > > > > > > > > > +     return 0;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > >  int
> > > > > > > > > > > >  nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > > > > > >  {
> > > > > > > > > > > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > > > > > >       if (pci->func->pcie.init)
> > > > > > > > > > > >               pci->func->pcie.init(pci);
> > > > > > > > > > > >
> > > > > > > > > > > > -     if (pci->pcie.speed != -1)
> > > > > > > > > > > > -             nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> > > > > > > > > > > > +     if (pci->pcie.cur_speed != -1)
> > > > > > > > > > > > +             nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> > > > > > > > > > > > +                                pci->pcie.cur_width);
> > > > > > > > > > > >
> > > > > > > > > > > >       return 0;
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > +int
> > > > > > > > > > > > +nvkm_pcie_fini(struct nvkm_pci *pci)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +     if (!IS_ERR_VALUE(pci->pcie.def_speed))
> > > > > > > > > > > > +             return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> > > > > > > > > > > > +     return 0;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > >  int
> > > > > > > > > > > >  nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > > > > > >  {
> > > > > > > > > > > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > > > > > >               speed = max_speed;
> > > > > > > > > > > >       }
> > > > > > > > > > > >
> > > > > > > > > > > > -     pci->pcie.speed = speed;
> > > > > > > > > > > > -     pci->pcie.width = width;
> > > > > > > > > > > > +     pci->pcie.cur_speed = speed;
> > > > > > > > > > > > +     pci->pcie.cur_width = width;
> > > > > > > > > > > >
> > > > > > > > > > > >       if (speed == cur_speed) {
> > > > > > > > > > > >               nvkm_debug(subdev, "requested matches current speed\n");
> > > > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > > > > index a0d4c007..e7744671 100644
> > > > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
> > > > > > > > > > > >  int gk104_pcie_version_supported(struct nvkm_pci *);
> > > > > > > > > > > >
> > > > > > > > > > > >  int nvkm_pcie_oneinit(struct nvkm_pci *);
> > > > > > > > > > > > +int nvkm_pcie_preinit(struct nvkm_pci *);
> > > > > > > > > > > >  int nvkm_pcie_init(struct nvkm_pci *);
> > > > > > > > > > > > +int nvkm_pcie_fini(struct nvkm_pci *);
> > > > > > > > > > > >  #endif
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.21.0
> > > > > > > > > > > >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini
  2019-06-19 12:12                         ` Karol Herbst
@ 2019-06-24 15:04                           ` Karol Herbst
  0 siblings, 0 replies; 23+ messages in thread
From: Karol Herbst @ 2019-06-24 15:04 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: nouveau, Lyude Paul, Linux PCI

Hi Bjorn,

I managed to trigger the issue from userspace without having to call
into any ACPI code. Wrote a little script[1] which essentially does
(and prints) this (I've added some comments):

PCI SCAN
# set PCIe link speed to 2.5. Bug doesn't trigger with 5.0 or 8.0
# NVA is a small lib we use to poke into MMIO registers of the GPU in
order to easier reverse engineer them.
NVA R 0x8c040 80005800
NVA W 0x8c040 80085800
NVA R 0x8c040 80085800
NVA W 0x8c040 80085801
NVA R 0x8c040 80085800
# put GPU into d3hot via the PCI config
PCI R 0000:01:00.0 0x64 00000008
PCI W 0000:01:00.0 0x64 0000000b
PCI R 0000:01:00.0 0x64 0000000b
# slot ctrl/status
PCI R 0000:00:01.0 0xb8 00480000
# link ctrl/status 2
PCI R 0000:00:01.0 0xd0 001e0043
# bit 7 in 0x248 is called Q0L2 inside the ACPI code
# is set by the ACPI platform method to runtime suspend the GPU
# \_SB.PCI0.PEG0.PG00._OFF on my machine for the parent method
PCI W 0000:00:01.0 0x248 00000080
PCI R 0000:00:01.0 0x248 00000000
# now things are super messed up <====!!
# slot ctrl/status:
# bit 6 got cleared which indicates the bus doesn't detect any device
connected to the slot
# bit stays set if the PCIe link speed was set to 5.0 or 8.0
PCI R 0000:00:01.0 0xb8 00080000
# link ctrl/status 2
# apparently equalization wasn't executed, but maybe that makes sense
if the bridge controller doesn't see any devices
PCI R 0000:00:01.0 0xd0 00000043
# reading from the GPU now fails
PCI R 0000:01:00.0 0x64 ffffffff
GPU PCI link in broken state. GPU might not come back on PCI bus rescan.
PCI REMOVE 0000:01:00.0
PCI SCAN
# the GPU wasn't able to be found after a rescan
setpci: Warning: No devices selected for "0x64.L".
PCI R 0000:01:00.0 0x64

I think Q0L2 should set the Link into L2 state, but I have no idea
what implications that has. Also the write to Q0L2 is guarded by some
conditions inside the ACPI code, but I don't know how much we can
trust vendors to have those guards and what they do at all. At this
point I am inclined to say that this might actually be a hardware or
firmware bug or something.

At this point I guess it makes sense to poke Intel folks, but I
wouldn't know who exactly to ask here. Any thoughts or ideas?

Sadly I don't know the PCIe spec enough (to be honest, I just read a
bit in the spec yesterday) to say what this Q0L2 field is actually
doing. That register also doesn't seem to be documented by Intel afaik
[2].

[1]: script: https://gist.githubusercontent.com/karolherbst/0d0c369a0c14dc092fcb0f5c854dd79c/raw/daab0c3a3d0aa94c270eba44b811271042dd2756/script.sh
[2]: intel KBL platform datasheet:
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/7th-gen-core-family-mobile-h-processor-lines-datasheet-vol-2.pdf

On Wed, Jun 19, 2019 at 2:12 PM Karol Herbst <kherbst@redhat.com> wrote:
>
> ohh nvm. It was a mistake on my end. Sorry for the noise
>
> On Wed, Jun 19, 2019 at 2:07 PM Karol Herbst <kherbst@redhat.com> wrote:
> >
> > Hi Bjorn,
> >
> > I was playing around with some older information again (write into the
> > PCI config to put the card into d3 state). And there is something
> > which made me very curious:
> > If I put the card manually into any other state besides D0 via the
> > 0x64 pci config register, the card just dies and pci core seems to
> > expect this to not happen. pci_raw_set_power_state has this
> > "pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);" call,
> > and reads the value back later, but if the card is already gone, maybe
> > we can't do this for nvidia GPUs?
> >
> > No idea why I didn't played around more with that register, but if the
> > card already dies there then this kind of shows there is indeed an
> > issue on a PCI level, no?
> >
> > On Mon, Jun 3, 2019 at 8:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > On Mon, Jun 03, 2019 at 03:18:56PM +0200, Karol Herbst wrote:
> > > > @bjorn: any further ideas? Otherwise I'd like to just go ahead and fix
> > > > this issue inside Nouveau and leave it there until we have a better
> > > > understanding or non Nouveau cases of this issue.
> > >
> > > Nope, I have no more ideas.
> > >
> > > > On Tue, May 21, 2019 at 7:48 PM Karol Herbst <kherbst@redhat.com> wrote:
> > > > >
> > > > > doing the same on the bridge controller with my workarounds applied:
> > > > >
> > > > > please note some differences:
> > > > > LnkSta: Speed 8GT/s (ok) vs Speed 2.5GT/s (downgraded)
> > > > > SltSta: PresDet+ vs PresDet-
> > > > > LnkSta2: Equalization stuff
> > > > > Virtual channel: NegoPending- vs NegoPending+
> > > > >
> > > > > both times I executed lspci while the GPU was still suspended.
> > > > >
> > > > > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > > > > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > > > > decode])
> > > > >         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > > > > ParErr- Stepping- SERR- FastB2B- DisINTx-
> > > > >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > > >         Latency: 0
> > > > >         Interrupt: pin A routed to IRQ 16
> > > > >         Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> > > > >         I/O behind bridge: 0000e000-0000efff [size=4K]
> > > > >         Memory behind bridge: ec000000-ed0fffff [size=17M]
> > > > >         Prefetchable memory behind bridge:
> > > > > 00000000c0000000-00000000d1ffffff [size=288M]
> > > > >         Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > > <TAbort- <MAbort+ <SERR- <PERR-
> > > > >         BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> > > > >                 PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> > > > >         Capabilities: [88] Subsystem: Dell Device 07be
> > > > >         Capabilities: [80] Power Management version 3
> > > > >                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > > > > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > > > >                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > > > >         Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> > > > >                 Address: 00000000  Data: 0000
> > > > >         Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> > > > >                 DevCap: MaxPayload 256 bytes, PhantFunc 0
> > > > >                         ExtTag- RBE+
> > > > >                 DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > > >                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > > > >                         MaxPayload 256 bytes, MaxReadReq 128 bytes
> > > > >                 DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > > > AuxPwr- TransPend-
> > > > >                 LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> > > > > Exit Latency L0s <256ns, L1 <8us
> > > > >                         ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> > > > >                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > > > >                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > > >                 LnkSta: Speed 8GT/s (ok), Width x16 (ok)
> > > > >                         TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
> > > > >                 SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> > > > > HotPlug- Surprise-
> > > > >                         Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
> > > > >                 SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet-
> > > > > CmdCplt- HPIrq- LinkChg-
> > > > >                         Control: AttnInd Unknown, PwrInd Unknown,
> > > > > Power- Interlock-
> > > > >                 SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> > > > > PresDet+ Interlock-
> > > > >                         Changed: MRL- PresDet+ LinkState-
> > > > >                 RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > > > > PMEIntEna- CRSVisible-
> > > > >                 RootCap: CRSVisible-
> > > > >                 RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> > > > >                 DevCap2: Completion Timeout: Not Supported,
> > > > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> > > > >                          AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
> > > > >                 DevCtl2: Completion Timeout: 50us to 50ms,
> > > > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> > > > >                          AtomicOpsCtl: ReqEn- EgressBlck-
> > > > >                 LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> > > > >                          Transmit Margin: Normal Operating Range,
> > > > > EnterModifiedCompliance- ComplianceSOS-
> > > > >                          Compliance De-emphasis: -6dB
> > > > >                 LnkSta2: Current De-emphasis Level: -6dB,
> > > > > EqualizationComplete+, EqualizationPhase1+
> > > > >                          EqualizationPhase2+, EqualizationPhase3+,
> > > > > LinkEqualizationRequest-
> > > > >         Capabilities: [100 v1] Virtual Channel
> > > > >                 Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
> > > > >                 Arb:    Fixed- WRR32- WRR64- WRR128-
> > > > >                 Ctrl:   ArbSelect=Fixed
> > > > >                 Status: InProgress-
> > > > >                 VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> > > > >                         Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
> > > > >                         Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> > > > >                         Status: NegoPending- InProgress-
> > > > >         Capabilities: [140 v1] Root Complex Link
> > > > >                 Desc:   PortNumber=02 ComponentID=01 EltType=Config
> > > > >                 Link0:  Desc:   TargetPort=00 TargetComponent=01
> > > > > AssocRCRB- LinkType=MemMapped LinkValid+
> > > > >                         Addr:   00000000fed19000
> > > > >         Capabilities: [d94 v1] Secondary PCI Express <?>
> > > > >         Kernel driver in use: pcieport
> > > > > 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> > > > > 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> > > > > 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> > > > > 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> > > > > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> > > > > 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> > > > > 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> > > > > b0: 40 00 03 d1 80 25 0c 00 00 00 48 00 00 00 00 00
> > > > > c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> > > > > d0: 43 00 1e 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > f0: 00 00 00 84 4e 01 01 20 00 00 00 00 e0 00 10 00
> > > > >
> > > > > On Tue, May 21, 2019 at 7:35 PM Karol Herbst <kherbst@redhat.com> wrote:
> > > > > >
> > > > > > was able to get the lspci prints via ssh. Machine rebooted
> > > > > > automatically each time though.
> > > > > >
> > > > > > relevant dmesg:
> > > > > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > > > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > > > > kernel: nouveau 0000:01:00.0: Refused to change power state, currently in D3
> > > > > > kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
> > > > > >
> > > > > > (last one is a 64 bit mmio read to get the on GPU timer value)
> > > > > >
> > > > > > # lspci -vvxxx -s 0:01.00
> > > > > > 00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th
> > > > > > Gen Core Processor PCIe Controller (x16) (rev 05) (prog-if 00 [Normal
> > > > > > decode])
> > > > > >        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > > > > > ParErr- Stepping- SERR- FastB2B- DisINTx-
> > > > > >        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > > > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > > > >        Latency: 0
> > > > > >        Interrupt: pin A routed to IRQ 16
> > > > > >        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> > > > > >        I/O behind bridge: 0000e000-0000efff [size=4K]
> > > > > >        Memory behind bridge: ec000000-ed0fffff [size=17M]
> > > > > >        Prefetchable memory behind bridge:
> > > > > > 00000000c0000000-00000000d1ffffff [size=288M]
> > > > > >        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
> > > > > > <TAbort- <MAbort+ <SERR- <PERR-
> > > > > >        BridgeCtl: Parity- SERR- NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
> > > > > >                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> > > > > >        Capabilities: [88] Subsystem: Dell Device 07be
> > > > > >        Capabilities: [80] Power Management version 3
> > > > > >                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > > > > > PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > > > > >                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > > > > >        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
> > > > > >                Address: 00000000  Data: 0000
> > > > > >        Capabilities: [a0] Express (v2) Root Port (Slot+), MSI 00
> > > > > >                DevCap: MaxPayload 256 bytes, PhantFunc 0
> > > > > >                        ExtTag- RBE+
> > > > > >                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > > > >                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > > > > >                        MaxPayload 256 bytes, MaxReadReq 128 bytes
> > > > > >                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > > > > AuxPwr- TransPend-
> > > > > >                LnkCap: Port #2, Speed 8GT/s, Width x16, ASPM L0s L1,
> > > > > > Exit Latency L0s <256ns, L1 <8us
> > > > > >                        ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp+
> > > > > >                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > > > > >                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > > > > >                LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
> > > > > >                        TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
> > > > > >                SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd-
> > > > > > HotPlug- Surprise-
> > > > > >                        Slot #1, PowerLimit 75.000W; Interlock- NoCompl+
> > > > > >                SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt-
> > > > > > HPIrq- LinkChg-
> > > > > >                        Control: AttnInd Unknown, PwrInd Unknown,
> > > > > > Power- Interlock-
> > > > > >                SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt-
> > > > > > PresDet- Interlock-
> > > > > >                        Changed: MRL- PresDet+ LinkState-
> > > > > >                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
> > > > > > PMEIntEna- CRSVisible-
> > > > > >                RootCap: CRSVisible-
> > > > > >                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> > > > > >                DevCap2: Completion Timeout: Not Supported,
> > > > > > TimeoutDis-, LTR+, OBFF Via WAKE# ARIFwd-
> > > > > >                         AtomicOpsCap: Routing- 32bit+ 64bit+ 128bitCAS+
> > > > > >                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-,
> > > > > > LTR+, OBFF Via WAKE# ARIFwd-
> > > > > >                         AtomicOpsCtl: ReqEn- EgressBlck-
> > > > > >                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> > > > > >                         Transmit Margin: Normal Operating Range,
> > > > > > EnterModifiedCompliance- ComplianceSOS-
> > > > > >                         Compliance De-emphasis: -6dB
> > > > > >                LnkSta2: Current De-emphasis Level: -6dB,
> > > > > > EqualizationComplete-, EqualizationPhase1-
> > > > > >                         EqualizationPhase2-, EqualizationPhase3-,
> > > > > > LinkEqualizationRequest-
> > > > > >        Capabilities: [100 v1] Virtual Channel
> > > > > >                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
> > > > > >                Arb:    Fixed- WRR32- WRR64- WRR128-
> > > > > >                Ctrl:   ArbSelect=Fixed
> > > > > >                Status: InProgress-
> > > > > >                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> > > > > >                        Arb:    Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
> > > > > >                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> > > > > >                        Status: NegoPending+ InProgress-
> > > > > >        Capabilities: [140 v1] Root Complex Link
> > > > > >                Desc:   PortNumber=02 ComponentID=01 EltType=Config
> > > > > >                Link0:  Desc:   TargetPort=00 TargetComponent=01
> > > > > > AssocRCRB- LinkType=MemMapped LinkValid+
> > > > > >                        Addr:   00000000fed19000
> > > > > >        Capabilities: [d94 v1] Secondary PCI Express <?>
> > > > > >        Kernel driver in use: pcieport
> > > > > > 00: 86 80 01 19 07 00 10 00 05 00 04 06 00 00 81 00
> > > > > > 10: 00 00 00 00 00 00 00 00 00 01 01 00 e0 e0 00 20
> > > > > > 20: 00 ec 00 ed 01 c0 f1 d1 00 00 00 00 00 00 00 00
> > > > > > 30: 00 00 00 00 88 00 00 00 00 00 00 00 ff 01 10 00
> > > > > > 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > 50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > 70: 00 00 00 00 00 00 00 00 00 62 17 00 00 00 00 0a
> > > > > > 80: 01 90 03 c8 08 00 00 00 0d 80 00 00 28 10 be 07
> > > > > > 90: 05 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > a0: 10 00 42 01 01 80 00 00 20 00 00 00 03 ad 61 02
> > > > > > b0: 40 00 01 d1 80 25 0c 00 00 00 08 00 00 00 00 00
> > > > > > c0: 00 00 00 00 80 0b 08 00 00 64 00 00 0e 00 00 00
> > > > > > d0: 43 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > > > f0: 00 40 01 00 4e 01 01 22 00 00 00 00 e0 00 10 00
> > > > > >
> > > > > > lspci -vvxxx -s 1:00.00
> > > > > > 01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050
> > > > > > Mobile] (rev ff) (prog-if ff)
> > > > > >        !!! Unknown header type 7f
> > > > > >        Kernel driver in use: nouveau
> > > > > >        Kernel modules: nouveau
> > > > > > 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 70: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > 90: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > a0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > b0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > c0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > d0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > e0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > > f0: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> > > > > >
> > > > > > On Tue, May 21, 2019 at 4:30 PM Karol Herbst <kherbst@redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, May 21, 2019 at 4:13 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Tue, May 21, 2019 at 03:28:48PM +0200, Karol Herbst wrote:
> > > > > > > > > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > > > > > > > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > > > > > > > > > Apperantly things go south if we suspend the device with a different PCIE
> > > > > > > > > > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > > > > > > > > > > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > > > > > > > > > > > that outside of drivers?
> > > > > > > > > > > >
> > > > > > > > > > > > I agree it would be nice to fix this in the PCI core if that's
> > > > > > > > > > > > feasible.
> > > > > > > > > > > >
> > > > > > > > > > > > It looks like this driver changes the PCIe link speed using some
> > > > > > > > > > > > device-specific mechanism.  When we suspend, we put the device in
> > > > > > > > > > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > > > > > > > > > comes up at the boot speed because nothing did that device-specific
> > > > > > > > > > > > magic to change it, so you probably end up with the link being slow
> > > > > > > > > > > > but the driver thinking it's configured to be fast, and maybe that
> > > > > > > > > > > > combination doesn't work.
> > > > > > > > > > > >
> > > > > > > > > > > > If it requires something device-specific to change that link speed, I
> > > > > > > > > > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > > > > > > > > > something?
> > > > > > > > > > > >
> > > > > > > > > > > > Per the PCIe spec (r4.0, sec 1.2):
> > > > > > > > > > > >
> > > > > > > > > > > >   Initialization – During hardware initialization, each PCI Express
> > > > > > > > > > > >   Link is set up following a negotiation of Lane widths and frequency
> > > > > > > > > > > >   of operation by the two agents at each end of the Link. No firmware
> > > > > > > > > > > >   or operating system software is involved.
> > > > > > > > > > > >
> > > > > > > > > > > > I have been assuming that this means device-specific link speed
> > > > > > > > > > > > management is out of spec, but it seems pretty common that devices
> > > > > > > > > > > > don't come up by themselves at the fastest possible link speed.  So
> > > > > > > > > > > > maybe the spec just intends that devices can operate at *some* valid
> > > > > > > > > > > > speed.
> > > > > > > > > > >
> > > > > > > > > > > I would expect that devices kind of have to figure out what they can
> > > > > > > > > > > operate on and the operating system kind of just checks what the
> > > > > > > > > > > current state is and doesn't try to "restore" the old state or
> > > > > > > > > > > something?
> > > > > > > > > >
> > > > > > > > > > The devices at each end of the link negotiate the width and speed of
> > > > > > > > > > the link.  This is done directly by the hardware without any help from
> > > > > > > > > > the OS.
> > > > > > > > > >
> > > > > > > > > > The OS can read the current link state (Current Link Speed and
> > > > > > > > > > Negotiated Link Width, both in the Link Status register).  The OS has
> > > > > > > > > > very little control over that state.  It can't directly restore the
> > > > > > > > > > state because the hardware has to negotiate a width & speed that
> > > > > > > > > > result in reliable operation.
> > > > > > > > > >
> > > > > > > > > > > We don't do anything in the driver after the device was suspended. And
> > > > > > > > > > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > > > > > > > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > > > > > > > > > essentially. I have no idea how much of this is part of the actual pci
> > > > > > > > > > > standard and how much is driver specific. But the driver also wants to
> > > > > > > > > > > have some control over the link speed as it's tight to performance
> > > > > > > > > > > states on GPU.
> > > > > > > > > >
> > > > > > > > > > As far as I'm aware, there is no generic PCIe way for the OS to
> > > > > > > > > > influence the link width or speed.  If the GPU driver needs to do
> > > > > > > > > > that, it would be via some device-specific mechanism.
> > > > > > > > > >
> > > > > > > > > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > > > > > > > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > > > > > > > > > the communication with the controller is broken. But if we set it to
> > > > > > > > > > > the boot speed, resuming the GPU just works. So my assumption was,
> > > > > > > > > > > that _something_ (might it be the controller or the pci subsystem)
> > > > > > > > > > > tries to force to operate on an invalid link speed and because the
> > > > > > > > > > > bridge controller is actually powered down as well (as all children
> > > > > > > > > > > are in D3cold) I could imagine that something in the pci subsystem
> > > > > > > > > > > actually restores the state which lets the controller fail to
> > > > > > > > > > > establish communication again?
> > > > > > > > > >
> > > > > > > > > >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> > > > > > > > > >      without OS/driver intervention.
> > > > > > > > > >
> > > > > > > > > >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> > > > > > > > > >      requires driver intervention or at least some ACPI method.
> > > > > > > > >
> > > > > > > > > there is no driver intervention and Nouveau doesn't care at all. It's
> > > > > > > > > all done on the GPU. We just upload a script and some firmware on to
> > > > > > > > > the GPU. The script runs then on the PMU inside the GPU and this
> > > > > > > > > script also causes changing the PCIe link settings. But from a Nouveau
> > > > > > > > > point of view we don't care about the link before or after that script
> > > > > > > > > was invoked. Also there is no ACPI method involved.
> > > > > > > > >
> > > > > > > > > But if there is something we should notify pci core about, maybe
> > > > > > > > > that's something we have to do then?
> > > > > > > >
> > > > > > > > I don't think there's anything the PCI core could do with that
> > > > > > > > information anyway.  The PCI core doesn't care at all about the link
> > > > > > > > speed, and it really can't influence it directly.
> > > > > > > >
> > > > > > > > > >   3) Suspend puts GPU into D3cold (powered off).
> > > > > > > > > >
> > > > > > > > > >   4) Resume restores GPU to D0, and the Port and GPU hardware again
> > > > > > > > > >      negotiate 8.0 GT/s without OS/driver intervention, just like at
> > > > > > > > > >      initial boot.
> > > > > > > > >
> > > > > > > > > No, that negotiation fails apparently as any attempt to read anything
> > > > > > > > > from the device just fails inside pci core. Or something goes wrong
> > > > > > > > > when resuming the bridge controller.
> > > > > > > >
> > > > > > > > I'm surprised the negotiation would fail even after a power cycle of
> > > > > > > > the device.  But if you can avoid the issue by running another script
> > > > > > > > on the PMU before suspend, that's probably what you'll have to do.
> > > > > > > >
> > > > > > >
> > > > > > > there is none as far as we know. Or at least nothing inside the vbios.
> > > > > > > We still have to get signed PMU firmware images from Nvidia for full
> > > > > > > support, but this still would be a hacky issue as we would depend on
> > > > > > > those then (and without having those in  redistributable form, there
> > > > > > > isn't much we can do about except fixing it on the kernel side).
> > > > > > >
> > > > > > > > > >   5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
> > > > > > > > > >      8.0 GT/s.
> > > > > > > > >
> > > > > > > > > what is actually meant by "driver" here? The pci subsystem or Nouveau?
> > > > > > > >
> > > > > > > > I was thinking Nouveau because the PCI core doesn't care about the
> > > > > > > > link speed.
> > > > > > > >
> > > > > > > > > > Without knowing more about the transition to 2.5 GT/s, I can't guess
> > > > > > > > > > why the GPU wouldn't work after resume.  From a PCIe point of view,
> > > > > > > > > > the link is supposed to work and the device should be reachable
> > > > > > > > > > independent of the link speed.  But maybe there's some weird
> > > > > > > > > > dependency between the GPU and the driver here.
> > > > > > > > >
> > > > > > > > > but the device isn't reachable at all, not even from the pci
> > > > > > > > > subsystem. All reads fail/return a default error value (0xffffffff).
> > > > > > > >
> > > > > > > > Are these PCI config reads that return 0xffffffff?  Or MMIO reads?
> > > > > > > > "lspci -vvxxxx" of the bridge and the GPU might have a clue about
> > > > > > > > whether a PCI error occurred.
> > > > > > > >
> > > > > > >
> > > > > > > that's kind of problematic as it might just lock up my machine... but
> > > > > > > let me try that.
> > > > > > >
> > > > > > > > > > It sounds like things work if you return to 8.0 GT/s before suspend,
> > > > > > > > > > things work.  That would make sense to me because then the driver's
> > > > > > > > > > idea of the link state after resume would match the actual state.
> > > > > > > > >
> > > > > > > > > depends on what is meant by the driver here. Inside Nouveau we don't
> > > > > > > > > care one bit about the current link speed, so I assume you mean
> > > > > > > > > something inside the pci core code?
> > > > > > > > >
> > > > > > > > > > But I don't see a way to deal with this in the PCI core.  The PCI core
> > > > > > > > > > does save and restore most of the architected config space around
> > > > > > > > > > suspend/resume, but since this appears to be a device-specific thing,
> > > > > > > > > > the PCI core would have no idea how to save/restore it.
> > > > > > > > >
> > > > > > > > > if we assume that the negotiation on a device level works as intended,
> > > > > > > > > then I would expect this to be a pci core issue, which might actually
> > > > > > > > > be not fixable there. But if it's not, then we would have to put
> > > > > > > > > something like that inside the runpm documentation to tell drivers
> > > > > > > > > they have to do something about it.
> > > > > > > > >lspci -vvxxxx
> > > > > > > > > But again, for me it just sounds like the negotiation on the device
> > > > > > > > > level fails or something inside pci core messes it up.
> > > > > > > >
> > > > > > > > To me it sounds like the PMU script messed something up, and the PCI
> > > > > > > > core has no way to know what that was or how to fix it.
> > > > > > > >
> > > > > > >
> > > > > > > sure, I am mainly wondering why it doesn't work after we power cycled
> > > > > > > the GPU and the host bridge controller, because no matter what the
> > > > > > > state was before, we have to reprobe instead of relying on a known
> > > > > > > state, no?
> > > > > > >
> > > > > > > > > > > > > Signed-off-by: Karol Herbst <kherbst@redhat.com>
> > > > > > > > > > > > > Reviewed-by: Lyude Paul <lyude@redhat.com>
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  drm/nouveau/include/nvkm/subdev/pci.h |  5 +++--
> > > > > > > > > > > > >  drm/nouveau/nvkm/subdev/pci/base.c    |  9 +++++++--
> > > > > > > > > > > > >  drm/nouveau/nvkm/subdev/pci/pcie.c    | 24 ++++++++++++++++++++----
> > > > > > > > > > > > >  drm/nouveau/nvkm/subdev/pci/priv.h    |  2 ++
> > > > > > > > > > > > >  4 files changed, 32 insertions(+), 8 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drm/nouveau/include/nvkm/subdev/pci.h b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > > > > > index 1fdf3098..b23793a2 100644
> > > > > > > > > > > > > --- a/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > > > > > +++ b/drm/nouveau/include/nvkm/subdev/pci.h
> > > > > > > > > > > > > @@ -26,8 +26,9 @@ struct nvkm_pci {
> > > > > > > > > > > > >       } agp;
> > > > > > > > > > > > >
> > > > > > > > > > > > >       struct {
> > > > > > > > > > > > > -             enum nvkm_pcie_speed speed;
> > > > > > > > > > > > > -             u8 width;
> > > > > > > > > > > > > +             enum nvkm_pcie_speed cur_speed;
> > > > > > > > > > > > > +             enum nvkm_pcie_speed def_speed;
> > > > > > > > > > > > > +             u8 cur_width;
> > > > > > > > > > > > >       } pcie;
> > > > > > > > > > > > >
> > > > > > > > > > > > >       bool msi;
> > > > > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/base.c b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > > > > > index ee2431a7..d9fb5a83 100644
> > > > > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/base.c
> > > > > > > > > > > > > @@ -90,6 +90,8 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend)
> > > > > > > > > > > > >
> > > > > > > > > > > > >       if (pci->agp.bridge)
> > > > > > > > > > > > >               nvkm_agp_fini(pci);
> > > > > > > > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > > > > > > > +             nvkm_pcie_fini(pci);
> > > > > > > > > > > > >
> > > > > > > > > > > > >       return 0;
> > > > > > > > > > > > >  }
> > > > > > > > > > > > > @@ -100,6 +102,8 @@ nvkm_pci_preinit(struct nvkm_subdev *subdev)
> > > > > > > > > > > > >       struct nvkm_pci *pci = nvkm_pci(subdev);
> > > > > > > > > > > > >       if (pci->agp.bridge)
> > > > > > > > > > > > >               nvkm_agp_preinit(pci);
> > > > > > > > > > > > > +     else if (pci_is_pcie(pci->pdev))
> > > > > > > > > > > > > +             nvkm_pcie_preinit(pci);
> > > > > > > > > > > > >       return 0;
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > @@ -193,8 +197,9 @@ nvkm_pci_new_(const struct nvkm_pci_func *func, struct nvkm_device *device,
> > > > > > > > > > > > >       pci->func = func;
> > > > > > > > > > > > >       pci->pdev = device->func->pci(device)->pdev;
> > > > > > > > > > > > >       pci->irq = -1;
> > > > > > > > > > > > > -     pci->pcie.speed = -1;
> > > > > > > > > > > > > -     pci->pcie.width = -1;
> > > > > > > > > > > > > +     pci->pcie.cur_speed = -1;
> > > > > > > > > > > > > +     pci->pcie.def_speed = -1;
> > > > > > > > > > > > > +     pci->pcie.cur_width = -1;
> > > > > > > > > > > > >
> > > > > > > > > > > > >       if (device->type == NVKM_DEVICE_AGP)
> > > > > > > > > > > > >               nvkm_agp_ctor(pci);
> > > > > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/pcie.c b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > > > > > index 70ccbe0d..731dd30e 100644
> > > > > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/pcie.c
> > > > > > > > > > > > > @@ -85,6 +85,13 @@ nvkm_pcie_oneinit(struct nvkm_pci *pci)
> > > > > > > > > > > > >       return 0;
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > +int
> > > > > > > > > > > > > +nvkm_pcie_preinit(struct nvkm_pci *pci)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +     pci->pcie.def_speed = nvkm_pcie_get_speed(pci);
> > > > > > > > > > > > > +     return 0;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > >  int
> > > > > > > > > > > > >  nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > > > > > > >  {
> > > > > > > > > > > > > @@ -105,12 +112,21 @@ nvkm_pcie_init(struct nvkm_pci *pci)
> > > > > > > > > > > > >       if (pci->func->pcie.init)
> > > > > > > > > > > > >               pci->func->pcie.init(pci);
> > > > > > > > > > > > >
> > > > > > > > > > > > > -     if (pci->pcie.speed != -1)
> > > > > > > > > > > > > -             nvkm_pcie_set_link(pci, pci->pcie.speed, pci->pcie.width);
> > > > > > > > > > > > > +     if (pci->pcie.cur_speed != -1)
> > > > > > > > > > > > > +             nvkm_pcie_set_link(pci, pci->pcie.cur_speed,
> > > > > > > > > > > > > +                                pci->pcie.cur_width);
> > > > > > > > > > > > >
> > > > > > > > > > > > >       return 0;
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > +int
> > > > > > > > > > > > > +nvkm_pcie_fini(struct nvkm_pci *pci)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +     if (!IS_ERR_VALUE(pci->pcie.def_speed))
> > > > > > > > > > > > > +             return nvkm_pcie_set_link(pci, pci->pcie.def_speed, 16);
> > > > > > > > > > > > > +     return 0;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > >  int
> > > > > > > > > > > > >  nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > > > > > > >  {
> > > > > > > > > > > > > @@ -146,8 +162,8 @@ nvkm_pcie_set_link(struct nvkm_pci *pci, enum nvkm_pcie_speed speed, u8 width)
> > > > > > > > > > > > >               speed = max_speed;
> > > > > > > > > > > > >       }
> > > > > > > > > > > > >
> > > > > > > > > > > > > -     pci->pcie.speed = speed;
> > > > > > > > > > > > > -     pci->pcie.width = width;
> > > > > > > > > > > > > +     pci->pcie.cur_speed = speed;
> > > > > > > > > > > > > +     pci->pcie.cur_width = width;
> > > > > > > > > > > > >
> > > > > > > > > > > > >       if (speed == cur_speed) {
> > > > > > > > > > > > >               nvkm_debug(subdev, "requested matches current speed\n");
> > > > > > > > > > > > > diff --git a/drm/nouveau/nvkm/subdev/pci/priv.h b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > > > > > index a0d4c007..e7744671 100644
> > > > > > > > > > > > > --- a/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > > > > > +++ b/drm/nouveau/nvkm/subdev/pci/priv.h
> > > > > > > > > > > > > @@ -60,5 +60,7 @@ enum nvkm_pcie_speed gk104_pcie_max_speed(struct nvkm_pci *);
> > > > > > > > > > > > >  int gk104_pcie_version_supported(struct nvkm_pci *);
> > > > > > > > > > > > >
> > > > > > > > > > > > >  int nvkm_pcie_oneinit(struct nvkm_pci *);
> > > > > > > > > > > > > +int nvkm_pcie_preinit(struct nvkm_pci *);
> > > > > > > > > > > > >  int nvkm_pcie_init(struct nvkm_pci *);
> > > > > > > > > > > > > +int nvkm_pcie_fini(struct nvkm_pci *);
> > > > > > > > > > > > >  #endif
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.21.0
> > > > > > > > > > > > >

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-06-24 15:04 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-07 20:12 [PATCH v2 0/4] Potential fix for runpm issues on various laptops Karol Herbst
2019-05-07 20:12 ` [PATCH v2 1/4] drm: don't set the pci power state if the pci subsystem handles the ACPI bits Karol Herbst
2019-05-08 19:10   ` Lyude Paul
2019-05-07 20:12 ` [PATCH v2 2/4] pci: enable pcie link changes for pascal Karol Herbst
2019-05-20 21:25   ` Bjorn Helgaas
2019-05-07 20:12 ` [PATCH v2 3/4] pci: add nvkm_pcie_get_speed Karol Herbst
2019-05-07 20:12 ` [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini Karol Herbst
2019-05-20 21:19   ` Bjorn Helgaas
2019-05-20 22:30     ` Karol Herbst
2019-05-21 13:10       ` Bjorn Helgaas
2019-05-21 13:28         ` Karol Herbst
2019-05-21 13:50           ` [Nouveau] " Ilia Mirkin
2019-05-21 13:56             ` Karol Herbst
2019-05-21 14:13           ` Bjorn Helgaas
2019-05-21 14:30             ` Karol Herbst
2019-05-21 17:35               ` Karol Herbst
2019-05-21 17:48                 ` Karol Herbst
2019-06-03 13:18                   ` Karol Herbst
2019-06-03 18:10                     ` Bjorn Helgaas
2019-06-19 12:07                       ` Karol Herbst
2019-06-19 12:12                         ` Karol Herbst
2019-06-24 15:04                           ` Karol Herbst
2019-05-20 13:23 ` [PATCH v2 0/4] Potential fix for runpm issues on various laptops Karol Herbst

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).