All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V5 0/4] PCI: Loongson-related pci quirks
@ 2021-06-29  8:55 Huacai Chen
  2021-06-29  8:55 ` [PATCH V5 1/4] PCI/portdrv: Don't disable device during shutdown Huacai Chen
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Huacai Chen @ 2021-06-29  8:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Xuefeng Li, Huacai Chen, Jiaxun Yang, Huacai Chen,
	Tiezhu Yang, Jianmin Lv

This patchset resolve some Loongson-related pci problems (however, some
of them affect not only Loongon platform). The first patch has been sent
before, and nearly all reviewers' questions had been answered at that
time [1]. The second patch move some LS7A quirks to quirks.c, where can
be shared by multi architectures. The third patch improve the mrrs quirk
for LS7A chipset, and the fourth patch add a new quirk for LS7A chipset.

[1] http://patchwork.ozlabs.org/project/linux-pci/patch/1600680138-10949-1-git-send-email-chenhc@lemote.com/

V1 -> V2:
1, Rework the 4th patch;
2, Improve commit messages;
3, Remove the last patch since there is better solutions.

V2 -> V3:
1, Add more affected device ids for the 4th patch;
2, Improve commit messages to describe root causes.

V3 -> V4:
1, Rework the MRRS quirk patch;
2, Improve commit messages to describe root causes, again.

V4 -> V5:
1, Improve the MRRS quirk patch;
2, Change the order of 2nd and 3rd patch;
3, Improve commit messages to describe root causes, again.

Huacai Chen, Tiezhu Yang and Jianmin Lv(4):
 PCI/portdrv: Don't disable device during shutdown.
 PCI: Improve the MRRS quirk for LS7A.
 PCI: Move loongson pci quirks to quirks.c.
 PCI: Add quirk for multifunction devices of LS7A.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> 
---
 drivers/pci/controller/pci-loongson.c | 69 -----------------------------
 drivers/pci/pci.c                     |  6 +++
 drivers/pci/pcie/portdrv.h            |  2 +-
 drivers/pci/pcie/portdrv_core.c       |  6 ++-
 drivers/pci/pcie/portdrv_pci.c        | 15 ++++++-
 drivers/pci/quirks.c                  | 81 +++++++++++++++++++++++++++++++++++
 include/linux/pci.h                   |  1 +
 include/linux/pci_ids.h               | 11 +++++
 8 files changed, 117 insertions(+), 74 deletions(-)
--
2.27.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH V5 1/4] PCI/portdrv: Don't disable device during shutdown
  2021-06-29  8:55 [PATCH V5 0/4] PCI: Loongson-related pci quirks Huacai Chen
@ 2021-06-29  8:55 ` Huacai Chen
  2021-06-29  9:19   ` Sinan Kaya
  2021-06-30  0:01   ` Bjorn Helgaas
  2021-06-29  8:55 ` [PATCH V5 2/4] PCI: Improve the MRRS quirk for LS7A Huacai Chen
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 10+ messages in thread
From: Huacai Chen @ 2021-06-29  8:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Xuefeng Li, Huacai Chen, Jiaxun Yang, Huacai Chen,
	Sinan Kaya, Tiezhu Yang

Use separate remove()/shutdown() callback, and don't disable PCI device
during shutdown. This can avoid some poweroff/reboot failures.

The poweroff/reboot failures could easily be reproduced on Loongson
platforms. I think this is not a Loongson-specific problem, instead, is
a problem related to some specific PCI hosts. On some x86 platforms,
radeon/amdgpu devices can cause the same problem [1][2], and commit
faefba95c9e8ca3a ("drm/amdgpu: just suspend the hw on pci shutdown")
can resolve it.

As Tiezhu said, this occasionally shutdown or reboot failure is due to
clear PCI_COMMAND_MASTER on the device in do_pci_disable_device() [3].

static void do_pci_disable_device(struct pci_dev *dev)
{
        u16 pci_command;

        pci_read_config_word(dev, PCI_COMMAND, &pci_command);
        if (pci_command & PCI_COMMAND_MASTER) {
                pci_command &= ~PCI_COMMAND_MASTER;
                pci_write_config_word(dev, PCI_COMMAND, pci_command);
        }

        pcibios_disable_device(dev);
}

When remove "pci_command &= ~PCI_COMMAND_MASTER;", it can work well when
shutdown or reboot. The root cause on Loongson platform is that CPU is
still accessing PCIe devices while poweroff/reboot, and if we disable
the Bus Master Bit at this time, the PCIe controller doesn't forward
requests to downstream devices, and also doesn't send TIMEOUT to CPU,
which causes CPU wait forever (hardware deadlock). This behavior is a
PCIe protocol violation, and will be fixed in new revisions of hardware
(add timeout mechanism for CPU read request, whether or not Bus Master
bit is cleared).

Radeon driver is more difficult than amdgpu due to its confusing symbol
names, and I have maintained an out-of-tree patch for a long time [4].
Recently, we found more and more devices can cause the same problem, and
it is very difficult to modify all problematic drivers as radeon/amdgpu
does (the .shutdown callback should make sure there is no DMA activity).
So, I think modify the PCIe port driver is a simple and effective way.
Because there is no poweroff/reboot problems before cc27b735ad3a75574a6a
("PCI/portdrv: Turn off PCIe services during shutdown"). And as early
discussed, kexec can still work fine after this patch [5].

[1] https://bugs.freedesktop.org/show_bug.cgi?id=97980
[2] https://bugs.freedesktop.org/show_bug.cgi?id=98638
[3] https://lore.kernel.org/patchwork/patch/1305067/
[4] https://github.com/chenhuacai/linux/commit/8da06f9b669831829416a3e9f4d1c57f217a42f0
[5] http://patchwork.ozlabs.org/project/linux-pci/patch/1600680138-10949-1-git-send-email-chenhc@lemote.com/

Cc: Sinan Kaya <okaya@kernel.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 drivers/pci/pcie/portdrv.h      |  2 +-
 drivers/pci/pcie/portdrv_core.c |  6 ++++--
 drivers/pci/pcie/portdrv_pci.c  | 15 +++++++++++++--
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 2ff5724b8f13..358d7281f6e8 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -117,7 +117,7 @@ int pcie_port_device_resume(struct device *dev);
 int pcie_port_device_runtime_suspend(struct device *dev);
 int pcie_port_device_runtime_resume(struct device *dev);
 #endif
-void pcie_port_device_remove(struct pci_dev *dev);
+void pcie_port_device_remove(struct pci_dev *dev, bool disable);
 int __must_check pcie_port_bus_register(void);
 void pcie_port_bus_unregister(void);
 
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index e1fed6649c41..98c0a99a41d6 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -484,11 +484,13 @@ EXPORT_SYMBOL_GPL(pcie_port_find_device);
  * Remove PCI Express port service devices associated with given port and
  * disable MSI-X or MSI for the port.
  */
-void pcie_port_device_remove(struct pci_dev *dev)
+void pcie_port_device_remove(struct pci_dev *dev, bool disable)
 {
 	device_for_each_child(&dev->dev, NULL, remove_iter);
 	pci_free_irq_vectors(dev);
-	pci_disable_device(dev);
+
+	if (disable)
+		pci_disable_device(dev);
 }
 
 /**
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index c7ff1eea225a..562fbf3c1ea9 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -147,7 +147,18 @@ static void pcie_portdrv_remove(struct pci_dev *dev)
 		pm_runtime_dont_use_autosuspend(&dev->dev);
 	}
 
-	pcie_port_device_remove(dev);
+	pcie_port_device_remove(dev, true);
+}
+
+static void pcie_portdrv_shutdown(struct pci_dev *dev)
+{
+	if (pci_bridge_d3_possible(dev)) {
+		pm_runtime_forbid(&dev->dev);
+		pm_runtime_get_noresume(&dev->dev);
+		pm_runtime_dont_use_autosuspend(&dev->dev);
+	}
+
+	pcie_port_device_remove(dev, false);
 }
 
 static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
@@ -219,7 +230,7 @@ static struct pci_driver pcie_portdriver = {
 
 	.probe		= pcie_portdrv_probe,
 	.remove		= pcie_portdrv_remove,
-	.shutdown	= pcie_portdrv_remove,
+	.shutdown	= pcie_portdrv_shutdown,
 
 	.err_handler	= &pcie_portdrv_err_handler,
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V5 2/4] PCI: Improve the MRRS quirk for LS7A
  2021-06-29  8:55 [PATCH V5 0/4] PCI: Loongson-related pci quirks Huacai Chen
  2021-06-29  8:55 ` [PATCH V5 1/4] PCI/portdrv: Don't disable device during shutdown Huacai Chen
@ 2021-06-29  8:55 ` Huacai Chen
  2021-06-29  8:55 ` [PATCH V5 3/4] PCI: Move loongson pci quirks to quirks.c Huacai Chen
  2021-06-29  8:55 ` [PATCH V5 4/4] PCI: Add quirk for multifunction devices of LS7A Huacai Chen
  3 siblings, 0 replies; 10+ messages in thread
From: Huacai Chen @ 2021-06-29  8:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Xuefeng Li, Huacai Chen, Jiaxun Yang, Huacai Chen

In new revision of LS7A, some PCIe ports support larger value than 256,
but their maximum supported MRRS values are not detectable. Moreover,
the current loongson_mrrs_quirk() cannot avoid devices increasing its
MRRS after pci_enable_device(), and some devices (e.g. Realtek 8169)
will actually set a big value in its driver. So the only possible way
is configure MRRS of all devices in BIOS, and add a pci host bridge bit
flag (i.e., no_inc_mrrs) to stop the increasing MRRS operations.

However, according to PCIe Spec, it is legal for an OS to program any
value for MRRS, and it is also legal for an endpoint to generate a Read
Request with any size up to its MRRS. As the hardware engineers say, the
root cause here is LS7A doesn't break up large read requests. In detail,
LS7A PCIe port reports CA (Completer Abort) if it receives a Memory Read
request with a size that's "too big" ("too big" means larger than the
PCIe ports can handle, which means 256 for some ports and 4096 for the
others, and of course this is a problem in the LS7A's hardware design).

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 drivers/pci/controller/pci-loongson.c | 47 ++++++++++-----------------
 drivers/pci/pci.c                     |  6 ++++
 include/linux/pci.h                   |  1 +
 3 files changed, 25 insertions(+), 29 deletions(-)

diff --git a/drivers/pci/controller/pci-loongson.c b/drivers/pci/controller/pci-loongson.c
index 48169b1e3817..b02c98723f3b 100644
--- a/drivers/pci/controller/pci-loongson.c
+++ b/drivers/pci/controller/pci-loongson.c
@@ -60,37 +60,26 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
 			DEV_LS7A_LPC, system_bus_quirk);
 
-static void loongson_mrrs_quirk(struct pci_dev *dev)
+static void loongson_mrrs_quirk(struct pci_dev *pdev)
 {
-	struct pci_bus *bus = dev->bus;
-	struct pci_dev *bridge;
-	static const struct pci_device_id bridge_devids[] = {
-		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_0) },
-		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_1) },
-		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_2) },
-		{ 0, },
-	};
-
-	/* look for the matching bridge */
-	while (!pci_is_root_bus(bus)) {
-		bridge = bus->self;
-		bus = bus->parent;
-		/*
-		 * Some Loongson PCIe ports have a h/w limitation of
-		 * 256 bytes maximum read request size. They can't handle
-		 * anything larger than this. So force this limit on
-		 * any devices attached under these ports.
-		 */
-		if (pci_match_id(bridge_devids, bridge)) {
-			if (pcie_get_readrq(dev) > 256) {
-				pci_info(dev, "limiting MRRS to 256\n");
-				pcie_set_readrq(dev, 256);
-			}
-			break;
-		}
-	}
+	/*
+	 * Some Loongson PCIe ports have h/w limitations of maximum read
+	 * request size. They can't handle anything larger than this. So
+	 * force this limit on any devices attached under these ports.
+	 */
+	struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
+
+	if (!bridge)
+		return;
+
+	bridge->no_inc_mrrs = 1;
 }
-DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_0, loongson_mrrs_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_1, loongson_mrrs_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_2, loongson_mrrs_quirk);
 
 static void __iomem *cfg1_map(struct loongson_pci *priv, int bus,
 				unsigned int devfn, int where)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8d4ebe095d0c..34986a153cd1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5794,6 +5794,7 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
 {
 	u16 v;
 	int ret;
+	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
 
 	if (rq < 128 || rq > 4096 || !is_power_of_2(rq))
 		return -EINVAL;
@@ -5812,6 +5813,11 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
 
 	v = (ffs(rq) - 8) << 12;
 
+	if (bridge->no_inc_mrrs) {
+		if (rq > pcie_get_readrq(dev))
+			return -EINVAL;
+	}
+
 	ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
 						  PCI_EXP_DEVCTL_READRQ, v);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 24306504226a..5cdf926000fc 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -541,6 +541,7 @@ struct pci_host_bridge {
 	void		*release_data;
 	unsigned int	ignore_reset_delay:1;	/* For entire hierarchy */
 	unsigned int	no_ext_tags:1;		/* No Extended Tags */
+	unsigned int	no_inc_mrrs:1;		/* No Increase MRRS */
 	unsigned int	native_aer:1;		/* OS may use PCIe AER */
 	unsigned int	native_pcie_hotplug:1;	/* OS may use PCIe hotplug */
 	unsigned int	native_shpc_hotplug:1;	/* OS may use SHPC hotplug */
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V5 3/4] PCI: Move loongson pci quirks to quirks.c
  2021-06-29  8:55 [PATCH V5 0/4] PCI: Loongson-related pci quirks Huacai Chen
  2021-06-29  8:55 ` [PATCH V5 1/4] PCI/portdrv: Don't disable device during shutdown Huacai Chen
  2021-06-29  8:55 ` [PATCH V5 2/4] PCI: Improve the MRRS quirk for LS7A Huacai Chen
@ 2021-06-29  8:55 ` Huacai Chen
  2021-06-29  8:55 ` [PATCH V5 4/4] PCI: Add quirk for multifunction devices of LS7A Huacai Chen
  3 siblings, 0 replies; 10+ messages in thread
From: Huacai Chen @ 2021-06-29  8:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Xuefeng Li, Huacai Chen, Jiaxun Yang, Huacai Chen

Loongson PCH (LS7A chipset) will be used by both MIPS-based and
LoongArch-based Loongson processors. MIPS-based Loongson uses FDT
but LoongArch-base Loongson uses ACPI, but the driver in drivers/
pci/controller/pci-loongson.c is FDT-only. So move the quirks to
quirks.c where can be shared by all architectures.

LoongArch is a new RISC ISA, mainline support will come soon, and
documentations are here (in translation):

https://github.com/loongson/LoongArch-Documentation

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 drivers/pci/controller/pci-loongson.c | 58 ---------------------------
 drivers/pci/quirks.c                  | 58 +++++++++++++++++++++++++++
 2 files changed, 58 insertions(+), 58 deletions(-)

diff --git a/drivers/pci/controller/pci-loongson.c b/drivers/pci/controller/pci-loongson.c
index b02c98723f3b..88066e9db69e 100644
--- a/drivers/pci/controller/pci-loongson.c
+++ b/drivers/pci/controller/pci-loongson.c
@@ -12,15 +12,6 @@
 
 #include "../pci.h"
 
-/* Device IDs */
-#define DEV_PCIE_PORT_0	0x7a09
-#define DEV_PCIE_PORT_1	0x7a19
-#define DEV_PCIE_PORT_2	0x7a29
-
-#define DEV_LS2K_APB	0x7a02
-#define DEV_LS7A_CONF	0x7a10
-#define DEV_LS7A_LPC	0x7a0c
-
 #define FLAG_CFG0	BIT(0)
 #define FLAG_CFG1	BIT(1)
 #define FLAG_DEV_FIX	BIT(2)
@@ -32,55 +23,6 @@ struct loongson_pci {
 	u32 flags;
 };
 
-/* Fixup wrong class code in PCIe bridges */
-static void bridge_class_quirk(struct pci_dev *dev)
-{
-	dev->class = PCI_CLASS_BRIDGE_PCI << 8;
-}
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
-			DEV_PCIE_PORT_0, bridge_class_quirk);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
-			DEV_PCIE_PORT_1, bridge_class_quirk);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
-			DEV_PCIE_PORT_2, bridge_class_quirk);
-
-static void system_bus_quirk(struct pci_dev *pdev)
-{
-	/*
-	 * The address space consumed by these devices is outside the
-	 * resources of the host bridge.
-	 */
-	pdev->mmio_always_on = 1;
-	pdev->non_compliant_bars = 1;
-}
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
-			DEV_LS2K_APB, system_bus_quirk);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
-			DEV_LS7A_CONF, system_bus_quirk);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
-			DEV_LS7A_LPC, system_bus_quirk);
-
-static void loongson_mrrs_quirk(struct pci_dev *pdev)
-{
-	/*
-	 * Some Loongson PCIe ports have h/w limitations of maximum read
-	 * request size. They can't handle anything larger than this. So
-	 * force this limit on any devices attached under these ports.
-	 */
-	struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
-
-	if (!bridge)
-		return;
-
-	bridge->no_inc_mrrs = 1;
-}
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
-			DEV_PCIE_PORT_0, loongson_mrrs_quirk);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
-			DEV_PCIE_PORT_1, loongson_mrrs_quirk);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
-			DEV_PCIE_PORT_2, loongson_mrrs_quirk);
-
 static void __iomem *cfg1_map(struct loongson_pci *priv, int bus,
 				unsigned int devfn, int where)
 {
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 22b2bb1109c9..febc53943051 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -205,6 +205,64 @@ static void quirk_mmio_always_on(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
 				PCI_CLASS_BRIDGE_HOST, 8, quirk_mmio_always_on);
 
+/* Device IDs */
+#define DEV_PCIE_PORT_0	0x7a09
+#define DEV_PCIE_PORT_1	0x7a19
+#define DEV_PCIE_PORT_2	0x7a29
+
+#define DEV_LS2K_APB	0x7a02
+#define DEV_LS7A_CONF	0x7a10
+#define DEV_LS7A_LPC	0x7a0c
+
+/* Fixup wrong class code in PCIe bridges */
+static void bridge_class_quirk(struct pci_dev *dev)
+{
+	dev->class = PCI_CLASS_BRIDGE_PCI << 8;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_0, bridge_class_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_1, bridge_class_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_2, bridge_class_quirk);
+
+static void system_bus_quirk(struct pci_dev *pdev)
+{
+	/*
+	 * The address space consumed by these devices is outside the
+	 * resources of the host bridge.
+	 */
+	pdev->mmio_always_on = 1;
+	pdev->non_compliant_bars = 1;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_LS2K_APB, system_bus_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_LS7A_CONF, system_bus_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_LS7A_LPC, system_bus_quirk);
+
+static void loongson_mrrs_quirk(struct pci_dev *pdev)
+{
+	/*
+	 * Some Loongson PCIe ports have h/w limitations of maximum read
+	 * request size. They can't handle anything larger than this. So
+	 * force this limit on any devices attached under these ports.
+	 */
+	struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
+
+	if (!bridge)
+		return;
+
+	bridge->no_inc_mrrs = 1;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_0, loongson_mrrs_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_1, loongson_mrrs_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_2, loongson_mrrs_quirk);
+
 /*
  * The Mellanox Tavor device gives false positive parity errors.  Disable
  * parity error reporting.
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V5 4/4] PCI: Add quirk for multifunction devices of LS7A
  2021-06-29  8:55 [PATCH V5 0/4] PCI: Loongson-related pci quirks Huacai Chen
                   ` (2 preceding siblings ...)
  2021-06-29  8:55 ` [PATCH V5 3/4] PCI: Move loongson pci quirks to quirks.c Huacai Chen
@ 2021-06-29  8:55 ` Huacai Chen
  3 siblings, 0 replies; 10+ messages in thread
From: Huacai Chen @ 2021-06-29  8:55 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Xuefeng Li, Huacai Chen, Jiaxun Yang, Jianmin Lv, Huacai Chen

From: Jianmin Lv <lvjianmin@loongson.cn>

In LS7A, multifunction device use same PCI PIN (because the PIN register
report the same INTx value to each function) but we need different IRQ
for different functions, so add a quirk to fix it for standard PCI PIN
usage.

This patch only affect ACPI based systems (and only needed by ACPI based
systems, too). For DT based systems, the irq mappings is defined in .dts
files and be handled by of_irq_parse_pci().

Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 drivers/pci/quirks.c    | 23 +++++++++++++++++++++++
 include/linux/pci_ids.h | 11 +++++++++++
 2 files changed, 34 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index febc53943051..9dd6cf5358e0 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -263,6 +263,29 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
 			DEV_PCIE_PORT_2, loongson_mrrs_quirk);
 
+static void loongson_pci_pin_quirk(struct pci_dev *pdev)
+{
+	pdev->pin = 1 + (PCI_FUNC(pdev->devfn) & 3);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_0, loongson_pci_pin_quirk);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_1, loongson_pci_pin_quirk);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_2, loongson_pci_pin_quirk);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON,
+			PCI_DEVICE_ID_LOONGSON_AHCI, loongson_pci_pin_quirk);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON,
+			PCI_DEVICE_ID_LOONGSON_EHCI, loongson_pci_pin_quirk);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON,
+			PCI_DEVICE_ID_LOONGSON_OHCI, loongson_pci_pin_quirk);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON,
+			PCI_DEVICE_ID_LOONGSON_DC, loongson_pci_pin_quirk);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON,
+			PCI_DEVICE_ID_LOONGSON_GPU, loongson_pci_pin_quirk);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON,
+			PCI_DEVICE_ID_LOONGSON_GMAC, loongson_pci_pin_quirk);
+
 /*
  * The Mellanox Tavor device gives false positive parity errors.  Disable
  * parity error reporting.
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 4c3fa5293d76..f3d41ae761d8 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -151,6 +151,17 @@
 /* Vendors and devices.  Sort key: vendor first, device next. */
 
 #define PCI_VENDOR_ID_LOONGSON		0x0014
+#define PCI_DEVICE_ID_LOONGSON_HOST     0x7a00
+#define PCI_DEVICE_ID_LOONGSON_APB      0x7a02
+#define PCI_DEVICE_ID_LOONGSON_GMAC     0x7a03
+#define PCI_DEVICE_ID_LOONGSON_DC       0x7a06
+#define PCI_DEVICE_ID_LOONGSON_HDA      0x7a07
+#define PCI_DEVICE_ID_LOONGSON_GPU      0x7a15
+#define PCI_DEVICE_ID_LOONGSON_AHCI     0x7a08
+#define PCI_DEVICE_ID_LOONGSON_EHCI     0x7a14
+#define PCI_DEVICE_ID_LOONGSON_OHCI     0x7a24
+#define PCI_DEVICE_ID_LOONGSON_LPC      0x7a0c
+#define PCI_DEVICE_ID_LOONGSON_DMA      0x7a0f
 
 #define PCI_VENDOR_ID_TTTECH		0x0357
 #define PCI_DEVICE_ID_TTTECH_MC322	0x000a
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH V5 1/4] PCI/portdrv: Don't disable device during shutdown
  2021-06-29  8:55 ` [PATCH V5 1/4] PCI/portdrv: Don't disable device during shutdown Huacai Chen
@ 2021-06-29  9:19   ` Sinan Kaya
  2021-06-29 10:35     ` Huacai Chen
  2021-06-30  0:01   ` Bjorn Helgaas
  1 sibling, 1 reply; 10+ messages in thread
From: Sinan Kaya @ 2021-06-29  9:19 UTC (permalink / raw)
  To: Huacai Chen, Bjorn Helgaas
  Cc: linux-pci, Xuefeng Li, Huacai Chen, Jiaxun Yang, Tiezhu Yang

On 6/29/2021 11:55 AM, Huacai Chen wrote:
> he root cause on Loongson platform is that CPU is
> still accessing PCIe devices while poweroff/reboot, and if we disable
> the Bus Master Bit at this time, the PCIe controller doesn't forward
> requests to downstream devices, and also doesn't send TIMEOUT to CPU,
> which causes CPU wait forever (hardware deadlock). This behavior is a
> PCIe protocol violation, and will be fixed in new revisions of hardware
> (add timeout mechanism for CPU read request, whether or not Bus Master
> bit is cleared).

Your word above says this is a quirk and it needs to be handled as such
in the code rather than impacting all platforms.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V5 1/4] PCI/portdrv: Don't disable device during shutdown
  2021-06-29  9:19   ` Sinan Kaya
@ 2021-06-29 10:35     ` Huacai Chen
  2021-06-29 11:26       ` Sinan Kaya
  0 siblings, 1 reply; 10+ messages in thread
From: Huacai Chen @ 2021-06-29 10:35 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Xuefeng Li, Jiaxun Yang,
	Tiezhu Yang

Hi, Sinan,

On Tue, Jun 29, 2021 at 5:19 PM Sinan Kaya <okaya@kernel.org> wrote:
>
> On 6/29/2021 11:55 AM, Huacai Chen wrote:
> > he root cause on Loongson platform is that CPU is
> > still accessing PCIe devices while poweroff/reboot, and if we disable
> > the Bus Master Bit at this time, the PCIe controller doesn't forward
> > requests to downstream devices, and also doesn't send TIMEOUT to CPU,
> > which causes CPU wait forever (hardware deadlock). This behavior is a
> > PCIe protocol violation, and will be fixed in new revisions of hardware
> > (add timeout mechanism for CPU read request, whether or not Bus Master
> > bit is cleared).
>
> Your word above says this is a quirk and it needs to be handled as such
> in the code rather than impacting all platforms.

Yes, this is more or less a quirk, and we have already found the root
cause in hardware. However, as I said before, there are other
platforms that also have similar problems.

Huacai
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V5 1/4] PCI/portdrv: Don't disable device during shutdown
  2021-06-29 10:35     ` Huacai Chen
@ 2021-06-29 11:26       ` Sinan Kaya
  0 siblings, 0 replies; 10+ messages in thread
From: Sinan Kaya @ 2021-06-29 11:26 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Xuefeng Li, Jiaxun Yang,
	Tiezhu Yang

On 6/29/2021 1:35 PM, Huacai Chen wrote:
> Yes, this is more or less a quirk, and we have already found the root
> cause in hardware. However, as I said before, there are other
> platforms that also have similar problems.

Proper way is to fix the driver's shutdown routine not workaround it.

Even if that's not possible, Is there a reason why we cannot quirk them
too if they are broken?

If you are aware of such other platforms, please file a bug.
We should get the owner's to look.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V5 1/4] PCI/portdrv: Don't disable device during shutdown
  2021-06-29  8:55 ` [PATCH V5 1/4] PCI/portdrv: Don't disable device during shutdown Huacai Chen
  2021-06-29  9:19   ` Sinan Kaya
@ 2021-06-30  0:01   ` Bjorn Helgaas
  2021-06-30  5:14     ` Huacai Chen
  1 sibling, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2021-06-30  0:01 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Bjorn Helgaas, linux-pci, Xuefeng Li, Huacai Chen, Jiaxun Yang,
	Sinan Kaya, Tiezhu Yang

On Tue, Jun 29, 2021 at 04:55:18PM +0800, Huacai Chen wrote:
> Use separate remove()/shutdown() callback, and don't disable PCI device
> during shutdown. This can avoid some poweroff/reboot failures.

To clarify, I think you mean "leave bus mastering enabled for PCIe
ports during poweroff/reboot."  Maybe even better would be "leave
DMA/MSI/MSI-X enabled for the PCIe port and downstream devices."

"Disable PCI device" is a little ambiguous because pci_enable_device()
enables memory and I/O decoding for the device BARs, but
pci_disable_device() turns off bus mastering and has nothing to do
with the BARs.  They are unfortunately not symmetric.

> The poweroff/reboot failures could easily be reproduced on Loongson
> platforms. I think this is not a Loongson-specific problem, instead, is
> a problem related to some specific PCI hosts. On some x86 platforms,
> radeon/amdgpu devices can cause the same problem [1][2], and commit
> faefba95c9e8ca3a ("drm/amdgpu: just suspend the hw on pci shutdown")
> can resolve it.

This faefba95c9e8ca3a information isn't really connected here because
we don't know exactly what the problem is, and the faefba95c9e8ca3a
commit log doesn't say anything about DMA/MSI/MSI-X or bus mastering.

> As Tiezhu said, this occasionally shutdown or reboot failure is due to
> clear PCI_COMMAND_MASTER on the device in do_pci_disable_device() [3].
> 
> static void do_pci_disable_device(struct pci_dev *dev)
> {
>         u16 pci_command;
> 
>         pci_read_config_word(dev, PCI_COMMAND, &pci_command);
>         if (pci_command & PCI_COMMAND_MASTER) {
>                 pci_command &= ~PCI_COMMAND_MASTER;
>                 pci_write_config_word(dev, PCI_COMMAND, pci_command);
>         }
> 
>         pcibios_disable_device(dev);
> }
> 
> When remove "pci_command &= ~PCI_COMMAND_MASTER;", it can work well when
> shutdown or reboot. The root cause on Loongson platform is that CPU is
> still accessing PCIe devices while poweroff/reboot, and if we disable
> the Bus Master Bit at this time, the PCIe controller doesn't forward
> requests to downstream devices, and also doesn't send TIMEOUT to CPU,
> which causes CPU wait forever (hardware deadlock).

Sorry, I don't think this is a root cause.  Let me try to dissect this
because I don't follow this argument yet.  Per PCIe r5.0, sec
7.5.1.1.3:

  Bus Master Enable - Controls the ability of a Function to issue
  Memory and I/O Read/Write Requests, and the ability of a Port to
  forward Memory and I/O Read/Write Requests in the Upstream
  direction.

In both endpoints and bridges, Bus Master Enable controls DMA, MSI,
and MSI-X initiated by a PCI *endpoint*.  It has nothing to do with
CPU accesses to the device, so I don't understand the claim that on
Loongson, the CPU is still accessing PCIe devices during
poweroff/reboot.

You seem to say the CPU hangs because Bus Mastering is disabled.  I
don't see how that can happen because Bus Mastering isn't involved at
all in CPU MMIO transactions.  Or maybe this is more LS7A breakage,
e.g., the Root Ports don't forward Completions for CPU MMIO reads when
Bus Mastering is disabled?  Obviously that would be completely broken
per spec, which says "This bit does not affect forwarding of
Completions in either the Upstream or Downstream direction."

In the poweroff/reboot path, we call pci_device_shutdown(), which
calls the driver's .shutdown() method if it has one.  Most drivers do
*not* have one, so they have no idea that poweroff/reboot is happening
and have no reason to stop DMA.

The portdrv driver *does* have a .shutdown() method, which currently
disables Bus Mastering, which means DMA/MSI/MSI-X from downstream
devices no longer works, even if their drivers haven't done anything.

The point of *this* patch is to leave Bus Mastering *enabled* during
poweroff/reboot.  That raises the question of WHY poweroff/reboot
fails when DMA/MSI/MSI-X are disabled.  That's the justification I'd
like to see in the commit log.  Not just "this makes things work," but
"here is the problem, and this is how this patch solves it."

> This behavior is a PCIe protocol violation, and will be fixed in new
> revisions of hardware (add timeout mechanism for CPU read request,
> whether or not Bus Master bit is cleared).

There's some confusion here.  A CPU read request is an MMIO
transaction that should not be affected by Bus Master Enable.

> Radeon driver is more difficult than amdgpu due to its confusing symbol
> names, and I have maintained an out-of-tree patch for a long time [4].

I poked around a little but couldn't find any posting of this radeon
patch.  I assume you're pushing it upstream somewhere?  Is there
resistance?  Is there any conversation there that linux-pci should be
involved in?  Regardless, this doesn't seem relevant for this commit
log.

> Recently, we found more and more devices can cause the same problem, and
> it is very difficult to modify all problematic drivers as radeon/amdgpu
> does (the .shutdown callback should make sure there is no DMA activity).

Can you include some examples of these other drivers that cause
problems?  My guess is that these drivers lack .shutdown() methods, so
they don't know a reboot or poweroff is in progress?

> So, I think modify the PCIe port driver is a simple and effective way.
> Because there is no poweroff/reboot problems before cc27b735ad3a75574a6a
> ("PCI/portdrv: Turn off PCIe services during shutdown"). And as early
> discussed, kexec can still work fine after this patch [5].
>
> [1] https://bugs.freedesktop.org/show_bug.cgi?id=97980
> [2] https://bugs.freedesktop.org/show_bug.cgi?id=98638
> [3] https://lore.kernel.org/patchwork/patch/1305067/
> [4] https://github.com/chenhuacai/linux/commit/8da06f9b669831829416a3e9f4d1c57f217a42f0
> [5] http://patchwork.ozlabs.org/project/linux-pci/patch/1600680138-10949-1-git-send-email-chenhc@lemote.com/
> 
> Cc: Sinan Kaya <okaya@kernel.org>
> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> ---
>  drivers/pci/pcie/portdrv.h      |  2 +-
>  drivers/pci/pcie/portdrv_core.c |  6 ++++--
>  drivers/pci/pcie/portdrv_pci.c  | 15 +++++++++++++--
>  3 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index 2ff5724b8f13..358d7281f6e8 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -117,7 +117,7 @@ int pcie_port_device_resume(struct device *dev);
>  int pcie_port_device_runtime_suspend(struct device *dev);
>  int pcie_port_device_runtime_resume(struct device *dev);
>  #endif
> -void pcie_port_device_remove(struct pci_dev *dev);
> +void pcie_port_device_remove(struct pci_dev *dev, bool disable);
>  int __must_check pcie_port_bus_register(void);
>  void pcie_port_bus_unregister(void);
>  
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index e1fed6649c41..98c0a99a41d6 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -484,11 +484,13 @@ EXPORT_SYMBOL_GPL(pcie_port_find_device);
>   * Remove PCI Express port service devices associated with given port and
>   * disable MSI-X or MSI for the port.
>   */
> -void pcie_port_device_remove(struct pci_dev *dev)
> +void pcie_port_device_remove(struct pci_dev *dev, bool disable)
>  {
>  	device_for_each_child(&dev->dev, NULL, remove_iter);
>  	pci_free_irq_vectors(dev);
> -	pci_disable_device(dev);
> +
> +	if (disable)
> +		pci_disable_device(dev);
>  }
>  
>  /**
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index c7ff1eea225a..562fbf3c1ea9 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -147,7 +147,18 @@ static void pcie_portdrv_remove(struct pci_dev *dev)
>  		pm_runtime_dont_use_autosuspend(&dev->dev);
>  	}
>  
> -	pcie_port_device_remove(dev);
> +	pcie_port_device_remove(dev, true);
> +}
> +
> +static void pcie_portdrv_shutdown(struct pci_dev *dev)
> +{
> +	if (pci_bridge_d3_possible(dev)) {
> +		pm_runtime_forbid(&dev->dev);
> +		pm_runtime_get_noresume(&dev->dev);
> +		pm_runtime_dont_use_autosuspend(&dev->dev);
> +	}
> +
> +	pcie_port_device_remove(dev, false);
>  }
>  
>  static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
> @@ -219,7 +230,7 @@ static struct pci_driver pcie_portdriver = {
>  
>  	.probe		= pcie_portdrv_probe,
>  	.remove		= pcie_portdrv_remove,
> -	.shutdown	= pcie_portdrv_remove,
> +	.shutdown	= pcie_portdrv_shutdown,
>  
>  	.err_handler	= &pcie_portdrv_err_handler,
>  
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V5 1/4] PCI/portdrv: Don't disable device during shutdown
  2021-06-30  0:01   ` Bjorn Helgaas
@ 2021-06-30  5:14     ` Huacai Chen
  0 siblings, 0 replies; 10+ messages in thread
From: Huacai Chen @ 2021-06-30  5:14 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Xuefeng Li, Jiaxun Yang,
	Sinan Kaya, Tiezhu Yang

Hi, Bjorn and Sinan,

On Wed, Jun 30, 2021 at 8:01 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, Jun 29, 2021 at 04:55:18PM +0800, Huacai Chen wrote:
> > Use separate remove()/shutdown() callback, and don't disable PCI device
> > during shutdown. This can avoid some poweroff/reboot failures.
>
> To clarify, I think you mean "leave bus mastering enabled for PCIe
> ports during poweroff/reboot."  Maybe even better would be "leave
> DMA/MSI/MSI-X enabled for the PCIe port and downstream devices."
>
> "Disable PCI device" is a little ambiguous because pci_enable_device()
> enables memory and I/O decoding for the device BARs, but
> pci_disable_device() turns off bus mastering and has nothing to do
> with the BARs.  They are unfortunately not symmetric.
>
> > The poweroff/reboot failures could easily be reproduced on Loongson
> > platforms. I think this is not a Loongson-specific problem, instead, is
> > a problem related to some specific PCI hosts. On some x86 platforms,
> > radeon/amdgpu devices can cause the same problem [1][2], and commit
> > faefba95c9e8ca3a ("drm/amdgpu: just suspend the hw on pci shutdown")
> > can resolve it.
>
> This faefba95c9e8ca3a information isn't really connected here because
> we don't know exactly what the problem is, and the faefba95c9e8ca3a
> commit log doesn't say anything about DMA/MSI/MSI-X or bus mastering.
>
> > As Tiezhu said, this occasionally shutdown or reboot failure is due to
> > clear PCI_COMMAND_MASTER on the device in do_pci_disable_device() [3].
> >
> > static void do_pci_disable_device(struct pci_dev *dev)
> > {
> >         u16 pci_command;
> >
> >         pci_read_config_word(dev, PCI_COMMAND, &pci_command);
> >         if (pci_command & PCI_COMMAND_MASTER) {
> >                 pci_command &= ~PCI_COMMAND_MASTER;
> >                 pci_write_config_word(dev, PCI_COMMAND, pci_command);
> >         }
> >
> >         pcibios_disable_device(dev);
> > }
> >
> > When remove "pci_command &= ~PCI_COMMAND_MASTER;", it can work well when
> > shutdown or reboot. The root cause on Loongson platform is that CPU is
> > still accessing PCIe devices while poweroff/reboot, and if we disable
> > the Bus Master Bit at this time, the PCIe controller doesn't forward
> > requests to downstream devices, and also doesn't send TIMEOUT to CPU,
> > which causes CPU wait forever (hardware deadlock).
>
> Sorry, I don't think this is a root cause.  Let me try to dissect this
> because I don't follow this argument yet.  Per PCIe r5.0, sec
> 7.5.1.1.3:
>
>   Bus Master Enable - Controls the ability of a Function to issue
>   Memory and I/O Read/Write Requests, and the ability of a Port to
>   forward Memory and I/O Read/Write Requests in the Upstream
>   direction.
>
> In both endpoints and bridges, Bus Master Enable controls DMA, MSI,
> and MSI-X initiated by a PCI *endpoint*.  It has nothing to do with
> CPU accesses to the device, so I don't understand the claim that on
> Loongson, the CPU is still accessing PCIe devices during
> poweroff/reboot.
>
> You seem to say the CPU hangs because Bus Mastering is disabled.  I
> don't see how that can happen because Bus Mastering isn't involved at
> all in CPU MMIO transactions.  Or maybe this is more LS7A breakage,
> e.g., the Root Ports don't forward Completions for CPU MMIO reads when
> Bus Mastering is disabled?  Obviously that would be completely broken
> per spec, which says "This bit does not affect forwarding of
> Completions in either the Upstream or Downstream direction."
Obviously, LS7A's PCIe controller misused the Bus Master bit, because
as you say, Bus Master shouldn't be involved in CPU MMIO transactions.
LS7A's behavior is due to the bad hardware design.

There are other problems similar to LS7A, such as Radeon/AMDGPU which
I referred to in the commit messages. But yes, as you say, maybe they
are just *similar* problems, no one can say they are exactly the same
problem. Once before I wanted to make a patch to solve "all of these
problems" together, but it seems unreasonable. So, my next version
will be just a quirk for LS7A, and leave other platforms as is.

Thanks,
Huacai

>
> In the poweroff/reboot path, we call pci_device_shutdown(), which
> calls the driver's .shutdown() method if it has one.  Most drivers do
> *not* have one, so they have no idea that poweroff/reboot is happening
> and have no reason to stop DMA.
>
> The portdrv driver *does* have a .shutdown() method, which currently
> disables Bus Mastering, which means DMA/MSI/MSI-X from downstream
> devices no longer works, even if their drivers haven't done anything.
>
> The point of *this* patch is to leave Bus Mastering *enabled* during
> poweroff/reboot.  That raises the question of WHY poweroff/reboot
> fails when DMA/MSI/MSI-X are disabled.  That's the justification I'd
> like to see in the commit log.  Not just "this makes things work," but
> "here is the problem, and this is how this patch solves it."
>
> > This behavior is a PCIe protocol violation, and will be fixed in new
> > revisions of hardware (add timeout mechanism for CPU read request,
> > whether or not Bus Master bit is cleared).
>
> There's some confusion here.  A CPU read request is an MMIO
> transaction that should not be affected by Bus Master Enable.
>
> > Radeon driver is more difficult than amdgpu due to its confusing symbol
> > names, and I have maintained an out-of-tree patch for a long time [4].
>
> I poked around a little but couldn't find any posting of this radeon
> patch.  I assume you're pushing it upstream somewhere?  Is there
> resistance?  Is there any conversation there that linux-pci should be
> involved in?  Regardless, this doesn't seem relevant for this commit
> log.
>
> > Recently, we found more and more devices can cause the same problem, and
> > it is very difficult to modify all problematic drivers as radeon/amdgpu
> > does (the .shutdown callback should make sure there is no DMA activity).
>
> Can you include some examples of these other drivers that cause
> problems?  My guess is that these drivers lack .shutdown() methods, so
> they don't know a reboot or poweroff is in progress?
>
> > So, I think modify the PCIe port driver is a simple and effective way.
> > Because there is no poweroff/reboot problems before cc27b735ad3a75574a6a
> > ("PCI/portdrv: Turn off PCIe services during shutdown"). And as early
> > discussed, kexec can still work fine after this patch [5].
> >
> > [1] https://bugs.freedesktop.org/show_bug.cgi?id=97980
> > [2] https://bugs.freedesktop.org/show_bug.cgi?id=98638
> > [3] https://lore.kernel.org/patchwork/patch/1305067/
> > [4] https://github.com/chenhuacai/linux/commit/8da06f9b669831829416a3e9f4d1c57f217a42f0
> > [5] http://patchwork.ozlabs.org/project/linux-pci/patch/1600680138-10949-1-git-send-email-chenhc@lemote.com/
> >
> > Cc: Sinan Kaya <okaya@kernel.org>
> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> > ---
> >  drivers/pci/pcie/portdrv.h      |  2 +-
> >  drivers/pci/pcie/portdrv_core.c |  6 ++++--
> >  drivers/pci/pcie/portdrv_pci.c  | 15 +++++++++++++--
> >  3 files changed, 18 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> > index 2ff5724b8f13..358d7281f6e8 100644
> > --- a/drivers/pci/pcie/portdrv.h
> > +++ b/drivers/pci/pcie/portdrv.h
> > @@ -117,7 +117,7 @@ int pcie_port_device_resume(struct device *dev);
> >  int pcie_port_device_runtime_suspend(struct device *dev);
> >  int pcie_port_device_runtime_resume(struct device *dev);
> >  #endif
> > -void pcie_port_device_remove(struct pci_dev *dev);
> > +void pcie_port_device_remove(struct pci_dev *dev, bool disable);
> >  int __must_check pcie_port_bus_register(void);
> >  void pcie_port_bus_unregister(void);
> >
> > diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> > index e1fed6649c41..98c0a99a41d6 100644
> > --- a/drivers/pci/pcie/portdrv_core.c
> > +++ b/drivers/pci/pcie/portdrv_core.c
> > @@ -484,11 +484,13 @@ EXPORT_SYMBOL_GPL(pcie_port_find_device);
> >   * Remove PCI Express port service devices associated with given port and
> >   * disable MSI-X or MSI for the port.
> >   */
> > -void pcie_port_device_remove(struct pci_dev *dev)
> > +void pcie_port_device_remove(struct pci_dev *dev, bool disable)
> >  {
> >       device_for_each_child(&dev->dev, NULL, remove_iter);
> >       pci_free_irq_vectors(dev);
> > -     pci_disable_device(dev);
> > +
> > +     if (disable)
> > +             pci_disable_device(dev);
> >  }
> >
> >  /**
> > diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> > index c7ff1eea225a..562fbf3c1ea9 100644
> > --- a/drivers/pci/pcie/portdrv_pci.c
> > +++ b/drivers/pci/pcie/portdrv_pci.c
> > @@ -147,7 +147,18 @@ static void pcie_portdrv_remove(struct pci_dev *dev)
> >               pm_runtime_dont_use_autosuspend(&dev->dev);
> >       }
> >
> > -     pcie_port_device_remove(dev);
> > +     pcie_port_device_remove(dev, true);
> > +}
> > +
> > +static void pcie_portdrv_shutdown(struct pci_dev *dev)
> > +{
> > +     if (pci_bridge_d3_possible(dev)) {
> > +             pm_runtime_forbid(&dev->dev);
> > +             pm_runtime_get_noresume(&dev->dev);
> > +             pm_runtime_dont_use_autosuspend(&dev->dev);
> > +     }
> > +
> > +     pcie_port_device_remove(dev, false);
> >  }
> >
> >  static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
> > @@ -219,7 +230,7 @@ static struct pci_driver pcie_portdriver = {
> >
> >       .probe          = pcie_portdrv_probe,
> >       .remove         = pcie_portdrv_remove,
> > -     .shutdown       = pcie_portdrv_remove,
> > +     .shutdown       = pcie_portdrv_shutdown,
> >
> >       .err_handler    = &pcie_portdrv_err_handler,
> >
> > --
> > 2.27.0
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-06-30  5:15 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-29  8:55 [PATCH V5 0/4] PCI: Loongson-related pci quirks Huacai Chen
2021-06-29  8:55 ` [PATCH V5 1/4] PCI/portdrv: Don't disable device during shutdown Huacai Chen
2021-06-29  9:19   ` Sinan Kaya
2021-06-29 10:35     ` Huacai Chen
2021-06-29 11:26       ` Sinan Kaya
2021-06-30  0:01   ` Bjorn Helgaas
2021-06-30  5:14     ` Huacai Chen
2021-06-29  8:55 ` [PATCH V5 2/4] PCI: Improve the MRRS quirk for LS7A Huacai Chen
2021-06-29  8:55 ` [PATCH V5 3/4] PCI: Move loongson pci quirks to quirks.c Huacai Chen
2021-06-29  8:55 ` [PATCH V5 4/4] PCI: Add quirk for multifunction devices of LS7A Huacai Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.