linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] PCI: Loongson-related pci quirks
@ 2021-05-14  8:00 Huacai Chen
  2021-05-14  8:00 ` [PATCH 1/5] PCI/portdrv: Don't disable pci device during shutdown Huacai Chen
                   ` (4 more replies)
  0 siblings, 5 replies; 39+ messages in thread
From: Huacai Chen @ 2021-05-14  8:00 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Huacai Chen, Jiaxun Yang, Huacai Chen, Tiezhu Yang,
	Jianmin Lv, Jingfeng Sui

This patchset resolve some Loongson-related pci problems (however, some
of them affect not only Loongon platform). The first patch has been sent
before, and nearly all reviewers' questions had been answered at that
time [1]. The second patch move some LS7A quirks to quirks.c, where can
be shared by multi architectures. The third patch improve the mrrs quirk
for LS7A chipset, and the fourth patch add a new quirk for LS7A chipset.
The last patch fix a problem of ASpeed AST2500, which is used as BMC by
some Loongson-based server boards.

[1] http://patchwork.ozlabs.org/project/linux-pci/patch/1600680138-10949-1-git-send-email-chenhc@lemote.com/

Huacai Chen, Tiezhu Yang, Jianmin Lv and Jingfeng Sui(5):
 PCI/portdrv: Don't disable pci device during shutdown.
 PCI: Move loongson pci quirks to quirks.c.
 PCI: Improve the mrrs quirk for LS7A.
 PCI: Add quirk for multifunction devices of LS7A.
 PCI: Support ASpeed VGA cards behind a misbehaving bridge.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> 
Signed-off-by: Jingfeng Sui <suijingfeng@loongson.cn> 
---
 drivers/pci/controller/pci-loongson.c |  69 -----------------
 drivers/pci/pci.c                     |   5 ++
 drivers/pci/pcie/portdrv.h            |   2 +-
 drivers/pci/pcie/portdrv_core.c       |   6 +-
 drivers/pci/pcie/portdrv_pci.c        |  15 +++-
 drivers/pci/quirks.c                  | 139 ++++++++++++++++++++++++++++++++++
 include/linux/pci.h                   |   2 +
 7 files changed, 164 insertions(+), 74 deletions(-)
--
2.27.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/5] PCI/portdrv: Don't disable pci device during shutdown
  2021-05-14  8:00 [PATCH 0/5] PCI: Loongson-related pci quirks Huacai Chen
@ 2021-05-14  8:00 ` Huacai Chen
  2021-05-14 14:20   ` Krzysztof Wilczyński
  2021-05-14 16:09   ` Bjorn Helgaas
  2021-05-14  8:00 ` [PATCH 2/5] PCI: Move loongson pci quirks to quirks.c Huacai Chen
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 39+ messages in thread
From: Huacai Chen @ 2021-05-14  8:00 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Huacai Chen, Jiaxun Yang, Huacai Chen, Tiezhu Yang

Use separate remove()/shutdown() callback, and don't disable pci device
during shutdown. This can avoid some poweroff/reboot failures.

The poweroff/reboot failures can easily reproduce on Loongson platforms.
I think this is not a Loongson-specific problem, instead, is a problem
related to some specific PCI hosts. On some x86 platforms, radeon/amdgpu
devices can cause the same problem, and commit faefba95c9e8ca3a523831c2e
("drm/amdgpu: just suspend the hw on pci shutdown") can resolve it.

As Tiezhu said, this occasionally shutdown or reboot failure is due to
clear PCI_COMMAND_MASTER on the device in do_pci_disable_device().

drivers/pci/pci.c
static void do_pci_disable_device(struct pci_dev *dev)
{
        u16 pci_command;

        pci_read_config_word(dev, PCI_COMMAND, &pci_command);
        if (pci_command & PCI_COMMAND_MASTER) {
                pci_command &= ~PCI_COMMAND_MASTER;
                pci_write_config_word(dev, PCI_COMMAND, pci_command);
        }

        pcibios_disable_device(dev);
}

When remove "pci_command &= ~PCI_COMMAND_MASTER;", it can work well when
shutdown or reboot. This may implies that there are DMA activities on the
device while shutdown.

Radeon driver is more difficult than amdgpu due to its confusing symbol
names, and I have maintained an out-of-tree patch for a long time [1].
Recently, we found more and more devices can cause the same problem, and
it is very difficult to modify all problematic drivers as radeon/amdgpu
does (the .shutdown callback should make sure there is no DMA activity).
So, I think modify the PCIe port driver is a simple and effective way.
And as early discussed, kexec can still work after this patch.

[1] https://github.com/chenhuacai/linux/commit/8da06f9b669831829416a3e9f4d1c57f217a42f0

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 drivers/pci/pcie/portdrv.h      |  2 +-
 drivers/pci/pcie/portdrv_core.c |  6 ++++--
 drivers/pci/pcie/portdrv_pci.c  | 15 +++++++++++++--
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 2ff5724b8f13..358d7281f6e8 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -117,7 +117,7 @@ int pcie_port_device_resume(struct device *dev);
 int pcie_port_device_runtime_suspend(struct device *dev);
 int pcie_port_device_runtime_resume(struct device *dev);
 #endif
-void pcie_port_device_remove(struct pci_dev *dev);
+void pcie_port_device_remove(struct pci_dev *dev, bool disable);
 int __must_check pcie_port_bus_register(void);
 void pcie_port_bus_unregister(void);
 
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index e1fed6649c41..98c0a99a41d6 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -484,11 +484,13 @@ EXPORT_SYMBOL_GPL(pcie_port_find_device);
  * Remove PCI Express port service devices associated with given port and
  * disable MSI-X or MSI for the port.
  */
-void pcie_port_device_remove(struct pci_dev *dev)
+void pcie_port_device_remove(struct pci_dev *dev, bool disable)
 {
 	device_for_each_child(&dev->dev, NULL, remove_iter);
 	pci_free_irq_vectors(dev);
-	pci_disable_device(dev);
+
+	if (disable)
+		pci_disable_device(dev);
 }
 
 /**
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index c7ff1eea225a..562fbf3c1ea9 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -147,7 +147,18 @@ static void pcie_portdrv_remove(struct pci_dev *dev)
 		pm_runtime_dont_use_autosuspend(&dev->dev);
 	}
 
-	pcie_port_device_remove(dev);
+	pcie_port_device_remove(dev, true);
+}
+
+static void pcie_portdrv_shutdown(struct pci_dev *dev)
+{
+	if (pci_bridge_d3_possible(dev)) {
+		pm_runtime_forbid(&dev->dev);
+		pm_runtime_get_noresume(&dev->dev);
+		pm_runtime_dont_use_autosuspend(&dev->dev);
+	}
+
+	pcie_port_device_remove(dev, false);
 }
 
 static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
@@ -219,7 +230,7 @@ static struct pci_driver pcie_portdriver = {
 
 	.probe		= pcie_portdrv_probe,
 	.remove		= pcie_portdrv_remove,
-	.shutdown	= pcie_portdrv_remove,
+	.shutdown	= pcie_portdrv_shutdown,
 
 	.err_handler	= &pcie_portdrv_err_handler,
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/5] PCI: Move loongson pci quirks to quirks.c
  2021-05-14  8:00 [PATCH 0/5] PCI: Loongson-related pci quirks Huacai Chen
  2021-05-14  8:00 ` [PATCH 1/5] PCI/portdrv: Don't disable pci device during shutdown Huacai Chen
@ 2021-05-14  8:00 ` Huacai Chen
  2021-05-14  8:00 ` [PATCH 3/5] PCI: Improve the mrrs quirk for LS7A Huacai Chen
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 39+ messages in thread
From: Huacai Chen @ 2021-05-14  8:00 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Huacai Chen, Jiaxun Yang, Huacai Chen

Loongson PCH (LS7A chipset) will be used by both MIPS-based and
LoongArch-based Loongson processors. MIPS-based Loongson uses FDT
but LoongArch-base Loongson uses ACPI, but the driver in drivers/
pci/controller/pci-loongson.c is FDT-only. So move the quirks to
quirks.c where can be shared by all architectures.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 drivers/pci/controller/pci-loongson.c | 69 ---------------------------
 drivers/pci/quirks.c                  | 69 +++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 69 deletions(-)

diff --git a/drivers/pci/controller/pci-loongson.c b/drivers/pci/controller/pci-loongson.c
index 48169b1e3817..88066e9db69e 100644
--- a/drivers/pci/controller/pci-loongson.c
+++ b/drivers/pci/controller/pci-loongson.c
@@ -12,15 +12,6 @@
 
 #include "../pci.h"
 
-/* Device IDs */
-#define DEV_PCIE_PORT_0	0x7a09
-#define DEV_PCIE_PORT_1	0x7a19
-#define DEV_PCIE_PORT_2	0x7a29
-
-#define DEV_LS2K_APB	0x7a02
-#define DEV_LS7A_CONF	0x7a10
-#define DEV_LS7A_LPC	0x7a0c
-
 #define FLAG_CFG0	BIT(0)
 #define FLAG_CFG1	BIT(1)
 #define FLAG_DEV_FIX	BIT(2)
@@ -32,66 +23,6 @@ struct loongson_pci {
 	u32 flags;
 };
 
-/* Fixup wrong class code in PCIe bridges */
-static void bridge_class_quirk(struct pci_dev *dev)
-{
-	dev->class = PCI_CLASS_BRIDGE_PCI << 8;
-}
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
-			DEV_PCIE_PORT_0, bridge_class_quirk);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
-			DEV_PCIE_PORT_1, bridge_class_quirk);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
-			DEV_PCIE_PORT_2, bridge_class_quirk);
-
-static void system_bus_quirk(struct pci_dev *pdev)
-{
-	/*
-	 * The address space consumed by these devices is outside the
-	 * resources of the host bridge.
-	 */
-	pdev->mmio_always_on = 1;
-	pdev->non_compliant_bars = 1;
-}
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
-			DEV_LS2K_APB, system_bus_quirk);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
-			DEV_LS7A_CONF, system_bus_quirk);
-DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
-			DEV_LS7A_LPC, system_bus_quirk);
-
-static void loongson_mrrs_quirk(struct pci_dev *dev)
-{
-	struct pci_bus *bus = dev->bus;
-	struct pci_dev *bridge;
-	static const struct pci_device_id bridge_devids[] = {
-		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_0) },
-		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_1) },
-		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_2) },
-		{ 0, },
-	};
-
-	/* look for the matching bridge */
-	while (!pci_is_root_bus(bus)) {
-		bridge = bus->self;
-		bus = bus->parent;
-		/*
-		 * Some Loongson PCIe ports have a h/w limitation of
-		 * 256 bytes maximum read request size. They can't handle
-		 * anything larger than this. So force this limit on
-		 * any devices attached under these ports.
-		 */
-		if (pci_match_id(bridge_devids, bridge)) {
-			if (pcie_get_readrq(dev) > 256) {
-				pci_info(dev, "limiting MRRS to 256\n");
-				pcie_set_readrq(dev, 256);
-			}
-			break;
-		}
-	}
-}
-DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk);
-
 static void __iomem *cfg1_map(struct loongson_pci *priv, int bus,
 				unsigned int devfn, int where)
 {
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index dcb229de1acb..66e4bea69431 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -205,6 +205,75 @@ static void quirk_mmio_always_on(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_ANY_ID, PCI_ANY_ID,
 				PCI_CLASS_BRIDGE_HOST, 8, quirk_mmio_always_on);
 
+/* Loongson-related quirks */
+#define DEV_PCIE_PORT_0	0x7a09
+#define DEV_PCIE_PORT_1	0x7a19
+#define DEV_PCIE_PORT_2	0x7a29
+
+#define DEV_LS2K_APB	0x7a02
+#define DEV_LS7A_CONF	0x7a10
+#define DEV_LS7A_LPC	0x7a0c
+
+/* Fixup wrong class code in PCIe bridges */
+static void loongson_bridge_class_quirk(struct pci_dev *dev)
+{
+	dev->class = PCI_CLASS_BRIDGE_PCI << 8;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_0, loongson_bridge_class_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_1, loongson_bridge_class_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_2, loongson_bridge_class_quirk);
+
+static void loongson_system_bus_quirk(struct pci_dev *pdev)
+{
+	/*
+	 * The address space consumed by these devices is outside the
+	 * resources of the host bridge.
+	 */
+	pdev->mmio_always_on = 1;
+	pdev->non_compliant_bars = 1;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_LS2K_APB, loongson_system_bus_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_LS7A_CONF, loongson_system_bus_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_LS7A_LPC, loongson_system_bus_quirk);
+
+static void loongson_mrrs_quirk(struct pci_dev *dev)
+{
+	struct pci_bus *bus = dev->bus;
+	struct pci_dev *bridge;
+	static const struct pci_device_id bridge_devids[] = {
+		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_0) },
+		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_1) },
+		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_2) },
+		{ 0, },
+	};
+
+	/* look for the matching bridge */
+	while (!pci_is_root_bus(bus)) {
+		bridge = bus->self;
+		bus = bus->parent;
+		/*
+		 * Some Loongson PCIe ports have a h/w limitation of
+		 * 256 bytes maximum read request size. They can't handle
+		 * anything larger than this. So force this limit on
+		 * any devices attached under these ports.
+		 */
+		if (pci_match_id(bridge_devids, bridge)) {
+			if (pcie_get_readrq(dev) > 256) {
+				pci_info(dev, "limiting MRRS to 256\n");
+				pcie_set_readrq(dev, 256);
+			}
+			break;
+		}
+	}
+}
+DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk);
+
 /*
  * The Mellanox Tavor device gives false positive parity errors.  Disable
  * parity error reporting.
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/5] PCI: Improve the mrrs quirk for LS7A
  2021-05-14  8:00 [PATCH 0/5] PCI: Loongson-related pci quirks Huacai Chen
  2021-05-14  8:00 ` [PATCH 1/5] PCI/portdrv: Don't disable pci device during shutdown Huacai Chen
  2021-05-14  8:00 ` [PATCH 2/5] PCI: Move loongson pci quirks to quirks.c Huacai Chen
@ 2021-05-14  8:00 ` Huacai Chen
  2021-05-14 14:03   ` Krzysztof Wilczyński
  2021-05-14 15:39   ` Bjorn Helgaas
  2021-05-14  8:00 ` [PATCH 4/5] PCI: Add quirk for multifunction devices of LS7A Huacai Chen
  2021-05-14  8:00 ` [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge Huacai Chen
  4 siblings, 2 replies; 39+ messages in thread
From: Huacai Chen @ 2021-05-14  8:00 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Huacai Chen, Jiaxun Yang, Huacai Chen

In new revision of LS7A, some pcie ports support larger value than 256,
but their mrrs values are not dectectable. And moreover, the current
loongson_mrrs_quirk() cannot avoid devices increasing its mrrs after
pci_enable_device(). So the only possible way is configure mrrs of all
devices in BIOS, and add a pci dev flag (PCI_DEV_FLAGS_NO_INCREASE_MRRS)
to stop the increasing mrrs operations.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 drivers/pci/pci.c    | 5 +++++
 drivers/pci/quirks.c | 6 ++++++
 include/linux/pci.h  | 2 ++
 3 files changed, 13 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b717680377a9..6f0d2f5b6f30 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5802,6 +5802,11 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
 
 	v = (ffs(rq) - 8) << 12;
 
+	if (dev->dev_flags & PCI_DEV_FLAGS_NO_INCREASE_MRRS) {
+		if (rq > pcie_get_readrq(dev))
+			return -EINVAL;
+	}
+
 	ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
 						  PCI_EXP_DEVCTL_READRQ, v);
 
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 66e4bea69431..10b3b2057940 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -264,6 +264,12 @@ static void loongson_mrrs_quirk(struct pci_dev *dev)
 		 * any devices attached under these ports.
 		 */
 		if (pci_match_id(bridge_devids, bridge)) {
+			dev->dev_flags |= PCI_DEV_FLAGS_NO_INCREASE_MRRS;
+
+			if (pcie_bus_config == PCIE_BUS_DEFAULT ||
+			    pcie_bus_config == PCIE_BUS_TUNE_OFF)
+				break;
+
 			if (pcie_get_readrq(dev) > 256) {
 				pci_info(dev, "limiting MRRS to 256\n");
 				pcie_set_readrq(dev, 256);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c20211e59a57..7fb2072a83b8 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -227,6 +227,8 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
 	/* Don't use Relaxed Ordering for TLPs directed at this device */
 	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
+	/* Don't increase BIOS's MRRS configuration */
+	PCI_DEV_FLAGS_NO_INCREASE_MRRS = (__force pci_dev_flags_t) (1 << 12),
 };
 
 enum pci_irq_reroute_variant {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/5] PCI: Add quirk for multifunction devices of LS7A
  2021-05-14  8:00 [PATCH 0/5] PCI: Loongson-related pci quirks Huacai Chen
                   ` (2 preceding siblings ...)
  2021-05-14  8:00 ` [PATCH 3/5] PCI: Improve the mrrs quirk for LS7A Huacai Chen
@ 2021-05-14  8:00 ` Huacai Chen
  2021-05-14 13:22   ` Krzysztof Wilczyński
                     ` (2 more replies)
  2021-05-14  8:00 ` [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge Huacai Chen
  4 siblings, 3 replies; 39+ messages in thread
From: Huacai Chen @ 2021-05-14  8:00 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Huacai Chen, Jiaxun Yang, Jianmin Lv, Huacai Chen

From: Jianmin Lv <lvjianmin@loongson.cn>

In LS7A, multifunction device use same pci PIN and different
irq for different function, so fix it for standard pci PIN
usage.

Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 drivers/pci/quirks.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 10b3b2057940..6ab4b3bba36b 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -242,6 +242,23 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
 			DEV_LS7A_LPC, loongson_system_bus_quirk);
 
+static void loongson_pci_pin_quirk(struct pci_dev *dev)
+{
+	static const struct pci_device_id devids[] = {
+		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_0) },
+		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_1) },
+		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_2) },
+		{ 0, },
+	};
+
+	if (pci_match_id(devids, dev)) {
+		u8 fun = dev->devfn & 7;
+
+		dev->pin = 1 + (fun & 3);
+	}
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON, PCI_ANY_ID, loongson_pci_pin_quirk);
+
 static void loongson_mrrs_quirk(struct pci_dev *dev)
 {
 	struct pci_bus *bus = dev->bus;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-14  8:00 [PATCH 0/5] PCI: Loongson-related pci quirks Huacai Chen
                   ` (3 preceding siblings ...)
  2021-05-14  8:00 ` [PATCH 4/5] PCI: Add quirk for multifunction devices of LS7A Huacai Chen
@ 2021-05-14  8:00 ` Huacai Chen
  2021-05-14 13:56   ` Krzysztof Wilczyński
  2021-05-14 15:10   ` Bjorn Helgaas
  4 siblings, 2 replies; 39+ messages in thread
From: Huacai Chen @ 2021-05-14  8:00 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Huacai Chen, Jiaxun Yang, Huacai Chen, Jingfeng Sui

According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is
VGA Enable bit which modifies the response to VGA compatible addresses.

If the VGA Enable bit is set, the bridge will decode and forward the
following accesses on the primary interface to the secondary interface.

The ASpeed AST2500 hardward does not set the VGA Enable bit on its
bridge control register, which causes vgaarb subsystem don't think the
VGA card behind the bridge as a valid boot vga device.

So we provide a quirk to fix Xorg auto-detection.

See similar bug:

https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Jingfeng Sui <suijingfeng@loongson.cn>
---
 drivers/pci/quirks.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6ab4b3bba36b..adf5490706ad 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -28,6 +28,7 @@
 #include <linux/platform_data/x86/apple.h>
 #include <linux/pm_runtime.h>
 #include <linux/switchtec.h>
+#include <linux/vgaarb.h>
 #include <asm/dma.h>	/* isa_dma_bridge_buggy */
 #include "pci.h"
 
@@ -297,6 +298,52 @@ static void loongson_mrrs_quirk(struct pci_dev *dev)
 }
 DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk);
 
+
+static void aspeed_fixup_vgaarb(struct pci_dev *pdev)
+{
+	struct pci_dev *bridge;
+	struct pci_bus *bus;
+	struct pci_dev *vdevp = NULL;
+	u16 config;
+
+	bus = pdev->bus;
+	bridge = bus->self;
+
+	/* Is VGA routed to us? */
+	if (bridge && (pci_is_bridge(bridge))) {
+		pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &config);
+
+		/* Yes, this bridge is PCI bridge-to-bridge spec compliant,
+		 *  just return!
+		 */
+		if (config & PCI_BRIDGE_CTL_VGA)
+			return;
+
+		dev_warn(&pdev->dev, "VGA bridge control is not enabled\n");
+	}
+
+	/* Just return if the system already have a default device */
+	if (vga_default_device())
+		return;
+
+	/* No default vga device */
+	while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, vdevp))) {
+		if (vdevp->vendor != 0x1a03) {
+			/* Have other vga devcie in the system, do nothing */
+			dev_info(&pdev->dev, "Another boot vga device: 0x%x:0x%x\n",
+				vdevp->vendor, vdevp->device);
+			return;
+		}
+	}
+
+	vga_set_default_device(pdev);
+
+	dev_info(&pdev->dev, "Boot vga device set as 0x%x:0x%x\n",
+			pdev->vendor, pdev->device);
+}
+DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb);
+
+
 /*
  * The Mellanox Tavor device gives false positive parity errors.  Disable
  * parity error reporting.
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/5] PCI: Add quirk for multifunction devices of LS7A
  2021-05-14  8:00 ` [PATCH 4/5] PCI: Add quirk for multifunction devices of LS7A Huacai Chen
@ 2021-05-14 13:22   ` Krzysztof Wilczyński
  2021-05-14 14:52   ` Jiaxun Yang
  2021-05-14 15:51   ` Bjorn Helgaas
  2 siblings, 0 replies; 39+ messages in thread
From: Krzysztof Wilczyński @ 2021-05-14 13:22 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Bjorn Helgaas, linux-pci, Huacai Chen, Jiaxun Yang, Jianmin Lv

Hi Jianmin,

[...]
> In LS7A, multifunction device use same pci PIN and different
> irq for different function, so fix it for standard pci PIN
> usage.

It would be "PCI" instead of "pci" and "IRQ" instead of "irq" in the
commit message.  This would also be true in the other patches in this
series.

[...]
> +	if (pci_match_id(devids, dev)) {
> +		u8 fun = dev->devfn & 7;

You could use the PCI_FUNC() macro here.

Krzysztof

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-14  8:00 ` [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge Huacai Chen
@ 2021-05-14 13:56   ` Krzysztof Wilczyński
  2021-05-14 15:10   ` Bjorn Helgaas
  1 sibling, 0 replies; 39+ messages in thread
From: Krzysztof Wilczyński @ 2021-05-14 13:56 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Bjorn Helgaas, linux-pci, Huacai Chen, Jiaxun Yang, Jingfeng Sui

Hi Huacai,

[...]
> The ASpeed AST2500 hardward does not set the VGA Enable bit on its

It would be "hardware" in the above sentence.

> bridge control register, which causes vgaarb subsystem don't think the

Probably better to say "don't consider the" rather than use "think".

> VGA card behind the bridge as a valid boot vga device.

It would be "VGA" in the above sentence, to be consistent.

> So we provide a quirk to fix Xorg auto-detection.

A nit pick.  Technically it's "X.org", but I see that the canon in the
commit messages is to use "Xorg", so either would be valid.

> See similar bug:
> 
> https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/

A small nit pick.  Linking to https://lore.kernel.org/linux-pci/ is
often preferred.

[...]
> +	struct pci_dev *bridge;
> +	struct pci_bus *bus;
> +	struct pci_dev *vdevp = NULL;
> +	u16 config;
> +
> +	bus = pdev->bus;
> +	bridge = bus->self;

Preferred style would be:

  struct pci_bus *bus = pdev->bug;
  struct pci_dev *bridge = bus->self;

> +	/* Is VGA routed to us? */
> +	if (bridge && (pci_is_bridge(bridge))) {
> +		pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &config);
> +
> +		/* Yes, this bridge is PCI bridge-to-bridge spec compliant,
> +		 *  just return!
> +		 */

Second line of the comment has some errand space.  Also, wording of this
comment could be improved a bit.

> +		if (config & PCI_BRIDGE_CTL_VGA)
> +			return;
> +
> +		dev_warn(&pdev->dev, "VGA bridge control is not enabled\n");
> +	}
> +
> +	/* Just return if the system already have a default device */

Missing period at the end of the sentence in the comment.  I am also not
sure this comment is useful, as the if-statement immediately below is
quite self-explanatory.

> +	if (vga_default_device())
> +		return;
> +
> +	/* No default vga device */

It would be "VGA" here, plus missing period at the end.

> +	while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, vdevp))) {
> +		if (vdevp->vendor != 0x1a03) {
> +			/* Have other vga devcie in the system, do nothing */

It would be "VGA" and "device" in the comment above, missing period at
the end.  I am also not sure if this comment is useful, given the log
line just below it.

> +			dev_info(&pdev->dev, "Another boot vga device: 0x%x:0x%x\n",

It would be "VGA" in the message above.  Also, using "Another" is a bit
vague.  What makes a VGA device "another" in this case?

> +	dev_info(&pdev->dev, "Boot vga device set as 0x%x:0x%x\n",
> +			pdev->vendor, pdev->device);

You need to align the second line in the above with the open bracket,
like in the log line above.

> +DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb);

You might need to wrap this line above, see:

  https://lore.kernel.org/linux-pci/20171026223701.GA25649@bhelgaas-glaptop.roam.corp.google.com/

Krzysztof

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/5] PCI: Improve the mrrs quirk for LS7A
  2021-05-14  8:00 ` [PATCH 3/5] PCI: Improve the mrrs quirk for LS7A Huacai Chen
@ 2021-05-14 14:03   ` Krzysztof Wilczyński
  2021-05-14 15:39   ` Bjorn Helgaas
  1 sibling, 0 replies; 39+ messages in thread
From: Krzysztof Wilczyński @ 2021-05-14 14:03 UTC (permalink / raw)
  To: Huacai Chen; +Cc: Bjorn Helgaas, linux-pci, Huacai Chen, Jiaxun Yang

Hi Huacai,

It would be "MRRS" in the subject line.

> In new revision of LS7A, some pcie ports support larger value than 256,

Here, it would be "PCIe" instead of "pcie".

> but their mrrs values are not dectectable. And moreover, the current

It would be "MRRS", and "detectable".

> loongson_mrrs_quirk() cannot avoid devices increasing its mrrs after
> pci_enable_device(). So the only possible way is configure mrrs of all
> devices in BIOS, and add a pci dev flag (PCI_DEV_FLAGS_NO_INCREASE_MRRS)
> to stop the increasing mrrs operations.
[...]

Again, it would be "MRRS" and "PCI" throughout.  Possibly also "device
flag" rather than "dev flag", but this is more of a nit pick.

Krzysztof

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/5] PCI/portdrv: Don't disable pci device during shutdown
  2021-05-14  8:00 ` [PATCH 1/5] PCI/portdrv: Don't disable pci device during shutdown Huacai Chen
@ 2021-05-14 14:20   ` Krzysztof Wilczyński
  2021-05-14 16:09   ` Bjorn Helgaas
  1 sibling, 0 replies; 39+ messages in thread
From: Krzysztof Wilczyński @ 2021-05-14 14:20 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Bjorn Helgaas, linux-pci, Huacai Chen, Jiaxun Yang, Tiezhu Yang

Hi Huacai,

> Use separate remove()/shutdown() callback, and don't disable pci device

It would be "PCI" here in the above sentence and in the subject line.

> during shutdown. This can avoid some poweroff/reboot failures.

> The poweroff/reboot failures can easily reproduce on Loongson platforms.

Could be better as "can easily be reproduced" in the above.

> I think this is not a Loongson-specific problem, instead, is a problem
> related to some specific PCI hosts. On some x86 platforms, radeon/amdgpu
> devices can cause the same problem, and commit faefba95c9e8ca3a523831c2e
> ("drm/amdgpu: just suspend the hw on pci shutdown") can resolve it.

You might want to change the language to be more imperative in here, as
at the moment I am not sure if you actually have a solution to the
problem here, or you think you have one. :)
 
> As Tiezhu said, this occasionally shutdown or reboot failure is due to
> clear PCI_COMMAND_MASTER on the device in do_pci_disable_device().
> 
> drivers/pci/pci.c
> static void do_pci_disable_device(struct pci_dev *dev)
> {
>         u16 pci_command;
> 
>         pci_read_config_word(dev, PCI_COMMAND, &pci_command);
>         if (pci_command & PCI_COMMAND_MASTER) {
>                 pci_command &= ~PCI_COMMAND_MASTER;
>                 pci_write_config_word(dev, PCI_COMMAND, pci_command);
>         }
> 
>         pcibios_disable_device(dev);
> }
> 
> When remove "pci_command &= ~PCI_COMMAND_MASTER;", it can work well when
> shutdown or reboot. This may implies that there are DMA activities on the
> device while shutdown.
> 
> Radeon driver is more difficult than amdgpu due to its confusing symbol
> names, and I have maintained an out-of-tree patch for a long time [1].
> Recently, we found more and more devices can cause the same problem, and
> it is very difficult to modify all problematic drivers as radeon/amdgpu
> does (the .shutdown callback should make sure there is no DMA activity).
> So, I think modify the PCIe port driver is a simple and effective way.
> And as early discussed, kexec can still work after this patch.
> 
> [1] https://github.com/chenhuacai/linux/commit/8da06f9b669831829416a3e9f4d1c57f217a42f0
[...]

The above explanation and entire backstory is very helpful, but it might
be better to include it in the cover letter, and keep the commit message
here concise and only focused on what is being done here and why.

Krzysztof

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/5] PCI: Add quirk for multifunction devices of LS7A
  2021-05-14  8:00 ` [PATCH 4/5] PCI: Add quirk for multifunction devices of LS7A Huacai Chen
  2021-05-14 13:22   ` Krzysztof Wilczyński
@ 2021-05-14 14:52   ` Jiaxun Yang
  2021-05-15  3:52     ` Huacai Chen
  2021-05-14 15:51   ` Bjorn Helgaas
  2 siblings, 1 reply; 39+ messages in thread
From: Jiaxun Yang @ 2021-05-14 14:52 UTC (permalink / raw)
  To: Huacai Chen, Bjorn Helgaas; +Cc: linux-pci, Huacai Chen, Jianmin Lv



在 2021/5/14 16:00, Huacai Chen 写道:
> From: Jianmin Lv <lvjianmin@loongson.cn>
>
> In LS7A, multifunction device use same pci PIN and different
> irq for different function, so fix it for standard pci PIN
> usage.

Hmm, I'm unsure about this change.
The PCIe port, or PCI-to-PCI bridge on LS7A only have a single
upstream interrupt specified in DeviceTree, how can this quirk
work?

Thanks.

- Jiaxun

>
> Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> ---


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-14  8:00 ` [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge Huacai Chen
  2021-05-14 13:56   ` Krzysztof Wilczyński
@ 2021-05-14 15:10   ` Bjorn Helgaas
  2021-05-15  9:09     ` Huacai Chen
  1 sibling, 1 reply; 39+ messages in thread
From: Bjorn Helgaas @ 2021-05-14 15:10 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Bjorn Helgaas, linux-pci, Huacai Chen, Jiaxun Yang, Jingfeng Sui

On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote:
> According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is
> VGA Enable bit which modifies the response to VGA compatible addresses.

The bridge spec is pretty old, and most of the content has been
incorporated into the PCIe spec.  I think you can cite "PCIe r5.0, sec
7.5.1.3.13" here instead.

> If the VGA Enable bit is set, the bridge will decode and forward the
> following accesses on the primary interface to the secondary interface.

*Which* following accesses?  The structure of English requires that if
you say "the following accesses," you must continue by *listing* the
accesses.

> The ASpeed AST2500 hardward does not set the VGA Enable bit on its
> bridge control register, which causes vgaarb subsystem don't think the
> VGA card behind the bridge as a valid boot vga device.

s/hardward/bridge/
s/vga/VGA/ (also in code comments and dmesg strings below)

From the code, it looks like AST2500 ([1a03:2000]) is a VGA device,
since it apparently has a VGA class code.  But here you say the
AST2500 has a Bridge Control register, which suggests that it's a
bridge.  If AST2500 is some sort of combination that includes both a
bridge and a VGA device, please outline that topology.

But the hardware defect is that some bridges forward VGA accesses even
though their VGA Enable bit is not set?  The quirk should be attached
to broken *bridges*, not to VGA devices.

If a bridge forwards VGA accesses regardless of how its VGA Enable bit
is set, that means VGA arbitration (in vgaarb.c) cannot work
correctly, so merely setting the default VGA device once in a quirk is
not sufficient.  You would have to somehow disable any future attempts
to use other VGA devices.  Only the VGA device below this defective
bridge is usable.  Any other VGA devices in the system would be
useless.

> So we provide a quirk to fix Xorg auto-detection.
> 
> See similar bug:
> 
> https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/

This patch was never merged.  If we merged a revised version, please
cite the SHA1 instead.

> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> Signed-off-by: Jingfeng Sui <suijingfeng@loongson.cn>
> ---
>  drivers/pci/quirks.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6ab4b3bba36b..adf5490706ad 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -28,6 +28,7 @@
>  #include <linux/platform_data/x86/apple.h>
>  #include <linux/pm_runtime.h>
>  #include <linux/switchtec.h>
> +#include <linux/vgaarb.h>
>  #include <asm/dma.h>	/* isa_dma_bridge_buggy */
>  #include "pci.h"
>  
> @@ -297,6 +298,52 @@ static void loongson_mrrs_quirk(struct pci_dev *dev)
>  }
>  DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk);
>  
> +
> +static void aspeed_fixup_vgaarb(struct pci_dev *pdev)
> +{
> +	struct pci_dev *bridge;
> +	struct pci_bus *bus;
> +	struct pci_dev *vdevp = NULL;
> +	u16 config;
> +
> +	bus = pdev->bus;
> +	bridge = bus->self;
> +
> +	/* Is VGA routed to us? */
> +	if (bridge && (pci_is_bridge(bridge))) {
> +		pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &config);
> +
> +		/* Yes, this bridge is PCI bridge-to-bridge spec compliant,
> +		 *  just return!
> +		 */
> +		if (config & PCI_BRIDGE_CTL_VGA)
> +			return;
> +
> +		dev_warn(&pdev->dev, "VGA bridge control is not enabled\n");
> +	}

You cannot assume that a bridge is defective just because
PCI_BRIDGE_CTL_VGA is not set.

> +	/* Just return if the system already have a default device */
> +	if (vga_default_device())
> +		return;
> +
> +	/* No default vga device */
> +	while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, vdevp))) {
> +		if (vdevp->vendor != 0x1a03) {
> +			/* Have other vga devcie in the system, do nothing */
> +			dev_info(&pdev->dev, "Another boot vga device: 0x%x:0x%x\n",
> +				vdevp->vendor, vdevp->device);
> +			return;
> +		}
> +	}
> +
> +	vga_set_default_device(pdev);
> +
> +	dev_info(&pdev->dev, "Boot vga device set as 0x%x:0x%x\n",
> +			pdev->vendor, pdev->device);
> +}
> +DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb);
> +
> +
>  /*
>   * The Mellanox Tavor device gives false positive parity errors.  Disable
>   * parity error reporting.
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/5] PCI: Improve the mrrs quirk for LS7A
  2021-05-14  8:00 ` [PATCH 3/5] PCI: Improve the mrrs quirk for LS7A Huacai Chen
  2021-05-14 14:03   ` Krzysztof Wilczyński
@ 2021-05-14 15:39   ` Bjorn Helgaas
  2021-05-15  3:49     ` Huacai Chen
  1 sibling, 1 reply; 39+ messages in thread
From: Bjorn Helgaas @ 2021-05-14 15:39 UTC (permalink / raw)
  To: Huacai Chen; +Cc: Bjorn Helgaas, linux-pci, Huacai Chen, Jiaxun Yang

On Fri, May 14, 2021 at 04:00:23PM +0800, Huacai Chen wrote:
> In new revision of LS7A, some pcie ports support larger value than 256,
> but their mrrs values are not dectectable. And moreover, the current
> loongson_mrrs_quirk() cannot avoid devices increasing its mrrs after
> pci_enable_device(). So the only possible way is configure mrrs of all
> devices in BIOS, and add a pci dev flag (PCI_DEV_FLAGS_NO_INCREASE_MRRS)
> to stop the increasing mrrs operations.

s/mrrs/MRRS/
s/dectectable/detectable/

This doesn't make sense to me.  MRRS "sets the maximum Read Request
size for the Function as a Requester" (PCIe r5.0, sec 7.5.3.4).

The MRRS in the Device Control register is a 3-bit RW field (or a RO
field with value 000b).  If it's RW, software is allowed to write any
3-bit value to it.  There is no "maximum allowed value" for software
to detect.

The value software writes is only a *limit* on the Read Request size.
The hardware is never obligated to generate Read Requests of that max
size.  If software writes 101b (4096 byte max size), and the hardware
only supports generating 128-byte Read Requests, there's no issue.
It's perfectly fine for the hardware to generate 128-byte requests.

Apparently something bad happens if software writes something "too
large" to MRRS?  What actually happens?

If the problem is that the device generates a large Read Request and
in response, it receives a data TLP that is larger than it can handle,
that sounds like an MPS issue, not an MRRS issue.

Based on the existing loongson_mrrs_quirk(), it looks like this is a
long-standing issue.  I'm sorry I missed this when reviewing the
driver in the first place.  This all needs a much better explanation
of what the real problem is.  The "h/w limitation of 256 bytes maximum
read request size" comment just doesn't make sense from the spec point
of view.

I do know that Linux uses MRRS and MPS in ... highly unusual ways, and
maybe we're tripping over that somehow.  If so, we need to figure out
exactly how so we can make Linux's use of MPS and MRRS better overall.

> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> ---
>  drivers/pci/pci.c    | 5 +++++
>  drivers/pci/quirks.c | 6 ++++++
>  include/linux/pci.h  | 2 ++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b717680377a9..6f0d2f5b6f30 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5802,6 +5802,11 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
>  
>  	v = (ffs(rq) - 8) << 12;
>  
> +	if (dev->dev_flags & PCI_DEV_FLAGS_NO_INCREASE_MRRS) {
> +		if (rq > pcie_get_readrq(dev))
> +			return -EINVAL;
> +	}
> +
>  	ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
>  						  PCI_EXP_DEVCTL_READRQ, v);
>  
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 66e4bea69431..10b3b2057940 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -264,6 +264,12 @@ static void loongson_mrrs_quirk(struct pci_dev *dev)
>  		 * any devices attached under these ports.
>  		 */
>  		if (pci_match_id(bridge_devids, bridge)) {
> +			dev->dev_flags |= PCI_DEV_FLAGS_NO_INCREASE_MRRS;
> +
> +			if (pcie_bus_config == PCIE_BUS_DEFAULT ||
> +			    pcie_bus_config == PCIE_BUS_TUNE_OFF)
> +				break;
> +
>  			if (pcie_get_readrq(dev) > 256) {
>  				pci_info(dev, "limiting MRRS to 256\n");
>  				pcie_set_readrq(dev, 256);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index c20211e59a57..7fb2072a83b8 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -227,6 +227,8 @@ enum pci_dev_flags {
>  	PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
>  	/* Don't use Relaxed Ordering for TLPs directed at this device */
>  	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
> +	/* Don't increase BIOS's MRRS configuration */
> +	PCI_DEV_FLAGS_NO_INCREASE_MRRS = (__force pci_dev_flags_t) (1 << 12),
>  };
>  
>  enum pci_irq_reroute_variant {
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/5] PCI: Add quirk for multifunction devices of LS7A
  2021-05-14  8:00 ` [PATCH 4/5] PCI: Add quirk for multifunction devices of LS7A Huacai Chen
  2021-05-14 13:22   ` Krzysztof Wilczyński
  2021-05-14 14:52   ` Jiaxun Yang
@ 2021-05-14 15:51   ` Bjorn Helgaas
  2021-05-15  3:56     ` Huacai Chen
  2 siblings, 1 reply; 39+ messages in thread
From: Bjorn Helgaas @ 2021-05-14 15:51 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Bjorn Helgaas, linux-pci, Huacai Chen, Jiaxun Yang, Jianmin Lv

On Fri, May 14, 2021 at 04:00:24PM +0800, Huacai Chen wrote:
> From: Jianmin Lv <lvjianmin@loongson.cn>
> 
> In LS7A, multifunction device use same pci PIN and different
> irq for different function, so fix it for standard pci PIN
> usage.

Apparently the defect here is that the Interrupt Pin register reports
the wrong INTx values?

Will this be fixed in future hardware so we don't have to add more
devices to the quirk?

> Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> ---
>  drivers/pci/quirks.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 10b3b2057940..6ab4b3bba36b 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -242,6 +242,23 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
>  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
>  			DEV_LS7A_LPC, loongson_system_bus_quirk);
>  
> +static void loongson_pci_pin_quirk(struct pci_dev *dev)
> +{
> +	static const struct pci_device_id devids[] = {
> +		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_0) },
> +		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_1) },
> +		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_2) },
> +		{ 0, },
> +	};
> +
> +	if (pci_match_id(devids, dev)) {
> +		u8 fun = dev->devfn & 7;

Use PCI_FUNC().

> +		dev->pin = 1 + (fun & 3);
> +	}
> +}
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON, PCI_ANY_ID, loongson_pci_pin_quirk);

The usual pattern is to list each device here instead of using
pci_match_id() in the quirk, e.g.,

  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON, 0x7a09, loongson_pci_pin_quirk);
  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON, 0x7a19, loongson_pci_pin_quirk);
  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON, 0x7a29, loongson_pci_pin_quirk);

>  static void loongson_mrrs_quirk(struct pci_dev *dev)
>  {
>  	struct pci_bus *bus = dev->bus;
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/5] PCI/portdrv: Don't disable pci device during shutdown
  2021-05-14  8:00 ` [PATCH 1/5] PCI/portdrv: Don't disable pci device during shutdown Huacai Chen
  2021-05-14 14:20   ` Krzysztof Wilczyński
@ 2021-05-14 16:09   ` Bjorn Helgaas
  2021-05-15  3:38     ` Huacai Chen
  1 sibling, 1 reply; 39+ messages in thread
From: Bjorn Helgaas @ 2021-05-14 16:09 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Bjorn Helgaas, linux-pci, Huacai Chen, Jiaxun Yang, Tiezhu Yang

In subject, s/pci device/device/.  We already know this is PCI.

On Fri, May 14, 2021 at 04:00:21PM +0800, Huacai Chen wrote:
> Use separate remove()/shutdown() callback, and don't disable pci device
> during shutdown. This can avoid some poweroff/reboot failures.

s/pci/PCI/

> The poweroff/reboot failures can easily reproduce on Loongson platforms.
> I think this is not a Loongson-specific problem, instead, is a problem
> related to some specific PCI hosts. On some x86 platforms, radeon/amdgpu
> devices can cause the same problem, and commit faefba95c9e8ca3a523831c2e
> ("drm/amdgpu: just suspend the hw on pci shutdown") can resolve it.

Please explain exactly what these failures are and include URLs for
relevant reports, bugzillas, etc.

Conventional citation format is

  faefba95c9e8 ("drm/amdgpu: just suspend the hw on pci shutdown")

> As Tiezhu said, this occasionally shutdown or reboot failure is due to
> clear PCI_COMMAND_MASTER on the device in do_pci_disable_device().

Where did Tiezhu say this?  Please link to this conversation.

> drivers/pci/pci.c

Unnecessary; we can use cscope/tags/grep/etc to find this.

> static void do_pci_disable_device(struct pci_dev *dev)
> {
>         u16 pci_command;
> 
>         pci_read_config_word(dev, PCI_COMMAND, &pci_command);
>         if (pci_command & PCI_COMMAND_MASTER) {
>                 pci_command &= ~PCI_COMMAND_MASTER;
>                 pci_write_config_word(dev, PCI_COMMAND, pci_command);
>         }
> 
>         pcibios_disable_device(dev);
> }
> 
> When remove "pci_command &= ~PCI_COMMAND_MASTER;", it can work well when
> shutdown or reboot. This may implies that there are DMA activities on the
> device while shutdown.
> 
> Radeon driver is more difficult than amdgpu due to its confusing symbol
> names, and I have maintained an out-of-tree patch for a long time [1].
> Recently, we found more and more devices can cause the same problem, and
> it is very difficult to modify all problematic drivers as radeon/amdgpu
> does (the .shutdown callback should make sure there is no DMA activity).
> So, I think modify the PCIe port driver is a simple and effective way.
> And as early discussed, kexec can still work after this patch.

Link to this discussion as well?

This commit log does not contain a clear description of the problem
and how the patch fixes it.

> [1] https://github.com/chenhuacai/linux/commit/8da06f9b669831829416a3e9f4d1c57f217a42f0
> 
> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> ---
>  drivers/pci/pcie/portdrv.h      |  2 +-
>  drivers/pci/pcie/portdrv_core.c |  6 ++++--
>  drivers/pci/pcie/portdrv_pci.c  | 15 +++++++++++++--
>  3 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index 2ff5724b8f13..358d7281f6e8 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -117,7 +117,7 @@ int pcie_port_device_resume(struct device *dev);
>  int pcie_port_device_runtime_suspend(struct device *dev);
>  int pcie_port_device_runtime_resume(struct device *dev);
>  #endif
> -void pcie_port_device_remove(struct pci_dev *dev);
> +void pcie_port_device_remove(struct pci_dev *dev, bool disable);
>  int __must_check pcie_port_bus_register(void);
>  void pcie_port_bus_unregister(void);
>  
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index e1fed6649c41..98c0a99a41d6 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -484,11 +484,13 @@ EXPORT_SYMBOL_GPL(pcie_port_find_device);
>   * Remove PCI Express port service devices associated with given port and
>   * disable MSI-X or MSI for the port.
>   */
> -void pcie_port_device_remove(struct pci_dev *dev)
> +void pcie_port_device_remove(struct pci_dev *dev, bool disable)
>  {
>  	device_for_each_child(&dev->dev, NULL, remove_iter);
>  	pci_free_irq_vectors(dev);
> -	pci_disable_device(dev);
> +
> +	if (disable)
> +		pci_disable_device(dev);
>  }
>  
>  /**
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index c7ff1eea225a..562fbf3c1ea9 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -147,7 +147,18 @@ static void pcie_portdrv_remove(struct pci_dev *dev)
>  		pm_runtime_dont_use_autosuspend(&dev->dev);
>  	}
>  
> -	pcie_port_device_remove(dev);
> +	pcie_port_device_remove(dev, true);
> +}
> +
> +static void pcie_portdrv_shutdown(struct pci_dev *dev)
> +{
> +	if (pci_bridge_d3_possible(dev)) {
> +		pm_runtime_forbid(&dev->dev);
> +		pm_runtime_get_noresume(&dev->dev);
> +		pm_runtime_dont_use_autosuspend(&dev->dev);
> +	}
> +
> +	pcie_port_device_remove(dev, false);
>  }
>  
>  static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
> @@ -219,7 +230,7 @@ static struct pci_driver pcie_portdriver = {
>  
>  	.probe		= pcie_portdrv_probe,
>  	.remove		= pcie_portdrv_remove,
> -	.shutdown	= pcie_portdrv_remove,
> +	.shutdown	= pcie_portdrv_shutdown,
>  
>  	.err_handler	= &pcie_portdrv_err_handler,
>  
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/5] PCI/portdrv: Don't disable pci device during shutdown
  2021-05-14 16:09   ` Bjorn Helgaas
@ 2021-05-15  3:38     ` Huacai Chen
  0 siblings, 0 replies; 39+ messages in thread
From: Huacai Chen @ 2021-05-15  3:38 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang, Tiezhu Yang

Hi, Krzysztof and Bjorn

On Sat, May 15, 2021 at 12:09 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> In subject, s/pci device/device/.  We already know this is PCI.
>
> On Fri, May 14, 2021 at 04:00:21PM +0800, Huacai Chen wrote:
> > Use separate remove()/shutdown() callback, and don't disable pci device
> > during shutdown. This can avoid some poweroff/reboot failures.
>
> s/pci/PCI/
>
> > The poweroff/reboot failures can easily reproduce on Loongson platforms.
> > I think this is not a Loongson-specific problem, instead, is a problem
> > related to some specific PCI hosts. On some x86 platforms, radeon/amdgpu
> > devices can cause the same problem, and commit faefba95c9e8ca3a523831c2e
> > ("drm/amdgpu: just suspend the hw on pci shutdown") can resolve it.
>
> Please explain exactly what these failures are and include URLs for
> relevant reports, bugzillas, etc.
>
> Conventional citation format is
>
>   faefba95c9e8 ("drm/amdgpu: just suspend the hw on pci shutdown")
Thank you very much, I will send a new version with more information attached.

>
> > As Tiezhu said, this occasionally shutdown or reboot failure is due to
> > clear PCI_COMMAND_MASTER on the device in do_pci_disable_device().
>
> Where did Tiezhu say this?  Please link to this conversation.
>
> > drivers/pci/pci.c
>
> Unnecessary; we can use cscope/tags/grep/etc to find this.
>
> > static void do_pci_disable_device(struct pci_dev *dev)
> > {
> >         u16 pci_command;
> >
> >         pci_read_config_word(dev, PCI_COMMAND, &pci_command);
> >         if (pci_command & PCI_COMMAND_MASTER) {
> >                 pci_command &= ~PCI_COMMAND_MASTER;
> >                 pci_write_config_word(dev, PCI_COMMAND, pci_command);
> >         }
> >
> >         pcibios_disable_device(dev);
> > }
> >
> > When remove "pci_command &= ~PCI_COMMAND_MASTER;", it can work well when
> > shutdown or reboot. This may implies that there are DMA activities on the
> > device while shutdown.
> >
> > Radeon driver is more difficult than amdgpu due to its confusing symbol
> > names, and I have maintained an out-of-tree patch for a long time [1].
> > Recently, we found more and more devices can cause the same problem, and
> > it is very difficult to modify all problematic drivers as radeon/amdgpu
> > does (the .shutdown callback should make sure there is no DMA activity).
> > So, I think modify the PCIe port driver is a simple and effective way.
> > And as early discussed, kexec can still work after this patch.
>
> Link to this discussion as well?
>
> This commit log does not contain a clear description of the problem
> and how the patch fixes it.
>
> > [1] https://github.com/chenhuacai/linux/commit/8da06f9b669831829416a3e9f4d1c57f217a42f0
> >
> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
> > ---
> >  drivers/pci/pcie/portdrv.h      |  2 +-
> >  drivers/pci/pcie/portdrv_core.c |  6 ++++--
> >  drivers/pci/pcie/portdrv_pci.c  | 15 +++++++++++++--
> >  3 files changed, 18 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> > index 2ff5724b8f13..358d7281f6e8 100644
> > --- a/drivers/pci/pcie/portdrv.h
> > +++ b/drivers/pci/pcie/portdrv.h
> > @@ -117,7 +117,7 @@ int pcie_port_device_resume(struct device *dev);
> >  int pcie_port_device_runtime_suspend(struct device *dev);
> >  int pcie_port_device_runtime_resume(struct device *dev);
> >  #endif
> > -void pcie_port_device_remove(struct pci_dev *dev);
> > +void pcie_port_device_remove(struct pci_dev *dev, bool disable);
> >  int __must_check pcie_port_bus_register(void);
> >  void pcie_port_bus_unregister(void);
> >
> > diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> > index e1fed6649c41..98c0a99a41d6 100644
> > --- a/drivers/pci/pcie/portdrv_core.c
> > +++ b/drivers/pci/pcie/portdrv_core.c
> > @@ -484,11 +484,13 @@ EXPORT_SYMBOL_GPL(pcie_port_find_device);
> >   * Remove PCI Express port service devices associated with given port and
> >   * disable MSI-X or MSI for the port.
> >   */
> > -void pcie_port_device_remove(struct pci_dev *dev)
> > +void pcie_port_device_remove(struct pci_dev *dev, bool disable)
> >  {
> >       device_for_each_child(&dev->dev, NULL, remove_iter);
> >       pci_free_irq_vectors(dev);
> > -     pci_disable_device(dev);
> > +
> > +     if (disable)
> > +             pci_disable_device(dev);
> >  }
> >
> >  /**
> > diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> > index c7ff1eea225a..562fbf3c1ea9 100644
> > --- a/drivers/pci/pcie/portdrv_pci.c
> > +++ b/drivers/pci/pcie/portdrv_pci.c
> > @@ -147,7 +147,18 @@ static void pcie_portdrv_remove(struct pci_dev *dev)
> >               pm_runtime_dont_use_autosuspend(&dev->dev);
> >       }
> >
> > -     pcie_port_device_remove(dev);
> > +     pcie_port_device_remove(dev, true);
> > +}
> > +
> > +static void pcie_portdrv_shutdown(struct pci_dev *dev)
> > +{
> > +     if (pci_bridge_d3_possible(dev)) {
> > +             pm_runtime_forbid(&dev->dev);
> > +             pm_runtime_get_noresume(&dev->dev);
> > +             pm_runtime_dont_use_autosuspend(&dev->dev);
> > +     }
> > +
> > +     pcie_port_device_remove(dev, false);
> >  }
> >
> >  static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
> > @@ -219,7 +230,7 @@ static struct pci_driver pcie_portdriver = {
> >
> >       .probe          = pcie_portdrv_probe,
> >       .remove         = pcie_portdrv_remove,
> > -     .shutdown       = pcie_portdrv_remove,
> > +     .shutdown       = pcie_portdrv_shutdown,
> >
> >       .err_handler    = &pcie_portdrv_err_handler,
> >
> > --
> > 2.27.0
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/5] PCI: Improve the mrrs quirk for LS7A
  2021-05-14 15:39   ` Bjorn Helgaas
@ 2021-05-15  3:49     ` Huacai Chen
  2021-05-15  6:22       ` Jiaxun Yang
  0 siblings, 1 reply; 39+ messages in thread
From: Huacai Chen @ 2021-05-15  3:49 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang, huangshuai

Hi, Krzysztof and Bjorn

I will improve my spelling, and others see below.

On Fri, May 14, 2021 at 11:40 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, May 14, 2021 at 04:00:23PM +0800, Huacai Chen wrote:
> > In new revision of LS7A, some pcie ports support larger value than 256,
> > but their mrrs values are not dectectable. And moreover, the current
> > loongson_mrrs_quirk() cannot avoid devices increasing its mrrs after
> > pci_enable_device(). So the only possible way is configure mrrs of all
> > devices in BIOS, and add a pci dev flag (PCI_DEV_FLAGS_NO_INCREASE_MRRS)
> > to stop the increasing mrrs operations.
>
> s/mrrs/MRRS/
> s/dectectable/detectable/
>
> This doesn't make sense to me.  MRRS "sets the maximum Read Request
> size for the Function as a Requester" (PCIe r5.0, sec 7.5.3.4).
>
> The MRRS in the Device Control register is a 3-bit RW field (or a RO
> field with value 000b).  If it's RW, software is allowed to write any
> 3-bit value to it.  There is no "maximum allowed value" for software
> to detect.
>
> The value software writes is only a *limit* on the Read Request size.
> The hardware is never obligated to generate Read Requests of that max
> size.  If software writes 101b (4096 byte max size), and the hardware
> only supports generating 128-byte Read Requests, there's no issue.
> It's perfectly fine for the hardware to generate 128-byte requests.
>
> Apparently something bad happens if software writes something "too
> large" to MRRS?  What actually happens?
>
> If the problem is that the device generates a large Read Request and
> in response, it receives a data TLP that is larger than it can handle,
> that sounds like an MPS issue, not an MRRS issue.
>
> Based on the existing loongson_mrrs_quirk(), it looks like this is a
> long-standing issue.  I'm sorry I missed this when reviewing the
> driver in the first place.  This all needs a much better explanation
> of what the real problem is.  The "h/w limitation of 256 bytes maximum
> read request size" comment just doesn't make sense from the spec point
> of view.
>
> I do know that Linux uses MRRS and MPS in ... highly unusual ways, and
> maybe we're tripping over that somehow.  If so, we need to figure out
> exactly how so we can make Linux's use of MPS and MRRS better overall.
I have discussed with Shuai Huang (the main designer of LS7A), he said
that some devices (such as Realtek 8169) usually write a large value
to MRRS in its driver. And that usually larger than LS7A bridge can
handle, the quirk in this patch is avoid device driver to increase
MRRS (and BIOS initialize a reasonable value at power on stage).

Huacai
>
> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > ---
> >  drivers/pci/pci.c    | 5 +++++
> >  drivers/pci/quirks.c | 6 ++++++
> >  include/linux/pci.h  | 2 ++
> >  3 files changed, 13 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index b717680377a9..6f0d2f5b6f30 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5802,6 +5802,11 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
> >
> >       v = (ffs(rq) - 8) << 12;
> >
> > +     if (dev->dev_flags & PCI_DEV_FLAGS_NO_INCREASE_MRRS) {
> > +             if (rq > pcie_get_readrq(dev))
> > +                     return -EINVAL;
> > +     }
> > +
> >       ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
> >                                                 PCI_EXP_DEVCTL_READRQ, v);
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 66e4bea69431..10b3b2057940 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -264,6 +264,12 @@ static void loongson_mrrs_quirk(struct pci_dev *dev)
> >                * any devices attached under these ports.
> >                */
> >               if (pci_match_id(bridge_devids, bridge)) {
> > +                     dev->dev_flags |= PCI_DEV_FLAGS_NO_INCREASE_MRRS;
> > +
> > +                     if (pcie_bus_config == PCIE_BUS_DEFAULT ||
> > +                         pcie_bus_config == PCIE_BUS_TUNE_OFF)
> > +                             break;
> > +
> >                       if (pcie_get_readrq(dev) > 256) {
> >                               pci_info(dev, "limiting MRRS to 256\n");
> >                               pcie_set_readrq(dev, 256);
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index c20211e59a57..7fb2072a83b8 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -227,6 +227,8 @@ enum pci_dev_flags {
> >       PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
> >       /* Don't use Relaxed Ordering for TLPs directed at this device */
> >       PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
> > +     /* Don't increase BIOS's MRRS configuration */
> > +     PCI_DEV_FLAGS_NO_INCREASE_MRRS = (__force pci_dev_flags_t) (1 << 12),
> >  };
> >
> >  enum pci_irq_reroute_variant {
> > --
> > 2.27.0
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/5] PCI: Add quirk for multifunction devices of LS7A
  2021-05-14 14:52   ` Jiaxun Yang
@ 2021-05-15  3:52     ` Huacai Chen
  2021-05-18 13:59       ` Bjorn Helgaas
  0 siblings, 1 reply; 39+ messages in thread
From: Huacai Chen @ 2021-05-15  3:52 UTC (permalink / raw)
  To: Jiaxun Yang; +Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Jianmin Lv

Hi, Jiaxun,

On Fri, May 14, 2021 at 10:52 PM Jiaxun Yang <jiaxun.yang@flygoat.com> wrote:
>
>
>
> 在 2021/5/14 16:00, Huacai Chen 写道:
> > From: Jianmin Lv <lvjianmin@loongson.cn>
> >
> > In LS7A, multifunction device use same pci PIN and different
> > irq for different function, so fix it for standard pci PIN
> > usage.
>
> Hmm, I'm unsure about this change.
> The PCIe port, or PCI-to-PCI bridge on LS7A only have a single
> upstream interrupt specified in DeviceTree, how can this quirk
> work?
LS7A will be shared by MIPS-based Loongson and LoongArch-based
Loongson, LoongArch use ACPI rather than FDT, so this quirk is needed.

Huacai
>
> Thanks.
>
> - Jiaxun
>
> >
> > Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > ---
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/5] PCI: Add quirk for multifunction devices of LS7A
  2021-05-14 15:51   ` Bjorn Helgaas
@ 2021-05-15  3:56     ` Huacai Chen
  0 siblings, 0 replies; 39+ messages in thread
From: Huacai Chen @ 2021-05-15  3:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang, Jianmin Lv

Hi, Krzysztof and Bjorn

I will improve my spelling, others see below please.

On Fri, May 14, 2021 at 11:51 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, May 14, 2021 at 04:00:24PM +0800, Huacai Chen wrote:
> > From: Jianmin Lv <lvjianmin@loongson.cn>
> >
> > In LS7A, multifunction device use same pci PIN and different
> > irq for different function, so fix it for standard pci PIN
> > usage.
>
> Apparently the defect here is that the Interrupt Pin register reports
> the wrong INTx values?
In LS7A, the Pin register is hardcoded to the same value for all
functions in multi-function devices, so we need this.
>
> Will this be fixed in future hardware so we don't have to add more
> devices to the quirk?
This will be fixed in future hardware.

Huacai
>
> > Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > ---
> >  drivers/pci/quirks.c | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 10b3b2057940..6ab4b3bba36b 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -242,6 +242,23 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
> >  DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
> >                       DEV_LS7A_LPC, loongson_system_bus_quirk);
> >
> > +static void loongson_pci_pin_quirk(struct pci_dev *dev)
> > +{
> > +     static const struct pci_device_id devids[] = {
> > +             { PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_0) },
> > +             { PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_1) },
> > +             { PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_2) },
> > +             { 0, },
> > +     };
> > +
> > +     if (pci_match_id(devids, dev)) {
> > +             u8 fun = dev->devfn & 7;
>
> Use PCI_FUNC().
>
> > +             dev->pin = 1 + (fun & 3);
> > +     }
> > +}
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON, PCI_ANY_ID, loongson_pci_pin_quirk);
>
> The usual pattern is to list each device here instead of using
> pci_match_id() in the quirk, e.g.,
>
>   DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON, 0x7a09, loongson_pci_pin_quirk);
>   DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON, 0x7a19, loongson_pci_pin_quirk);
>   DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LOONGSON, 0x7a29, loongson_pci_pin_quirk);
>
> >  static void loongson_mrrs_quirk(struct pci_dev *dev)
> >  {
> >       struct pci_bus *bus = dev->bus;
> > --
> > 2.27.0
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/5] PCI: Improve the mrrs quirk for LS7A
  2021-05-15  3:49     ` Huacai Chen
@ 2021-05-15  6:22       ` Jiaxun Yang
  0 siblings, 0 replies; 39+ messages in thread
From: Jiaxun Yang @ 2021-05-15  6:22 UTC (permalink / raw)
  To: Huacai Chen, Bjorn Helgaas
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, huangshuai



在 2021/5/15 11:49, Huacai Chen 写道:
> Hi, Krzysztof and Bjorn
>
> I will improve my spelling, and others see below.
>
> On Fri, May 14, 2021 at 11:40 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>> On Fri, May 14, 2021 at 04:00:23PM +0800, Huacai Chen wrote:
>>> In new revision of LS7A, some pcie ports support larger value than 256,
>>> but their mrrs values are not dectectable. And moreover, the current
>>> loongson_mrrs_quirk() cannot avoid devices increasing its mrrs after
>>> pci_enable_device(). So the only possible way is configure mrrs of all
>>> devices in BIOS, and add a pci dev flag (PCI_DEV_FLAGS_NO_INCREASE_MRRS)
>>> to stop the increasing mrrs operations.
>> s/mrrs/MRRS/
>> s/dectectable/detectable/
>>
>> This doesn't make sense to me.  MRRS "sets the maximum Read Request
>> size for the Function as a Requester" (PCIe r5.0, sec 7.5.3.4).
>>
>> The MRRS in the Device Control register is a 3-bit RW field (or a RO
>> field with value 000b).  If it's RW, software is allowed to write any
>> 3-bit value to it.  There is no "maximum allowed value" for software
>> to detect.
>>
>> The value software writes is only a *limit* on the Read Request size.
>> The hardware is never obligated to generate Read Requests of that max
>> size.  If software writes 101b (4096 byte max size), and the hardware
>> only supports generating 128-byte Read Requests, there's no issue.
>> It's perfectly fine for the hardware to generate 128-byte requests.
>>
>> Apparently something bad happens if software writes something "too
>> large" to MRRS?  What actually happens?
>>
>> If the problem is that the device generates a large Read Request and
>> in response, it receives a data TLP that is larger than it can handle,
>> that sounds like an MPS issue, not an MRRS issue.
>>
>> Based on the existing loongson_mrrs_quirk(), it looks like this is a
>> long-standing issue.  I'm sorry I missed this when reviewing the
>> driver in the first place.  This all needs a much better explanation
>> of what the real problem is.  The "h/w limitation of 256 bytes maximum
>> read request size" comment just doesn't make sense from the spec point
>> of view.
>>
>> I do know that Linux uses MRRS and MPS in ... highly unusual ways, and
>> maybe we're tripping over that somehow.  If so, we need to figure out
>> exactly how so we can make Linux's use of MPS and MRRS better overall.
> I have discussed with Shuai Huang (the main designer of LS7A), he said
> that some devices (such as Realtek 8169) usually write a large value
> to MRRS in its driver. And that usually larger than LS7A bridge can
> handle, the quirk in this patch is avoid device driver to increase
> MRRS (and BIOS initialize a reasonable value at power on stage).

Based on my experiments on LS2K which have a similar issue, I guess the
problem is Loongson's AXI bus failed to accept reading burst larger then
certain size.

The larger MRRS is legal for PCIe controller but not for upstream bus.
So when you write the value to MRRS register the controller will still
generate oversized TLP and then send illegal response to AXI bus. Thus
we need to limit MRRS in software to avoid such situation.

I'm not a loongson employee so it might be wrong.

Thanks.

- Jiaxun

>
> Huacai
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-14 15:10   ` Bjorn Helgaas
@ 2021-05-15  9:09     ` Huacai Chen
  2021-05-17 12:53       ` Huacai Chen
  0 siblings, 1 reply; 39+ messages in thread
From: Huacai Chen @ 2021-05-15  9:09 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang, Jingfeng Sui

Hi, Bjorn,

On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote:
> > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is
> > VGA Enable bit which modifies the response to VGA compatible addresses.
>
> The bridge spec is pretty old, and most of the content has been
> incorporated into the PCIe spec.  I think you can cite "PCIe r5.0, sec
> 7.5.1.3.13" here instead.
>
> > If the VGA Enable bit is set, the bridge will decode and forward the
> > following accesses on the primary interface to the secondary interface.
>
> *Which* following accesses?  The structure of English requires that if
> you say "the following accesses," you must continue by *listing* the
> accesses.
>
> > The ASpeed AST2500 hardward does not set the VGA Enable bit on its
> > bridge control register, which causes vgaarb subsystem don't think the
> > VGA card behind the bridge as a valid boot vga device.
>
> s/hardward/bridge/
> s/vga/VGA/ (also in code comments and dmesg strings below)
>
> From the code, it looks like AST2500 ([1a03:2000]) is a VGA device,
> since it apparently has a VGA class code.  But here you say the
> AST2500 has a Bridge Control register, which suggests that it's a
> bridge.  If AST2500 is some sort of combination that includes both a
> bridge and a VGA device, please outline that topology.
>
> But the hardware defect is that some bridges forward VGA accesses even
> though their VGA Enable bit is not set?  The quirk should be attached
> to broken *bridges*, not to VGA devices.
>
> If a bridge forwards VGA accesses regardless of how its VGA Enable bit
> is set, that means VGA arbitration (in vgaarb.c) cannot work
> correctly, so merely setting the default VGA device once in a quirk is
> not sufficient.  You would have to somehow disable any future attempts
> to use other VGA devices.  Only the VGA device below this defective
> bridge is usable.  Any other VGA devices in the system would be
> useless.
>
> > So we provide a quirk to fix Xorg auto-detection.
> >
> > See similar bug:
> >
> > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/
>
> This patch was never merged.  If we merged a revised version, please
> cite the SHA1 instead.
This patch has never merged, and I found that it is unnecessary after
commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device
even if there's no legacy VGA"). Maybe this ASpeed patch is also
unnecessary. If it is still needed, I'll investigate the root cause.

Huacai
>
> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > Signed-off-by: Jingfeng Sui <suijingfeng@loongson.cn>
> > ---
> >  drivers/pci/quirks.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 47 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 6ab4b3bba36b..adf5490706ad 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -28,6 +28,7 @@
> >  #include <linux/platform_data/x86/apple.h>
> >  #include <linux/pm_runtime.h>
> >  #include <linux/switchtec.h>
> > +#include <linux/vgaarb.h>
> >  #include <asm/dma.h> /* isa_dma_bridge_buggy */
> >  #include "pci.h"
> >
> > @@ -297,6 +298,52 @@ static void loongson_mrrs_quirk(struct pci_dev *dev)
> >  }
> >  DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk);
> >
> > +
> > +static void aspeed_fixup_vgaarb(struct pci_dev *pdev)
> > +{
> > +     struct pci_dev *bridge;
> > +     struct pci_bus *bus;
> > +     struct pci_dev *vdevp = NULL;
> > +     u16 config;
> > +
> > +     bus = pdev->bus;
> > +     bridge = bus->self;
> > +
> > +     /* Is VGA routed to us? */
> > +     if (bridge && (pci_is_bridge(bridge))) {
> > +             pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &config);
> > +
> > +             /* Yes, this bridge is PCI bridge-to-bridge spec compliant,
> > +              *  just return!
> > +              */
> > +             if (config & PCI_BRIDGE_CTL_VGA)
> > +                     return;
> > +
> > +             dev_warn(&pdev->dev, "VGA bridge control is not enabled\n");
> > +     }
>
> You cannot assume that a bridge is defective just because
> PCI_BRIDGE_CTL_VGA is not set.
>
> > +     /* Just return if the system already have a default device */
> > +     if (vga_default_device())
> > +             return;
> > +
> > +     /* No default vga device */
> > +     while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, vdevp))) {
> > +             if (vdevp->vendor != 0x1a03) {
> > +                     /* Have other vga devcie in the system, do nothing */
> > +                     dev_info(&pdev->dev, "Another boot vga device: 0x%x:0x%x\n",
> > +                             vdevp->vendor, vdevp->device);
> > +                     return;
> > +             }
> > +     }
> > +
> > +     vga_set_default_device(pdev);
> > +
> > +     dev_info(&pdev->dev, "Boot vga device set as 0x%x:0x%x\n",
> > +                     pdev->vendor, pdev->device);
> > +}
> > +DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb);
> > +
> > +
> >  /*
> >   * The Mellanox Tavor device gives false positive parity errors.  Disable
> >   * parity error reporting.
> > --
> > 2.27.0
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-15  9:09     ` Huacai Chen
@ 2021-05-17 12:53       ` Huacai Chen
  2021-05-17 18:28         ` Bjorn Helgaas
  0 siblings, 1 reply; 39+ messages in thread
From: Huacai Chen @ 2021-05-17 12:53 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang, Jingfeng Sui

Hi, Bjorn,

On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote:
>
> Hi, Bjorn,
>
> On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote:
> > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is
> > > VGA Enable bit which modifies the response to VGA compatible addresses.
> >
> > The bridge spec is pretty old, and most of the content has been
> > incorporated into the PCIe spec.  I think you can cite "PCIe r5.0, sec
> > 7.5.1.3.13" here instead.
> >
> > > If the VGA Enable bit is set, the bridge will decode and forward the
> > > following accesses on the primary interface to the secondary interface.
> >
> > *Which* following accesses?  The structure of English requires that if
> > you say "the following accesses," you must continue by *listing* the
> > accesses.
> >
> > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its
> > > bridge control register, which causes vgaarb subsystem don't think the
> > > VGA card behind the bridge as a valid boot vga device.
> >
> > s/hardward/bridge/
> > s/vga/VGA/ (also in code comments and dmesg strings below)
> >
> > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device,
> > since it apparently has a VGA class code.  But here you say the
> > AST2500 has a Bridge Control register, which suggests that it's a
> > bridge.  If AST2500 is some sort of combination that includes both a
> > bridge and a VGA device, please outline that topology.
> >
> > But the hardware defect is that some bridges forward VGA accesses even
> > though their VGA Enable bit is not set?  The quirk should be attached
> > to broken *bridges*, not to VGA devices.
> >
> > If a bridge forwards VGA accesses regardless of how its VGA Enable bit
> > is set, that means VGA arbitration (in vgaarb.c) cannot work
> > correctly, so merely setting the default VGA device once in a quirk is
> > not sufficient.  You would have to somehow disable any future attempts
> > to use other VGA devices.  Only the VGA device below this defective
> > bridge is usable.  Any other VGA devices in the system would be
> > useless.
> >
> > > So we provide a quirk to fix Xorg auto-detection.
> > >
> > > See similar bug:
> > >
> > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/
> >
> > This patch was never merged.  If we merged a revised version, please
> > cite the SHA1 instead.
> This patch has never merged, and I found that it is unnecessary after
> commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device
> even if there's no legacy VGA"). Maybe this ASpeed patch is also
> unnecessary. If it is still needed, I'll investigate the root cause.
I found that vga_arb_device_init() and pcibios_init() are both wrapped
by subsys_initcall(), which means their sequence is unpredictable. And
unfortunately, in our platform vga_arb_device_init() is called before
pcibios_init(), which makes vga_arb_device_init() fail to set a
default vga device. This is the root cause why we thought that we
still need a quirk for AST2500.

I think the best solution is make vga_arb_device_init() be wrapped by
subsys_initcall_sync(), do you think so?

Huacai
>
> Huacai
> >
> > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > > Signed-off-by: Jingfeng Sui <suijingfeng@loongson.cn>
> > > ---
> > >  drivers/pci/quirks.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 47 insertions(+)
> > >
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 6ab4b3bba36b..adf5490706ad 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -28,6 +28,7 @@
> > >  #include <linux/platform_data/x86/apple.h>
> > >  #include <linux/pm_runtime.h>
> > >  #include <linux/switchtec.h>
> > > +#include <linux/vgaarb.h>
> > >  #include <asm/dma.h> /* isa_dma_bridge_buggy */
> > >  #include "pci.h"
> > >
> > > @@ -297,6 +298,52 @@ static void loongson_mrrs_quirk(struct pci_dev *dev)
> > >  }
> > >  DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk);
> > >
> > > +
> > > +static void aspeed_fixup_vgaarb(struct pci_dev *pdev)
> > > +{
> > > +     struct pci_dev *bridge;
> > > +     struct pci_bus *bus;
> > > +     struct pci_dev *vdevp = NULL;
> > > +     u16 config;
> > > +
> > > +     bus = pdev->bus;
> > > +     bridge = bus->self;
> > > +
> > > +     /* Is VGA routed to us? */
> > > +     if (bridge && (pci_is_bridge(bridge))) {
> > > +             pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &config);
> > > +
> > > +             /* Yes, this bridge is PCI bridge-to-bridge spec compliant,
> > > +              *  just return!
> > > +              */
> > > +             if (config & PCI_BRIDGE_CTL_VGA)
> > > +                     return;
> > > +
> > > +             dev_warn(&pdev->dev, "VGA bridge control is not enabled\n");
> > > +     }
> >
> > You cannot assume that a bridge is defective just because
> > PCI_BRIDGE_CTL_VGA is not set.
> >
> > > +     /* Just return if the system already have a default device */
> > > +     if (vga_default_device())
> > > +             return;
> > > +
> > > +     /* No default vga device */
> > > +     while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, vdevp))) {
> > > +             if (vdevp->vendor != 0x1a03) {
> > > +                     /* Have other vga devcie in the system, do nothing */
> > > +                     dev_info(&pdev->dev, "Another boot vga device: 0x%x:0x%x\n",
> > > +                             vdevp->vendor, vdevp->device);
> > > +                     return;
> > > +             }
> > > +     }
> > > +
> > > +     vga_set_default_device(pdev);
> > > +
> > > +     dev_info(&pdev->dev, "Boot vga device set as 0x%x:0x%x\n",
> > > +                     pdev->vendor, pdev->device);
> > > +}
> > > +DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb);
> > > +
> > > +
> > >  /*
> > >   * The Mellanox Tavor device gives false positive parity errors.  Disable
> > >   * parity error reporting.
> > > --
> > > 2.27.0
> > >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-17 12:53       ` Huacai Chen
@ 2021-05-17 18:28         ` Bjorn Helgaas
  2021-05-18  2:31           ` 隋景峰
  2021-05-18  7:13           ` Huacai Chen
  0 siblings, 2 replies; 39+ messages in thread
From: Bjorn Helgaas @ 2021-05-17 18:28 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang, Jingfeng Sui

On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote:
> On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote:
> > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote:
> > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is
> > > > VGA Enable bit which modifies the response to VGA compatible addresses.
> > >
> > > The bridge spec is pretty old, and most of the content has been
> > > incorporated into the PCIe spec.  I think you can cite "PCIe r5.0, sec
> > > 7.5.1.3.13" here instead.
> > >
> > > > If the VGA Enable bit is set, the bridge will decode and forward the
> > > > following accesses on the primary interface to the secondary interface.
> > >
> > > *Which* following accesses?  The structure of English requires that if
> > > you say "the following accesses," you must continue by *listing* the
> > > accesses.
> > >
> > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its
> > > > bridge control register, which causes vgaarb subsystem don't think the
> > > > VGA card behind the bridge as a valid boot vga device.
> > >
> > > s/hardward/bridge/
> > > s/vga/VGA/ (also in code comments and dmesg strings below)
> > >
> > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device,
> > > since it apparently has a VGA class code.  But here you say the
> > > AST2500 has a Bridge Control register, which suggests that it's a
> > > bridge.  If AST2500 is some sort of combination that includes both a
> > > bridge and a VGA device, please outline that topology.
> > >
> > > But the hardware defect is that some bridges forward VGA accesses even
> > > though their VGA Enable bit is not set?  The quirk should be attached
> > > to broken *bridges*, not to VGA devices.
> > >
> > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit
> > > is set, that means VGA arbitration (in vgaarb.c) cannot work
> > > correctly, so merely setting the default VGA device once in a quirk is
> > > not sufficient.  You would have to somehow disable any future attempts
> > > to use other VGA devices.  Only the VGA device below this defective
> > > bridge is usable.  Any other VGA devices in the system would be
> > > useless.
> > >
> > > > So we provide a quirk to fix Xorg auto-detection.
> > > >
> > > > See similar bug:
> > > >
> > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/
> > >
> > > This patch was never merged.  If we merged a revised version, please
> > > cite the SHA1 instead.
> >
> > This patch has never merged, and I found that it is unnecessary after
> > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device
> > even if there's no legacy VGA"). Maybe this ASpeed patch is also
> > unnecessary. If it is still needed, I'll investigate the root cause.
>
> I found that vga_arb_device_init() and pcibios_init() are both wrapped
> by subsys_initcall(), which means their sequence is unpredictable. And
> unfortunately, in our platform vga_arb_device_init() is called before
> pcibios_init(), which makes vga_arb_device_init() fail to set a
> default vga device. This is the root cause why we thought that we
> still need a quirk for AST2500.

Does this mean there is no hardware defect here?  The VGA Enable bit
works correctly?

> I think the best solution is make vga_arb_device_init() be wrapped by
> subsys_initcall_sync(), do you think so?

Hmm.  Unfortunately the semantics of subsys_initcall_sync() are not
documented, so I'm not sure exactly *why* such a change would work and
whether we could rely on it to continue working.

pcibios_init() isn't very consistent across arches.  On some,
including alpha, microblaze, some MIPS platforms, powerpc, and sh, it
enumerates PCI devices.  On others (ia64, parisc, sparc, x86), it does
basically nothing.  That makes life a little difficult.

vga_arb_device_init() is a subsys_initcall() and wants to look through
all the PCI devices.  That's a little problematic for arches where
pcibios_init() is also a subsys_initcall() and enumerates PCI devices.

Sorry, that's no answer for you.  Just more questions :)

> > > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > > > Signed-off-by: Jingfeng Sui <suijingfeng@loongson.cn>
> > > > ---
> > > >  drivers/pci/quirks.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 47 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > > index 6ab4b3bba36b..adf5490706ad 100644
> > > > --- a/drivers/pci/quirks.c
> > > > +++ b/drivers/pci/quirks.c
> > > > @@ -28,6 +28,7 @@
> > > >  #include <linux/platform_data/x86/apple.h>
> > > >  #include <linux/pm_runtime.h>
> > > >  #include <linux/switchtec.h>
> > > > +#include <linux/vgaarb.h>
> > > >  #include <asm/dma.h> /* isa_dma_bridge_buggy */
> > > >  #include "pci.h"
> > > >
> > > > @@ -297,6 +298,52 @@ static void loongson_mrrs_quirk(struct pci_dev *dev)
> > > >  }
> > > >  DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk);
> > > >
> > > > +
> > > > +static void aspeed_fixup_vgaarb(struct pci_dev *pdev)
> > > > +{
> > > > +     struct pci_dev *bridge;
> > > > +     struct pci_bus *bus;
> > > > +     struct pci_dev *vdevp = NULL;
> > > > +     u16 config;
> > > > +
> > > > +     bus = pdev->bus;
> > > > +     bridge = bus->self;
> > > > +
> > > > +     /* Is VGA routed to us? */
> > > > +     if (bridge && (pci_is_bridge(bridge))) {
> > > > +             pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &config);
> > > > +
> > > > +             /* Yes, this bridge is PCI bridge-to-bridge spec compliant,
> > > > +              *  just return!
> > > > +              */
> > > > +             if (config & PCI_BRIDGE_CTL_VGA)
> > > > +                     return;
> > > > +
> > > > +             dev_warn(&pdev->dev, "VGA bridge control is not enabled\n");
> > > > +     }
> > >
> > > You cannot assume that a bridge is defective just because
> > > PCI_BRIDGE_CTL_VGA is not set.
> > >
> > > > +     /* Just return if the system already have a default device */
> > > > +     if (vga_default_device())
> > > > +             return;
> > > > +
> > > > +     /* No default vga device */
> > > > +     while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, vdevp))) {
> > > > +             if (vdevp->vendor != 0x1a03) {
> > > > +                     /* Have other vga devcie in the system, do nothing */
> > > > +                     dev_info(&pdev->dev, "Another boot vga device: 0x%x:0x%x\n",
> > > > +                             vdevp->vendor, vdevp->device);
> > > > +                     return;
> > > > +             }
> > > > +     }
> > > > +
> > > > +     vga_set_default_device(pdev);
> > > > +
> > > > +     dev_info(&pdev->dev, "Boot vga device set as 0x%x:0x%x\n",
> > > > +                     pdev->vendor, pdev->device);
> > > > +}
> > > > +DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb);
> > > > +
> > > > +
> > > >  /*
> > > >   * The Mellanox Tavor device gives false positive parity errors.  Disable
> > > >   * parity error reporting.
> > > > --
> > > > 2.27.0
> > > >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-17 18:28         ` Bjorn Helgaas
@ 2021-05-18  2:31           ` 隋景峰
  2021-05-18  3:09             ` Bjorn Helgaas
  2021-05-18  7:13           ` Huacai Chen
  1 sibling, 1 reply; 39+ messages in thread
From: 隋景峰 @ 2021-05-18  2:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Huacai Chen, Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang




&gt; -----Original Messages-----
&gt; From: "Bjorn Helgaas" <helgaas@kernel.org>
&gt; Sent Time: 2021-05-18 02:28:10 (Tuesday)
&gt; To: "Huacai Chen" <chenhuacai@gmail.com>
&gt; Cc: "Huacai Chen" <chenhuacai@loongson.cn>, "Bjorn Helgaas" <bhelgaas@google.com>, linux-pci <linux-pci@vger.kernel.org>, "Jiaxun Yang" <jiaxun.yang@flygoat.com>, "Jingfeng Sui" <suijingfeng@loongson.cn>
&gt; Subject: Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
&gt; 
&gt; On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote:
&gt; &gt; On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote:
&gt; &gt; &gt; On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
&gt; &gt; &gt; &gt; On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote:
&gt; &gt; &gt; &gt; &gt; According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is
&gt; &gt; &gt; &gt; &gt; VGA Enable bit which modifies the response to VGA compatible addresses.
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; The bridge spec is pretty old, and most of the content has been
&gt; &gt; &gt; &gt; incorporated into the PCIe spec.  I think you can cite "PCIe r5.0, sec
&gt; &gt; &gt; &gt; 7.5.1.3.13" here instead.
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; If the VGA Enable bit is set, the bridge will decode and forward the
&gt; &gt; &gt; &gt; &gt; following accesses on the primary interface to the secondary interface.
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; *Which* following accesses?  The structure of English requires that if
&gt; &gt; &gt; &gt; you say "the following accesses," you must continue by *listing* the
&gt; &gt; &gt; &gt; accesses.
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; The ASpeed AST2500 hardward does not set the VGA Enable bit on its
&gt; &gt; &gt; &gt; &gt; bridge control register, which causes vgaarb subsystem don't think the
&gt; &gt; &gt; &gt; &gt; VGA card behind the bridge as a valid boot vga device.
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; s/hardward/bridge/
&gt; &gt; &gt; &gt; s/vga/VGA/ (also in code comments and dmesg strings below)
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; From the code, it looks like AST2500 ([1a03:2000]) is a VGA device,
&gt; &gt; &gt; &gt; since it apparently has a VGA class code.  But here you say the
&gt; &gt; &gt; &gt; AST2500 has a Bridge Control register, which suggests that it's a
&gt; &gt; &gt; &gt; bridge.  If AST2500 is some sort of combination that includes both a
&gt; &gt; &gt; &gt; bridge and a VGA device, please outline that topology.
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; But the hardware defect is that some bridges forward VGA accesses even
&gt; &gt; &gt; &gt; though their VGA Enable bit is not set?  The quirk should be attached
&gt; &gt; &gt; &gt; to broken *bridges*, not to VGA devices.
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; If a bridge forwards VGA accesses regardless of how its VGA Enable bit
&gt; &gt; &gt; &gt; is set, that means VGA arbitration (in vgaarb.c) cannot work
&gt; &gt; &gt; &gt; correctly, so merely setting the default VGA device once in a quirk is
&gt; &gt; &gt; &gt; not sufficient.  You would have to somehow disable any future attempts
&gt; &gt; &gt; &gt; to use other VGA devices.  Only the VGA device below this defective
&gt; &gt; &gt; &gt; bridge is usable.  Any other VGA devices in the system would be
&gt; &gt; &gt; &gt; useless.
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; So we provide a quirk to fix Xorg auto-detection.
&gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; See similar bug:
&gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; This patch was never merged.  If we merged a revised version, please
&gt; &gt; &gt; &gt; cite the SHA1 instead.
&gt; &gt; &gt;
&gt; &gt; &gt; This patch has never merged, and I found that it is unnecessary after
&gt; &gt; &gt; commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device
&gt; &gt; &gt; even if there's no legacy VGA"). Maybe this ASpeed patch is also
&gt; &gt; &gt; unnecessary. If it is still needed, I'll investigate the root cause.
&gt; &gt;
&gt; &gt; I found that vga_arb_device_init() and pcibios_init() are both wrapped
&gt; &gt; by subsys_initcall(), which means their sequence is unpredictable. And
&gt; &gt; unfortunately, in our platform vga_arb_device_init() is called before
&gt; &gt; pcibios_init(), which makes vga_arb_device_init() fail to set a
&gt; &gt; default vga device. This is the root cause why we thought that we
&gt; &gt; still need a quirk for AST2500.
&gt; 
&gt; Does this mean there is no hardware defect here?  The VGA Enable bit
&gt; works correctly?


The AST2500 BMC card we are using consist of a PCI-to-PCI Bridge (1a03:1150)
and a PCI VGA device (1a03:2000). The value of the Bridge Control register
in its PCI-to-PCI Bridge Configuration Registers is 0x0000. Thus, the VGA
Enable bit in the Bridge Control register do not set, and the VGA
Enable bit is not writable. 

The topology of this BMC card is illustrated as following:

     /sys/devices/pci0000:00
     |-- 0000:00:0c.0
     |   |-- class (0x060400)
     |   |-- vendor (0x0014)
     |   |-- device (0x7a09)
     |   |-- ...
     |   |-- 0000:04:00.0
     |   |   | -- class (0x060400)
     |   |   | -- device (0x1150)
     |   |   | -- vendor (0x1a03)
     |   |   | -- revision (0x04)
     |   |   | -- ...
     |   |   | -- 0000:05:00.0 
     |   |   |    | -- class  (0x030000)
     |   |   |    | -- device (0x2000)
     |   |   |    | -- vendor (0x1a03)
     |   |   |    | -- irq (51)
     |   |   |    | -- i2c-6
     |   |   |    | -- drm
     |   |   |    | -- graphics 
     |   |   |    | -- ...
     |   `-- uevent
     `-- ...

The following information is getted from lspci -vvxx:


04:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast &gt;TAbort- <tabort- <mabort-="">SERR- <perr- intx-="" latency:="" 0="" interrupt:="" pin="" a="" routed="" to="" irq="" 51="" numa="" node:="" bus:="" primary="04," secondary="05," subordinate="05," sec-latency="0" i="" o="" behind="" bridge:="" 00004000-00004fff="" memory="" 41000000-427fffff="" status:="" 66mhz+="" fastb2b-="" parerr-="" devsel="medium">TAbort- <tabort- <mabort-="" <serr-="" <perr-="" bridgectl:="" parity-="" serr-="" noisa-="" vga-="" mabort-="">Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: <access denied="">
00: 03 1a 50 11 07 00 10 00 04 00 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 04 05 05 00 41 41 20 02
20: 00 41 70 42 f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 63 01 00 00

05:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41) (prog-if 00 [VGA controller])
	Subsystem: ASPEED Technology, Inc. ASPEED Graphics Family
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop+ ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium &gt;TAbort- <tabort- <mabort-="">SERR- <perr- intx-="" latency:="" 0="" interrupt:="" pin="" a="" routed="" to="" irq="" 51="" numa="" node:="" region="" 0:="" memory="" at="" 41000000="" (32-bit,="" non-prefetchable)="" [size="16M]" 1:="" 42000000="" 2:="" i="" o="" ports="" 4000="" capabilities:="" <access="" denied="">
	Kernel driver in use: ast
00: 03 1a 00 20 27 00 10 02 41 00 00 03 00 00 00 00
10: 00 00 00 41 00 00 00 42 01 40 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 03 1a 00 20
30: 00 00 00 00 40 00 00 00 00 00 00 00 63 01 00 00


&gt; 
&gt; &gt; I think the best solution is make vga_arb_device_init() be wrapped by
&gt; &gt; subsys_initcall_sync(), do you think so?
&gt; 
&gt; Hmm.  Unfortunately the semantics of subsys_initcall_sync() are not
&gt; documented, so I'm not sure exactly *why* such a change would work and
&gt; whether we could rely on it to continue working.
&gt; 
&gt; pcibios_init() isn't very consistent across arches.  On some,
&gt; including alpha, microblaze, some MIPS platforms, powerpc, and sh, it
&gt; enumerates PCI devices.  On others (ia64, parisc, sparc, x86), it does
&gt; basically nothing.  That makes life a little difficult.
&gt; 
&gt; vga_arb_device_init() is a subsys_initcall() and wants to look through
&gt; all the PCI devices.  That's a little problematic for arches where
&gt; pcibios_init() is also a subsys_initcall() and enumerates PCI devices.
&gt; 
&gt; Sorry, that's no answer for you.  Just more questions :)
&gt; 
&gt; &gt; &gt; &gt; &gt; Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
&gt; &gt; &gt; &gt; &gt; Signed-off-by: Jingfeng Sui <suijingfeng@loongson.cn>
&gt; &gt; &gt; &gt; &gt; ---
&gt; &gt; &gt; &gt; &gt;  drivers/pci/quirks.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
&gt; &gt; &gt; &gt; &gt;  1 file changed, 47 insertions(+)
&gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
&gt; &gt; &gt; &gt; &gt; index 6ab4b3bba36b..adf5490706ad 100644
&gt; &gt; &gt; &gt; &gt; --- a/drivers/pci/quirks.c
&gt; &gt; &gt; &gt; &gt; +++ b/drivers/pci/quirks.c
&gt; &gt; &gt; &gt; &gt; @@ -28,6 +28,7 @@
&gt; &gt; &gt; &gt; &gt;  #include <linux platform_data="" x86="" apple.h="">
&gt; &gt; &gt; &gt; &gt;  #include <linux pm_runtime.h="">
&gt; &gt; &gt; &gt; &gt;  #include <linux switchtec.h="">
&gt; &gt; &gt; &gt; &gt; +#include <linux vgaarb.h="">
&gt; &gt; &gt; &gt; &gt;  #include <asm dma.h=""> /* isa_dma_bridge_buggy */
&gt; &gt; &gt; &gt; &gt;  #include "pci.h"
&gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; @@ -297,6 +298,52 @@ static void loongson_mrrs_quirk(struct pci_dev *dev)
&gt; &gt; &gt; &gt; &gt;  }
&gt; &gt; &gt; &gt; &gt;  DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk);
&gt; &gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; +
&gt; &gt; &gt; &gt; &gt; +static void aspeed_fixup_vgaarb(struct pci_dev *pdev)
&gt; &gt; &gt; &gt; &gt; +{
&gt; &gt; &gt; &gt; &gt; +     struct pci_dev *bridge;
&gt; &gt; &gt; &gt; &gt; +     struct pci_bus *bus;
&gt; &gt; &gt; &gt; &gt; +     struct pci_dev *vdevp = NULL;
&gt; &gt; &gt; &gt; &gt; +     u16 config;
&gt; &gt; &gt; &gt; &gt; +
&gt; &gt; &gt; &gt; &gt; +     bus = pdev-&gt;bus;
&gt; &gt; &gt; &gt; &gt; +     bridge = bus-&gt;self;
&gt; &gt; &gt; &gt; &gt; +
&gt; &gt; &gt; &gt; &gt; +     /* Is VGA routed to us? */
&gt; &gt; &gt; &gt; &gt; +     if (bridge &amp;&amp; (pci_is_bridge(bridge))) {
&gt; &gt; &gt; &gt; &gt; +             pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &amp;config);
&gt; &gt; &gt; &gt; &gt; +
&gt; &gt; &gt; &gt; &gt; +             /* Yes, this bridge is PCI bridge-to-bridge spec compliant,
&gt; &gt; &gt; &gt; &gt; +              *  just return!
&gt; &gt; &gt; &gt; &gt; +              */
&gt; &gt; &gt; &gt; &gt; +             if (config &amp; PCI_BRIDGE_CTL_VGA)
&gt; &gt; &gt; &gt; &gt; +                     return;
&gt; &gt; &gt; &gt; &gt; +
&gt; &gt; &gt; &gt; &gt; +             dev_warn(&amp;pdev-&gt;dev, "VGA bridge control is not enabled\n");
&gt; &gt; &gt; &gt; &gt; +     }
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; You cannot assume that a bridge is defective just because
&gt; &gt; &gt; &gt; PCI_BRIDGE_CTL_VGA is not set.
&gt; &gt; &gt; &gt;
&gt; &gt; &gt; &gt; &gt; +     /* Just return if the system already have a default device */
&gt; &gt; &gt; &gt; &gt; +     if (vga_default_device())
&gt; &gt; &gt; &gt; &gt; +             return;
&gt; &gt; &gt; &gt; &gt; +
&gt; &gt; &gt; &gt; &gt; +     /* No default vga device */
&gt; &gt; &gt; &gt; &gt; +     while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA &lt;&lt; 8, vdevp))) {
&gt; &gt; &gt; &gt; &gt; +             if (vdevp-&gt;vendor != 0x1a03) {
&gt; &gt; &gt; &gt; &gt; +                     /* Have other vga devcie in the system, do nothing */
&gt; &gt; &gt; &gt; &gt; +                     dev_info(&amp;pdev-&gt;dev, "Another boot vga device: 0x%x:0x%x\n",
&gt; &gt; &gt; &gt; &gt; +                             vdevp-&gt;vendor, vdevp-&gt;device);
&gt; &gt; &gt; &gt; &gt; +                     return;
&gt; &gt; &gt; &gt; &gt; +             }
&gt; &gt; &gt; &gt; &gt; +     }
&gt; &gt; &gt; &gt; &gt; +
&gt; &gt; &gt; &gt; &gt; +     vga_set_default_device(pdev);
&gt; &gt; &gt; &gt; &gt; +
&gt; &gt; &gt; &gt; &gt; +     dev_info(&amp;pdev-&gt;dev, "Boot vga device set as 0x%x:0x%x\n",
&gt; &gt; &gt; &gt; &gt; +                     pdev-&gt;vendor, pdev-&gt;device);
&gt; &gt; &gt; &gt; &gt; +}
&gt; &gt; &gt; &gt; &gt; +DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb);
&gt; &gt; &gt; &gt; &gt; +
&gt; &gt; &gt; &gt; &gt; +
&gt; &gt; &gt; &gt; &gt;  /*
&gt; &gt; &gt; &gt; &gt;   * The Mellanox Tavor device gives false positive parity errors.  Disable
&gt; &gt; &gt; &gt; &gt;   * parity error reporting.
&gt; &gt; &gt; &gt; &gt; --
&gt; &gt; &gt; &gt; &gt; 2.27.0
&gt; &gt; &gt; &gt; &gt;
</asm></linux></linux></linux></linux></suijingfeng@loongson.cn></chenhuacai@loongson.cn></perr-></tabort-></access></tabort-></perr-></tabort-></helgaas@kernel.org></chenhuacai@gmail.com></suijingfeng@loongson.cn></jiaxun.yang@flygoat.com></linux-pci@vger.kernel.org></bhelgaas@google.com></chenhuacai@loongson.cn></chenhuacai@gmail.com></helgaas@kernel.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-18  2:31           ` 隋景峰
@ 2021-05-18  3:09             ` Bjorn Helgaas
  2021-05-18  9:30               ` suijingfeng
  0 siblings, 1 reply; 39+ messages in thread
From: Bjorn Helgaas @ 2021-05-18  3:09 UTC (permalink / raw)
  To: 隋景峰
  Cc: Huacai Chen, Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang

On Tue, May 18, 2021 at 10:31:56AM +0800, 隋景峰 wrote:
> &gt; -----Original Messages-----

Wow, this is ugly (the "&gt;" instead of ">").  Can you figure out
how to respond in the usual plain-text way?

> &gt; From: "Bjorn Helgaas" <helgaas@kernel.org>
> &gt; Sent Time: 2021-05-18 02:28:10 (Tuesday)
> &gt; To: "Huacai Chen" <chenhuacai@gmail.com>
> &gt; Cc: "Huacai Chen" <chenhuacai@loongson.cn>, "Bjorn Helgaas" <bhelgaas@google.com>, linux-pci <linux-pci@vger.kernel.org>, "Jiaxun Yang" <jiaxun.yang@flygoat.com>, "Jingfeng Sui" <suijingfeng@loongson.cn>
> &gt; Subject: Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
> &gt; 
> &gt; On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote:
> &gt; &gt; On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote:
> &gt; &gt; &gt; On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> &gt; &gt; &gt; &gt; On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote:
> &gt; &gt; &gt; &gt; &gt; According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is
> &gt; &gt; &gt; &gt; &gt; VGA Enable bit which modifies the response to VGA compatible addresses.
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; The bridge spec is pretty old, and most of the content has been
> &gt; &gt; &gt; &gt; incorporated into the PCIe spec.  I think you can cite "PCIe r5.0, sec
> &gt; &gt; &gt; &gt; 7.5.1.3.13" here instead.
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; If the VGA Enable bit is set, the bridge will decode and forward the
> &gt; &gt; &gt; &gt; &gt; following accesses on the primary interface to the secondary interface.
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; *Which* following accesses?  The structure of English requires that if
> &gt; &gt; &gt; &gt; you say "the following accesses," you must continue by *listing* the
> &gt; &gt; &gt; &gt; accesses.
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; The ASpeed AST2500 hardward does not set the VGA Enable bit on its
> &gt; &gt; &gt; &gt; &gt; bridge control register, which causes vgaarb subsystem don't think the
> &gt; &gt; &gt; &gt; &gt; VGA card behind the bridge as a valid boot vga device.
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; s/hardward/bridge/
> &gt; &gt; &gt; &gt; s/vga/VGA/ (also in code comments and dmesg strings below)
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; From the code, it looks like AST2500 ([1a03:2000]) is a VGA device,
> &gt; &gt; &gt; &gt; since it apparently has a VGA class code.  But here you say the
> &gt; &gt; &gt; &gt; AST2500 has a Bridge Control register, which suggests that it's a
> &gt; &gt; &gt; &gt; bridge.  If AST2500 is some sort of combination that includes both a
> &gt; &gt; &gt; &gt; bridge and a VGA device, please outline that topology.
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; But the hardware defect is that some bridges forward VGA accesses even
> &gt; &gt; &gt; &gt; though their VGA Enable bit is not set?  The quirk should be attached
> &gt; &gt; &gt; &gt; to broken *bridges*, not to VGA devices.
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; If a bridge forwards VGA accesses regardless of how its VGA Enable bit
> &gt; &gt; &gt; &gt; is set, that means VGA arbitration (in vgaarb.c) cannot work
> &gt; &gt; &gt; &gt; correctly, so merely setting the default VGA device once in a quirk is
> &gt; &gt; &gt; &gt; not sufficient.  You would have to somehow disable any future attempts
> &gt; &gt; &gt; &gt; to use other VGA devices.  Only the VGA device below this defective
> &gt; &gt; &gt; &gt; bridge is usable.  Any other VGA devices in the system would be
> &gt; &gt; &gt; &gt; useless.
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; So we provide a quirk to fix Xorg auto-detection.
> &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; See similar bug:
> &gt; &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; &gt; https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/
> &gt; &gt; &gt; &gt;
> &gt; &gt; &gt; &gt; This patch was never merged.  If we merged a revised version, please
> &gt; &gt; &gt; &gt; cite the SHA1 instead.
> &gt; &gt; &gt;
> &gt; &gt; &gt; This patch has never merged, and I found that it is unnecessary after
> &gt; &gt; &gt; commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device
> &gt; &gt; &gt; even if there's no legacy VGA"). Maybe this ASpeed patch is also
> &gt; &gt; &gt; unnecessary. If it is still needed, I'll investigate the root cause.
> &gt; &gt;
> &gt; &gt; I found that vga_arb_device_init() and pcibios_init() are both wrapped
> &gt; &gt; by subsys_initcall(), which means their sequence is unpredictable. And
> &gt; &gt; unfortunately, in our platform vga_arb_device_init() is called before
> &gt; &gt; pcibios_init(), which makes vga_arb_device_init() fail to set a
> &gt; &gt; default vga device. This is the root cause why we thought that we
> &gt; &gt; still need a quirk for AST2500.
> &gt; 
> &gt; Does this mean there is no hardware defect here?  The VGA Enable bit
> &gt; works correctly?
> 
> 
> The AST2500 BMC card we are using consist of a PCI-to-PCI Bridge (1a03:1150)
> and a PCI VGA device (1a03:2000). The value of the Bridge Control register
> in its PCI-to-PCI Bridge Configuration Registers is 0x0000. Thus, the VGA
> Enable bit in the Bridge Control register do not set, and the VGA
> Enable bit is not writable. 

If VGA Enable is 0 and cannot be set to 1, the bridge should *never*
forward VGA accesses to its secondary bus.  The generic VGA driver
that uses the legacy [mem 0xa0000-0xbffff] range should not work with
the VGA device at 05:00.0, and that device cannot participate in the
VGA arbitration scheme, which relies on the VGA Enable bit.

If you have a driver for 05:00.0 that doesn't need the legacy memory
range, it's possible that it may work.  But VGA arbitration will be
broken, and if 05:00.0 needs to be initialized by an option ROM, that
probably won't work either.

If the 04:00.0 bridge *always* forwards VGA accesses, even though its
VGA Enable bit is always zero, then the bridge is broken.  In that
case, the generic VGA driver should work with the 05:00.0 device, but
VGA arbitration will be limited.  I'm not sure, but the arbiter
*might* be able to use the VGA Enable bit in the 00:0c.0 bridge to
control VGA access to 05:00.0.  You wouldn't be able to have more than
one VGA device below 00:0c.0, and you may not be able have more than
one in the entire system.

> The topology of this BMC card is illustrated as following:
> 
>      /sys/devices/pci0000:00
>      |-- 0000:00:0c.0
>      |   |-- class (0x060400)
>      |   |-- vendor (0x0014)
>      |   |-- device (0x7a09)
>      |   |-- ...
>      |   |-- 0000:04:00.0
>      |   |   | -- class (0x060400)
>      |   |   | -- device (0x1150)
>      |   |   | -- vendor (0x1a03)
>      |   |   | -- revision (0x04)
>      |   |   | -- ...
>      |   |   | -- 0000:05:00.0 
>      |   |   |    | -- class  (0x030000)
>      |   |   |    | -- device (0x2000)
>      |   |   |    | -- vendor (0x1a03)
>      |   |   |    | -- irq (51)
>      |   |   |    | -- i2c-6
>      |   |   |    | -- drm
>      |   |   |    | -- graphics 
>      |   |   |    | -- ...
>      |   `-- uevent
>      `-- ...
> 
> The following information is getted from lspci -vvxx:

Generally it's better to use "sudo lspci -vvxx" so you decode all the
capabilities, too.  But in this case, all we care about is the Bridge
Control register ("bridgectl"), which *is* included (and apparently is
set to 0, since it's decoded as "vga-").

> 04:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04) (prog-if 00 [Normal decode])
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast &gt;TAbort- <tabort- <mabort-="">SERR- <perr- intx-="" latency:="" 0="" interrupt:="" pin="" a="" routed="" to="" irq="" 51="" numa="" node:="" bus:="" primary="04," secondary="05," subordinate="05," sec-latency="0" i="" o="" behind="" bridge:="" 00004000-00004fff="" memory="" 41000000-427fffff="" status:="" 66mhz+="" fastb2b-="" parerr-="" devsel="medium">TAbort- <tabort- <mabort-="" <serr-="" <perr-="" bridgectl:="" parity-="" serr-="" noisa-="" vga-="" mabort-="">Reset- FastB2B-
> 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> 	Capabilities: <access denied="">
> 00: 03 1a 50 11 07 00 10 00 04 00 04 06 00 00 01 00
> 10: 00 00 00 00 00 00 00 00 04 05 05 00 41 41 20 02
> 20: 00 41 70 42 f1 ff 01 00 00 00 00 00 00 00 00 00
> 30: 00 00 00 00 50 00 00 00 00 00 00 00 63 01 00 00
> 
> 05:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41) (prog-if 00 [VGA controller])
> 	Subsystem: ASPEED Technology, Inc. ASPEED Graphics Family
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop+ ParErr- Stepping- SERR- FastB2B- DisINTx-
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium &gt;TAbort- <tabort- <mabort-="">SERR- <perr- intx-="" latency:="" 0="" interrupt:="" pin="" a="" routed="" to="" irq="" 51="" numa="" node:="" region="" 0:="" memory="" at="" 41000000="" (32-bit,="" non-prefetchable)="" [size="16M]" 1:="" 42000000="" 2:="" i="" o="" ports="" 4000="" capabilities:="" <access="" denied="">
> 	Kernel driver in use: ast
> 00: 03 1a 00 20 27 00 10 02 41 00 00 03 00 00 00 00
> 10: 00 00 00 41 00 00 00 42 01 40 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 03 1a 00 20
> 30: 00 00 00 00 40 00 00 00 00 00 00 00 63 01 00 00

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-17 18:28         ` Bjorn Helgaas
  2021-05-18  2:31           ` 隋景峰
@ 2021-05-18  7:13           ` Huacai Chen
  2021-05-18 19:35             ` Bjorn Helgaas
  1 sibling, 1 reply; 39+ messages in thread
From: Huacai Chen @ 2021-05-18  7:13 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang, Jingfeng Sui

Hi, Bjorn,

On Tue, May 18, 2021 at 2:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote:
> > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote:
> > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote:
> > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is
> > > > > VGA Enable bit which modifies the response to VGA compatible addresses.
> > > >
> > > > The bridge spec is pretty old, and most of the content has been
> > > > incorporated into the PCIe spec.  I think you can cite "PCIe r5.0, sec
> > > > 7.5.1.3.13" here instead.
> > > >
> > > > > If the VGA Enable bit is set, the bridge will decode and forward the
> > > > > following accesses on the primary interface to the secondary interface.
> > > >
> > > > *Which* following accesses?  The structure of English requires that if
> > > > you say "the following accesses," you must continue by *listing* the
> > > > accesses.
> > > >
> > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its
> > > > > bridge control register, which causes vgaarb subsystem don't think the
> > > > > VGA card behind the bridge as a valid boot vga device.
> > > >
> > > > s/hardward/bridge/
> > > > s/vga/VGA/ (also in code comments and dmesg strings below)
> > > >
> > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device,
> > > > since it apparently has a VGA class code.  But here you say the
> > > > AST2500 has a Bridge Control register, which suggests that it's a
> > > > bridge.  If AST2500 is some sort of combination that includes both a
> > > > bridge and a VGA device, please outline that topology.
> > > >
> > > > But the hardware defect is that some bridges forward VGA accesses even
> > > > though their VGA Enable bit is not set?  The quirk should be attached
> > > > to broken *bridges*, not to VGA devices.
> > > >
> > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit
> > > > is set, that means VGA arbitration (in vgaarb.c) cannot work
> > > > correctly, so merely setting the default VGA device once in a quirk is
> > > > not sufficient.  You would have to somehow disable any future attempts
> > > > to use other VGA devices.  Only the VGA device below this defective
> > > > bridge is usable.  Any other VGA devices in the system would be
> > > > useless.
> > > >
> > > > > So we provide a quirk to fix Xorg auto-detection.
> > > > >
> > > > > See similar bug:
> > > > >
> > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/
> > > >
> > > > This patch was never merged.  If we merged a revised version, please
> > > > cite the SHA1 instead.
> > >
> > > This patch has never merged, and I found that it is unnecessary after
> > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device
> > > even if there's no legacy VGA"). Maybe this ASpeed patch is also
> > > unnecessary. If it is still needed, I'll investigate the root cause.
> >
> > I found that vga_arb_device_init() and pcibios_init() are both wrapped
> > by subsys_initcall(), which means their sequence is unpredictable. And
> > unfortunately, in our platform vga_arb_device_init() is called before
> > pcibios_init(), which makes vga_arb_device_init() fail to set a
> > default vga device. This is the root cause why we thought that we
> > still need a quirk for AST2500.
>
> Does this mean there is no hardware defect here?  The VGA Enable bit
> works correctly?
>
No, VGA Enable bit still doesn't set, but with commit
a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device even if
there's no legacy VGA") we no longer depend on VGA Enable.

> > I think the best solution is make vga_arb_device_init() be wrapped by
> > subsys_initcall_sync(), do you think so?
>
> Hmm.  Unfortunately the semantics of subsys_initcall_sync() are not
> documented, so I'm not sure exactly *why* such a change would work and
> whether we could rely on it to continue working.
>
> pcibios_init() isn't very consistent across arches.  On some,
> including alpha, microblaze, some MIPS platforms, powerpc, and sh, it
> enumerates PCI devices.  On others (ia64, parisc, sparc, x86), it does
> basically nothing.  That makes life a little difficult.
subsys_initcall_sync() is ensured after all subsys_initcall()
functions, so at least it can solve the problem on platforms which use
pcibios_init() to enumerate PCI devices (x86 and other ACPI-based
platforms are also OK, because they use acpi_init()
-->acpi_scan_init() -->pci_acpi_scan_root() to enumerate devices).

Huacai
>
> vga_arb_device_init() is a subsys_initcall() and wants to look through
> all the PCI devices.  That's a little problematic for arches where
> pcibios_init() is also a subsys_initcall() and enumerates PCI devices.
>
> Sorry, that's no answer for you.  Just more questions :)
>
> > > > > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > > > > Signed-off-by: Jingfeng Sui <suijingfeng@loongson.cn>
> > > > > ---
> > > > >  drivers/pci/quirks.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 47 insertions(+)
> > > > >
> > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > > > index 6ab4b3bba36b..adf5490706ad 100644
> > > > > --- a/drivers/pci/quirks.c
> > > > > +++ b/drivers/pci/quirks.c
> > > > > @@ -28,6 +28,7 @@
> > > > >  #include <linux/platform_data/x86/apple.h>
> > > > >  #include <linux/pm_runtime.h>
> > > > >  #include <linux/switchtec.h>
> > > > > +#include <linux/vgaarb.h>
> > > > >  #include <asm/dma.h> /* isa_dma_bridge_buggy */
> > > > >  #include "pci.h"
> > > > >
> > > > > @@ -297,6 +298,52 @@ static void loongson_mrrs_quirk(struct pci_dev *dev)
> > > > >  }
> > > > >  DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk);
> > > > >
> > > > > +
> > > > > +static void aspeed_fixup_vgaarb(struct pci_dev *pdev)
> > > > > +{
> > > > > +     struct pci_dev *bridge;
> > > > > +     struct pci_bus *bus;
> > > > > +     struct pci_dev *vdevp = NULL;
> > > > > +     u16 config;
> > > > > +
> > > > > +     bus = pdev->bus;
> > > > > +     bridge = bus->self;
> > > > > +
> > > > > +     /* Is VGA routed to us? */
> > > > > +     if (bridge && (pci_is_bridge(bridge))) {
> > > > > +             pci_read_config_word(bridge, PCI_BRIDGE_CONTROL, &config);
> > > > > +
> > > > > +             /* Yes, this bridge is PCI bridge-to-bridge spec compliant,
> > > > > +              *  just return!
> > > > > +              */
> > > > > +             if (config & PCI_BRIDGE_CTL_VGA)
> > > > > +                     return;
> > > > > +
> > > > > +             dev_warn(&pdev->dev, "VGA bridge control is not enabled\n");
> > > > > +     }
> > > >
> > > > You cannot assume that a bridge is defective just because
> > > > PCI_BRIDGE_CTL_VGA is not set.
> > > >
> > > > > +     /* Just return if the system already have a default device */
> > > > > +     if (vga_default_device())
> > > > > +             return;
> > > > > +
> > > > > +     /* No default vga device */
> > > > > +     while ((vdevp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, vdevp))) {
> > > > > +             if (vdevp->vendor != 0x1a03) {
> > > > > +                     /* Have other vga devcie in the system, do nothing */
> > > > > +                     dev_info(&pdev->dev, "Another boot vga device: 0x%x:0x%x\n",
> > > > > +                             vdevp->vendor, vdevp->device);
> > > > > +                     return;
> > > > > +             }
> > > > > +     }
> > > > > +
> > > > > +     vga_set_default_device(pdev);
> > > > > +
> > > > > +     dev_info(&pdev->dev, "Boot vga device set as 0x%x:0x%x\n",
> > > > > +                     pdev->vendor, pdev->device);
> > > > > +}
> > > > > +DECLARE_PCI_FIXUP_CLASS_FINAL(0x1a03, 0x2000, PCI_CLASS_DISPLAY_VGA, 8, aspeed_fixup_vgaarb);
> > > > > +
> > > > > +
> > > > >  /*
> > > > >   * The Mellanox Tavor device gives false positive parity errors.  Disable
> > > > >   * parity error reporting.
> > > > > --
> > > > > 2.27.0
> > > > >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-18  3:09             ` Bjorn Helgaas
@ 2021-05-18  9:30               ` suijingfeng
  2021-05-18 19:31                 ` Bjorn Helgaas
  0 siblings, 1 reply; 39+ messages in thread
From: suijingfeng @ 2021-05-18  9:30 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Huacai Chen, Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang



On 2021/5/18 上午11:09, Bjorn Helgaas wrote:

> If VGA Enable is 0 and cannot be set to 1, the bridge should *never*
> forward VGA accesses to its secondary bus.  The generic VGA driver
> that uses the legacy [mem 0xa0000-0xbffff] range should not work with
> the VGA device at 05:00.0, and that device cannot participate in the
> VGA arbitration scheme, which relies on the VGA Enable bit.
> 
> If you have a driver for 05:00.0 that doesn't need the legacy memory
> range, it's possible that it may work.  But VGA arbitration will be
> broken, and if 05:00.0 needs to be initialized by an option ROM, that
> probably won't work either.

We are not using a "generic VGA driver", in user space, we are using the 
modesetting driver come with X server, and it seems work normally. The 
real problems is VGA arbitration will not set 05:00.0 as the default VGA 
which means that when X server read 
/sys/devices/pci0000:00/0000:00:0c.0/0000:04:00.0/0000:05:00.0/boot_vga 
will get a "0". This break Xorg auto-detection. We want the boot_vga 
sysfs file be "1".

> If the 04:00.0 bridge *always* forwards VGA accesses, even though its
> VGA Enable bit is always zero, then the bridge is broken.  In that
> case, the generic VGA driver should work with the 05:00.0 device, but
> VGA arbitration will be limited.  I'm not sure, but the arbiter
> *might* be able to use the VGA Enable bit in the 00:0c.0 bridge to
> control VGA access to 05:00.0.  You wouldn't be able to have more than
> one VGA device below 00:0c.0, and you may not be able have more than
> one in the entire system.

We have only one VGA device(05:00.0) below 00:0c.0, but we do able to 
have more than one in the entire system. We could even mount a AMDGPU
on this server. But in reality, there is a render only GPU and a 
self-designed display controller integrated in LS7A1000 bridge. Both the 
render only GPU and the display controller is PCI device, they are 
located at PCI root bus directly without a PCI-to-PCI bridge in the 
middle. The display controller is blocked by the firmware if ASPEED BMC 
card is present, it can't be accessed under linux kernel. Let me show 
you a updated version of the PCI topology of our server(machine):

        /sys/devices/pci0000:00
        |-- 0000:00:06.0
        |   | -- class (0x040000)
        |   | -- vendor (0x0014)
        |   | -- device (0x7a15)
        |   | -- drm
        |   | -- ...
        |-- 0000:00:0c.0
        |   |-- class (0x060400)
        |   |-- vendor (0x0014)
        |   |-- device (0x7a09)
        |   |-- ...
        |   |-- 0000:04:00.0
        |   |   | -- class (0x060400)
        |   |   | -- device (0x1150)
        |   |   | -- vendor (0x1a03)
        |   |   | -- revision (0x04)
        |   |   | -- ...
        |   |   | -- 0000:05:00.0
        |   |   |    | -- class  (0x030000)
        |   |   |    | -- device (0x2000)
        |   |   |    | -- vendor (0x1a03)
        |   |   |    | -- boot_vga
        |   |   |    | -- i2c-6
        |   |   |    | -- drm
        |   |   |    | -- graphics
        |   |   |    | -- ...
        |   `-- uevent
        `-- ...

Even through the render only GPU(00:06.0) is not a VGA device, it still 
can disturb X server choose a primary device to use. But the root cause 
is the kernel side does not set 05:00.0 as default VGA. In this case X 
server will fallback to the first device found to use. and 00:06.0 is 
always found before 05:00.0. If kernel side set 05:00.0 as default VGA,
all other problems is secondary.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/5] PCI: Add quirk for multifunction devices of LS7A
  2021-05-15  3:52     ` Huacai Chen
@ 2021-05-18 13:59       ` Bjorn Helgaas
  2021-05-19  2:26         ` Huacai Chen
  0 siblings, 1 reply; 39+ messages in thread
From: Bjorn Helgaas @ 2021-05-18 13:59 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Jiaxun Yang, Huacai Chen, Bjorn Helgaas, linux-pci, Jianmin Lv,
	Rob Herring

[+cc Rob, beginning of thread
https://lore.kernel.org/r/20210514080025.1828197-5-chenhuacai@loongson.cn]

On Sat, May 15, 2021 at 11:52:53AM +0800, Huacai Chen wrote:
> On Fri, May 14, 2021 at 10:52 PM Jiaxun Yang <jiaxun.yang@flygoat.com> wrote:
> > 在 2021/5/14 16:00, Huacai Chen 写道:
> > > From: Jianmin Lv <lvjianmin@loongson.cn>
> > >
> > > In LS7A, multifunction device use same pci PIN and different
> > > irq for different function, so fix it for standard pci PIN
> > > usage.
> >
> > Hmm, I'm unsure about this change.
> > The PCIe port, or PCI-to-PCI bridge on LS7A only have a single
> > upstream interrupt specified in DeviceTree, how can this quirk
> > work?
>
> LS7A will be shared by MIPS-based Loongson and LoongArch-based
> Loongson, LoongArch use ACPI rather than FDT, so this quirk is needed.

Can you expand on this a little bit?

Which DT binding are you referring to?  Is it in the Linux source
tree?

I think Linux reads Interrupt Pin for both FDT and ACPI systems, and
apparently that register contains the same value for all functions of
this multi-function device.

The quirk will be applied for both FDT and ACPI systems, but it sounds
like you're saying this is needed *because* LoongArch uses ACPI.

Bjorn

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-18  9:30               ` suijingfeng
@ 2021-05-18 19:31                 ` Bjorn Helgaas
  0 siblings, 0 replies; 39+ messages in thread
From: Bjorn Helgaas @ 2021-05-18 19:31 UTC (permalink / raw)
  To: suijingfeng
  Cc: Huacai Chen, Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang

On Tue, May 18, 2021 at 05:30:38PM +0800, suijingfeng wrote:
> On 2021/5/18 上午11:09, Bjorn Helgaas wrote:
> 
> > If VGA Enable is 0 and cannot be set to 1, the bridge should *never*
> > forward VGA accesses to its secondary bus.  The generic VGA driver
> > that uses the legacy [mem 0xa0000-0xbffff] range should not work with
> > the VGA device at 05:00.0, and that device cannot participate in the
> > VGA arbitration scheme, which relies on the VGA Enable bit.
> > 
> > If you have a driver for 05:00.0 that doesn't need the legacy memory
> > range, it's possible that it may work.  But VGA arbitration will be
> > broken, and if 05:00.0 needs to be initialized by an option ROM, that
> > probably won't work either.
> 
> We are not using a "generic VGA driver", in user space, we are using the
> modesetting driver come with X server, and it seems work normally. The real
> problems is VGA arbitration will not set 05:00.0 as the default VGA which
> means that when X server read
> /sys/devices/pci0000:00/0000:00:0c.0/0000:04:00.0/0000:05:00.0/boot_vga will
> get a "0". This break Xorg auto-detection. We want the boot_vga sysfs file
> be "1".

I don't think it's true; I think VGA arbitration *will* set 05:00.0 as
the default VGA, as long as 05:00.0 has been enumerated before
vga_arb_device_init().

As far as I can tell, the problem is not that the bridge is broken or
that vga_arb_device_init() is broken.  The problem is that on your
system, vga_arb_device_init() runs before 05:00.0 has been enumerated,
so it *can't* set 05:00.0 as the default VGA because it doesn't know
about 05:00.0 at all.

> > If the 04:00.0 bridge *always* forwards VGA accesses, even though its
> > VGA Enable bit is always zero, then the bridge is broken.  In that
> > case, the generic VGA driver should work with the 05:00.0 device, but
> > VGA arbitration will be limited.  I'm not sure, but the arbiter
> > *might* be able to use the VGA Enable bit in the 00:0c.0 bridge to
> > control VGA access to 05:00.0.  You wouldn't be able to have more than
> > one VGA device below 00:0c.0, and you may not be able have more than
> > one in the entire system.
> 
> We have only one VGA device(05:00.0) below 00:0c.0, but we do able to have
> more than one in the entire system. We could even mount a AMDGPU
> on this server. But in reality, there is a render only GPU and a
> self-designed display controller integrated in LS7A1000 bridge. Both the
> render only GPU and the display controller is PCI device, they are located
> at PCI root bus directly without a PCI-to-PCI bridge in the middle. The
> display controller is blocked by the firmware if ASPEED BMC card is present,
> it can't be accessed under linux kernel. Let me show you a updated version
> of the PCI topology of our server(machine):
> 
>        /sys/devices/pci0000:00
>        |-- 0000:00:06.0
>        |   | -- class (0x040000)
>        |   | -- vendor (0x0014)
>        |   | -- device (0x7a15)
>        |   | -- drm
>        |   | -- ...
>        |-- 0000:00:0c.0
>        |   |-- class (0x060400)
>        |   |-- vendor (0x0014)
>        |   |-- device (0x7a09)
>        |   |-- ...
>        |   |-- 0000:04:00.0
>        |   |   | -- class (0x060400)
>        |   |   | -- device (0x1150)
>        |   |   | -- vendor (0x1a03)
>        |   |   | -- revision (0x04)
>        |   |   | -- ...
>        |   |   | -- 0000:05:00.0
>        |   |   |    | -- class  (0x030000)
>        |   |   |    | -- device (0x2000)
>        |   |   |    | -- vendor (0x1a03)
>        |   |   |    | -- boot_vga
>        |   |   |    | -- i2c-6
>        |   |   |    | -- drm
>        |   |   |    | -- graphics
>        |   |   |    | -- ...
>        |   `-- uevent
>        `-- ...
> 
> Even through the render only GPU(00:06.0) is not a VGA device, it still can
> disturb X server choose a primary device to use. But the root cause is the
> kernel side does not set 05:00.0 as default VGA. In this case X server will
> fallback to the first device found to use. and 00:06.0 is always found
> before 05:00.0. If kernel side set 05:00.0 as default VGA,
> all other problems is secondary.

The fact that X selects 00:06.0 is a user-level thing and I don't know
what's involved.  Its class code (0x0400) looks like
PCI_CLASS_MULTIMEDIA_VIDEO, so vga_arb_device_init() should completely
ignore it.

vga_arb_device_init() should set 05:00.0 as the *kernel's* default
device as long as 05:00.0 has already been enumerated.

Even if vga_arb_device_init() runs before 05:00.0 has been enumerated,
it looks like vga_arbiter_add_pci_device() should notice when 05:00.0
is enumerated, and you should see the "VGA device added:" message for
it.

vga_arbiter_add_pci_device() looks like it *would* set 05:00.0 as the
default device in that case, except for the fact that 04:00.0 doesn't
support PCI_BRIDGE_CTL_VGA.  That's probably a bug in a37c0f48950b
("vgaarb: Select a default VGA device even if there's no legacy VGA").
I think the logic added by a37c0f48950b probably should be shared
between the vga_arb_device_init() initcall path and the
vga_arbiter_add_pci_device() device-add path.

Can you collect your dmesg output, so we can see the enumeration order
and what vgaarb does with it?

Bjorn

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-18  7:13           ` Huacai Chen
@ 2021-05-18 19:35             ` Bjorn Helgaas
  2021-05-19  2:17               ` Huacai Chen
  0 siblings, 1 reply; 39+ messages in thread
From: Bjorn Helgaas @ 2021-05-18 19:35 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang, Jingfeng Sui

On Tue, May 18, 2021 at 03:13:43PM +0800, Huacai Chen wrote:
> On Tue, May 18, 2021 at 2:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote:
> > > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote:
> > > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote:
> > > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is
> > > > > > VGA Enable bit which modifies the response to VGA compatible addresses.
> > > > >
> > > > > The bridge spec is pretty old, and most of the content has been
> > > > > incorporated into the PCIe spec.  I think you can cite "PCIe r5.0, sec
> > > > > 7.5.1.3.13" here instead.
> > > > >
> > > > > > If the VGA Enable bit is set, the bridge will decode and forward the
> > > > > > following accesses on the primary interface to the secondary interface.
> > > > >
> > > > > *Which* following accesses?  The structure of English requires that if
> > > > > you say "the following accesses," you must continue by *listing* the
> > > > > accesses.
> > > > >
> > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its
> > > > > > bridge control register, which causes vgaarb subsystem don't think the
> > > > > > VGA card behind the bridge as a valid boot vga device.
> > > > >
> > > > > s/hardward/bridge/
> > > > > s/vga/VGA/ (also in code comments and dmesg strings below)
> > > > >
> > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device,
> > > > > since it apparently has a VGA class code.  But here you say the
> > > > > AST2500 has a Bridge Control register, which suggests that it's a
> > > > > bridge.  If AST2500 is some sort of combination that includes both a
> > > > > bridge and a VGA device, please outline that topology.
> > > > >
> > > > > But the hardware defect is that some bridges forward VGA accesses even
> > > > > though their VGA Enable bit is not set?  The quirk should be attached
> > > > > to broken *bridges*, not to VGA devices.
> > > > >
> > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit
> > > > > is set, that means VGA arbitration (in vgaarb.c) cannot work
> > > > > correctly, so merely setting the default VGA device once in a quirk is
> > > > > not sufficient.  You would have to somehow disable any future attempts
> > > > > to use other VGA devices.  Only the VGA device below this defective
> > > > > bridge is usable.  Any other VGA devices in the system would be
> > > > > useless.
> > > > >
> > > > > > So we provide a quirk to fix Xorg auto-detection.
> > > > > >
> > > > > > See similar bug:
> > > > > >
> > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/
> > > > >
> > > > > This patch was never merged.  If we merged a revised version, please
> > > > > cite the SHA1 instead.
> > > >
> > > > This patch has never merged, and I found that it is unnecessary after
> > > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device
> > > > even if there's no legacy VGA"). Maybe this ASpeed patch is also
> > > > unnecessary. If it is still needed, I'll investigate the root cause.
> > >
> > > I found that vga_arb_device_init() and pcibios_init() are both wrapped
> > > by subsys_initcall(), which means their sequence is unpredictable. And
> > > unfortunately, in our platform vga_arb_device_init() is called before
> > > pcibios_init(), which makes vga_arb_device_init() fail to set a
> > > default vga device. This is the root cause why we thought that we
> > > still need a quirk for AST2500.
> >
> > Does this mean there is no hardware defect here?  The VGA Enable bit
> > works correctly?
> >
> No, VGA Enable bit still doesn't set, but with commit
> a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device even if
> there's no legacy VGA") we no longer depend on VGA Enable.

Correct me if I'm wrong:

  - On the AST2500 bridge [1a03:1150], the VGA Enable bit is
    read-only 0.

  - The AST2500 bridge never forwards VGA accesses ([mem
    0xa0000-0xbffff], [io 0x3b0-0x3bb], [io 0x3c0-0x3df]) to its
    secondary bus.

The VGA Enable bit is optional, and if both the above are true, the
bridge is working correctly per spec, and the quirk below is not the
right solution, and whatever solution we come up with should not
claim that the bridge is misbehaving.

> > > I think the best solution is make vga_arb_device_init() be wrapped by
> > > subsys_initcall_sync(), do you think so?
> >
> > Hmm.  Unfortunately the semantics of subsys_initcall_sync() are not
> > documented, so I'm not sure exactly *why* such a change would work and
> > whether we could rely on it to continue working.
> >
> > pcibios_init() isn't very consistent across arches.  On some,
> > including alpha, microblaze, some MIPS platforms, powerpc, and sh, it
> > enumerates PCI devices.  On others (ia64, parisc, sparc, x86), it does
> > basically nothing.  That makes life a little difficult.
>
> subsys_initcall_sync() is ensured after all subsys_initcall()
> functions, so at least it can solve the problem on platforms which use
> pcibios_init() to enumerate PCI devices (x86 and other ACPI-based
> platforms are also OK, because they use acpi_init()
> -->acpi_scan_init() -->pci_acpi_scan_root() to enumerate devices).

More details in my response to suijingfeng:
https://lore.kernel.org/r/20210518193100.GA148462@bjorn-Precision-5520

I'd rather not fiddle with the initcall ordering.  That mechanism is
fragile and I'd prefer something more robust.

I'm wondering whether it's practical to do something in the normal PCI
enumeration path, e.g., in pci_init_capabilities().  Maybe we can
detect the default VGA device as we enumerate it.  Then we wouldn't
have this weird process of "find all PCI devices first, then scan for
the default VGA device, and oh, by the way, also check for VGA devices
hot-added later."

Bjorn

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-18 19:35             ` Bjorn Helgaas
@ 2021-05-19  2:17               ` Huacai Chen
  2021-05-19 19:33                 ` Bjorn Helgaas
  0 siblings, 1 reply; 39+ messages in thread
From: Huacai Chen @ 2021-05-19  2:17 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang, Jingfeng Sui

Hi, Bjorn,

On Wed, May 19, 2021 at 3:35 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, May 18, 2021 at 03:13:43PM +0800, Huacai Chen wrote:
> > On Tue, May 18, 2021 at 2:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote:
> > > > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote:
> > > > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote:
> > > > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is
> > > > > > > VGA Enable bit which modifies the response to VGA compatible addresses.
> > > > > >
> > > > > > The bridge spec is pretty old, and most of the content has been
> > > > > > incorporated into the PCIe spec.  I think you can cite "PCIe r5.0, sec
> > > > > > 7.5.1.3.13" here instead.
> > > > > >
> > > > > > > If the VGA Enable bit is set, the bridge will decode and forward the
> > > > > > > following accesses on the primary interface to the secondary interface.
> > > > > >
> > > > > > *Which* following accesses?  The structure of English requires that if
> > > > > > you say "the following accesses," you must continue by *listing* the
> > > > > > accesses.
> > > > > >
> > > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its
> > > > > > > bridge control register, which causes vgaarb subsystem don't think the
> > > > > > > VGA card behind the bridge as a valid boot vga device.
> > > > > >
> > > > > > s/hardward/bridge/
> > > > > > s/vga/VGA/ (also in code comments and dmesg strings below)
> > > > > >
> > > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device,
> > > > > > since it apparently has a VGA class code.  But here you say the
> > > > > > AST2500 has a Bridge Control register, which suggests that it's a
> > > > > > bridge.  If AST2500 is some sort of combination that includes both a
> > > > > > bridge and a VGA device, please outline that topology.
> > > > > >
> > > > > > But the hardware defect is that some bridges forward VGA accesses even
> > > > > > though their VGA Enable bit is not set?  The quirk should be attached
> > > > > > to broken *bridges*, not to VGA devices.
> > > > > >
> > > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit
> > > > > > is set, that means VGA arbitration (in vgaarb.c) cannot work
> > > > > > correctly, so merely setting the default VGA device once in a quirk is
> > > > > > not sufficient.  You would have to somehow disable any future attempts
> > > > > > to use other VGA devices.  Only the VGA device below this defective
> > > > > > bridge is usable.  Any other VGA devices in the system would be
> > > > > > useless.
> > > > > >
> > > > > > > So we provide a quirk to fix Xorg auto-detection.
> > > > > > >
> > > > > > > See similar bug:
> > > > > > >
> > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/
> > > > > >
> > > > > > This patch was never merged.  If we merged a revised version, please
> > > > > > cite the SHA1 instead.
> > > > >
> > > > > This patch has never merged, and I found that it is unnecessary after
> > > > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device
> > > > > even if there's no legacy VGA"). Maybe this ASpeed patch is also
> > > > > unnecessary. If it is still needed, I'll investigate the root cause.
> > > >
> > > > I found that vga_arb_device_init() and pcibios_init() are both wrapped
> > > > by subsys_initcall(), which means their sequence is unpredictable. And
> > > > unfortunately, in our platform vga_arb_device_init() is called before
> > > > pcibios_init(), which makes vga_arb_device_init() fail to set a
> > > > default vga device. This is the root cause why we thought that we
> > > > still need a quirk for AST2500.
> > >
> > > Does this mean there is no hardware defect here?  The VGA Enable bit
> > > works correctly?
> > >
> > No, VGA Enable bit still doesn't set, but with commit
> > a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device even if
> > there's no legacy VGA") we no longer depend on VGA Enable.
>
> Correct me if I'm wrong:
>
>   - On the AST2500 bridge [1a03:1150], the VGA Enable bit is
>     read-only 0.
>
>   - The AST2500 bridge never forwards VGA accesses ([mem
>     0xa0000-0xbffff], [io 0x3b0-0x3bb], [io 0x3c0-0x3df]) to its
>     secondary bus.
>
> The VGA Enable bit is optional, and if both the above are true, the
> bridge is working correctly per spec, and the quirk below is not the
> right solution, and whatever solution we come up with should not
> claim that the bridge is misbehaving.
Yes, you are right, the bridge is working correctly, which is similar
to HiSilicon D05.


>
> > > > I think the best solution is make vga_arb_device_init() be wrapped by
> > > > subsys_initcall_sync(), do you think so?
> > >
> > > Hmm.  Unfortunately the semantics of subsys_initcall_sync() are not
> > > documented, so I'm not sure exactly *why* such a change would work and
> > > whether we could rely on it to continue working.
> > >
> > > pcibios_init() isn't very consistent across arches.  On some,
> > > including alpha, microblaze, some MIPS platforms, powerpc, and sh, it
> > > enumerates PCI devices.  On others (ia64, parisc, sparc, x86), it does
> > > basically nothing.  That makes life a little difficult.
> >
> > subsys_initcall_sync() is ensured after all subsys_initcall()
> > functions, so at least it can solve the problem on platforms which use
> > pcibios_init() to enumerate PCI devices (x86 and other ACPI-based
> > platforms are also OK, because they use acpi_init()
> > -->acpi_scan_init() -->pci_acpi_scan_root() to enumerate devices).
>
> More details in my response to suijingfeng:
> https://lore.kernel.org/r/20210518193100.GA148462@bjorn-Precision-5520
>
> I'd rather not fiddle with the initcall ordering.  That mechanism is
> fragile and I'd prefer something more robust.
>
> I'm wondering whether it's practical to do something in the normal PCI
> enumeration path, e.g., in pci_init_capabilities().  Maybe we can
> detect the default VGA device as we enumerate it.  Then we wouldn't
> have this weird process of "find all PCI devices first, then scan for
> the default VGA device, and oh, by the way, also check for VGA devices
> hot-added later."
If we don't want to rely on initcall order, and want to solve the
hot-added case, then can we add vga_arb_select_default_device() in
pci_notify() when (action == BUS_NOTIFY_ADD_DEVICE &&
!vga_default_device())?

Huacai
>
> Bjorn

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/5] PCI: Add quirk for multifunction devices of LS7A
  2021-05-18 13:59       ` Bjorn Helgaas
@ 2021-05-19  2:26         ` Huacai Chen
  2021-05-19  3:05           ` Jiaxun Yang
  0 siblings, 1 reply; 39+ messages in thread
From: Huacai Chen @ 2021-05-19  2:26 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jiaxun Yang, Huacai Chen, Bjorn Helgaas, linux-pci, Jianmin Lv,
	Rob Herring

Hi, Bjorn,

On Tue, May 18, 2021 at 9:59 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Rob, beginning of thread
> https://lore.kernel.org/r/20210514080025.1828197-5-chenhuacai@loongson.cn]
>
> On Sat, May 15, 2021 at 11:52:53AM +0800, Huacai Chen wrote:
> > On Fri, May 14, 2021 at 10:52 PM Jiaxun Yang <jiaxun.yang@flygoat.com> wrote:
> > > 在 2021/5/14 16:00, Huacai Chen 写道:
> > > > From: Jianmin Lv <lvjianmin@loongson.cn>
> > > >
> > > > In LS7A, multifunction device use same pci PIN and different
> > > > irq for different function, so fix it for standard pci PIN
> > > > usage.
> > >
> > > Hmm, I'm unsure about this change.
> > > The PCIe port, or PCI-to-PCI bridge on LS7A only have a single
> > > upstream interrupt specified in DeviceTree, how can this quirk
> > > work?
> >
> > LS7A will be shared by MIPS-based Loongson and LoongArch-based
> > Loongson, LoongArch use ACPI rather than FDT, so this quirk is needed.
>
> Can you expand on this a little bit?
>
> Which DT binding are you referring to?  Is it in the Linux source
> tree?
I referring to arch/mips/boot/dts/loongson/ls7a-pch.dtsi, it is
already in Linux source tree.

>
> I think Linux reads Interrupt Pin for both FDT and ACPI systems, and
> apparently that register contains the same value for all functions of
> this multi-function device.
>
> The quirk will be applied for both FDT and ACPI systems, but it sounds
> like you're saying this is needed *because* LoongArch uses ACPI.
Jiaxun said that FDT system doesn't need this quirk, maybe he know more.

Huacai
>
> Bjorn

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/5] PCI: Add quirk for multifunction devices of LS7A
  2021-05-19  2:26         ` Huacai Chen
@ 2021-05-19  3:05           ` Jiaxun Yang
  0 siblings, 0 replies; 39+ messages in thread
From: Jiaxun Yang @ 2021-05-19  3:05 UTC (permalink / raw)
  To: Huacai Chen, Bjorn Helgaas
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Jianmin Lv, Rob Herring



在 2021/5/19 10:26, Huacai Chen 写道:
> Hi, Bjorn,
>
> On Tue, May 18, 2021 at 9:59 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>> [+cc Rob, beginning of thread
>> https://lore.kernel.org/r/20210514080025.1828197-5-chenhuacai@loongson.cn]
>>
>> On Sat, May 15, 2021 at 11:52:53AM +0800, Huacai Chen wrote:
>>> On Fri, May 14, 2021 at 10:52 PM Jiaxun Yang <jiaxun.yang@flygoat.com> wrote:
>>>> 在 2021/5/14 16:00, Huacai Chen 写道:
>>>>> From: Jianmin Lv <lvjianmin@loongson.cn>
>>>>>
>>>>> In LS7A, multifunction device use same pci PIN and different
>>>>> irq for different function, so fix it for standard pci PIN
>>>>> usage.
>>>> Hmm, I'm unsure about this change.
>>>> The PCIe port, or PCI-to-PCI bridge on LS7A only have a single
>>>> upstream interrupt specified in DeviceTree, how can this quirk
>>>> work?
>>> LS7A will be shared by MIPS-based Loongson and LoongArch-based
>>> Loongson, LoongArch use ACPI rather than FDT, so this quirk is needed.
>> Can you expand on this a little bit?
>>
>> Which DT binding are you referring to?  Is it in the Linux source
>> tree?
> I referring to arch/mips/boot/dts/loongson/ls7a-pch.dtsi, it is
> already in Linux source tree.
>
>> I think Linux reads Interrupt Pin for both FDT and ACPI systems, and
>> apparently that register contains the same value for all functions of
>> this multi-function device.
>>
>> The quirk will be applied for both FDT and ACPI systems, but it sounds
>> like you're saying this is needed *because* LoongArch uses ACPI.
> Jiaxun said that FDT system doesn't need this quirk, maybe he know more.

For FDT system, the IRQ routine of PCI ports looks like:

                 interrupt-map-mask = <0 0 0 0>;
                 interrupt-map = <0 0 0 0 &pic 41 IRQ_TYPE_LEVEL_HIGH>;

It means they are all going to the same parent IRQ no matter what we read
from pin register. And it's the actual hardware behavior AFAIK.

Thanks.

- Jiaxun

>
> Huacai
>> Bjorn


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-19  2:17               ` Huacai Chen
@ 2021-05-19 19:33                 ` Bjorn Helgaas
  2021-05-25 11:03                   ` Huacai Chen
  0 siblings, 1 reply; 39+ messages in thread
From: Bjorn Helgaas @ 2021-05-19 19:33 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang, Jingfeng Sui

On Wed, May 19, 2021 at 10:17:14AM +0800, Huacai Chen wrote:
> On Wed, May 19, 2021 at 3:35 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Tue, May 18, 2021 at 03:13:43PM +0800, Huacai Chen wrote:
> > > On Tue, May 18, 2021 at 2:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote:
> > > > > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote:
> > > > > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote:
> > > > > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is
> > > > > > > > VGA Enable bit which modifies the response to VGA compatible addresses.
> > > > > > >
> > > > > > > The bridge spec is pretty old, and most of the content has been
> > > > > > > incorporated into the PCIe spec.  I think you can cite "PCIe r5.0, sec
> > > > > > > 7.5.1.3.13" here instead.
> > > > > > >
> > > > > > > > If the VGA Enable bit is set, the bridge will decode and forward the
> > > > > > > > following accesses on the primary interface to the secondary interface.
> > > > > > >
> > > > > > > *Which* following accesses?  The structure of English requires that if
> > > > > > > you say "the following accesses," you must continue by *listing* the
> > > > > > > accesses.
> > > > > > >
> > > > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its
> > > > > > > > bridge control register, which causes vgaarb subsystem don't think the
> > > > > > > > VGA card behind the bridge as a valid boot vga device.
> > > > > > >
> > > > > > > s/hardward/bridge/
> > > > > > > s/vga/VGA/ (also in code comments and dmesg strings below)
> > > > > > >
> > > > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device,
> > > > > > > since it apparently has a VGA class code.  But here you say the
> > > > > > > AST2500 has a Bridge Control register, which suggests that it's a
> > > > > > > bridge.  If AST2500 is some sort of combination that includes both a
> > > > > > > bridge and a VGA device, please outline that topology.
> > > > > > >
> > > > > > > But the hardware defect is that some bridges forward VGA accesses even
> > > > > > > though their VGA Enable bit is not set?  The quirk should be attached
> > > > > > > to broken *bridges*, not to VGA devices.
> > > > > > >
> > > > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit
> > > > > > > is set, that means VGA arbitration (in vgaarb.c) cannot work
> > > > > > > correctly, so merely setting the default VGA device once in a quirk is
> > > > > > > not sufficient.  You would have to somehow disable any future attempts
> > > > > > > to use other VGA devices.  Only the VGA device below this defective
> > > > > > > bridge is usable.  Any other VGA devices in the system would be
> > > > > > > useless.
> > > > > > >
> > > > > > > > So we provide a quirk to fix Xorg auto-detection.
> > > > > > > >
> > > > > > > > See similar bug:
> > > > > > > >
> > > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/
> > > > > > >
> > > > > > > This patch was never merged.  If we merged a revised version, please
> > > > > > > cite the SHA1 instead.
> > > > > >
> > > > > > This patch has never merged, and I found that it is unnecessary after
> > > > > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device
> > > > > > even if there's no legacy VGA"). Maybe this ASpeed patch is also
> > > > > > unnecessary. If it is still needed, I'll investigate the root cause.
> > > > >
> > > > > I found that vga_arb_device_init() and pcibios_init() are both wrapped
> > > > > by subsys_initcall(), which means their sequence is unpredictable. And
> > > > > unfortunately, in our platform vga_arb_device_init() is called before
> > > > > pcibios_init(), which makes vga_arb_device_init() fail to set a
> > > > > default vga device. This is the root cause why we thought that we
> > > > > still need a quirk for AST2500.
> > > >
> > > > Does this mean there is no hardware defect here?  The VGA Enable bit
> > > > works correctly?
> > > >
> > > No, VGA Enable bit still doesn't set, but with commit
> > > a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device even if
> > > there's no legacy VGA") we no longer depend on VGA Enable.
> >
> > Correct me if I'm wrong:
> >
> >   - On the AST2500 bridge [1a03:1150], the VGA Enable bit is
> >     read-only 0.
> >
> >   - The AST2500 bridge never forwards VGA accesses ([mem
> >     0xa0000-0xbffff], [io 0x3b0-0x3bb], [io 0x3c0-0x3df]) to its
> >     secondary bus.
> >
> > The VGA Enable bit is optional, and if both the above are true, the
> > bridge is working correctly per spec, and the quirk below is not the
> > right solution, and whatever solution we come up with should not
> > claim that the bridge is misbehaving.
> Yes, you are right, the bridge is working correctly, which is similar
> to HiSilicon D05.
> 
> 
> >
> > > > > I think the best solution is make vga_arb_device_init() be wrapped by
> > > > > subsys_initcall_sync(), do you think so?
> > > >
> > > > Hmm.  Unfortunately the semantics of subsys_initcall_sync() are not
> > > > documented, so I'm not sure exactly *why* such a change would work and
> > > > whether we could rely on it to continue working.
> > > >
> > > > pcibios_init() isn't very consistent across arches.  On some,
> > > > including alpha, microblaze, some MIPS platforms, powerpc, and sh, it
> > > > enumerates PCI devices.  On others (ia64, parisc, sparc, x86), it does
> > > > basically nothing.  That makes life a little difficult.
> > >
> > > subsys_initcall_sync() is ensured after all subsys_initcall()
> > > functions, so at least it can solve the problem on platforms which use
> > > pcibios_init() to enumerate PCI devices (x86 and other ACPI-based
> > > platforms are also OK, because they use acpi_init()
> > > -->acpi_scan_init() -->pci_acpi_scan_root() to enumerate devices).
> >
> > More details in my response to suijingfeng:
> > https://lore.kernel.org/r/20210518193100.GA148462@bjorn-Precision-5520
> >
> > I'd rather not fiddle with the initcall ordering.  That mechanism is
> > fragile and I'd prefer something more robust.
> >
> > I'm wondering whether it's practical to do something in the normal PCI
> > enumeration path, e.g., in pci_init_capabilities().  Maybe we can
> > detect the default VGA device as we enumerate it.  Then we wouldn't
> > have this weird process of "find all PCI devices first, then scan for
> > the default VGA device, and oh, by the way, also check for VGA devices
> > hot-added later."
>
> If we don't want to rely on initcall order, and want to solve the
> hot-added case, then can we add vga_arb_select_default_device() in
> pci_notify() when (action == BUS_NOTIFY_ADD_DEVICE &&
> !vga_default_device())?

I think I would see if it's possible to call
vga_arb_select_default_device() from vga_arbiter_add_pci_device()
instead of from vga_arb_device_init().

I would also (as a separate patch) try to get rid of this loop in
vga_arb_device_init():

        list_for_each_entry(vgadev, &vga_list, list) {
                struct device *dev = &vgadev->pdev->dev;

                if (vgadev->bridge_has_one_vga)
                        vgaarb_info(dev, "bridge control possible\n");
                else
                        vgaarb_info(dev, "no bridge control possible\n");
        }

and do the vgaarb_info() in vga_arbiter_check_bridge_sharing(), where
the loop would not be needed.

Bjorn

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-19 19:33                 ` Bjorn Helgaas
@ 2021-05-25 11:03                   ` Huacai Chen
  2021-05-25 13:55                     ` Bjorn Helgaas
  0 siblings, 1 reply; 39+ messages in thread
From: Huacai Chen @ 2021-05-25 11:03 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang, Jingfeng Sui

Hi, Bjorn,

On Thu, May 20, 2021 at 3:33 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, May 19, 2021 at 10:17:14AM +0800, Huacai Chen wrote:
> > On Wed, May 19, 2021 at 3:35 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Tue, May 18, 2021 at 03:13:43PM +0800, Huacai Chen wrote:
> > > > On Tue, May 18, 2021 at 2:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote:
> > > > > > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote:
> > > > > > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote:
> > > > > > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is
> > > > > > > > > VGA Enable bit which modifies the response to VGA compatible addresses.
> > > > > > > >
> > > > > > > > The bridge spec is pretty old, and most of the content has been
> > > > > > > > incorporated into the PCIe spec.  I think you can cite "PCIe r5.0, sec
> > > > > > > > 7.5.1.3.13" here instead.
> > > > > > > >
> > > > > > > > > If the VGA Enable bit is set, the bridge will decode and forward the
> > > > > > > > > following accesses on the primary interface to the secondary interface.
> > > > > > > >
> > > > > > > > *Which* following accesses?  The structure of English requires that if
> > > > > > > > you say "the following accesses," you must continue by *listing* the
> > > > > > > > accesses.
> > > > > > > >
> > > > > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its
> > > > > > > > > bridge control register, which causes vgaarb subsystem don't think the
> > > > > > > > > VGA card behind the bridge as a valid boot vga device.
> > > > > > > >
> > > > > > > > s/hardward/bridge/
> > > > > > > > s/vga/VGA/ (also in code comments and dmesg strings below)
> > > > > > > >
> > > > > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device,
> > > > > > > > since it apparently has a VGA class code.  But here you say the
> > > > > > > > AST2500 has a Bridge Control register, which suggests that it's a
> > > > > > > > bridge.  If AST2500 is some sort of combination that includes both a
> > > > > > > > bridge and a VGA device, please outline that topology.
> > > > > > > >
> > > > > > > > But the hardware defect is that some bridges forward VGA accesses even
> > > > > > > > though their VGA Enable bit is not set?  The quirk should be attached
> > > > > > > > to broken *bridges*, not to VGA devices.
> > > > > > > >
> > > > > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit
> > > > > > > > is set, that means VGA arbitration (in vgaarb.c) cannot work
> > > > > > > > correctly, so merely setting the default VGA device once in a quirk is
> > > > > > > > not sufficient.  You would have to somehow disable any future attempts
> > > > > > > > to use other VGA devices.  Only the VGA device below this defective
> > > > > > > > bridge is usable.  Any other VGA devices in the system would be
> > > > > > > > useless.
> > > > > > > >
> > > > > > > > > So we provide a quirk to fix Xorg auto-detection.
> > > > > > > > >
> > > > > > > > > See similar bug:
> > > > > > > > >
> > > > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/
> > > > > > > >
> > > > > > > > This patch was never merged.  If we merged a revised version, please
> > > > > > > > cite the SHA1 instead.
> > > > > > >
> > > > > > > This patch has never merged, and I found that it is unnecessary after
> > > > > > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device
> > > > > > > even if there's no legacy VGA"). Maybe this ASpeed patch is also
> > > > > > > unnecessary. If it is still needed, I'll investigate the root cause.
> > > > > >
> > > > > > I found that vga_arb_device_init() and pcibios_init() are both wrapped
> > > > > > by subsys_initcall(), which means their sequence is unpredictable. And
> > > > > > unfortunately, in our platform vga_arb_device_init() is called before
> > > > > > pcibios_init(), which makes vga_arb_device_init() fail to set a
> > > > > > default vga device. This is the root cause why we thought that we
> > > > > > still need a quirk for AST2500.
> > > > >
> > > > > Does this mean there is no hardware defect here?  The VGA Enable bit
> > > > > works correctly?
> > > > >
> > > > No, VGA Enable bit still doesn't set, but with commit
> > > > a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device even if
> > > > there's no legacy VGA") we no longer depend on VGA Enable.
> > >
> > > Correct me if I'm wrong:
> > >
> > >   - On the AST2500 bridge [1a03:1150], the VGA Enable bit is
> > >     read-only 0.
> > >
> > >   - The AST2500 bridge never forwards VGA accesses ([mem
> > >     0xa0000-0xbffff], [io 0x3b0-0x3bb], [io 0x3c0-0x3df]) to its
> > >     secondary bus.
> > >
> > > The VGA Enable bit is optional, and if both the above are true, the
> > > bridge is working correctly per spec, and the quirk below is not the
> > > right solution, and whatever solution we come up with should not
> > > claim that the bridge is misbehaving.
> > Yes, you are right, the bridge is working correctly, which is similar
> > to HiSilicon D05.
> >
> >
> > >
> > > > > > I think the best solution is make vga_arb_device_init() be wrapped by
> > > > > > subsys_initcall_sync(), do you think so?
> > > > >
> > > > > Hmm.  Unfortunately the semantics of subsys_initcall_sync() are not
> > > > > documented, so I'm not sure exactly *why* such a change would work and
> > > > > whether we could rely on it to continue working.
> > > > >
> > > > > pcibios_init() isn't very consistent across arches.  On some,
> > > > > including alpha, microblaze, some MIPS platforms, powerpc, and sh, it
> > > > > enumerates PCI devices.  On others (ia64, parisc, sparc, x86), it does
> > > > > basically nothing.  That makes life a little difficult.
> > > >
> > > > subsys_initcall_sync() is ensured after all subsys_initcall()
> > > > functions, so at least it can solve the problem on platforms which use
> > > > pcibios_init() to enumerate PCI devices (x86 and other ACPI-based
> > > > platforms are also OK, because they use acpi_init()
> > > > -->acpi_scan_init() -->pci_acpi_scan_root() to enumerate devices).
> > >
> > > More details in my response to suijingfeng:
> > > https://lore.kernel.org/r/20210518193100.GA148462@bjorn-Precision-5520
> > >
> > > I'd rather not fiddle with the initcall ordering.  That mechanism is
> > > fragile and I'd prefer something more robust.
> > >
> > > I'm wondering whether it's practical to do something in the normal PCI
> > > enumeration path, e.g., in pci_init_capabilities().  Maybe we can
> > > detect the default VGA device as we enumerate it.  Then we wouldn't
> > > have this weird process of "find all PCI devices first, then scan for
> > > the default VGA device, and oh, by the way, also check for VGA devices
> > > hot-added later."
> >
> > If we don't want to rely on initcall order, and want to solve the
> > hot-added case, then can we add vga_arb_select_default_device() in
> > pci_notify() when (action == BUS_NOTIFY_ADD_DEVICE &&
> > !vga_default_device())?
>
> I think I would see if it's possible to call
> vga_arb_select_default_device() from vga_arbiter_add_pci_device()
> instead of from vga_arb_device_init().
>
> I would also (as a separate patch) try to get rid of this loop in
> vga_arb_device_init():
>
>         list_for_each_entry(vgadev, &vga_list, list) {
>                 struct device *dev = &vgadev->pdev->dev;
>
>                 if (vgadev->bridge_has_one_vga)
>                         vgaarb_info(dev, "bridge control possible\n");
>                 else
>                         vgaarb_info(dev, "no bridge control possible\n");
>         }
>
> and do the vgaarb_info() in vga_arbiter_check_bridge_sharing(), where
> the loop would not be needed.
Any updates?

Huacai
>
> Bjorn

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-25 11:03                   ` Huacai Chen
@ 2021-05-25 13:55                     ` Bjorn Helgaas
  2021-05-26  2:33                       ` Huacai Chen
  0 siblings, 1 reply; 39+ messages in thread
From: Bjorn Helgaas @ 2021-05-25 13:55 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang, Jingfeng Sui

On Tue, May 25, 2021 at 07:03:05PM +0800, Huacai Chen wrote:
> On Thu, May 20, 2021 at 3:33 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Wed, May 19, 2021 at 10:17:14AM +0800, Huacai Chen wrote:
> > > On Wed, May 19, 2021 at 3:35 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Tue, May 18, 2021 at 03:13:43PM +0800, Huacai Chen wrote:
> > > > > On Tue, May 18, 2021 at 2:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote:
> > > > > > > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote:
> > > > > > > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote:
> > > > > > > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is
> > > > > > > > > > VGA Enable bit which modifies the response to VGA compatible addresses.
> > > > > > > > >
> > > > > > > > > The bridge spec is pretty old, and most of the content has been
> > > > > > > > > incorporated into the PCIe spec.  I think you can cite "PCIe r5.0, sec
> > > > > > > > > 7.5.1.3.13" here instead.
> > > > > > > > >
> > > > > > > > > > If the VGA Enable bit is set, the bridge will decode and forward the
> > > > > > > > > > following accesses on the primary interface to the secondary interface.
> > > > > > > > >
> > > > > > > > > *Which* following accesses?  The structure of English requires that if
> > > > > > > > > you say "the following accesses," you must continue by *listing* the
> > > > > > > > > accesses.
> > > > > > > > >
> > > > > > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its
> > > > > > > > > > bridge control register, which causes vgaarb subsystem don't think the
> > > > > > > > > > VGA card behind the bridge as a valid boot vga device.
> > > > > > > > >
> > > > > > > > > s/hardward/bridge/
> > > > > > > > > s/vga/VGA/ (also in code comments and dmesg strings below)
> > > > > > > > >
> > > > > > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device,
> > > > > > > > > since it apparently has a VGA class code.  But here you say the
> > > > > > > > > AST2500 has a Bridge Control register, which suggests that it's a
> > > > > > > > > bridge.  If AST2500 is some sort of combination that includes both a
> > > > > > > > > bridge and a VGA device, please outline that topology.
> > > > > > > > >
> > > > > > > > > But the hardware defect is that some bridges forward VGA accesses even
> > > > > > > > > though their VGA Enable bit is not set?  The quirk should be attached
> > > > > > > > > to broken *bridges*, not to VGA devices.
> > > > > > > > >
> > > > > > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit
> > > > > > > > > is set, that means VGA arbitration (in vgaarb.c) cannot work
> > > > > > > > > correctly, so merely setting the default VGA device once in a quirk is
> > > > > > > > > not sufficient.  You would have to somehow disable any future attempts
> > > > > > > > > to use other VGA devices.  Only the VGA device below this defective
> > > > > > > > > bridge is usable.  Any other VGA devices in the system would be
> > > > > > > > > useless.
> > > > > > > > >
> > > > > > > > > > So we provide a quirk to fix Xorg auto-detection.
> > > > > > > > > >
> > > > > > > > > > See similar bug:
> > > > > > > > > >
> > > > > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/
> > > > > > > > >
> > > > > > > > > This patch was never merged.  If we merged a revised version, please
> > > > > > > > > cite the SHA1 instead.
> > > > > > > >
> > > > > > > > This patch has never merged, and I found that it is unnecessary after
> > > > > > > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device
> > > > > > > > even if there's no legacy VGA"). Maybe this ASpeed patch is also
> > > > > > > > unnecessary. If it is still needed, I'll investigate the root cause.
> > > > > > >
> > > > > > > I found that vga_arb_device_init() and pcibios_init() are both wrapped
> > > > > > > by subsys_initcall(), which means their sequence is unpredictable. And
> > > > > > > unfortunately, in our platform vga_arb_device_init() is called before
> > > > > > > pcibios_init(), which makes vga_arb_device_init() fail to set a
> > > > > > > default vga device. This is the root cause why we thought that we
> > > > > > > still need a quirk for AST2500.
> > > > > >
> > > > > > Does this mean there is no hardware defect here?  The VGA Enable bit
> > > > > > works correctly?
> > > > > >
> > > > > No, VGA Enable bit still doesn't set, but with commit
> > > > > a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device even if
> > > > > there's no legacy VGA") we no longer depend on VGA Enable.
> > > >
> > > > Correct me if I'm wrong:
> > > >
> > > >   - On the AST2500 bridge [1a03:1150], the VGA Enable bit is
> > > >     read-only 0.
> > > >
> > > >   - The AST2500 bridge never forwards VGA accesses ([mem
> > > >     0xa0000-0xbffff], [io 0x3b0-0x3bb], [io 0x3c0-0x3df]) to its
> > > >     secondary bus.
> > > >
> > > > The VGA Enable bit is optional, and if both the above are true, the
> > > > bridge is working correctly per spec, and the quirk below is not the
> > > > right solution, and whatever solution we come up with should not
> > > > claim that the bridge is misbehaving.
> > > Yes, you are right, the bridge is working correctly, which is similar
> > > to HiSilicon D05.
> > >
> > >
> > > >
> > > > > > > I think the best solution is make vga_arb_device_init() be wrapped by
> > > > > > > subsys_initcall_sync(), do you think so?
> > > > > >
> > > > > > Hmm.  Unfortunately the semantics of subsys_initcall_sync() are not
> > > > > > documented, so I'm not sure exactly *why* such a change would work and
> > > > > > whether we could rely on it to continue working.
> > > > > >
> > > > > > pcibios_init() isn't very consistent across arches.  On some,
> > > > > > including alpha, microblaze, some MIPS platforms, powerpc, and sh, it
> > > > > > enumerates PCI devices.  On others (ia64, parisc, sparc, x86), it does
> > > > > > basically nothing.  That makes life a little difficult.
> > > > >
> > > > > subsys_initcall_sync() is ensured after all subsys_initcall()
> > > > > functions, so at least it can solve the problem on platforms which use
> > > > > pcibios_init() to enumerate PCI devices (x86 and other ACPI-based
> > > > > platforms are also OK, because they use acpi_init()
> > > > > -->acpi_scan_init() -->pci_acpi_scan_root() to enumerate devices).
> > > >
> > > > More details in my response to suijingfeng:
> > > > https://lore.kernel.org/r/20210518193100.GA148462@bjorn-Precision-5520
> > > >
> > > > I'd rather not fiddle with the initcall ordering.  That mechanism is
> > > > fragile and I'd prefer something more robust.
> > > >
> > > > I'm wondering whether it's practical to do something in the normal PCI
> > > > enumeration path, e.g., in pci_init_capabilities().  Maybe we can
> > > > detect the default VGA device as we enumerate it.  Then we wouldn't
> > > > have this weird process of "find all PCI devices first, then scan for
> > > > the default VGA device, and oh, by the way, also check for VGA devices
> > > > hot-added later."
> > >
> > > If we don't want to rely on initcall order, and want to solve the
> > > hot-added case, then can we add vga_arb_select_default_device() in
> > > pci_notify() when (action == BUS_NOTIFY_ADD_DEVICE &&
> > > !vga_default_device())?
> >
> > I think I would see if it's possible to call
> > vga_arb_select_default_device() from vga_arbiter_add_pci_device()
> > instead of from vga_arb_device_init().
> >
> > I would also (as a separate patch) try to get rid of this loop in
> > vga_arb_device_init():
> >
> >         list_for_each_entry(vgadev, &vga_list, list) {
> >                 struct device *dev = &vgadev->pdev->dev;
> >
> >                 if (vgadev->bridge_has_one_vga)
> >                         vgaarb_info(dev, "bridge control possible\n");
> >                 else
> >                         vgaarb_info(dev, "no bridge control possible\n");
> >         }
> >
> > and do the vgaarb_info() in vga_arbiter_check_bridge_sharing(), where
> > the loop would not be needed.
>
> Any updates?

Are you waiting for me to do something else?

I suggested an approach above, but I don't have time to actually do
the work for you.  

Bjorn

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-25 13:55                     ` Bjorn Helgaas
@ 2021-05-26  2:33                       ` Huacai Chen
  2021-05-26  3:00                         ` Dave Airlie
  0 siblings, 1 reply; 39+ messages in thread
From: Huacai Chen @ 2021-05-26  2:33 UTC (permalink / raw)
  To: Bjorn Helgaas, David Airlie, Daniel Vetter, Maling list - DRI developers
  Cc: Huacai Chen, Bjorn Helgaas, linux-pci, Jiaxun Yang, Jingfeng Sui

Hi, Bjorn,

On Tue, May 25, 2021 at 9:55 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Tue, May 25, 2021 at 07:03:05PM +0800, Huacai Chen wrote:
> > On Thu, May 20, 2021 at 3:33 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, May 19, 2021 at 10:17:14AM +0800, Huacai Chen wrote:
> > > > On Wed, May 19, 2021 at 3:35 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Tue, May 18, 2021 at 03:13:43PM +0800, Huacai Chen wrote:
> > > > > > On Tue, May 18, 2021 at 2:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > On Mon, May 17, 2021 at 08:53:43PM +0800, Huacai Chen wrote:
> > > > > > > > On Sat, May 15, 2021 at 5:09 PM Huacai Chen <chenhuacai@gmail.com> wrote:
> > > > > > > > > On Fri, May 14, 2021 at 11:10 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > > > On Fri, May 14, 2021 at 04:00:25PM +0800, Huacai Chen wrote:
> > > > > > > > > > > According to PCI-to-PCI bridge spec, bit 3 of Bridge Control Register is
> > > > > > > > > > > VGA Enable bit which modifies the response to VGA compatible addresses.
> > > > > > > > > >
> > > > > > > > > > The bridge spec is pretty old, and most of the content has been
> > > > > > > > > > incorporated into the PCIe spec.  I think you can cite "PCIe r5.0, sec
> > > > > > > > > > 7.5.1.3.13" here instead.
> > > > > > > > > >
> > > > > > > > > > > If the VGA Enable bit is set, the bridge will decode and forward the
> > > > > > > > > > > following accesses on the primary interface to the secondary interface.
> > > > > > > > > >
> > > > > > > > > > *Which* following accesses?  The structure of English requires that if
> > > > > > > > > > you say "the following accesses," you must continue by *listing* the
> > > > > > > > > > accesses.
> > > > > > > > > >
> > > > > > > > > > > The ASpeed AST2500 hardward does not set the VGA Enable bit on its
> > > > > > > > > > > bridge control register, which causes vgaarb subsystem don't think the
> > > > > > > > > > > VGA card behind the bridge as a valid boot vga device.
> > > > > > > > > >
> > > > > > > > > > s/hardward/bridge/
> > > > > > > > > > s/vga/VGA/ (also in code comments and dmesg strings below)
> > > > > > > > > >
> > > > > > > > > > From the code, it looks like AST2500 ([1a03:2000]) is a VGA device,
> > > > > > > > > > since it apparently has a VGA class code.  But here you say the
> > > > > > > > > > AST2500 has a Bridge Control register, which suggests that it's a
> > > > > > > > > > bridge.  If AST2500 is some sort of combination that includes both a
> > > > > > > > > > bridge and a VGA device, please outline that topology.
> > > > > > > > > >
> > > > > > > > > > But the hardware defect is that some bridges forward VGA accesses even
> > > > > > > > > > though their VGA Enable bit is not set?  The quirk should be attached
> > > > > > > > > > to broken *bridges*, not to VGA devices.
> > > > > > > > > >
> > > > > > > > > > If a bridge forwards VGA accesses regardless of how its VGA Enable bit
> > > > > > > > > > is set, that means VGA arbitration (in vgaarb.c) cannot work
> > > > > > > > > > correctly, so merely setting the default VGA device once in a quirk is
> > > > > > > > > > not sufficient.  You would have to somehow disable any future attempts
> > > > > > > > > > to use other VGA devices.  Only the VGA device below this defective
> > > > > > > > > > bridge is usable.  Any other VGA devices in the system would be
> > > > > > > > > > useless.
> > > > > > > > > >
> > > > > > > > > > > So we provide a quirk to fix Xorg auto-detection.
> > > > > > > > > > >
> > > > > > > > > > > See similar bug:
> > > > > > > > > > >
> > > > > > > > > > > https://patchwork.kernel.org/project/linux-pci/patch/20170619023528.11532-1-dja@axtens.net/
> > > > > > > > > >
> > > > > > > > > > This patch was never merged.  If we merged a revised version, please
> > > > > > > > > > cite the SHA1 instead.
> > > > > > > > >
> > > > > > > > > This patch has never merged, and I found that it is unnecessary after
> > > > > > > > > commit a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device
> > > > > > > > > even if there's no legacy VGA"). Maybe this ASpeed patch is also
> > > > > > > > > unnecessary. If it is still needed, I'll investigate the root cause.
> > > > > > > >
> > > > > > > > I found that vga_arb_device_init() and pcibios_init() are both wrapped
> > > > > > > > by subsys_initcall(), which means their sequence is unpredictable. And
> > > > > > > > unfortunately, in our platform vga_arb_device_init() is called before
> > > > > > > > pcibios_init(), which makes vga_arb_device_init() fail to set a
> > > > > > > > default vga device. This is the root cause why we thought that we
> > > > > > > > still need a quirk for AST2500.
> > > > > > >
> > > > > > > Does this mean there is no hardware defect here?  The VGA Enable bit
> > > > > > > works correctly?
> > > > > > >
> > > > > > No, VGA Enable bit still doesn't set, but with commit
> > > > > > a37c0f48950b56f6ef2ee637 ("vgaarb: Select a default VGA device even if
> > > > > > there's no legacy VGA") we no longer depend on VGA Enable.
> > > > >
> > > > > Correct me if I'm wrong:
> > > > >
> > > > >   - On the AST2500 bridge [1a03:1150], the VGA Enable bit is
> > > > >     read-only 0.
> > > > >
> > > > >   - The AST2500 bridge never forwards VGA accesses ([mem
> > > > >     0xa0000-0xbffff], [io 0x3b0-0x3bb], [io 0x3c0-0x3df]) to its
> > > > >     secondary bus.
> > > > >
> > > > > The VGA Enable bit is optional, and if both the above are true, the
> > > > > bridge is working correctly per spec, and the quirk below is not the
> > > > > right solution, and whatever solution we come up with should not
> > > > > claim that the bridge is misbehaving.
> > > > Yes, you are right, the bridge is working correctly, which is similar
> > > > to HiSilicon D05.
> > > >
> > > >
> > > > >
> > > > > > > > I think the best solution is make vga_arb_device_init() be wrapped by
> > > > > > > > subsys_initcall_sync(), do you think so?
> > > > > > >
> > > > > > > Hmm.  Unfortunately the semantics of subsys_initcall_sync() are not
> > > > > > > documented, so I'm not sure exactly *why* such a change would work and
> > > > > > > whether we could rely on it to continue working.
> > > > > > >
> > > > > > > pcibios_init() isn't very consistent across arches.  On some,
> > > > > > > including alpha, microblaze, some MIPS platforms, powerpc, and sh, it
> > > > > > > enumerates PCI devices.  On others (ia64, parisc, sparc, x86), it does
> > > > > > > basically nothing.  That makes life a little difficult.
> > > > > >
> > > > > > subsys_initcall_sync() is ensured after all subsys_initcall()
> > > > > > functions, so at least it can solve the problem on platforms which use
> > > > > > pcibios_init() to enumerate PCI devices (x86 and other ACPI-based
> > > > > > platforms are also OK, because they use acpi_init()
> > > > > > -->acpi_scan_init() -->pci_acpi_scan_root() to enumerate devices).
> > > > >
> > > > > More details in my response to suijingfeng:
> > > > > https://lore.kernel.org/r/20210518193100.GA148462@bjorn-Precision-5520
> > > > >
> > > > > I'd rather not fiddle with the initcall ordering.  That mechanism is
> > > > > fragile and I'd prefer something more robust.
> > > > >
> > > > > I'm wondering whether it's practical to do something in the normal PCI
> > > > > enumeration path, e.g., in pci_init_capabilities().  Maybe we can
> > > > > detect the default VGA device as we enumerate it.  Then we wouldn't
> > > > > have this weird process of "find all PCI devices first, then scan for
> > > > > the default VGA device, and oh, by the way, also check for VGA devices
> > > > > hot-added later."
> > > >
> > > > If we don't want to rely on initcall order, and want to solve the
> > > > hot-added case, then can we add vga_arb_select_default_device() in
> > > > pci_notify() when (action == BUS_NOTIFY_ADD_DEVICE &&
> > > > !vga_default_device())?
> > >
> > > I think I would see if it's possible to call
> > > vga_arb_select_default_device() from vga_arbiter_add_pci_device()
> > > instead of from vga_arb_device_init().
> > >
> > > I would also (as a separate patch) try to get rid of this loop in
> > > vga_arb_device_init():
> > >
> > >         list_for_each_entry(vgadev, &vga_list, list) {
> > >                 struct device *dev = &vgadev->pdev->dev;
> > >
> > >                 if (vgadev->bridge_has_one_vga)
> > >                         vgaarb_info(dev, "bridge control possible\n");
> > >                 else
> > >                         vgaarb_info(dev, "no bridge control possible\n");
> > >         }
> > >
> > > and do the vgaarb_info() in vga_arbiter_check_bridge_sharing(), where
> > > the loop would not be needed.
> >
> > Any updates?
>
> Are you waiting for me to do something else?
>
> I suggested an approach above, but I don't have time to actually do
> the work for you.
Yes, I am really waiting... but I am also investigating history and thinking.

If I haven't missed something (correct me if I'm wrong). For the
original HiSilicon problem, the first attempt is to modify
vga_arbiter_add_pci_device() and remove the VGA_RSRC_LEGACY_MASK
check. But vga_arbiter_add_pci_device() is called for each PCI device,
so removing that check will cause the first VGA device to be the
default VGA device. This breaks some x86 platforms, so after that you
don't touch vga_arbiter_add_pci_device(), but add
vga_arb_select_default_device() in vga_arb_device_init().

If the above history is correct, then we cannot add
vga_arb_select_default_device() in vga_arbiter_add_pci_device()
directly. So it seems we can only add vga_arb_select_default_device()
in pci_notify(). And if we don't care about hotplug, we can simply use
subsys_initcall_sync() to wrap vga_arb_device_init().

And DRM developers, please let me know what do you think about?

Huacai

>
> Bjorn

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-26  2:33                       ` Huacai Chen
@ 2021-05-26  3:00                         ` Dave Airlie
  2021-05-26 18:29                           ` Bjorn Helgaas
  0 siblings, 1 reply; 39+ messages in thread
From: Dave Airlie @ 2021-05-26  3:00 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Bjorn Helgaas, David Airlie, Daniel Vetter,
	Maling list - DRI developers, Bjorn Helgaas, linux-pci,
	Jingfeng Sui, Jiaxun Yang, Huacai Chen

> > > > I think I would see if it's possible to call
> > > > vga_arb_select_default_device() from vga_arbiter_add_pci_device()
> > > > instead of from vga_arb_device_init().
> > > >
> > > > I would also (as a separate patch) try to get rid of this loop in
> > > > vga_arb_device_init():
> > > >
> > > >         list_for_each_entry(vgadev, &vga_list, list) {
> > > >                 struct device *dev = &vgadev->pdev->dev;
> > > >
> > > >                 if (vgadev->bridge_has_one_vga)
> > > >                         vgaarb_info(dev, "bridge control possible\n");
> > > >                 else
> > > >                         vgaarb_info(dev, "no bridge control possible\n");
> > > >         }
> > > >
> > > > and do the vgaarb_info() in vga_arbiter_check_bridge_sharing(), where
> > > > the loop would not be needed.
> > >
> > > Any updates?
> >
> > Are you waiting for me to do something else?
> >
> > I suggested an approach above, but I don't have time to actually do
> > the work for you.
> Yes, I am really waiting... but I am also investigating history and thinking.
>
> If I haven't missed something (correct me if I'm wrong). For the
> original HiSilicon problem, the first attempt is to modify
> vga_arbiter_add_pci_device() and remove the VGA_RSRC_LEGACY_MASK
> check. But vga_arbiter_add_pci_device() is called for each PCI device,
> so removing that check will cause the first VGA device to be the
> default VGA device. This breaks some x86 platforms, so after that you
> don't touch vga_arbiter_add_pci_device(), but add
> vga_arb_select_default_device() in vga_arb_device_init().
>
> If the above history is correct, then we cannot add
> vga_arb_select_default_device() in vga_arbiter_add_pci_device()
> directly. So it seems we can only add vga_arb_select_default_device()
> in pci_notify(). And if we don't care about hotplug, we can simply use
> subsys_initcall_sync() to wrap vga_arb_device_init().
>
> And DRM developers, please let me know what do you think about?

I'm not 100% following what is going on here.

Do you need call vga_arb_select_default_device after hotplug for some
reason, or it this just a race with subsys_init?

I think just adding subsys_initcall_sync should be fine

I don't see why you'd want to care about making a hotplug VGA device
the default at this point.

Dave.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge
  2021-05-26  3:00                         ` Dave Airlie
@ 2021-05-26 18:29                           ` Bjorn Helgaas
  0 siblings, 0 replies; 39+ messages in thread
From: Bjorn Helgaas @ 2021-05-26 18:29 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Huacai Chen, David Airlie, Daniel Vetter,
	Maling list - DRI developers, Bjorn Helgaas, linux-pci,
	Jingfeng Sui, Jiaxun Yang, Huacai Chen

On Wed, May 26, 2021 at 01:00:33PM +1000, Dave Airlie wrote:
> > > > > I think I would see if it's possible to call
> > > > > vga_arb_select_default_device() from vga_arbiter_add_pci_device()
> > > > > instead of from vga_arb_device_init().
> > > > >
> > > > > I would also (as a separate patch) try to get rid of this loop in
> > > > > vga_arb_device_init():
> > > > >
> > > > >         list_for_each_entry(vgadev, &vga_list, list) {
> > > > >                 struct device *dev = &vgadev->pdev->dev;
> > > > >
> > > > >                 if (vgadev->bridge_has_one_vga)
> > > > >                         vgaarb_info(dev, "bridge control possible\n");
> > > > >                 else
> > > > >                         vgaarb_info(dev, "no bridge control possible\n");
> > > > >         }
> > > > >
> > > > > and do the vgaarb_info() in vga_arbiter_check_bridge_sharing(), where
> > > > > the loop would not be needed.
> > > >
> > > > Any updates?
> > >
> > > Are you waiting for me to do something else?
> > >
> > > I suggested an approach above, but I don't have time to actually do
> > > the work for you.
> >
> > Yes, I am really waiting... but I am also investigating history
> > and thinking.

Well, don't wait for me because this work is not on my to-do list :)

> > If I haven't missed something (correct me if I'm wrong). For the
> > original HiSilicon problem, the first attempt is to modify
> > vga_arbiter_add_pci_device() and remove the VGA_RSRC_LEGACY_MASK
> > check. But vga_arbiter_add_pci_device() is called for each PCI device,
> > so removing that check will cause the first VGA device to be the
> > default VGA device. This breaks some x86 platforms, so after that you
> > don't touch vga_arbiter_add_pci_device(), but add
> > vga_arb_select_default_device() in vga_arb_device_init().
> >
> > If the above history is correct, then we cannot add
> > vga_arb_select_default_device() in vga_arbiter_add_pci_device()
> > directly. So it seems we can only add vga_arb_select_default_device()
> > in pci_notify(). And if we don't care about hotplug, we can simply use
> > subsys_initcall_sync() to wrap vga_arb_device_init().
> >
> > And DRM developers, please let me know what do you think about?
> 
> I'm not 100% following what is going on here.
> 
> Do you need call vga_arb_select_default_device after hotplug for some
> reason, or it this just a race with subsys_init?
> 
> I think just adding subsys_initcall_sync should be fine

Doing subsys_initcall_sync(vga_arb_device_init) is probably "OK".  I
don't think it's *great* because initcalls don't make dependencies
explicit so it won't be obvious *why* it's subsys_initcall_sync, and
it feels a little like a band-aid.

> I don't see why you'd want to care about making a hotplug VGA device
> the default at this point.

I don't think hotplug per se is relevant for this ASpeed case.

But I think the current design is slightly broken in that we set up
the machinery to call vga_arbiter_add_pci_device() for hot-added
devices, but a hot-added device can never be set as the default VGA
device.

Imagine a system with a single VGA device.  If that device is plugged
in before boot, it becomes the default VGA device.  If it is hot-added
after boot, it does not.  That inconsistency feels wrong to me.

If it were possible to set the default VGA device in
vga_arbiter_add_pci_device(), it would fix that inconsistency and
solve the ASpeed case.  But maybe that's not practical.

Bjorn

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2021-05-26 18:29 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-14  8:00 [PATCH 0/5] PCI: Loongson-related pci quirks Huacai Chen
2021-05-14  8:00 ` [PATCH 1/5] PCI/portdrv: Don't disable pci device during shutdown Huacai Chen
2021-05-14 14:20   ` Krzysztof Wilczyński
2021-05-14 16:09   ` Bjorn Helgaas
2021-05-15  3:38     ` Huacai Chen
2021-05-14  8:00 ` [PATCH 2/5] PCI: Move loongson pci quirks to quirks.c Huacai Chen
2021-05-14  8:00 ` [PATCH 3/5] PCI: Improve the mrrs quirk for LS7A Huacai Chen
2021-05-14 14:03   ` Krzysztof Wilczyński
2021-05-14 15:39   ` Bjorn Helgaas
2021-05-15  3:49     ` Huacai Chen
2021-05-15  6:22       ` Jiaxun Yang
2021-05-14  8:00 ` [PATCH 4/5] PCI: Add quirk for multifunction devices of LS7A Huacai Chen
2021-05-14 13:22   ` Krzysztof Wilczyński
2021-05-14 14:52   ` Jiaxun Yang
2021-05-15  3:52     ` Huacai Chen
2021-05-18 13:59       ` Bjorn Helgaas
2021-05-19  2:26         ` Huacai Chen
2021-05-19  3:05           ` Jiaxun Yang
2021-05-14 15:51   ` Bjorn Helgaas
2021-05-15  3:56     ` Huacai Chen
2021-05-14  8:00 ` [PATCH 5/5] PCI: Support ASpeed VGA cards behind a misbehaving bridge Huacai Chen
2021-05-14 13:56   ` Krzysztof Wilczyński
2021-05-14 15:10   ` Bjorn Helgaas
2021-05-15  9:09     ` Huacai Chen
2021-05-17 12:53       ` Huacai Chen
2021-05-17 18:28         ` Bjorn Helgaas
2021-05-18  2:31           ` 隋景峰
2021-05-18  3:09             ` Bjorn Helgaas
2021-05-18  9:30               ` suijingfeng
2021-05-18 19:31                 ` Bjorn Helgaas
2021-05-18  7:13           ` Huacai Chen
2021-05-18 19:35             ` Bjorn Helgaas
2021-05-19  2:17               ` Huacai Chen
2021-05-19 19:33                 ` Bjorn Helgaas
2021-05-25 11:03                   ` Huacai Chen
2021-05-25 13:55                     ` Bjorn Helgaas
2021-05-26  2:33                       ` Huacai Chen
2021-05-26  3:00                         ` Dave Airlie
2021-05-26 18:29                           ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).