All of lore.kernel.org
 help / color / mirror / Atom feed
From: Huacai Chen <chenhuacai@loongson.cn>
To: "Bjorn Helgaas" <bhelgaas@google.com>,
	"Lorenzo Pieralisi" <lorenzo.pieralisi@arm.com>,
	"Rob Herring" <robh@kernel.org>,
	"Krzysztof Wilczyński" <kw@linux.com>
Cc: linux-pci@vger.kernel.org, Jianmin Lv <lvjianmin@loongson.cn>,
	Xuefeng Li <lixuefeng@loongson.cn>,
	Huacai Chen <chenhuacai@gmail.com>,
	Jiaxun Yang <jiaxun.yang@flygoat.com>,
	Huacai Chen <chenhuacai@loongson.cn>
Subject: [PATCH V2 2/2] PCI: Add quirk for LS7A to avoid reboot failure
Date: Fri,  6 Jan 2023 17:51:43 +0800	[thread overview]
Message-ID: <20230106095143.3158998-3-chenhuacai@loongson.cn> (raw)
In-Reply-To: <20230106095143.3158998-1-chenhuacai@loongson.cn>

After cc27b735ad3a7557 ("PCI/portdrv: Turn off PCIe services during
shutdown") we observe poweroff/reboot failures on systems with LS7A
chipset.

We found that if we remove "pci_command &= ~PCI_COMMAND_MASTER" in
do_pci_disable_device(), it can work well. The hardware engineer says
that the root cause is that CPU is still accessing PCIe devices while
poweroff/reboot, and if we disable the Bus Master Bit at this time, the
PCIe controller doesn't forward requests to downstream devices, and also
does not send TIMEOUT to CPU, which causes CPU wait forever (hardware
deadlock).

To be clear, the sequence is like this:

  - CPU issues MMIO read to device below Root Port

  - LS7A Root Port fails to forward transaction to secondary bus
    because of LS7A Bus Master defect

  - CPU hangs waiting for response to MMIO read

Then how is userspace able to use a device after the device is removed?

To give more details, let's take the graphics driver (e.g. amdgpu) as
an example. The userspace programs call printf() to display "shutting
down xxx service" during shutdown/reboot, or the kernel calls printk()
to display something during shutdown/reboot. These can happen at any
time, even after we call pcie_port_device_remove() to disable the pcie
port on the graphic card.

The call stack is: printk() --> call_console_drivers() --> con->write()
--> vt_console_print() --> fbcon_putcs()

This scenario happens because userspace programs (or the kernel itself)
don't know whether a device is 'usable', they just use it, at any time.

This hardware behavior is a PCIe protocol violation (Bus Master should
not be involved in CPU MMIO transactions), and it will be fixed in new
revisions of hardware (add timeout mechanism for CPU read request,
whether or not Bus Master bit is cleared).

On some x86 platforms, radeon/amdgpu devices can cause similar problems
[1][2]. Once before I wanted to make a single patch to solve "all of
these problems" together, but it seems unreasonable because maybe they
are not exactly the same problem. So, this patch add a new function
pcie_portdrv_shutdown(), a slight modified copy of pcie_portdrv_remove()
dedicated for the shutdown path, and then add a quirk just for LS7A to
avoid clearing Bus Master bit in pcie_portdrv_shutdown(). Leave other
platforms behave as before.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=97980
[2] https://bugs.freedesktop.org/show_bug.cgi?id=98638

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 drivers/pci/controller/pci-loongson.c | 17 +++++++++++++++++
 drivers/pci/pcie/portdrv.c            | 21 +++++++++++++++++++--
 include/linux/pci.h                   |  1 +
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/pci-loongson.c b/drivers/pci/controller/pci-loongson.c
index 759ec211c17b..641308ba4126 100644
--- a/drivers/pci/controller/pci-loongson.c
+++ b/drivers/pci/controller/pci-loongson.c
@@ -93,6 +93,24 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
 			DEV_PCIE_PORT_2, loongson_mrrs_quirk);
 
+static void loongson_bmaster_quirk(struct pci_dev *pdev)
+{
+	/*
+	 * Some Loongson PCIe ports will cause CPU deadlock if there is
+	 * MMIO access to a downstream device when the root port disable
+	 * the Bus Master bit during poweroff/reboot.
+	 */
+	struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
+
+	bridge->no_dis_bmaster = 1;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_0, loongson_bmaster_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_1, loongson_bmaster_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_2, loongson_bmaster_quirk);
+
 static void loongson_pci_pin_quirk(struct pci_dev *pdev)
 {
 	pdev->pin = 1 + (PCI_FUNC(pdev->devfn) & 3);
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 2cc2e60bcb39..96f45c444422 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -501,7 +501,6 @@ static void pcie_port_device_remove(struct pci_dev *dev)
 {
 	device_for_each_child(&dev->dev, NULL, remove_iter);
 	pci_free_irq_vectors(dev);
-	pci_disable_device(dev);
 }
 
 /**
@@ -727,6 +726,24 @@ static void pcie_portdrv_remove(struct pci_dev *dev)
 	}
 
 	pcie_port_device_remove(dev);
+
+	pci_disable_device(dev);
+}
+
+static void pcie_portdrv_shutdown(struct pci_dev *dev)
+{
+	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+
+	if (pci_bridge_d3_possible(dev)) {
+		pm_runtime_forbid(&dev->dev);
+		pm_runtime_get_noresume(&dev->dev);
+		pm_runtime_dont_use_autosuspend(&dev->dev);
+	}
+
+	pcie_port_device_remove(dev);
+
+	if (!bridge->no_dis_bmaster)
+		pci_disable_device(dev);
 }
 
 static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
@@ -777,7 +794,7 @@ static struct pci_driver pcie_portdriver = {
 
 	.probe		= pcie_portdrv_probe,
 	.remove		= pcie_portdrv_remove,
-	.shutdown	= pcie_portdrv_remove,
+	.shutdown	= pcie_portdrv_shutdown,
 
 	.err_handler	= &pcie_portdrv_err_handler,
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 3df2049ec4a8..a64dbcb89231 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -573,6 +573,7 @@ struct pci_host_bridge {
 	unsigned int	ignore_reset_delay:1;	/* For entire hierarchy */
 	unsigned int	no_ext_tags:1;		/* No Extended Tags */
 	unsigned int	no_inc_mrrs:1;		/* No Increase MRRS */
+	unsigned int	no_dis_bmaster:1;	/* No Disable Bus Master */
 	unsigned int	native_aer:1;		/* OS may use PCIe AER */
 	unsigned int	native_pcie_hotplug:1;	/* OS may use PCIe hotplug */
 	unsigned int	native_shpc_hotplug:1;	/* OS may use SHPC hotplug */
-- 
2.31.1


  parent reply	other threads:[~2023-01-06  9:53 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-06  9:51 [PATCH V2 0/2] PCI: Add two Loongson's LS7A quirks Huacai Chen
2023-01-06  9:51 ` [PATCH V2 1/2] PCI: loongson: Improve the MRRS quirk for LS7A Huacai Chen
2023-01-31  0:16   ` Bjorn Helgaas
2023-01-31 11:54     ` Huacai Chen
2023-01-06  9:51 ` Huacai Chen [this message]
2023-01-06 15:38   ` [PATCH V2 2/2] PCI: Add quirk for LS7A to avoid reboot failure Bjorn Helgaas
2023-01-07  2:25     ` Huacai Chen
2023-01-19 12:25       ` Huacai Chen
2023-01-19 12:50         ` Bjorn Helgaas
2023-01-20 13:31           ` Huacai Chen
2023-01-20 15:36             ` Bjorn Helgaas
2023-01-21 15:10               ` Huacai Chen
2023-01-30 12:35                 ` Thorsten Leemhuis
2023-02-01 22:10                   ` Bjorn Helgaas
2023-01-31  0:01                 ` Bjorn Helgaas
2023-01-31 12:02                   ` Huacai Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230106095143.3158998-3-chenhuacai@loongson.cn \
    --to=chenhuacai@loongson.cn \
    --cc=bhelgaas@google.com \
    --cc=chenhuacai@gmail.com \
    --cc=jiaxun.yang@flygoat.com \
    --cc=kw@linux.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=lixuefeng@loongson.cn \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=lvjianmin@loongson.cn \
    --cc=robh@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.