All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 0/2] PCI: Resolve Loongson's LS7A PCI problems
@ 2023-02-01  4:30 Huacai Chen
  2023-02-01  4:30 ` [PATCH V4 1/2] PCI: Omit pci_disable_device() in .shutdown() Huacai Chen
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Huacai Chen @ 2023-02-01  4:30 UTC (permalink / raw)
  To: Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring, Krzysztof Wilczyński
  Cc: linux-pci, Jianmin Lv, Xuefeng Li, Huacai Chen, Jiaxun Yang,
	Huacai Chen, Tiezhu Yang

This patchset attempt to resolves Loongson's LS7A PCI problems: the
first patch remove pci_disable_device() in pcie_portdrv_remove() to
avoid poweroff/reboot failure; the second patch improves the mrrs quirk
for LS7A chipset; 

V1 -> V2:
1, Update commit messages and comments.

V2 -> V3:
1, Simply remove pci_disable_device() in pcie_port_device_remove() to
   solve poweroff/reboot failure.
2, Update commit messages and comments.

V3 -> V4:
1, Just remove pci_disable_device() in pcie_portdrv_shutdown() and keep
   pcie_portdrv_remove() be the same logic as before.

Huacai Chen, Tiezhu Yang and Jianmin Lv(2):
 PCI: Omit pci_disable_device() in .shutdown().
 PCI: loongson: Improve the MRRS quirk for LS7A.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> 
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
---
 drivers/pci/controller/pci-loongson.c | 44 ++++++++++++-----------------------
 drivers/pci/pci.c                     |  6 +++++
 drivers/pci/pcie/portdrv.c            | 16 +++++++++++--
 include/linux/pci.h                   |  1 +
 4 files changed, 36 insertions(+), 31 deletions(-)
--
2.27.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH V4 1/2] PCI: Omit pci_disable_device() in .shutdown()
  2023-02-01  4:30 [PATCH V4 0/2] PCI: Resolve Loongson's LS7A PCI problems Huacai Chen
@ 2023-02-01  4:30 ` Huacai Chen
  2023-02-01 18:17   ` Bjorn Helgaas
  2023-02-01  4:30 ` [PATCH V4 2/2] PCI: loongson: Improve the MRRS quirk for LS7A Huacai Chen
  2023-02-01 18:57 ` [PATCH V4 0/2] PCI: Resolve Loongson's LS7A PCI problems Bjorn Helgaas
  2 siblings, 1 reply; 10+ messages in thread
From: Huacai Chen @ 2023-02-01  4:30 UTC (permalink / raw)
  To: Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring, Krzysztof Wilczyński
  Cc: linux-pci, Jianmin Lv, Xuefeng Li, Huacai Chen, Jiaxun Yang, Huacai Chen

This patch has a long story.

After cc27b735ad3a7557 ("PCI/portdrv: Turn off PCIe services during
shutdown") we observe poweroff/reboot failures on systems with LS7A
chipset.

We found that if we remove "pci_command &= ~PCI_COMMAND_MASTER" in
do_pci_disable_device(), it can work well. The hardware engineer says
that the root cause is that CPU is still accessing PCIe devices while
poweroff/reboot, and if we disable the Bus Master Bit at this time, the
PCIe controller doesn't forward requests to downstream devices, and also
does not send TIMEOUT to CPU, which causes CPU wait forever (hardware
deadlock).

To be clear, the sequence is like this:

  - CPU issues MMIO read to device below Root Port

  - LS7A Root Port fails to forward transaction to secondary bus
    because of LS7A Bus Master defect

  - CPU hangs waiting for response to MMIO read

Then how is userspace able to use a device after the device is removed?

To give more details, let's take the graphics driver (e.g. amdgpu) as
an example. The userspace programs call printf() to display "shutting
down xxx service" during shutdown/reboot, or the kernel calls printk()
to display something during shutdown/reboot. These can happen at any
time, even after we call pcie_port_device_remove() to disable the pcie
port on the graphic card.

The call stack is: printk() --> call_console_drivers() --> con->write()
--> vt_console_print() --> fbcon_putcs()

This scenario happens because userspace programs (or the kernel itself)
don't know whether a device is 'usable', they just use it, at any time.

This hardware behavior is a PCIe protocol violation (Bus Master should
not be involved in CPU MMIO transactions), and it will be fixed in new
revisions of hardware (add timeout mechanism for CPU read request,
whether or not Bus Master bit is cleared).

On some x86 platforms, radeon/amdgpu devices can cause similar problems
[1][2].

Once before I add a quirk to solve the LS7A problem but looks ugly.
After long time discussions, Bjorn Helgaas suggest simply remove the
pci_disable_device() in pcie_portdrv_shutdown() and this patch do it
exactly.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=97980
[2] https://bugs.freedesktop.org/show_bug.cgi?id=98638

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 drivers/pci/pcie/portdrv.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 2cc2e60bcb39..46fad0d813b2 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -501,7 +501,6 @@ static void pcie_port_device_remove(struct pci_dev *dev)
 {
 	device_for_each_child(&dev->dev, NULL, remove_iter);
 	pci_free_irq_vectors(dev);
-	pci_disable_device(dev);
 }
 
 /**
@@ -727,6 +726,19 @@ static void pcie_portdrv_remove(struct pci_dev *dev)
 	}
 
 	pcie_port_device_remove(dev);
+
+	pci_disable_device(dev);
+}
+
+static void pcie_portdrv_shutdown(struct pci_dev *dev)
+{
+	if (pci_bridge_d3_possible(dev)) {
+		pm_runtime_forbid(&dev->dev);
+		pm_runtime_get_noresume(&dev->dev);
+		pm_runtime_dont_use_autosuspend(&dev->dev);
+	}
+
+	pcie_port_device_remove(dev);
 }
 
 static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
@@ -777,7 +789,7 @@ static struct pci_driver pcie_portdriver = {
 
 	.probe		= pcie_portdrv_probe,
 	.remove		= pcie_portdrv_remove,
-	.shutdown	= pcie_portdrv_remove,
+	.shutdown	= pcie_portdrv_shutdown,
 
 	.err_handler	= &pcie_portdrv_err_handler,
 
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH V4 2/2] PCI: loongson: Improve the MRRS quirk for LS7A
  2023-02-01  4:30 [PATCH V4 0/2] PCI: Resolve Loongson's LS7A PCI problems Huacai Chen
  2023-02-01  4:30 ` [PATCH V4 1/2] PCI: Omit pci_disable_device() in .shutdown() Huacai Chen
@ 2023-02-01  4:30 ` Huacai Chen
  2023-02-01 18:57 ` [PATCH V4 0/2] PCI: Resolve Loongson's LS7A PCI problems Bjorn Helgaas
  2 siblings, 0 replies; 10+ messages in thread
From: Huacai Chen @ 2023-02-01  4:30 UTC (permalink / raw)
  To: Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring, Krzysztof Wilczyński
  Cc: linux-pci, Jianmin Lv, Xuefeng Li, Huacai Chen, Jiaxun Yang, Huacai Chen

In new revision of LS7A, some PCIe ports support larger value than 256,
but their maximum supported MRRS values are not detectable. Moreover,
the current loongson_mrrs_quirk() cannot avoid devices increasing its
MRRS after pci_enable_device(), and some devices (e.g. Realtek 8169)
will actually set a big value in its driver. So the only possible way
is configure MRRS of all devices in BIOS, and add a pci host bridge bit
flag (i.e., no_inc_mrrs) to stop the increasing MRRS operations.

However, according to PCIe Spec, it is legal for an OS to program any
value for MRRS, and it is also legal for an endpoint to generate a Read
Request with any size up to its MRRS. As the hardware engineers say, the
root cause here is LS7A doesn't break up large read requests. In detail,
LS7A PCIe port reports CA (Completer Abort) if it receives a Memory Read
request with a size that's "too big" ("too big" means larger than the
PCIe ports can handle, which means 256 for some ports and 4096 for the
others, and of course this is a problem in the LS7A's hardware design).

Link: Link: https://bugzilla.kernel.org/show_bug.cgi?id=216884
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 drivers/pci/controller/pci-loongson.c | 44 +++++++++------------------
 drivers/pci/pci.c                     |  6 ++++
 include/linux/pci.h                   |  1 +
 3 files changed, 22 insertions(+), 29 deletions(-)

diff --git a/drivers/pci/controller/pci-loongson.c b/drivers/pci/controller/pci-loongson.c
index 05c50408f13b..759ec211c17b 100644
--- a/drivers/pci/controller/pci-loongson.c
+++ b/drivers/pci/controller/pci-loongson.c
@@ -75,37 +75,23 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
 DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
 			DEV_LS7A_LPC, system_bus_quirk);
 
-static void loongson_mrrs_quirk(struct pci_dev *dev)
+static void loongson_mrrs_quirk(struct pci_dev *pdev)
 {
-	struct pci_bus *bus = dev->bus;
-	struct pci_dev *bridge;
-	static const struct pci_device_id bridge_devids[] = {
-		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_0) },
-		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_1) },
-		{ PCI_VDEVICE(LOONGSON, DEV_PCIE_PORT_2) },
-		{ 0, },
-	};
-
-	/* look for the matching bridge */
-	while (!pci_is_root_bus(bus)) {
-		bridge = bus->self;
-		bus = bus->parent;
-		/*
-		 * Some Loongson PCIe ports have a h/w limitation of
-		 * 256 bytes maximum read request size. They can't handle
-		 * anything larger than this. So force this limit on
-		 * any devices attached under these ports.
-		 */
-		if (pci_match_id(bridge_devids, bridge)) {
-			if (pcie_get_readrq(dev) > 256) {
-				pci_info(dev, "limiting MRRS to 256\n");
-				pcie_set_readrq(dev, 256);
-			}
-			break;
-		}
-	}
+	/*
+	 * Some Loongson PCIe ports have h/w limitations of maximum read
+	 * request size. They can't handle anything larger than this. So
+	 * force this limit on any devices attached under these ports.
+	 */
+	struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus);
+
+	bridge->no_inc_mrrs = 1;
 }
-DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_mrrs_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_0, loongson_mrrs_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_1, loongson_mrrs_quirk);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
+			DEV_PCIE_PORT_2, loongson_mrrs_quirk);
 
 static void loongson_pci_pin_quirk(struct pci_dev *pdev)
 {
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index fba95486caaf..ae88210a12c7 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -6033,6 +6033,7 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
 {
 	u16 v;
 	int ret;
+	struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
 
 	if (rq < 128 || rq > 4096 || !is_power_of_2(rq))
 		return -EINVAL;
@@ -6051,6 +6052,11 @@ int pcie_set_readrq(struct pci_dev *dev, int rq)
 
 	v = (ffs(rq) - 8) << 12;
 
+	if (bridge->no_inc_mrrs) {
+		if (rq > pcie_get_readrq(dev))
+			return -EINVAL;
+	}
+
 	ret = pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
 						  PCI_EXP_DEVCTL_READRQ, v);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index adffd65e84b4..3df2049ec4a8 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -572,6 +572,7 @@ struct pci_host_bridge {
 	void		*release_data;
 	unsigned int	ignore_reset_delay:1;	/* For entire hierarchy */
 	unsigned int	no_ext_tags:1;		/* No Extended Tags */
+	unsigned int	no_inc_mrrs:1;		/* No Increase MRRS */
 	unsigned int	native_aer:1;		/* OS may use PCIe AER */
 	unsigned int	native_pcie_hotplug:1;	/* OS may use PCIe hotplug */
 	unsigned int	native_shpc_hotplug:1;	/* OS may use SHPC hotplug */
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH V4 1/2] PCI: Omit pci_disable_device() in .shutdown()
  2023-02-01  4:30 ` [PATCH V4 1/2] PCI: Omit pci_disable_device() in .shutdown() Huacai Chen
@ 2023-02-01 18:17   ` Bjorn Helgaas
  2023-02-02 13:27     ` Huacai Chen
  0 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2023-02-01 18:17 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, linux-pci, Jianmin Lv, Xuefeng Li,
	Huacai Chen, Jiaxun Yang

On Wed, Feb 01, 2023 at 12:30:17PM +0800, Huacai Chen wrote:
> This patch has a long story.
> 
> After cc27b735ad3a7557 ("PCI/portdrv: Turn off PCIe services during
> shutdown") we observe poweroff/reboot failures on systems with LS7A
> chipset.
> 
> We found that if we remove "pci_command &= ~PCI_COMMAND_MASTER" in
> do_pci_disable_device(), it can work well. The hardware engineer says
> that the root cause is that CPU is still accessing PCIe devices while
> poweroff/reboot, and if we disable the Bus Master Bit at this time, the
> PCIe controller doesn't forward requests to downstream devices, and also
> does not send TIMEOUT to CPU, which causes CPU wait forever (hardware
> deadlock).
> 
> To be clear, the sequence is like this:
> 
>   - CPU issues MMIO read to device below Root Port
> 
>   - LS7A Root Port fails to forward transaction to secondary bus
>     because of LS7A Bus Master defect
> 
>   - CPU hangs waiting for response to MMIO read
> 
> Then how is userspace able to use a device after the device is removed?
> 
> To give more details, let's take the graphics driver (e.g. amdgpu) as
> an example. The userspace programs call printf() to display "shutting
> down xxx service" during shutdown/reboot, or the kernel calls printk()
> to display something during shutdown/reboot. These can happen at any
> time, even after we call pcie_port_device_remove() to disable the pcie
> port on the graphic card.
> 
> The call stack is: printk() --> call_console_drivers() --> con->write()
> --> vt_console_print() --> fbcon_putcs()
> 
> This scenario happens because userspace programs (or the kernel itself)
> don't know whether a device is 'usable', they just use it, at any time.
> 
> This hardware behavior is a PCIe protocol violation (Bus Master should
> not be involved in CPU MMIO transactions), and it will be fixed in new
> revisions of hardware (add timeout mechanism for CPU read request,
> whether or not Bus Master bit is cleared).
> 
> On some x86 platforms, radeon/amdgpu devices can cause similar problems
> [1][2].
> 
> Once before I add a quirk to solve the LS7A problem but looks ugly.
> After long time discussions, Bjorn Helgaas suggest simply remove the
> pci_disable_device() in pcie_portdrv_shutdown() and this patch do it
> exactly.
> 
> [1] https://bugs.freedesktop.org/show_bug.cgi?id=97980
> [2] https://bugs.freedesktop.org/show_bug.cgi?id=98638
> 
> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> ---
>  drivers/pci/pcie/portdrv.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 2cc2e60bcb39..46fad0d813b2 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -501,7 +501,6 @@ static void pcie_port_device_remove(struct pci_dev *dev)
>  {
>  	device_for_each_child(&dev->dev, NULL, remove_iter);
>  	pci_free_irq_vectors(dev);
> -	pci_disable_device(dev);
>  }
>  
>  /**
> @@ -727,6 +726,19 @@ static void pcie_portdrv_remove(struct pci_dev *dev)
>  	}
>  
>  	pcie_port_device_remove(dev);
> +
> +	pci_disable_device(dev);
> +}
> +
> +static void pcie_portdrv_shutdown(struct pci_dev *dev)
> +{
> +	if (pci_bridge_d3_possible(dev)) {
> +		pm_runtime_forbid(&dev->dev);
> +		pm_runtime_get_noresume(&dev->dev);
> +		pm_runtime_dont_use_autosuspend(&dev->dev);
> +	}
> +
> +	pcie_port_device_remove(dev);

Thanks!  I guess you verified that this actually *does* call all the
port service .remove() methods, right?  aer_remove(), dpc_remove(),
etc?

I *assume* that happens via the device_unregister() done in
remove_iter(), but there's a LOT of code in the middle.

>  }
>  
>  static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
> @@ -777,7 +789,7 @@ static struct pci_driver pcie_portdriver = {
>  
>  	.probe		= pcie_portdrv_probe,
>  	.remove		= pcie_portdrv_remove,
> -	.shutdown	= pcie_portdrv_remove,
> +	.shutdown	= pcie_portdrv_shutdown,
>  
>  	.err_handler	= &pcie_portdrv_err_handler,
>  
> -- 
> 2.39.0
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V4 0/2] PCI: Resolve Loongson's LS7A PCI problems
  2023-02-01  4:30 [PATCH V4 0/2] PCI: Resolve Loongson's LS7A PCI problems Huacai Chen
  2023-02-01  4:30 ` [PATCH V4 1/2] PCI: Omit pci_disable_device() in .shutdown() Huacai Chen
  2023-02-01  4:30 ` [PATCH V4 2/2] PCI: loongson: Improve the MRRS quirk for LS7A Huacai Chen
@ 2023-02-01 18:57 ` Bjorn Helgaas
  2023-02-02 13:28   ` Huacai Chen
  2 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2023-02-01 18:57 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, linux-pci, Jianmin Lv, Xuefeng Li,
	Huacai Chen, Jiaxun Yang, Tiezhu Yang

On Wed, Feb 01, 2023 at 12:30:16PM +0800, Huacai Chen wrote:
> This patchset attempt to resolves Loongson's LS7A PCI problems: the
> first patch remove pci_disable_device() in pcie_portdrv_remove() to
> avoid poweroff/reboot failure; the second patch improves the mrrs quirk
> for LS7A chipset; 
> 
> V1 -> V2:
> 1, Update commit messages and comments.
> 
> V2 -> V3:
> 1, Simply remove pci_disable_device() in pcie_port_device_remove() to
>    solve poweroff/reboot failure.
> 2, Update commit messages and comments.
> 
> V3 -> V4:
> 1, Just remove pci_disable_device() in pcie_portdrv_shutdown() and keep
>    pcie_portdrv_remove() be the same logic as before.
> 
> Huacai Chen, Tiezhu Yang and Jianmin Lv(2):
>  PCI: Omit pci_disable_device() in .shutdown().
>  PCI: loongson: Improve the MRRS quirk for LS7A.
> 
> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> 
> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>

I applied both to pci/enumeration for v6.3, thanks!

I added a diagnostic message in pcie_set_readq() and reworked the
commit logs, so take a look and make sure I didn't mess it up:

https://git.kernel.org/cgit/linux/kernel/git/pci/pci.git/log/?h=pci/enumeration&id=8b3517f88ff2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V4 1/2] PCI: Omit pci_disable_device() in .shutdown()
  2023-02-01 18:17   ` Bjorn Helgaas
@ 2023-02-02 13:27     ` Huacai Chen
  2023-02-02 20:30       ` Bjorn Helgaas
  0 siblings, 1 reply; 10+ messages in thread
From: Huacai Chen @ 2023-02-02 13:27 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Huacai Chen, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, linux-pci, Jianmin Lv, Xuefeng Li,
	Jiaxun Yang

On Thu, Feb 2, 2023 at 2:17 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Feb 01, 2023 at 12:30:17PM +0800, Huacai Chen wrote:
> > This patch has a long story.
> >
> > After cc27b735ad3a7557 ("PCI/portdrv: Turn off PCIe services during
> > shutdown") we observe poweroff/reboot failures on systems with LS7A
> > chipset.
> >
> > We found that if we remove "pci_command &= ~PCI_COMMAND_MASTER" in
> > do_pci_disable_device(), it can work well. The hardware engineer says
> > that the root cause is that CPU is still accessing PCIe devices while
> > poweroff/reboot, and if we disable the Bus Master Bit at this time, the
> > PCIe controller doesn't forward requests to downstream devices, and also
> > does not send TIMEOUT to CPU, which causes CPU wait forever (hardware
> > deadlock).
> >
> > To be clear, the sequence is like this:
> >
> >   - CPU issues MMIO read to device below Root Port
> >
> >   - LS7A Root Port fails to forward transaction to secondary bus
> >     because of LS7A Bus Master defect
> >
> >   - CPU hangs waiting for response to MMIO read
> >
> > Then how is userspace able to use a device after the device is removed?
> >
> > To give more details, let's take the graphics driver (e.g. amdgpu) as
> > an example. The userspace programs call printf() to display "shutting
> > down xxx service" during shutdown/reboot, or the kernel calls printk()
> > to display something during shutdown/reboot. These can happen at any
> > time, even after we call pcie_port_device_remove() to disable the pcie
> > port on the graphic card.
> >
> > The call stack is: printk() --> call_console_drivers() --> con->write()
> > --> vt_console_print() --> fbcon_putcs()
> >
> > This scenario happens because userspace programs (or the kernel itself)
> > don't know whether a device is 'usable', they just use it, at any time.
> >
> > This hardware behavior is a PCIe protocol violation (Bus Master should
> > not be involved in CPU MMIO transactions), and it will be fixed in new
> > revisions of hardware (add timeout mechanism for CPU read request,
> > whether or not Bus Master bit is cleared).
> >
> > On some x86 platforms, radeon/amdgpu devices can cause similar problems
> > [1][2].
> >
> > Once before I add a quirk to solve the LS7A problem but looks ugly.
> > After long time discussions, Bjorn Helgaas suggest simply remove the
> > pci_disable_device() in pcie_portdrv_shutdown() and this patch do it
> > exactly.
> >
> > [1] https://bugs.freedesktop.org/show_bug.cgi?id=97980
> > [2] https://bugs.freedesktop.org/show_bug.cgi?id=98638
> >
> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > ---
> >  drivers/pci/pcie/portdrv.c | 16 ++++++++++++++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> > index 2cc2e60bcb39..46fad0d813b2 100644
> > --- a/drivers/pci/pcie/portdrv.c
> > +++ b/drivers/pci/pcie/portdrv.c
> > @@ -501,7 +501,6 @@ static void pcie_port_device_remove(struct pci_dev *dev)
> >  {
> >       device_for_each_child(&dev->dev, NULL, remove_iter);
> >       pci_free_irq_vectors(dev);
> > -     pci_disable_device(dev);
> >  }
> >
> >  /**
> > @@ -727,6 +726,19 @@ static void pcie_portdrv_remove(struct pci_dev *dev)
> >       }
> >
> >       pcie_port_device_remove(dev);
> > +
> > +     pci_disable_device(dev);
> > +}
> > +
> > +static void pcie_portdrv_shutdown(struct pci_dev *dev)
> > +{
> > +     if (pci_bridge_d3_possible(dev)) {
> > +             pm_runtime_forbid(&dev->dev);
> > +             pm_runtime_get_noresume(&dev->dev);
> > +             pm_runtime_dont_use_autosuspend(&dev->dev);
> > +     }
> > +
> > +     pcie_port_device_remove(dev);
>
> Thanks!  I guess you verified that this actually *does* call all the
> port service .remove() methods, right?  aer_remove(), dpc_remove(),
> etc?
I have tested, but aer_probe(), dpc_probe() doesn't get called at
boot, so does aer_remove(), dpc_remove() when poweroff. I haven't got
the root cause but I will continue to investigate.

Huacai
>
> I *assume* that happens via the device_unregister() done in
> remove_iter(), but there's a LOT of code in the middle.
>
> >  }
> >
> >  static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
> > @@ -777,7 +789,7 @@ static struct pci_driver pcie_portdriver = {
> >
> >       .probe          = pcie_portdrv_probe,
> >       .remove         = pcie_portdrv_remove,
> > -     .shutdown       = pcie_portdrv_remove,
> > +     .shutdown       = pcie_portdrv_shutdown,
> >
> >       .err_handler    = &pcie_portdrv_err_handler,
> >
> > --
> > 2.39.0
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V4 0/2] PCI: Resolve Loongson's LS7A PCI problems
  2023-02-01 18:57 ` [PATCH V4 0/2] PCI: Resolve Loongson's LS7A PCI problems Bjorn Helgaas
@ 2023-02-02 13:28   ` Huacai Chen
  0 siblings, 0 replies; 10+ messages in thread
From: Huacai Chen @ 2023-02-02 13:28 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Huacai Chen, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, linux-pci, Jianmin Lv, Xuefeng Li,
	Jiaxun Yang, Tiezhu Yang

On Thu, Feb 2, 2023 at 2:58 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Feb 01, 2023 at 12:30:16PM +0800, Huacai Chen wrote:
> > This patchset attempt to resolves Loongson's LS7A PCI problems: the
> > first patch remove pci_disable_device() in pcie_portdrv_remove() to
> > avoid poweroff/reboot failure; the second patch improves the mrrs quirk
> > for LS7A chipset;
> >
> > V1 -> V2:
> > 1, Update commit messages and comments.
> >
> > V2 -> V3:
> > 1, Simply remove pci_disable_device() in pcie_port_device_remove() to
> >    solve poweroff/reboot failure.
> > 2, Update commit messages and comments.
> >
> > V3 -> V4:
> > 1, Just remove pci_disable_device() in pcie_portdrv_shutdown() and keep
> >    pcie_portdrv_remove() be the same logic as before.
> >
> > Huacai Chen, Tiezhu Yang and Jianmin Lv(2):
> >  PCI: Omit pci_disable_device() in .shutdown().
> >  PCI: loongson: Improve the MRRS quirk for LS7A.
> >
> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
> > Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
>
> I applied both to pci/enumeration for v6.3, thanks!
>
> I added a diagnostic message in pcie_set_readq() and reworked the
> commit logs, so take a look and make sure I didn't mess it up:
>
> https://git.kernel.org/cgit/linux/kernel/git/pci/pci.git/log/?h=pci/enumeration&id=8b3517f88ff2
OK, thanks.

Huacai

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V4 1/2] PCI: Omit pci_disable_device() in .shutdown()
  2023-02-02 13:27     ` Huacai Chen
@ 2023-02-02 20:30       ` Bjorn Helgaas
  2023-02-03  4:00         ` Huacai Chen
  0 siblings, 1 reply; 10+ messages in thread
From: Bjorn Helgaas @ 2023-02-02 20:30 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Huacai Chen, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, linux-pci, Jianmin Lv, Xuefeng Li,
	Jiaxun Yang

On Thu, Feb 02, 2023 at 09:27:03PM +0800, Huacai Chen wrote:
> On Thu, Feb 2, 2023 at 2:17 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Wed, Feb 01, 2023 at 12:30:17PM +0800, Huacai Chen wrote:

> > > +static void pcie_portdrv_shutdown(struct pci_dev *dev)
> > > +{
> > > +     if (pci_bridge_d3_possible(dev)) {
> > > +             pm_runtime_forbid(&dev->dev);
> > > +             pm_runtime_get_noresume(&dev->dev);
> > > +             pm_runtime_dont_use_autosuspend(&dev->dev);
> > > +     }
> > > +
> > > +     pcie_port_device_remove(dev);
> >
> > Thanks!  I guess you verified that this actually *does* call all the
> > port service .remove() methods, right?  aer_remove(), dpc_remove(),
> > etc?
>
> I have tested, but aer_probe(), dpc_probe() doesn't get called at
> boot, so does aer_remove(), dpc_remove() when poweroff. I haven't got
> the root cause but I will continue to investigate.

We'll only call aer_probe() and dpc_probe() if the port supports those
services and the platform has granted us control of them.  I don't
know if your platform does.  It may support PCIe native hotplug
(pcie_hp_init()) or PME (pcie_pme_init()).

Bjorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V4 1/2] PCI: Omit pci_disable_device() in .shutdown()
  2023-02-02 20:30       ` Bjorn Helgaas
@ 2023-02-03  4:00         ` Huacai Chen
  2023-02-03 18:03           ` Bjorn Helgaas
  0 siblings, 1 reply; 10+ messages in thread
From: Huacai Chen @ 2023-02-03  4:00 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Huacai Chen, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, linux-pci, Jianmin Lv, Xuefeng Li,
	Jiaxun Yang

Hi, Bjorn,

On Fri, Feb 3, 2023 at 4:30 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Thu, Feb 02, 2023 at 09:27:03PM +0800, Huacai Chen wrote:
> > On Thu, Feb 2, 2023 at 2:17 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, Feb 01, 2023 at 12:30:17PM +0800, Huacai Chen wrote:
>
> > > > +static void pcie_portdrv_shutdown(struct pci_dev *dev)
> > > > +{
> > > > +     if (pci_bridge_d3_possible(dev)) {
> > > > +             pm_runtime_forbid(&dev->dev);
> > > > +             pm_runtime_get_noresume(&dev->dev);
> > > > +             pm_runtime_dont_use_autosuspend(&dev->dev);
> > > > +     }
> > > > +
> > > > +     pcie_port_device_remove(dev);
> > >
> > > Thanks!  I guess you verified that this actually *does* call all the
> > > port service .remove() methods, right?  aer_remove(), dpc_remove(),
> > > etc?
> >
> > I have tested, but aer_probe(), dpc_probe() doesn't get called at
> > boot, so does aer_remove(), dpc_remove() when poweroff. I haven't got
> > the root cause but I will continue to investigate.
>
> We'll only call aer_probe() and dpc_probe() if the port supports those
> services and the platform has granted us control of them.  I don't
> know if your platform does.  It may support PCIe native hotplug
> (pcie_hp_init()) or PME (pcie_pme_init()).
When I use pcie_ports=native to boot kernel, I verified that
aer_remove() and pcie_pme_remove() are both called, while DPC and
HOTPLUG are both not supported.

Huacai
>
> Bjorn

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V4 1/2] PCI: Omit pci_disable_device() in .shutdown()
  2023-02-03  4:00         ` Huacai Chen
@ 2023-02-03 18:03           ` Bjorn Helgaas
  0 siblings, 0 replies; 10+ messages in thread
From: Bjorn Helgaas @ 2023-02-03 18:03 UTC (permalink / raw)
  To: Huacai Chen
  Cc: Huacai Chen, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, linux-pci, Jianmin Lv, Xuefeng Li,
	Jiaxun Yang

On Fri, Feb 03, 2023 at 12:00:37PM +0800, Huacai Chen wrote:
> Hi, Bjorn,
> 
> On Fri, Feb 3, 2023 at 4:30 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Thu, Feb 02, 2023 at 09:27:03PM +0800, Huacai Chen wrote:
> > > On Thu, Feb 2, 2023 at 2:17 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Wed, Feb 01, 2023 at 12:30:17PM +0800, Huacai Chen wrote:
> >
> > > > > +static void pcie_portdrv_shutdown(struct pci_dev *dev)
> > > > > +{
> > > > > +     if (pci_bridge_d3_possible(dev)) {
> > > > > +             pm_runtime_forbid(&dev->dev);
> > > > > +             pm_runtime_get_noresume(&dev->dev);
> > > > > +             pm_runtime_dont_use_autosuspend(&dev->dev);
> > > > > +     }
> > > > > +
> > > > > +     pcie_port_device_remove(dev);
> > > >
> > > > Thanks!  I guess you verified that this actually *does* call all the
> > > > port service .remove() methods, right?  aer_remove(), dpc_remove(),
> > > > etc?
> > >
> > > I have tested, but aer_probe(), dpc_probe() doesn't get called at
> > > boot, so does aer_remove(), dpc_remove() when poweroff. I haven't got
> > > the root cause but I will continue to investigate.
> >
> > We'll only call aer_probe() and dpc_probe() if the port supports those
> > services and the platform has granted us control of them.  I don't
> > know if your platform does.  It may support PCIe native hotplug
> > (pcie_hp_init()) or PME (pcie_pme_init()).
>
> When I use pcie_ports=native to boot kernel, I verified that
> aer_remove() and pcie_pme_remove() are both called, while DPC and
> HOTPLUG are both not supported.

Great, thank you!

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-02-03 18:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-01  4:30 [PATCH V4 0/2] PCI: Resolve Loongson's LS7A PCI problems Huacai Chen
2023-02-01  4:30 ` [PATCH V4 1/2] PCI: Omit pci_disable_device() in .shutdown() Huacai Chen
2023-02-01 18:17   ` Bjorn Helgaas
2023-02-02 13:27     ` Huacai Chen
2023-02-02 20:30       ` Bjorn Helgaas
2023-02-03  4:00         ` Huacai Chen
2023-02-03 18:03           ` Bjorn Helgaas
2023-02-01  4:30 ` [PATCH V4 2/2] PCI: loongson: Improve the MRRS quirk for LS7A Huacai Chen
2023-02-01 18:57 ` [PATCH V4 0/2] PCI: Resolve Loongson's LS7A PCI problems Bjorn Helgaas
2023-02-02 13:28   ` Huacai Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.