All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
@ 2022-05-28 22:44 ` Jim Quinlan
  0 siblings, 0 replies; 20+ messages in thread
From: Jim Quinlan @ 2022-05-28 22:44 UTC (permalink / raw)
  To: linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas, james.dutton,
	Cyril Brulebois, bcm-kernel-feedback-list, jim2101024,
	james.quinlan
  Cc: Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	Lorenzo Pieralisi, Rob Herring

v2 -- Replace using the condition "bus->number == 1" and instead
      use "pci_is_root_bus(bus->parent)".  Although I initially
      planned to get/enable regulators under and port device,
      this became more complicated for this regression fix.
   -- Corrected the commit message in v1 to say "Root Port DT
      node" rather than "Endpoint DT node".
   -- brcm_pcie_add_bus() now returns 0 if there is an error in
      error in calling pci_subdev_regulators_add_bus().  Instead,
      we dev_err() and turn on our refusal mode instead.
   -- The pci_subdev_regulators_remove_bus() function now calls
      regulator_bulk_free() in addtion to regulator_bulk_disable().
      I noticed that this call was missing after Bjorn had me detail
      the call graph for removing the bus when pcie link-up failed.
   -- Rewrote and added some comments.

v1 -- Original

Jim Quinlan (1):
  PCI: brcmstb: Fix regression regarding missing PCIe linkup

 drivers/pci/controller/pcie-brcmstb.c | 43 +++++++++++++++++++--------
 1 file changed, 30 insertions(+), 13 deletions(-)


base-commit: ef1302160bfb19f804451d0e919266703501c875
prerequisite-patch-id: 23a425390a4226bd70bbff459148c80f5e28379c
prerequisite-patch-id: e3f2875124b46b2b1cf9ea28883bf0c864b79479
prerequisite-patch-id: 9cdd706ee2038c7b393c4d65ff76a1873df1ca03
prerequisite-patch-id: 332ac90be6e4e4110e27bdd1caaff212c129f547
prerequisite-patch-id: 32a74f87cbfe9e8d52c34a4edeee6d271925665a
prerequisite-patch-id: f57cdf7ec7080bb8c95782bc7c3ec672db8ec1ce
prerequisite-patch-id: 18dc9236aed47f708f5c854afd832f3c80be5ea7
prerequisite-patch-id: dd147c6854c4ca12a9a8bd4f5714968a59d60e4e
prerequisite-patch-id: 0905430e81a95900a1366916fe2940b848317a7c
prerequisite-patch-id: 710896210c50354d87f6025fe0bd1b89981138eb
prerequisite-patch-id: 97d3886cb911cb12ef3d514fdfff2a0ab11e8570
prerequisite-patch-id: 241f1e1878fc177d941f4982ca12779a29feb62b
prerequisite-patch-id: d856608825e2294297db5d7f88f8c180f3e5a1f2
prerequisite-patch-id: 92bcbc9772fb4d248157bcf35e799ac37be8ee45
prerequisite-patch-id: 6f4b1aac459bb54523ade0e87c04e9d6c45bd9f5
prerequisite-patch-id: 090ee7a3112a4ecb03805b23ed10e2c96b3b34ed
-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 0/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
@ 2022-05-28 22:44 ` Jim Quinlan
  0 siblings, 0 replies; 20+ messages in thread
From: Jim Quinlan @ 2022-05-28 22:44 UTC (permalink / raw)
  To: linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas, james.dutton,
	Cyril Brulebois, bcm-kernel-feedback-list, jim2101024,
	james.quinlan
  Cc: Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	Lorenzo Pieralisi, Rob Herring

v2 -- Replace using the condition "bus->number == 1" and instead
      use "pci_is_root_bus(bus->parent)".  Although I initially
      planned to get/enable regulators under and port device,
      this became more complicated for this regression fix.
   -- Corrected the commit message in v1 to say "Root Port DT
      node" rather than "Endpoint DT node".
   -- brcm_pcie_add_bus() now returns 0 if there is an error in
      error in calling pci_subdev_regulators_add_bus().  Instead,
      we dev_err() and turn on our refusal mode instead.
   -- The pci_subdev_regulators_remove_bus() function now calls
      regulator_bulk_free() in addtion to regulator_bulk_disable().
      I noticed that this call was missing after Bjorn had me detail
      the call graph for removing the bus when pcie link-up failed.
   -- Rewrote and added some comments.

v1 -- Original

Jim Quinlan (1):
  PCI: brcmstb: Fix regression regarding missing PCIe linkup

 drivers/pci/controller/pcie-brcmstb.c | 43 +++++++++++++++++++--------
 1 file changed, 30 insertions(+), 13 deletions(-)


base-commit: ef1302160bfb19f804451d0e919266703501c875
prerequisite-patch-id: 23a425390a4226bd70bbff459148c80f5e28379c
prerequisite-patch-id: e3f2875124b46b2b1cf9ea28883bf0c864b79479
prerequisite-patch-id: 9cdd706ee2038c7b393c4d65ff76a1873df1ca03
prerequisite-patch-id: 332ac90be6e4e4110e27bdd1caaff212c129f547
prerequisite-patch-id: 32a74f87cbfe9e8d52c34a4edeee6d271925665a
prerequisite-patch-id: f57cdf7ec7080bb8c95782bc7c3ec672db8ec1ce
prerequisite-patch-id: 18dc9236aed47f708f5c854afd832f3c80be5ea7
prerequisite-patch-id: dd147c6854c4ca12a9a8bd4f5714968a59d60e4e
prerequisite-patch-id: 0905430e81a95900a1366916fe2940b848317a7c
prerequisite-patch-id: 710896210c50354d87f6025fe0bd1b89981138eb
prerequisite-patch-id: 97d3886cb911cb12ef3d514fdfff2a0ab11e8570
prerequisite-patch-id: 241f1e1878fc177d941f4982ca12779a29feb62b
prerequisite-patch-id: d856608825e2294297db5d7f88f8c180f3e5a1f2
prerequisite-patch-id: 92bcbc9772fb4d248157bcf35e799ac37be8ee45
prerequisite-patch-id: 6f4b1aac459bb54523ade0e87c04e9d6c45bd9f5
prerequisite-patch-id: 090ee7a3112a4ecb03805b23ed10e2c96b3b34ed
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
  2022-05-28 22:44 ` Jim Quinlan
@ 2022-05-28 22:44   ` Jim Quinlan
  -1 siblings, 0 replies; 20+ messages in thread
From: Jim Quinlan @ 2022-05-28 22:44 UTC (permalink / raw)
  To: linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas, james.dutton,
	Cyril Brulebois, bcm-kernel-feedback-list, jim2101024,
	james.quinlan
  Cc: Florian Fainelli, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list

commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")

introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
root port DT node described in [2] was missing, no linkup would be attempted,
and subsequent accesses would cause a panic because this particular PCIe HW
causes a CPU abort on illegal accesses (instead of returning 0xffffffff).

We fix this by allowing the DT root port node to be missing, as it behaved
before the original patchset messed things up.

In addition, two small changes are made:

  1. Having pci_subdev_regulators_remove_bus() call
     regulator_bulk_free() in addtion to regulator_bulk_disable().
  2. Having brcm_pcie_add_bus() return 0 if there is an
     error in calling pci_subdev_regulators_add_bus().
     Instead, we dev_err() and turn on our refusal mode instead.

It would be best if this commit were tested by someone with a Rpi CM4
platform, as that is how the regression was found.  I have only emulated
the problem and fix on different platform.

Note that a bisection identified

commit 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")

as the first failing commit.  This commit is a regression, but is unrelated
and was fixed by a subsequent commit in the original patchset.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
[2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml

Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
---
 drivers/pci/controller/pcie-brcmstb.c | 43 +++++++++++++++++++--------
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
index ba5c120816b2..0839325f79ab 100644
--- a/drivers/pci/controller/pcie-brcmstb.c
+++ b/drivers/pci/controller/pcie-brcmstb.c
@@ -540,29 +540,42 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus)
 
 static int brcm_pcie_add_bus(struct pci_bus *bus)
 {
-	struct device *dev = &bus->dev;
-	struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata;
+	struct brcm_pcie *pcie;
 	int ret;
 
-	if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent))
+	/*
+	 * Right now we only alloc/enable regulators and initiate pcie link
+	 * when under the root port bus of the current domain.  In the
+	 * future we may want to alloc/enable regulators under any port
+	 * device (e.g. a switch).
+	 */
+	if (!bus->parent || !pci_is_root_bus(bus->parent))
 		return 0;
 
 	ret = pci_subdev_regulators_add_bus(bus);
-	if (ret)
-		return ret;
+	if (ret) {
+		dev_err(pcie->dev, "failed to alloc/enable regulators\n");
+		goto err;
+	}
 
-	/* Grab the regulators for suspend/resume */
+	/* Save the regulators for RC suspend/resume */
+	pcie = (struct brcm_pcie *) bus->sysdata;
 	pcie->sr = bus->dev.driver_data;
 
+	/* Attempt PCIe link-up */
+	if (brcm_pcie_linkup(pcie) == 0)
+		return 0;
+err:
 	/*
-	 * If we have failed linkup there is no point to return an error as
-	 * currently it will cause a WARNING() from pci_alloc_child_bus().
-	 * We return 0 and turn on the "refusal_mode" so that any further
-	 * accesses to the pci_dev just get 0xffffffff
+	 * If we have failed linkup or have an error when turning on
+	 * regulators, there is no point to return an error value to the
+	 * caller (pci_alloc_child_bus()) as it will summarily execute a
+	 * WARNING().  Instead, we turn on our "refusal_mode" and return 0
+	 * so that any further PCIe accesses succeed but do nothing (reads
+	 * return 0xffffffff).  If we do not turn on refusal mode, our
+	 * unforgiving PCIe HW will signal a CPU abort.
 	 */
-	if (brcm_pcie_linkup(pcie) != 0)
-		pcie->refusal_mode = true;
-
+	pcie->refusal_mode = true;
 	return 0;
 }
 
@@ -570,13 +583,17 @@ static void pci_subdev_regulators_remove_bus(struct pci_bus *bus)
 {
 	struct device *dev = &bus->dev;
 	struct subdev_regulators *sr = dev->driver_data;
+	struct brcm_pcie *pcie;
 
 	if (!sr || !bus->parent || !pci_is_root_bus(bus->parent))
 		return;
 
 	if (regulator_bulk_disable(sr->num_supplies, sr->supplies))
 		dev_err(dev, "failed to disable regulators for downstream device\n");
+	regulator_bulk_free(sr->num_supplies, sr->supplies);
 	dev->driver_data = NULL;
+	pcie = (struct brcm_pcie *) bus->sysdata;
+	pcie->sr = NULL;
 }
 
 /* Limits operation to a specific generation (1, 2, or 3) */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
@ 2022-05-28 22:44   ` Jim Quinlan
  0 siblings, 0 replies; 20+ messages in thread
From: Jim Quinlan @ 2022-05-28 22:44 UTC (permalink / raw)
  To: linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas, james.dutton,
	Cyril Brulebois, bcm-kernel-feedback-list, jim2101024,
	james.quinlan
  Cc: Florian Fainelli, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list

commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")

introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
root port DT node described in [2] was missing, no linkup would be attempted,
and subsequent accesses would cause a panic because this particular PCIe HW
causes a CPU abort on illegal accesses (instead of returning 0xffffffff).

We fix this by allowing the DT root port node to be missing, as it behaved
before the original patchset messed things up.

In addition, two small changes are made:

  1. Having pci_subdev_regulators_remove_bus() call
     regulator_bulk_free() in addtion to regulator_bulk_disable().
  2. Having brcm_pcie_add_bus() return 0 if there is an
     error in calling pci_subdev_regulators_add_bus().
     Instead, we dev_err() and turn on our refusal mode instead.

It would be best if this commit were tested by someone with a Rpi CM4
platform, as that is how the regression was found.  I have only emulated
the problem and fix on different platform.

Note that a bisection identified

commit 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")

as the first failing commit.  This commit is a regression, but is unrelated
and was fixed by a subsequent commit in the original patchset.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
[2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml

Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
---
 drivers/pci/controller/pcie-brcmstb.c | 43 +++++++++++++++++++--------
 1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
index ba5c120816b2..0839325f79ab 100644
--- a/drivers/pci/controller/pcie-brcmstb.c
+++ b/drivers/pci/controller/pcie-brcmstb.c
@@ -540,29 +540,42 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus)
 
 static int brcm_pcie_add_bus(struct pci_bus *bus)
 {
-	struct device *dev = &bus->dev;
-	struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata;
+	struct brcm_pcie *pcie;
 	int ret;
 
-	if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent))
+	/*
+	 * Right now we only alloc/enable regulators and initiate pcie link
+	 * when under the root port bus of the current domain.  In the
+	 * future we may want to alloc/enable regulators under any port
+	 * device (e.g. a switch).
+	 */
+	if (!bus->parent || !pci_is_root_bus(bus->parent))
 		return 0;
 
 	ret = pci_subdev_regulators_add_bus(bus);
-	if (ret)
-		return ret;
+	if (ret) {
+		dev_err(pcie->dev, "failed to alloc/enable regulators\n");
+		goto err;
+	}
 
-	/* Grab the regulators for suspend/resume */
+	/* Save the regulators for RC suspend/resume */
+	pcie = (struct brcm_pcie *) bus->sysdata;
 	pcie->sr = bus->dev.driver_data;
 
+	/* Attempt PCIe link-up */
+	if (brcm_pcie_linkup(pcie) == 0)
+		return 0;
+err:
 	/*
-	 * If we have failed linkup there is no point to return an error as
-	 * currently it will cause a WARNING() from pci_alloc_child_bus().
-	 * We return 0 and turn on the "refusal_mode" so that any further
-	 * accesses to the pci_dev just get 0xffffffff
+	 * If we have failed linkup or have an error when turning on
+	 * regulators, there is no point to return an error value to the
+	 * caller (pci_alloc_child_bus()) as it will summarily execute a
+	 * WARNING().  Instead, we turn on our "refusal_mode" and return 0
+	 * so that any further PCIe accesses succeed but do nothing (reads
+	 * return 0xffffffff).  If we do not turn on refusal mode, our
+	 * unforgiving PCIe HW will signal a CPU abort.
 	 */
-	if (brcm_pcie_linkup(pcie) != 0)
-		pcie->refusal_mode = true;
-
+	pcie->refusal_mode = true;
 	return 0;
 }
 
@@ -570,13 +583,17 @@ static void pci_subdev_regulators_remove_bus(struct pci_bus *bus)
 {
 	struct device *dev = &bus->dev;
 	struct subdev_regulators *sr = dev->driver_data;
+	struct brcm_pcie *pcie;
 
 	if (!sr || !bus->parent || !pci_is_root_bus(bus->parent))
 		return;
 
 	if (regulator_bulk_disable(sr->num_supplies, sr->supplies))
 		dev_err(dev, "failed to disable regulators for downstream device\n");
+	regulator_bulk_free(sr->num_supplies, sr->supplies);
 	dev->driver_data = NULL;
+	pcie = (struct brcm_pcie *) bus->sysdata;
+	pcie->sr = NULL;
 }
 
 /* Limits operation to a specific generation (1, 2, or 3) */
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
  2022-05-28 22:44   ` Jim Quinlan
@ 2022-05-29  1:15     ` Cyril Brulebois
  -1 siblings, 0 replies; 20+ messages in thread
From: Cyril Brulebois @ 2022-05-29  1:15 UTC (permalink / raw)
  To: Jim Quinlan
  Cc: linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas, james.dutton,
	bcm-kernel-feedback-list, james.quinlan, Florian Fainelli,
	Lorenzo Pieralisi, Rob Herring, Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list

[-- Attachment #1: Type: text/plain, Size: 6986 bytes --]

Hi Jim,

Jim Quinlan <jim2101024@gmail.com> (2022-05-28):
> commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> 
> introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
> root port DT node described in [2] was missing, no linkup would be attempted,
> and subsequent accesses would cause a panic because this particular PCIe HW
> causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
> 
> We fix this by allowing the DT root port node to be missing, as it behaved
> before the original patchset messed things up.
> 
> In addition, two small changes are made:
> 
>   1. Having pci_subdev_regulators_remove_bus() call
>      regulator_bulk_free() in addtion to regulator_bulk_disable().
>   2. Having brcm_pcie_add_bus() return 0 if there is an
>      error in calling pci_subdev_regulators_add_bus().
>      Instead, we dev_err() and turn on our refusal mode instead.
> 
> It would be best if this commit were tested by someone with a Rpi CM4
> platform, as that is how the regression was found.  I have only emulated
> the problem and fix on different platform.

Testing is less flawless than it was with the earlier version, but this
might be related to the fact master has moved a lot since then (from
v5.18-rcX to open merge window).

Overall, it's still a net win over the status quo (broken boot).


Applying your patch on 664a393a2663a0f62fc1b18157ccae33dcdbb8c8 and
performing cold boots is mostly fine:
 - without anything on the PCIe slot;
 - with a PCIe→quad-USB extension board, a USB keyboard and a USB stick
   (both work fine).

However, with an empty PCIe slot, I'm no longer able to perform the
following (which was rock solid, and has been used in all my testing up
to now):
 - boot the exact same Debian stable image as before (running v5.10.y if
   that matters);
 - deploy the patched kernel;
 - enable serial console;
 - reboot into the patched kernel.

PCI-related messages, a call trace, and broken storage:

    [    3.425331] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
    [    3.425353] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
    [    3.425388] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x0603ffffff -> 0x00f8000000
    [    3.425420] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x003fffffff -> 0x0400000000
    [    3.426229] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
    [    3.426243] pci_bus 0000:00: root bus resource [bus 00-ff]
    [    3.426255] pci_bus 0000:00: root bus resource [mem 0x600000000-0x603ffffff] (bus address [0xf8000000-0xfbffffff])
    [    3.426303] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
    [    3.426398] pci 0000:00:00.0: PME# supported from D0 D3hot
    [    3.428797] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
    [    3.745909] brcm-pcie fd500000.pcie: link down
    [    3.747915] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
    [    3.747944] pci 0000:00:00.0: PCI bridge to [bus 01]
    [    3.748294] pcieport 0000:00:00.0: PME: Signaling with IRQ 23
    [    3.748691] pcieport 0000:00:00.0: AER: enabled with IRQ 23
    [    3.749201] pci_bus 0000:01: busn_res: [bus 01] is released
    [    3.749462] pci_bus 0000:00: busn_res: [bus 00-ff] is released
    …
    [    5.617308] irq 35: nobody cared (try booting with the "irqpoll" option)
    [    5.617335] CPU: 0 PID: 127 Comm: systemd-udevd Not tainted 5.18.0+ #1
    [    5.617350] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
    [    5.617358] Call trace:
    [    5.617362]  dump_backtrace+0xc0/0x130
    [    5.617386]  show_stack+0x24/0x70
    [    5.617396]  dump_stack_lvl+0x68/0x84
    [    5.617415]  dump_stack+0x18/0x34
    [    5.617426]  __report_bad_irq+0x54/0x16c
    [    5.617436]  note_interrupt+0x324/0x41c
    [    5.617445]  handle_irq_event+0xc0/0x180
    [    5.617460]  handle_fasteoi_irq+0xc8/0x1fc
    [    5.617468]  generic_handle_domain_irq+0x38/0x50
    [    5.617481]  gic_handle_irq+0x68/0xa0
    [    5.617489]  call_on_irq_stack+0x2c/0x60
    [    5.617500]  do_interrupt_handler+0x88/0x90
    [    5.617511]  el0_interrupt+0x58/0x124
    [    5.617526]  __el0_irq_handler_common+0x18/0x2c
    [    5.617538]  el0t_64_irq_handler+0x10/0x20
    [    5.617549]  el0t_64_irq+0x18c/0x190
    [    5.617558] handlers:
    [    5.617563] [<(____ptrval____)>] sdhci_irq [sdhci] threaded [<(____ptrval____)>] sdhci_thread_irq [sdhci]
    [    5.617613] Disabling IRQ #35
    …
    [   15.581894] mmc0: Timeout waiting for hardware cmd interrupt.
    [   15.581914] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
    [   15.581920] mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
    [   15.581931] mmc0: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
    [   15.581937] mmc0: sdhci: Argument:  0x00000c00 | Trn mode: 0x00000000
    [   15.581944] mmc0: sdhci: Present:   0x1fff0000 | Host ctl: 0x00000001
    [   15.581951] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000080
    [   15.581957] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00007d07
    [   15.581964] mmc0: sdhci: Timeout:   0x00000000 | Int stat: 0x00018000
    [   15.581971] mmc0: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
    [   15.581976] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
    [   15.581982] mmc0: sdhci: Caps:      0x45ee6432 | Caps_1:   0x0000a525
    [   15.581988] mmc0: sdhci: Cmd:       0x0000341a | Max curr: 0x00080008
    [   15.581996] mmc0: sdhci: Resp[0]:   0x00000000 | Resp[1]:  0x00000000
    [   15.582001] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
    [   15.582005] mmc0: sdhci: Host ctl2: 0x00000000
    [   15.582011] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x00000000
    [   15.582016] mmc0: sdhci: ============================================

This last part gets repeated over and over, and storage (external SD
card) never comes up. I can share fuller logs if that's desirable. I can
also test booting with irqpoll if that's desirable. Or anything else that
might help.


I did check that applying the same patch on top of the v5.18 tag gives
good results (cold boots and reboots are fine, with or without an empty
PCIe slot, as that was the case during earlier test sessions), so I'd
guess something changed since then, and makes reboots more brittle than
they used to be.

I can also check applying the v1 patch on top of master and compare
results, to give a different perspective.

But I'd also be happy to get some directions as to which test(s) would
be most beneficial, which would help me cut down on combinatorics.


Cheers,
-- 
Cyril Brulebois (kibi@debian.org)            <https://debamax.com/>
D-I release manager -- Release team member -- Freelance Consultant

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
@ 2022-05-29  1:15     ` Cyril Brulebois
  0 siblings, 0 replies; 20+ messages in thread
From: Cyril Brulebois @ 2022-05-29  1:15 UTC (permalink / raw)
  To: Jim Quinlan
  Cc: linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas, james.dutton,
	bcm-kernel-feedback-list, james.quinlan, Florian Fainelli,
	Lorenzo Pieralisi, Rob Herring, Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list


[-- Attachment #1.1: Type: text/plain, Size: 6986 bytes --]

Hi Jim,

Jim Quinlan <jim2101024@gmail.com> (2022-05-28):
> commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> 
> introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
> root port DT node described in [2] was missing, no linkup would be attempted,
> and subsequent accesses would cause a panic because this particular PCIe HW
> causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
> 
> We fix this by allowing the DT root port node to be missing, as it behaved
> before the original patchset messed things up.
> 
> In addition, two small changes are made:
> 
>   1. Having pci_subdev_regulators_remove_bus() call
>      regulator_bulk_free() in addtion to regulator_bulk_disable().
>   2. Having brcm_pcie_add_bus() return 0 if there is an
>      error in calling pci_subdev_regulators_add_bus().
>      Instead, we dev_err() and turn on our refusal mode instead.
> 
> It would be best if this commit were tested by someone with a Rpi CM4
> platform, as that is how the regression was found.  I have only emulated
> the problem and fix on different platform.

Testing is less flawless than it was with the earlier version, but this
might be related to the fact master has moved a lot since then (from
v5.18-rcX to open merge window).

Overall, it's still a net win over the status quo (broken boot).


Applying your patch on 664a393a2663a0f62fc1b18157ccae33dcdbb8c8 and
performing cold boots is mostly fine:
 - without anything on the PCIe slot;
 - with a PCIe→quad-USB extension board, a USB keyboard and a USB stick
   (both work fine).

However, with an empty PCIe slot, I'm no longer able to perform the
following (which was rock solid, and has been used in all my testing up
to now):
 - boot the exact same Debian stable image as before (running v5.10.y if
   that matters);
 - deploy the patched kernel;
 - enable serial console;
 - reboot into the patched kernel.

PCI-related messages, a call trace, and broken storage:

    [    3.425331] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
    [    3.425353] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
    [    3.425388] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x0603ffffff -> 0x00f8000000
    [    3.425420] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x003fffffff -> 0x0400000000
    [    3.426229] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
    [    3.426243] pci_bus 0000:00: root bus resource [bus 00-ff]
    [    3.426255] pci_bus 0000:00: root bus resource [mem 0x600000000-0x603ffffff] (bus address [0xf8000000-0xfbffffff])
    [    3.426303] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
    [    3.426398] pci 0000:00:00.0: PME# supported from D0 D3hot
    [    3.428797] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
    [    3.745909] brcm-pcie fd500000.pcie: link down
    [    3.747915] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
    [    3.747944] pci 0000:00:00.0: PCI bridge to [bus 01]
    [    3.748294] pcieport 0000:00:00.0: PME: Signaling with IRQ 23
    [    3.748691] pcieport 0000:00:00.0: AER: enabled with IRQ 23
    [    3.749201] pci_bus 0000:01: busn_res: [bus 01] is released
    [    3.749462] pci_bus 0000:00: busn_res: [bus 00-ff] is released
    …
    [    5.617308] irq 35: nobody cared (try booting with the "irqpoll" option)
    [    5.617335] CPU: 0 PID: 127 Comm: systemd-udevd Not tainted 5.18.0+ #1
    [    5.617350] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
    [    5.617358] Call trace:
    [    5.617362]  dump_backtrace+0xc0/0x130
    [    5.617386]  show_stack+0x24/0x70
    [    5.617396]  dump_stack_lvl+0x68/0x84
    [    5.617415]  dump_stack+0x18/0x34
    [    5.617426]  __report_bad_irq+0x54/0x16c
    [    5.617436]  note_interrupt+0x324/0x41c
    [    5.617445]  handle_irq_event+0xc0/0x180
    [    5.617460]  handle_fasteoi_irq+0xc8/0x1fc
    [    5.617468]  generic_handle_domain_irq+0x38/0x50
    [    5.617481]  gic_handle_irq+0x68/0xa0
    [    5.617489]  call_on_irq_stack+0x2c/0x60
    [    5.617500]  do_interrupt_handler+0x88/0x90
    [    5.617511]  el0_interrupt+0x58/0x124
    [    5.617526]  __el0_irq_handler_common+0x18/0x2c
    [    5.617538]  el0t_64_irq_handler+0x10/0x20
    [    5.617549]  el0t_64_irq+0x18c/0x190
    [    5.617558] handlers:
    [    5.617563] [<(____ptrval____)>] sdhci_irq [sdhci] threaded [<(____ptrval____)>] sdhci_thread_irq [sdhci]
    [    5.617613] Disabling IRQ #35
    …
    [   15.581894] mmc0: Timeout waiting for hardware cmd interrupt.
    [   15.581914] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
    [   15.581920] mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
    [   15.581931] mmc0: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
    [   15.581937] mmc0: sdhci: Argument:  0x00000c00 | Trn mode: 0x00000000
    [   15.581944] mmc0: sdhci: Present:   0x1fff0000 | Host ctl: 0x00000001
    [   15.581951] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000080
    [   15.581957] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00007d07
    [   15.581964] mmc0: sdhci: Timeout:   0x00000000 | Int stat: 0x00018000
    [   15.581971] mmc0: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
    [   15.581976] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
    [   15.581982] mmc0: sdhci: Caps:      0x45ee6432 | Caps_1:   0x0000a525
    [   15.581988] mmc0: sdhci: Cmd:       0x0000341a | Max curr: 0x00080008
    [   15.581996] mmc0: sdhci: Resp[0]:   0x00000000 | Resp[1]:  0x00000000
    [   15.582001] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
    [   15.582005] mmc0: sdhci: Host ctl2: 0x00000000
    [   15.582011] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x00000000
    [   15.582016] mmc0: sdhci: ============================================

This last part gets repeated over and over, and storage (external SD
card) never comes up. I can share fuller logs if that's desirable. I can
also test booting with irqpoll if that's desirable. Or anything else that
might help.


I did check that applying the same patch on top of the v5.18 tag gives
good results (cold boots and reboots are fine, with or without an empty
PCIe slot, as that was the case during earlier test sessions), so I'd
guess something changed since then, and makes reboots more brittle than
they used to be.

I can also check applying the v1 patch on top of master and compare
results, to give a different perspective.

But I'd also be happy to get some directions as to which test(s) would
be most beneficial, which would help me cut down on combinatorics.


Cheers,
-- 
Cyril Brulebois (kibi@debian.org)            <https://debamax.com/>
D-I release manager -- Release team member -- Freelance Consultant

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
  2022-05-29  1:15     ` Cyril Brulebois
@ 2022-05-29  2:11       ` Jim Quinlan
  -1 siblings, 0 replies; 20+ messages in thread
From: Jim Quinlan @ 2022-05-29  2:11 UTC (permalink / raw)
  To: Cyril Brulebois
  Cc: linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas, James Dutton,
	bcm-kernel-feedback-list, Jim Quinlan, Florian Fainelli,
	Lorenzo Pieralisi, Rob Herring, Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list

On Sat, May 28, 2022 at 9:15 PM Cyril Brulebois <kibi@debian.org> wrote:
>
> Hi Jim,
>
> Jim Quinlan <jim2101024@gmail.com> (2022-05-28):
> > commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")vpcie3v3-supply
> >
> > introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
> > root port DT node described in [2] was missing, no linkup would be attempted,
> > and subsequent accesses would cause a panic because this particular PCIe HW
> > causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
> >
> > We fix this by allowing the DT root port node to be missing, as it behaved
> > before the original patchset messed things up.
> >
> > In addition, two small changes are made:
> >
> >   1. Having pci_subdev_regulators_remove_bus() call
> >      regulator_bulk_free() in addtion to regulator_bulk_disable().
> >   2. Having brcm_pcie_add_bus() return 0 if there is an
> >      error in calling pci_subdev_regulators_add_bus().
> >      Instead, we dev_err() and turn on our refusal mode instead.
> >
> > It would be best if this commit were tested by someone with a Rpi CM4
> > platform, as that is how the regression was found.  I have only emulated
> > the problem and fix on different platform.
>
> Testing is less flawless than it was with the earlier version, but this
> might be related to the fact master has moved a lot since then (from
> v5.18-rcX to open merge window).
>
> Overall, it's still a net win over the status quo (broken boot).
>
>
> Applying your patch on 664a393a2663a0f62fc1b18157ccae33dcdbb8c8 and
> performing cold boots is mostly fine:
>  - without anything on the PCIe slot;
>  - with a PCIe→quad-USB extension board, a USB keyboard and a USB stick
>    (both work fine).
>
> However, with an empty PCIe slot, I'm no longer able to perform the
> following (which was rock solid, and has been used in all my testing up
> to now):
>  - boot the exact same Debian stable image as before (running v5.10.y if
>    that matters);
>  - deploy the patched kernel;
>  - enable serial console;
>  - reboot into the patched kernel.
Hi Cyril,

Thanks for the quick response.  As you may have guessed our CM4 did
not arrive yet although we did get the CM4 IO
board.  I don't know if you have the bandwidth to try one or both of
these tests:

(1) Same experiment except remove the V2 commit and use my V1 commit.
Unless this is exactly what you tried before.
(2) Same experiment except commenting out the call I made to
regulator_bulk_free().


>
> PCI-related messages, a call trace, and broken storage:
>
>     [    3.425331] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
>     [    3.425353] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
>     [    3.425388] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x0603ffffff -> 0x00f8000000
>     [    3.425420] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x003fffffff -> 0x0400000000
>     [    3.426229] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
>     [    3.426243] pci_bus 0000:00: root bus resource [bus 00-ff]
>     [    3.426255] pci_bus 0000:00: root bus resource [mem 0x600000000-0x603ffffff] (bus address [0xf8000000-0xfbffffff])
>     [    3.426303] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
>     [    3.426398] pci 0000:00:00.0: PME# supported from D0 D3hot
>     [    3.428797] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
>     [    3.745909] brcm-pcie fd500000.pcie: link down
>     [    3.747915] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
>     [    3.747944] pci 0000:00:00.0: PCI bridge to [bus 01]
>     [    3.748294] pcieport 0000:00:00.0: PME: Signaling with IRQ 23
>     [    3.748691] pcieport 0000:00:00.0: AER: enabled with IRQ 23
>     [    3.749201] pci_bus 0000:01: busn_res: [bus 01] is released
>     [    3.749462] pci_bus 0000:00: busn_res: [bus 00-ff] is released
>     …
Does "..." here mean you removed some lines or that it hung?  If you
removed lines can you please post the
full bootlog?  I do not need to see the mmc0 sdhci errors though.

>     [    5.617308] irq 35: nobody cared (try booting with the "irqpoll" option)
>     [    5.617335] CPU: 0 PID: 127 Comm: systemd-udevd Not tainted 5.18.0+ #1
>     [    5.617350] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
>     [    5.617358] Call trace:
>     [    5.617362]  dump_backtrace+0xc0/0x130
>     [    5.617386]  show_stack+0x24/0x70
>     [    5.617396]  dump_stack_lvl+0x68/0x84
>     [    5.617415]  dump_stack+0x18/0x34
>     [    5.617426]  __report_bad_irq+0x54/0x16c
>     [    5.617436]  note_interrupt+0x324/0x41c
>     [    5.617445]  handle_irq_event+0xc0/0x180
>     [    5.617460]  handle_fasteoi_irq+0xc8/0x1fc
>     [    5.617468]  generic_handle_domain_irq+0x38/0x50
>     [    5.617481]  gic_handle_irq+0x68/0xa0
>     [    5.617489]  call_on_irq_stack+0x2c/0x60
>     [    5.617500]  do_interrupt_handler+0x88/0x90
>     [    5.617511]  el0_interrupt+0x58/0x124
>     [    5.617526]  __el0_irq_handler_common+0x18/0x2c
>     [    5.617538]  el0t_64_irq_handler+0x10/0x20
>     [    5.617549]  el0t_64_irq+0x18c/0x190
>     [    5.617558] handlers:
>     [    5.617563] [<(____ptrval____)>] sdhci_irq [sdhci] threaded [<(____ptrval____)>] sdhci_thread_irq [sdhci]
>     [    5.617613] Disabling IRQ #35
>     …
>     [   15.581894] mmc0: Timeout waiting for hardware cmd interrupt.
>     [   15.581914] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>     [   15.581920] mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
>     [   15.581931] mmc0: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
>     [   15.581937] mmc0: sdhci: Argument:  0x00000c00 | Trn mode: 0x00000000
>     [   15.581944] mmc0: sdhci: Present:   0x1fff0000 | Host ctl: 0x00000001
>     [   15.581951] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000080
>     [   15.581957] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00007d07
>     [   15.581964] mmc0: sdhci: Timeout:   0x00000000 | Int stat: 0x00018000
>     [   15.581971] mmc0: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
>     [   15.581976] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
>     [   15.581982] mmc0: sdhci: Caps:      0x45ee6432 | Caps_1:   0x0000a525
>     [   15.581988] mmc0: sdhci: Cmd:       0x0000341a | Max curr: 0x00080008
>     [   15.581996] mmc0: sdhci: Resp[0]:   0x00000000 | Resp[1]:  0x00000000
>     [   15.582001] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
>     [   15.582005] mmc0: sdhci: Host ctl2: 0x00000000
>     [   15.582011] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x00000000
>     [   15.582016] mmc0: sdhci: ============================================
>
> This last part gets repeated over and over, and storage (external SD
> card) never comes up. I can share fuller logs if that's desirable. I can
the
full bootlog?

>     [    5.617308] irq 35: nobody cared (try booting with the "irqpoll" option)
>     [    5.617335] CPU: 0 PID: 127 Comm: systemd-udevd Not tainted 5.18.0+ #1
>     [    5.617350] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
>     [    5.617358] Call trace:
>     [    5.617362]  dump_backtrace+0xc0/0x130
>     [    5.617386]  show_stack+0x24/0x70
>     [    5.617396]  dump_stack_lvl+0x68/0x84
>     [    5.617415]  dump_stack+0x18/0x34
>     [    5.617426]  __report_bad_irq+0x54/0x16c
>     [    5.617436]  note_interrupt+0x324/0x41c
>     [    5.617445]  handle_irq_event+0xc0/0x180
>     [    5.617460]  handle_fasteoi_irq+0xc8/0x1fc
>     [    5.617468]  generic_handle_domain_irq+0x38/0x50
>     [    5.617481]  gic_handle_irq+0x68/0xa0
>     [    5.617489]  call_on_irq_stack+0x2c/0x60
>     [    5.617500]  do_interrupt_handler+0x88/0x90
>     [    5.617511]  el0_interrupt+0x58/0x124
>     [    5.617526]  __el0_irq_handler_common+0x18/0x2c
>     [    5.617538]  el0t_64_irq_handler+0x10/0x20
>     [    5.617549]  el0t_64_irq+0x18c/0x190
>     [    5.617558] handlers:
>     [    5.617563] [<(____ptrval____)>] sdhci_irq [sdhci] threaded [<(____ptrval____)>] sdhci_thread_irq [sdhci]
>     [    5.617613] Disabling IRQ #35
>     …
>     [   15.581894] mmc0: Timeout waiting for hardware cmd interrupt.
>     [   15.581914] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>     [   15.581920] mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
>     [   15.581931] mmc0: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
>     [   15.581937] mmc0: sdhci: Argument:  0x00000c00 | Trn mode: 0x00000000
>     [   15.581944] mmc0: sdhci: Present:   0x1fff0000 | Host ctl: 0x00000001
>     [   15.581951] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000080
>     [   15.581957] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00007d07
>     [   15.581964] mmc0: sdhci: Timeout:   0x00000000 | Int stat: 0x00018000
>     [   15.581971] mmc0: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
>     [   15.581976] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
>     [   15.581982] mmc0: sdhci: Caps:      0x45ee6432 | Caps_1:   0x0000a525
>     [   15.581988] mmc0: sdhci: Cmd:       0x0000341a | Max curr: 0x00080008
>     [   15.581996] mmc0: sdhci: Resp[0]:   0x00000000 | Resp[1]:  0x00000000
>     [   15.582001] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
>     [   15.582005] mmc0: sdhci: Host ctl2: 0x00000000
>     [   15.582011] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x00000000
>     [   15.582016] mmc0: sdhci: ============================================
>> also test booting with irqpoll if that's desirable. Or anything else that
> might help.
I have seen this before and I do not think it is related to my V2
commit. Nonetheless, let us
assume that it is related until we can prove otherwise.

>
>
> I did check that applying the same patch on top of the v5.18 tag gives
> good results (cold boots and reboots are fine, with or without an empty
> PCIe slot, as that was the case during earlier test sessions), so I'd
> guess something changed since then, and makes reboots more brittle than
> they used to be.
Okay, that's why I'd like to see if my v1 works and the v2 doesn't.  That
will tell me a lot.

Kind regards,
Jim Quinlan
Broadcom STB
>
> I can also check applying the v1 patch on top of master and compare
> results, to give a different perspective.
>
> But I'd also be happy to get some directions as to which test(s) would
> be most beneficial, which would help me cut down on combinatorics.
>
>
> Cheers,
> --
> Cyril Brulebois (kibi@debian.org)            <https://debamax.com/>
> D-I release manager -- Release team member -- Freelance Consultant

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
@ 2022-05-29  2:11       ` Jim Quinlan
  0 siblings, 0 replies; 20+ messages in thread
From: Jim Quinlan @ 2022-05-29  2:11 UTC (permalink / raw)
  To: Cyril Brulebois
  Cc: linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas, James Dutton,
	bcm-kernel-feedback-list, Jim Quinlan, Florian Fainelli,
	Lorenzo Pieralisi, Rob Herring, Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list

On Sat, May 28, 2022 at 9:15 PM Cyril Brulebois <kibi@debian.org> wrote:
>
> Hi Jim,
>
> Jim Quinlan <jim2101024@gmail.com> (2022-05-28):
> > commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")vpcie3v3-supply
> >
> > introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
> > root port DT node described in [2] was missing, no linkup would be attempted,
> > and subsequent accesses would cause a panic because this particular PCIe HW
> > causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
> >
> > We fix this by allowing the DT root port node to be missing, as it behaved
> > before the original patchset messed things up.
> >
> > In addition, two small changes are made:
> >
> >   1. Having pci_subdev_regulators_remove_bus() call
> >      regulator_bulk_free() in addtion to regulator_bulk_disable().
> >   2. Having brcm_pcie_add_bus() return 0 if there is an
> >      error in calling pci_subdev_regulators_add_bus().
> >      Instead, we dev_err() and turn on our refusal mode instead.
> >
> > It would be best if this commit were tested by someone with a Rpi CM4
> > platform, as that is how the regression was found.  I have only emulated
> > the problem and fix on different platform.
>
> Testing is less flawless than it was with the earlier version, but this
> might be related to the fact master has moved a lot since then (from
> v5.18-rcX to open merge window).
>
> Overall, it's still a net win over the status quo (broken boot).
>
>
> Applying your patch on 664a393a2663a0f62fc1b18157ccae33dcdbb8c8 and
> performing cold boots is mostly fine:
>  - without anything on the PCIe slot;
>  - with a PCIe→quad-USB extension board, a USB keyboard and a USB stick
>    (both work fine).
>
> However, with an empty PCIe slot, I'm no longer able to perform the
> following (which was rock solid, and has been used in all my testing up
> to now):
>  - boot the exact same Debian stable image as before (running v5.10.y if
>    that matters);
>  - deploy the patched kernel;
>  - enable serial console;
>  - reboot into the patched kernel.
Hi Cyril,

Thanks for the quick response.  As you may have guessed our CM4 did
not arrive yet although we did get the CM4 IO
board.  I don't know if you have the bandwidth to try one or both of
these tests:

(1) Same experiment except remove the V2 commit and use my V1 commit.
Unless this is exactly what you tried before.
(2) Same experiment except commenting out the call I made to
regulator_bulk_free().


>
> PCI-related messages, a call trace, and broken storage:
>
>     [    3.425331] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
>     [    3.425353] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
>     [    3.425388] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x0603ffffff -> 0x00f8000000
>     [    3.425420] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x003fffffff -> 0x0400000000
>     [    3.426229] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
>     [    3.426243] pci_bus 0000:00: root bus resource [bus 00-ff]
>     [    3.426255] pci_bus 0000:00: root bus resource [mem 0x600000000-0x603ffffff] (bus address [0xf8000000-0xfbffffff])
>     [    3.426303] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
>     [    3.426398] pci 0000:00:00.0: PME# supported from D0 D3hot
>     [    3.428797] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
>     [    3.745909] brcm-pcie fd500000.pcie: link down
>     [    3.747915] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
>     [    3.747944] pci 0000:00:00.0: PCI bridge to [bus 01]
>     [    3.748294] pcieport 0000:00:00.0: PME: Signaling with IRQ 23
>     [    3.748691] pcieport 0000:00:00.0: AER: enabled with IRQ 23
>     [    3.749201] pci_bus 0000:01: busn_res: [bus 01] is released
>     [    3.749462] pci_bus 0000:00: busn_res: [bus 00-ff] is released
>     …
Does "..." here mean you removed some lines or that it hung?  If you
removed lines can you please post the
full bootlog?  I do not need to see the mmc0 sdhci errors though.

>     [    5.617308] irq 35: nobody cared (try booting with the "irqpoll" option)
>     [    5.617335] CPU: 0 PID: 127 Comm: systemd-udevd Not tainted 5.18.0+ #1
>     [    5.617350] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
>     [    5.617358] Call trace:
>     [    5.617362]  dump_backtrace+0xc0/0x130
>     [    5.617386]  show_stack+0x24/0x70
>     [    5.617396]  dump_stack_lvl+0x68/0x84
>     [    5.617415]  dump_stack+0x18/0x34
>     [    5.617426]  __report_bad_irq+0x54/0x16c
>     [    5.617436]  note_interrupt+0x324/0x41c
>     [    5.617445]  handle_irq_event+0xc0/0x180
>     [    5.617460]  handle_fasteoi_irq+0xc8/0x1fc
>     [    5.617468]  generic_handle_domain_irq+0x38/0x50
>     [    5.617481]  gic_handle_irq+0x68/0xa0
>     [    5.617489]  call_on_irq_stack+0x2c/0x60
>     [    5.617500]  do_interrupt_handler+0x88/0x90
>     [    5.617511]  el0_interrupt+0x58/0x124
>     [    5.617526]  __el0_irq_handler_common+0x18/0x2c
>     [    5.617538]  el0t_64_irq_handler+0x10/0x20
>     [    5.617549]  el0t_64_irq+0x18c/0x190
>     [    5.617558] handlers:
>     [    5.617563] [<(____ptrval____)>] sdhci_irq [sdhci] threaded [<(____ptrval____)>] sdhci_thread_irq [sdhci]
>     [    5.617613] Disabling IRQ #35
>     …
>     [   15.581894] mmc0: Timeout waiting for hardware cmd interrupt.
>     [   15.581914] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>     [   15.581920] mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
>     [   15.581931] mmc0: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
>     [   15.581937] mmc0: sdhci: Argument:  0x00000c00 | Trn mode: 0x00000000
>     [   15.581944] mmc0: sdhci: Present:   0x1fff0000 | Host ctl: 0x00000001
>     [   15.581951] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000080
>     [   15.581957] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00007d07
>     [   15.581964] mmc0: sdhci: Timeout:   0x00000000 | Int stat: 0x00018000
>     [   15.581971] mmc0: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
>     [   15.581976] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
>     [   15.581982] mmc0: sdhci: Caps:      0x45ee6432 | Caps_1:   0x0000a525
>     [   15.581988] mmc0: sdhci: Cmd:       0x0000341a | Max curr: 0x00080008
>     [   15.581996] mmc0: sdhci: Resp[0]:   0x00000000 | Resp[1]:  0x00000000
>     [   15.582001] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
>     [   15.582005] mmc0: sdhci: Host ctl2: 0x00000000
>     [   15.582011] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x00000000
>     [   15.582016] mmc0: sdhci: ============================================
>
> This last part gets repeated over and over, and storage (external SD
> card) never comes up. I can share fuller logs if that's desirable. I can
the
full bootlog?

>     [    5.617308] irq 35: nobody cared (try booting with the "irqpoll" option)
>     [    5.617335] CPU: 0 PID: 127 Comm: systemd-udevd Not tainted 5.18.0+ #1
>     [    5.617350] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
>     [    5.617358] Call trace:
>     [    5.617362]  dump_backtrace+0xc0/0x130
>     [    5.617386]  show_stack+0x24/0x70
>     [    5.617396]  dump_stack_lvl+0x68/0x84
>     [    5.617415]  dump_stack+0x18/0x34
>     [    5.617426]  __report_bad_irq+0x54/0x16c
>     [    5.617436]  note_interrupt+0x324/0x41c
>     [    5.617445]  handle_irq_event+0xc0/0x180
>     [    5.617460]  handle_fasteoi_irq+0xc8/0x1fc
>     [    5.617468]  generic_handle_domain_irq+0x38/0x50
>     [    5.617481]  gic_handle_irq+0x68/0xa0
>     [    5.617489]  call_on_irq_stack+0x2c/0x60
>     [    5.617500]  do_interrupt_handler+0x88/0x90
>     [    5.617511]  el0_interrupt+0x58/0x124
>     [    5.617526]  __el0_irq_handler_common+0x18/0x2c
>     [    5.617538]  el0t_64_irq_handler+0x10/0x20
>     [    5.617549]  el0t_64_irq+0x18c/0x190
>     [    5.617558] handlers:
>     [    5.617563] [<(____ptrval____)>] sdhci_irq [sdhci] threaded [<(____ptrval____)>] sdhci_thread_irq [sdhci]
>     [    5.617613] Disabling IRQ #35
>     …
>     [   15.581894] mmc0: Timeout waiting for hardware cmd interrupt.
>     [   15.581914] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>     [   15.581920] mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
>     [   15.581931] mmc0: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
>     [   15.581937] mmc0: sdhci: Argument:  0x00000c00 | Trn mode: 0x00000000
>     [   15.581944] mmc0: sdhci: Present:   0x1fff0000 | Host ctl: 0x00000001
>     [   15.581951] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000080
>     [   15.581957] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00007d07
>     [   15.581964] mmc0: sdhci: Timeout:   0x00000000 | Int stat: 0x00018000
>     [   15.581971] mmc0: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
>     [   15.581976] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
>     [   15.581982] mmc0: sdhci: Caps:      0x45ee6432 | Caps_1:   0x0000a525
>     [   15.581988] mmc0: sdhci: Cmd:       0x0000341a | Max curr: 0x00080008
>     [   15.581996] mmc0: sdhci: Resp[0]:   0x00000000 | Resp[1]:  0x00000000
>     [   15.582001] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
>     [   15.582005] mmc0: sdhci: Host ctl2: 0x00000000
>     [   15.582011] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x00000000
>     [   15.582016] mmc0: sdhci: ============================================
>> also test booting with irqpoll if that's desirable. Or anything else that
> might help.
I have seen this before and I do not think it is related to my V2
commit. Nonetheless, let us
assume that it is related until we can prove otherwise.

>
>
> I did check that applying the same patch on top of the v5.18 tag gives
> good results (cold boots and reboots are fine, with or without an empty
> PCIe slot, as that was the case during earlier test sessions), so I'd
> guess something changed since then, and makes reboots more brittle than
> they used to be.
Okay, that's why I'd like to see if my v1 works and the v2 doesn't.  That
will tell me a lot.

Kind regards,
Jim Quinlan
Broadcom STB
>
> I can also check applying the v1 patch on top of master and compare
> results, to give a different perspective.
>
> But I'd also be happy to get some directions as to which test(s) would
> be most beneficial, which would help me cut down on combinatorics.
>
>
> Cheers,
> --
> Cyril Brulebois (kibi@debian.org)            <https://debamax.com/>
> D-I release manager -- Release team member -- Freelance Consultant

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
  2022-05-28 22:44   ` Jim Quinlan
@ 2022-05-29 16:52     ` Jim Quinlan
  -1 siblings, 0 replies; 20+ messages in thread
From: Jim Quinlan @ 2022-05-29 16:52 UTC (permalink / raw)
  To: linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas, James Dutton,
	Cyril Brulebois, bcm-kernel-feedback-list, Jim Quinlan,
	Jim Quinlan, Thorsten Leemhuis
  Cc: Florian Fainelli, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list

On Sat, May 28, 2022 at 6:44 PM Jim Quinlan <jim2101024@gmail.com> wrote:
>
> commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")

> Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
> root port DT node described in [2] was missing, no linkup would be attempted,
> and subsequent accesses would cause a panic because this particular PCIe HW
> causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
>
> We fix this by allowing the DT root port node to be missing, as it behaved
> before the original patchset messed things up.
>
> In addition, two small changes are made:
>
>   1. Having pci_subdev_regulators_remove_bus() call
>      regulator_bulk_free() in addtion to regulator_bulk_disable().
>   2. Having brcm_pcie_add_bus() return 0 if there is an
>      error in calling pci_subdev_regulators_add_bus().
>      Instead, we dev_err() and turn on our refusal mode instead.
>
> It would be best if this commit were tested by someone with a Rpi CM4
> platform, as that is how the regression was found.  I have only emulated
> the problem and fix on different platform.
>
> Note that a bisection identified
>
> commit 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
>
> as the first failing commit.  This commit is a regression, but is unrelated
> and was fixed by a subsequent commit in the original patchset.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
> [2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
>
> Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
Thorston -- I forgot to replace the bugzilla link; I'll get it on V3.  -- Jim

> Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> ---
>  drivers/pci/controller/pcie-brcmstb.c | 43 +++++++++++++++++++--------
>  1 file changed, 30 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> index ba5c120816b2..0839325f79ab 100644
> --- a/drivers/pci/controller/pcie-brcmstb.c
> +++ b/drivers/pci/controller/pcie-brcmstb.c
> @@ -540,29 +540,42 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus)
>
>  static int brcm_pcie_add_bus(struct pci_bus *bus)
>  {
> -       struct device *dev = &bus->dev;
> -       struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata;
> +       struct brcm_pcie *pcie;
>         int ret;
>
> -       if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent))
> +       /*
> +        * Right now we only alloc/enable regulators and initiate pcie link
> +        * when under the root port bus of the current domain.  In the
> +        * future we may want to alloc/enable regulators under any port
> +        * device (e.g. a switch).
> +        */
> +       if (!bus->parent || !pci_is_root_bus(bus->parent))
>                 return 0;
>
>         ret = pci_subdev_regulators_add_bus(bus);
> -       if (ret)
> -               return ret;
> +       if (ret) {
> +               dev_err(pcie->dev, "failed to alloc/enable regulators\n");
> +               goto err;
> +       }
>
> -       /* Grab the regulators for suspend/resume */
> +       /* Save the regulators for RC suspend/resume */
> +       pcie = (struct brcm_pcie *) bus->sysdata;
>         pcie->sr = bus->dev.driver_data;
>
> +       /* Attempt PCIe link-up */
> +       if (brcm_pcie_linkup(pcie) == 0)
> +               return 0;
> +err:
>         /*
> -        * If we have failed linkup there is no point to return an error as
> -        * currently it will cause a WARNING() from pci_alloc_child_bus().
> -        * We return 0 and turn on the "refusal_mode" so that any further
> -        * accesses to the pci_dev just get 0xffffffff
> +        * If we have failed linkup or have an error when turning on
> +        * regulators, there is no point to return an error value to the
> +        * caller (pci_alloc_child_bus()) as it will summarily execute a
> +        * WARNING().  Instead, we turn on our "refusal_mode" and return 0
> +        * so that any further PCIe accesses succeed but do nothing (reads
> +        * return 0xffffffff).  If we do not turn on refusal mode, our
> +        * unforgiving PCIe HW will signal a CPU abort.
>          */
> -       if (brcm_pcie_linkup(pcie) != 0)
> -               pcie->refusal_mode = true;
> -
> +       pcie->refusal_mode = true;
>         return 0;
>  }
>
> @@ -570,13 +583,17 @@ static void pci_subdev_regulators_remove_bus(struct pci_bus *bus)
>  {
>         struct device *dev = &bus->dev;
>         struct subdev_regulators *sr = dev->driver_data;
> +       struct brcm_pcie *pcie;
>
>         if (!sr || !bus->parent || !pci_is_root_bus(bus->parent))
>                 return;
>
>         if (regulator_bulk_disable(sr->num_supplies, sr->supplies))
>                 dev_err(dev, "failed to disable regulators for downstream device\n");
> +       regulator_bulk_free(sr->num_supplies, sr->supplies);
>         dev->driver_data = NULL;
> +       pcie = (struct brcm_pcie *) bus->sysdata;
> +       pcie->sr = NULL;
>  }
>
>  /* Limits operation to a specific generation (1, 2, or 3) */
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
@ 2022-05-29 16:52     ` Jim Quinlan
  0 siblings, 0 replies; 20+ messages in thread
From: Jim Quinlan @ 2022-05-29 16:52 UTC (permalink / raw)
  To: linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas, James Dutton,
	Cyril Brulebois, bcm-kernel-feedback-list, Jim Quinlan,
	Jim Quinlan, Thorsten Leemhuis
  Cc: Florian Fainelli, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list

On Sat, May 28, 2022 at 6:44 PM Jim Quinlan <jim2101024@gmail.com> wrote:
>
> commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")

> Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
> root port DT node described in [2] was missing, no linkup would be attempted,
> and subsequent accesses would cause a panic because this particular PCIe HW
> causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
>
> We fix this by allowing the DT root port node to be missing, as it behaved
> before the original patchset messed things up.
>
> In addition, two small changes are made:
>
>   1. Having pci_subdev_regulators_remove_bus() call
>      regulator_bulk_free() in addtion to regulator_bulk_disable().
>   2. Having brcm_pcie_add_bus() return 0 if there is an
>      error in calling pci_subdev_regulators_add_bus().
>      Instead, we dev_err() and turn on our refusal mode instead.
>
> It would be best if this commit were tested by someone with a Rpi CM4
> platform, as that is how the regression was found.  I have only emulated
> the problem and fix on different platform.
>
> Note that a bisection identified
>
> commit 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
>
> as the first failing commit.  This commit is a regression, but is unrelated
> and was fixed by a subsequent commit in the original patchset.
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
> [2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
>
> Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
Thorston -- I forgot to replace the bugzilla link; I'll get it on V3.  -- Jim

> Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
> ---
>  drivers/pci/controller/pcie-brcmstb.c | 43 +++++++++++++++++++--------
>  1 file changed, 30 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> index ba5c120816b2..0839325f79ab 100644
> --- a/drivers/pci/controller/pcie-brcmstb.c
> +++ b/drivers/pci/controller/pcie-brcmstb.c
> @@ -540,29 +540,42 @@ static int pci_subdev_regulators_add_bus(struct pci_bus *bus)
>
>  static int brcm_pcie_add_bus(struct pci_bus *bus)
>  {
> -       struct device *dev = &bus->dev;
> -       struct brcm_pcie *pcie = (struct brcm_pcie *) bus->sysdata;
> +       struct brcm_pcie *pcie;
>         int ret;
>
> -       if (!dev->of_node || !bus->parent || !pci_is_root_bus(bus->parent))
> +       /*
> +        * Right now we only alloc/enable regulators and initiate pcie link
> +        * when under the root port bus of the current domain.  In the
> +        * future we may want to alloc/enable regulators under any port
> +        * device (e.g. a switch).
> +        */
> +       if (!bus->parent || !pci_is_root_bus(bus->parent))
>                 return 0;
>
>         ret = pci_subdev_regulators_add_bus(bus);
> -       if (ret)
> -               return ret;
> +       if (ret) {
> +               dev_err(pcie->dev, "failed to alloc/enable regulators\n");
> +               goto err;
> +       }
>
> -       /* Grab the regulators for suspend/resume */
> +       /* Save the regulators for RC suspend/resume */
> +       pcie = (struct brcm_pcie *) bus->sysdata;
>         pcie->sr = bus->dev.driver_data;
>
> +       /* Attempt PCIe link-up */
> +       if (brcm_pcie_linkup(pcie) == 0)
> +               return 0;
> +err:
>         /*
> -        * If we have failed linkup there is no point to return an error as
> -        * currently it will cause a WARNING() from pci_alloc_child_bus().
> -        * We return 0 and turn on the "refusal_mode" so that any further
> -        * accesses to the pci_dev just get 0xffffffff
> +        * If we have failed linkup or have an error when turning on
> +        * regulators, there is no point to return an error value to the
> +        * caller (pci_alloc_child_bus()) as it will summarily execute a
> +        * WARNING().  Instead, we turn on our "refusal_mode" and return 0
> +        * so that any further PCIe accesses succeed but do nothing (reads
> +        * return 0xffffffff).  If we do not turn on refusal mode, our
> +        * unforgiving PCIe HW will signal a CPU abort.
>          */
> -       if (brcm_pcie_linkup(pcie) != 0)
> -               pcie->refusal_mode = true;
> -
> +       pcie->refusal_mode = true;
>         return 0;
>  }
>
> @@ -570,13 +583,17 @@ static void pci_subdev_regulators_remove_bus(struct pci_bus *bus)
>  {
>         struct device *dev = &bus->dev;
>         struct subdev_regulators *sr = dev->driver_data;
> +       struct brcm_pcie *pcie;
>
>         if (!sr || !bus->parent || !pci_is_root_bus(bus->parent))
>                 return;
>
>         if (regulator_bulk_disable(sr->num_supplies, sr->supplies))
>                 dev_err(dev, "failed to disable regulators for downstream device\n");
> +       regulator_bulk_free(sr->num_supplies, sr->supplies);
>         dev->driver_data = NULL;
> +       pcie = (struct brcm_pcie *) bus->sysdata;
> +       pcie->sr = NULL;
>  }
>
>  /* Limits operation to a specific generation (1, 2, or 3) */
> --
> 2.17.1
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
  2022-05-29 16:52     ` Jim Quinlan
@ 2022-05-29 17:36       ` Thorsten Leemhuis
  -1 siblings, 0 replies; 20+ messages in thread
From: Thorsten Leemhuis @ 2022-05-29 17:36 UTC (permalink / raw)
  To: Jim Quinlan, linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas,
	James Dutton, Cyril Brulebois, bcm-kernel-feedback-list,
	Jim Quinlan
  Cc: Florian Fainelli, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list



On 29.05.22 18:52, Jim Quinlan wrote:
> On Sat, May 28, 2022 at 6:44 PM Jim Quinlan <jim2101024@gmail.com> wrote:
>>
>> commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> 
>> Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
>> introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
>> root port DT node described in [2] was missing, no linkup would be attempted,
>> and subsequent accesses would cause a panic because this particular PCIe HW
>> causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
>>
>> We fix this by allowing the DT root port node to be missing, as it behaved
>> before the original patchset messed things up.
>>
>> In addition, two small changes are made:
>>
>>   1. Having pci_subdev_regulators_remove_bus() call
>>      regulator_bulk_free() in addtion to regulator_bulk_disable().
>>   2. Having brcm_pcie_add_bus() return 0 if there is an
>>      error in calling pci_subdev_regulators_add_bus().
>>      Instead, we dev_err() and turn on our refusal mode instead.
>>
>> It would be best if this commit were tested by someone with a Rpi CM4
>> platform, as that is how the regression was found.  I have only emulated
>> the problem and fix on different platform.
>>
>> Note that a bisection identified
>>
>> commit 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
>>
>> as the first failing commit.  This commit is a regression, but is unrelated
>> and was fixed by a subsequent commit in the original patchset.
>>
>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
>> [2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
>>
>> Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
>> Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
> Thorston -- I forgot to replace the bugzilla link; I'll get it on V3.  -- Jim

Don't worry to much about these details that might matter for regzbot
normally (the automatic handling of links to bugzilla ticket is sadly
dysfunctional currently anyway). Just getting the issue fixed in the
not-to-distant future is what I mainly care about. :-D

Ciao
Thorsten

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
@ 2022-05-29 17:36       ` Thorsten Leemhuis
  0 siblings, 0 replies; 20+ messages in thread
From: Thorsten Leemhuis @ 2022-05-29 17:36 UTC (permalink / raw)
  To: Jim Quinlan, linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas,
	James Dutton, Cyril Brulebois, bcm-kernel-feedback-list,
	Jim Quinlan
  Cc: Florian Fainelli, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list



On 29.05.22 18:52, Jim Quinlan wrote:
> On Sat, May 28, 2022 at 6:44 PM Jim Quinlan <jim2101024@gmail.com> wrote:
>>
>> commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
> 
>> Signed-off-by: Jim Quinlan <jim2101024@gmail.com>
>> introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
>> root port DT node described in [2] was missing, no linkup would be attempted,
>> and subsequent accesses would cause a panic because this particular PCIe HW
>> causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
>>
>> We fix this by allowing the DT root port node to be missing, as it behaved
>> before the original patchset messed things up.
>>
>> In addition, two small changes are made:
>>
>>   1. Having pci_subdev_regulators_remove_bus() call
>>      regulator_bulk_free() in addtion to regulator_bulk_disable().
>>   2. Having brcm_pcie_add_bus() return 0 if there is an
>>      error in calling pci_subdev_regulators_add_bus().
>>      Instead, we dev_err() and turn on our refusal mode instead.
>>
>> It would be best if this commit were tested by someone with a Rpi CM4
>> platform, as that is how the regression was found.  I have only emulated
>> the problem and fix on different platform.
>>
>> Note that a bisection identified
>>
>> commit 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
>>
>> as the first failing commit.  This commit is a regression, but is unrelated
>> and was fixed by a subsequent commit in the original patchset.
>>
>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=215925
>> [2] Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
>>
>> Fixes: 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
>> Fixes: 830aa6f29f07 ("PCI: brcmstb: Split brcm_pcie_setup() into two funcs")
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215925
> Thorston -- I forgot to replace the bugzilla link; I'll get it on V3.  -- Jim

Don't worry to much about these details that might matter for regzbot
normally (the automatic handling of links to bugzilla ticket is sadly
dysfunctional currently anyway). Just getting the issue fixed in the
not-to-distant future is what I mainly care about. :-D

Ciao
Thorsten

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
  2022-05-29  1:15     ` Cyril Brulebois
@ 2022-05-30 10:12       ` Stefan Wahren
  -1 siblings, 0 replies; 20+ messages in thread
From: Stefan Wahren @ 2022-05-30 10:12 UTC (permalink / raw)
  To: Cyril Brulebois, Jim Quinlan
  Cc: linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas, james.dutton,
	bcm-kernel-feedback-list, james.quinlan, Florian Fainelli,
	Lorenzo Pieralisi, Rob Herring, Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list

Hi Cyril,

Am 29.05.22 um 03:15 schrieb Cyril Brulebois:
> Hi Jim,
>
> Jim Quinlan <jim2101024@gmail.com> (2022-05-28):
>> commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
>>
>> introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
>> root port DT node described in [2] was missing, no linkup would be attempted,
>> and subsequent accesses would cause a panic because this particular PCIe HW
>> causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
>>
>> We fix this by allowing the DT root port node to be missing, as it behaved
>> before the original patchset messed things up.
>>
>> In addition, two small changes are made:
>>
>>    1. Having pci_subdev_regulators_remove_bus() call
>>       regulator_bulk_free() in addtion to regulator_bulk_disable().
>>    2. Having brcm_pcie_add_bus() return 0 if there is an
>>       error in calling pci_subdev_regulators_add_bus().
>>       Instead, we dev_err() and turn on our refusal mode instead.
>>
>> It would be best if this commit were tested by someone with a Rpi CM4
>> platform, as that is how the regression was found.  I have only emulated
>> the problem and fix on different platform.
> Testing is less flawless than it was with the earlier version, but this
> might be related to the fact master has moved a lot since then (from
> v5.18-rcX to open merge window).
>
> Overall, it's still a net win over the status quo (broken boot).
>
>
> Applying your patch on 664a393a2663a0f62fc1b18157ccae33dcdbb8c8 and
> performing cold boots is mostly fine:
>   - without anything on the PCIe slot;
>   - with a PCIe→quad-USB extension board, a USB keyboard and a USB stick
>     (both work fine).
>
> However, with an empty PCIe slot, I'm no longer able to perform the
> following (which was rock solid, and has been used in all my testing up
> to now):
>   - boot the exact same Debian stable image as before (running v5.10.y if
>     that matters);
>   - deploy the patched kernel;
>   - enable serial console;
>   - reboot into the patched kernel.
>
> PCI-related messages, a call trace, and broken storage:
>
>      [    3.425331] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
>      [    3.425353] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
>      [    3.425388] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x0603ffffff -> 0x00f8000000
>      [    3.425420] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x003fffffff -> 0x0400000000
>      [    3.426229] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
>      [    3.426243] pci_bus 0000:00: root bus resource [bus 00-ff]
>      [    3.426255] pci_bus 0000:00: root bus resource [mem 0x600000000-0x603ffffff] (bus address [0xf8000000-0xfbffffff])
>      [    3.426303] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
>      [    3.426398] pci 0000:00:00.0: PME# supported from D0 D3hot
>      [    3.428797] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
>      [    3.745909] brcm-pcie fd500000.pcie: link down
>      [    3.747915] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
>      [    3.747944] pci 0000:00:00.0: PCI bridge to [bus 01]
>      [    3.748294] pcieport 0000:00:00.0: PME: Signaling with IRQ 23
>      [    3.748691] pcieport 0000:00:00.0: AER: enabled with IRQ 23
>      [    3.749201] pci_bus 0000:01: busn_res: [bus 01] is released
>      [    3.749462] pci_bus 0000:00: busn_res: [bus 00-ff] is released
>      …
>      [    5.617308] irq 35: nobody cared (try booting with the "irqpoll" option)
>      [    5.617335] CPU: 0 PID: 127 Comm: systemd-udevd Not tainted 5.18.0+ #1
>      [    5.617350] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
>      [    5.617358] Call trace:
>      [    5.617362]  dump_backtrace+0xc0/0x130
>      [    5.617386]  show_stack+0x24/0x70
>      [    5.617396]  dump_stack_lvl+0x68/0x84
>      [    5.617415]  dump_stack+0x18/0x34
>      [    5.617426]  __report_bad_irq+0x54/0x16c
>      [    5.617436]  note_interrupt+0x324/0x41c
>      [    5.617445]  handle_irq_event+0xc0/0x180
>      [    5.617460]  handle_fasteoi_irq+0xc8/0x1fc
>      [    5.617468]  generic_handle_domain_irq+0x38/0x50
>      [    5.617481]  gic_handle_irq+0x68/0xa0
>      [    5.617489]  call_on_irq_stack+0x2c/0x60
>      [    5.617500]  do_interrupt_handler+0x88/0x90
>      [    5.617511]  el0_interrupt+0x58/0x124
>      [    5.617526]  __el0_irq_handler_common+0x18/0x2c
>      [    5.617538]  el0t_64_irq_handler+0x10/0x20
>      [    5.617549]  el0t_64_irq+0x18c/0x190
>      [    5.617558] handlers:
>      [    5.617563] [<(____ptrval____)>] sdhci_irq [sdhci] threaded [<(____ptrval____)>] sdhci_thread_irq [sdhci]
>      [    5.617613] Disabling IRQ #35
>      …
>      [   15.581894] mmc0: Timeout waiting for hardware cmd interrupt.
>      [   15.581914] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>      [   15.581920] mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
>      [   15.581931] mmc0: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
>      [   15.581937] mmc0: sdhci: Argument:  0x00000c00 | Trn mode: 0x00000000
>      [   15.581944] mmc0: sdhci: Present:   0x1fff0000 | Host ctl: 0x00000001
>      [   15.581951] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000080
>      [   15.581957] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00007d07
>      [   15.581964] mmc0: sdhci: Timeout:   0x00000000 | Int stat: 0x00018000
>      [   15.581971] mmc0: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
>      [   15.581976] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
>      [   15.581982] mmc0: sdhci: Caps:      0x45ee6432 | Caps_1:   0x0000a525
>      [   15.581988] mmc0: sdhci: Cmd:       0x0000341a | Max curr: 0x00080008
>      [   15.581996] mmc0: sdhci: Resp[0]:   0x00000000 | Resp[1]:  0x00000000
>      [   15.582001] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
>      [   15.582005] mmc0: sdhci: Host ctl2: 0x00000000
>      [   15.582011] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x00000000
>      [   15.582016] mmc0: sdhci: ============================================
>
> This last part gets repeated over and over, and storage (external SD
> card) never comes up. I can share fuller logs if that's desirable. I can
> also test booting with irqpoll if that's desirable. Or anything else that
> might help.
>
>
> I did check that applying the same patch on top of the v5.18 tag gives
> good results (cold boots and reboots are fine, with or without an empty
> PCIe slot, as that was the case during earlier test sessions), so I'd
> guess something changed since then, and makes reboots more brittle than
> they used to be.

i think we should better trust the results based on the v5.18 tag. 
During the merge window, regressions from other subsystems are possible.

Best regards

>
> I can also check applying the v1 patch on top of master and compare
> results, to give a different perspective.
>
> But I'd also be happy to get some directions as to which test(s) would
> be most beneficial, which would help me cut down on combinatorics.
>
>
> Cheers,
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
@ 2022-05-30 10:12       ` Stefan Wahren
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Wahren @ 2022-05-30 10:12 UTC (permalink / raw)
  To: Cyril Brulebois, Jim Quinlan
  Cc: linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas, james.dutton,
	bcm-kernel-feedback-list, james.quinlan, Florian Fainelli,
	Lorenzo Pieralisi, Rob Herring, Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list

Hi Cyril,

Am 29.05.22 um 03:15 schrieb Cyril Brulebois:
> Hi Jim,
>
> Jim Quinlan <jim2101024@gmail.com> (2022-05-28):
>> commit 93e41f3fca3d ("PCI: brcmstb: Add control of subdevice voltage regulators")
>>
>> introduced a regression on the PCIe RPi4 Compute Module.  If the PCIe
>> root port DT node described in [2] was missing, no linkup would be attempted,
>> and subsequent accesses would cause a panic because this particular PCIe HW
>> causes a CPU abort on illegal accesses (instead of returning 0xffffffff).
>>
>> We fix this by allowing the DT root port node to be missing, as it behaved
>> before the original patchset messed things up.
>>
>> In addition, two small changes are made:
>>
>>    1. Having pci_subdev_regulators_remove_bus() call
>>       regulator_bulk_free() in addtion to regulator_bulk_disable().
>>    2. Having brcm_pcie_add_bus() return 0 if there is an
>>       error in calling pci_subdev_regulators_add_bus().
>>       Instead, we dev_err() and turn on our refusal mode instead.
>>
>> It would be best if this commit were tested by someone with a Rpi CM4
>> platform, as that is how the regression was found.  I have only emulated
>> the problem and fix on different platform.
> Testing is less flawless than it was with the earlier version, but this
> might be related to the fact master has moved a lot since then (from
> v5.18-rcX to open merge window).
>
> Overall, it's still a net win over the status quo (broken boot).
>
>
> Applying your patch on 664a393a2663a0f62fc1b18157ccae33dcdbb8c8 and
> performing cold boots is mostly fine:
>   - without anything on the PCIe slot;
>   - with a PCIe→quad-USB extension board, a USB keyboard and a USB stick
>     (both work fine).
>
> However, with an empty PCIe slot, I'm no longer able to perform the
> following (which was rock solid, and has been used in all my testing up
> to now):
>   - boot the exact same Debian stable image as before (running v5.10.y if
>     that matters);
>   - deploy the patched kernel;
>   - enable serial console;
>   - reboot into the patched kernel.
>
> PCI-related messages, a call trace, and broken storage:
>
>      [    3.425331] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
>      [    3.425353] brcm-pcie fd500000.pcie:   No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
>      [    3.425388] brcm-pcie fd500000.pcie:      MEM 0x0600000000..0x0603ffffff -> 0x00f8000000
>      [    3.425420] brcm-pcie fd500000.pcie:   IB MEM 0x0000000000..0x003fffffff -> 0x0400000000
>      [    3.426229] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
>      [    3.426243] pci_bus 0000:00: root bus resource [bus 00-ff]
>      [    3.426255] pci_bus 0000:00: root bus resource [mem 0x600000000-0x603ffffff] (bus address [0xf8000000-0xfbffffff])
>      [    3.426303] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
>      [    3.426398] pci 0000:00:00.0: PME# supported from D0 D3hot
>      [    3.428797] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
>      [    3.745909] brcm-pcie fd500000.pcie: link down
>      [    3.747915] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
>      [    3.747944] pci 0000:00:00.0: PCI bridge to [bus 01]
>      [    3.748294] pcieport 0000:00:00.0: PME: Signaling with IRQ 23
>      [    3.748691] pcieport 0000:00:00.0: AER: enabled with IRQ 23
>      [    3.749201] pci_bus 0000:01: busn_res: [bus 01] is released
>      [    3.749462] pci_bus 0000:00: busn_res: [bus 00-ff] is released
>      …
>      [    5.617308] irq 35: nobody cared (try booting with the "irqpoll" option)
>      [    5.617335] CPU: 0 PID: 127 Comm: systemd-udevd Not tainted 5.18.0+ #1
>      [    5.617350] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
>      [    5.617358] Call trace:
>      [    5.617362]  dump_backtrace+0xc0/0x130
>      [    5.617386]  show_stack+0x24/0x70
>      [    5.617396]  dump_stack_lvl+0x68/0x84
>      [    5.617415]  dump_stack+0x18/0x34
>      [    5.617426]  __report_bad_irq+0x54/0x16c
>      [    5.617436]  note_interrupt+0x324/0x41c
>      [    5.617445]  handle_irq_event+0xc0/0x180
>      [    5.617460]  handle_fasteoi_irq+0xc8/0x1fc
>      [    5.617468]  generic_handle_domain_irq+0x38/0x50
>      [    5.617481]  gic_handle_irq+0x68/0xa0
>      [    5.617489]  call_on_irq_stack+0x2c/0x60
>      [    5.617500]  do_interrupt_handler+0x88/0x90
>      [    5.617511]  el0_interrupt+0x58/0x124
>      [    5.617526]  __el0_irq_handler_common+0x18/0x2c
>      [    5.617538]  el0t_64_irq_handler+0x10/0x20
>      [    5.617549]  el0t_64_irq+0x18c/0x190
>      [    5.617558] handlers:
>      [    5.617563] [<(____ptrval____)>] sdhci_irq [sdhci] threaded [<(____ptrval____)>] sdhci_thread_irq [sdhci]
>      [    5.617613] Disabling IRQ #35
>      …
>      [   15.581894] mmc0: Timeout waiting for hardware cmd interrupt.
>      [   15.581914] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
>      [   15.581920] mmc0: sdhci: Sys addr:  0x00000000 | Version:  0x00001002
>      [   15.581931] mmc0: sdhci: Blk size:  0x00000000 | Blk cnt:  0x00000000
>      [   15.581937] mmc0: sdhci: Argument:  0x00000c00 | Trn mode: 0x00000000
>      [   15.581944] mmc0: sdhci: Present:   0x1fff0000 | Host ctl: 0x00000001
>      [   15.581951] mmc0: sdhci: Power:     0x0000000f | Blk gap:  0x00000080
>      [   15.581957] mmc0: sdhci: Wake-up:   0x00000000 | Clock:    0x00007d07
>      [   15.581964] mmc0: sdhci: Timeout:   0x00000000 | Int stat: 0x00018000
>      [   15.581971] mmc0: sdhci: Int enab:  0x00ff1003 | Sig enab: 0x00ff1003
>      [   15.581976] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
>      [   15.581982] mmc0: sdhci: Caps:      0x45ee6432 | Caps_1:   0x0000a525
>      [   15.581988] mmc0: sdhci: Cmd:       0x0000341a | Max curr: 0x00080008
>      [   15.581996] mmc0: sdhci: Resp[0]:   0x00000000 | Resp[1]:  0x00000000
>      [   15.582001] mmc0: sdhci: Resp[2]:   0x00000000 | Resp[3]:  0x00000000
>      [   15.582005] mmc0: sdhci: Host ctl2: 0x00000000
>      [   15.582011] mmc0: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x00000000
>      [   15.582016] mmc0: sdhci: ============================================
>
> This last part gets repeated over and over, and storage (external SD
> card) never comes up. I can share fuller logs if that's desirable. I can
> also test booting with irqpoll if that's desirable. Or anything else that
> might help.
>
>
> I did check that applying the same patch on top of the v5.18 tag gives
> good results (cold boots and reboots are fine, with or without an empty
> PCIe slot, as that was the case during earlier test sessions), so I'd
> guess something changed since then, and makes reboots more brittle than
> they used to be.

i think we should better trust the results based on the v5.18 tag. 
During the merge window, regressions from other subsystems are possible.

Best regards

>
> I can also check applying the v1 patch on top of master and compare
> results, to give a different perspective.
>
> But I'd also be happy to get some directions as to which test(s) would
> be most beneficial, which would help me cut down on combinatorics.
>
>
> Cheers,
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
  2022-05-30 10:12       ` Stefan Wahren
@ 2022-06-02 19:17         ` Cyril Brulebois
  -1 siblings, 0 replies; 20+ messages in thread
From: Cyril Brulebois @ 2022-06-02 19:17 UTC (permalink / raw)
  To: Stefan Wahren
  Cc: Jim Quinlan, linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas,
	james.dutton, bcm-kernel-feedback-list, james.quinlan,
	Florian Fainelli, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list

[-- Attachment #1: Type: text/plain, Size: 946 bytes --]

Hi Stefan,

Stefan Wahren <stefan.wahren@i2se.com> (2022-05-30):
> i think we should better trust the results based on the v5.18 tag. During
> the merge window, regressions from other subsystems are possible.

Alright, that looks like a great plan.

Before getting your answer, I had tried taking the reboot part out of
the equation, but I found out that even cold boots might fail with the
mmc storage.

I haven't been able to conduct a systematic testing of the patch on top
of the v5.18 tag yet (say, 10 or 20 cold boots, and the same with
reboots) due to strong work constraints these past few days, but that's
definitely still on my short term todo list (hopefully before the end of
the week).

Sorry I didn't manage to get that lined up before Bjorn's pull request.


Cheers,
-- 
Cyril Brulebois (kibi@debian.org)            <https://debamax.com/>
D-I release manager -- Release team member -- Freelance Consultant

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
@ 2022-06-02 19:17         ` Cyril Brulebois
  0 siblings, 0 replies; 20+ messages in thread
From: Cyril Brulebois @ 2022-06-02 19:17 UTC (permalink / raw)
  To: Stefan Wahren
  Cc: Jim Quinlan, linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas,
	james.dutton, bcm-kernel-feedback-list, james.quinlan,
	Florian Fainelli, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list


[-- Attachment #1.1: Type: text/plain, Size: 946 bytes --]

Hi Stefan,

Stefan Wahren <stefan.wahren@i2se.com> (2022-05-30):
> i think we should better trust the results based on the v5.18 tag. During
> the merge window, regressions from other subsystems are possible.

Alright, that looks like a great plan.

Before getting your answer, I had tried taking the reboot part out of
the equation, but I found out that even cold boots might fail with the
mmc storage.

I haven't been able to conduct a systematic testing of the patch on top
of the v5.18 tag yet (say, 10 or 20 cold boots, and the same with
reboots) due to strong work constraints these past few days, but that's
definitely still on my short term todo list (hopefully before the end of
the week).

Sorry I didn't manage to get that lined up before Bjorn's pull request.


Cheers,
-- 
Cyril Brulebois (kibi@debian.org)            <https://debamax.com/>
D-I release manager -- Release team member -- Freelance Consultant

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
  2022-06-02 19:17         ` Cyril Brulebois
@ 2022-06-02 20:05           ` Bjorn Helgaas
  -1 siblings, 0 replies; 20+ messages in thread
From: Bjorn Helgaas @ 2022-06-02 20:05 UTC (permalink / raw)
  To: Cyril Brulebois
  Cc: Stefan Wahren, Jim Quinlan, linux-pci, Nicolas Saenz Julienne,
	Bjorn Helgaas, james.dutton, bcm-kernel-feedback-list,
	james.quinlan, Florian Fainelli, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list

On Thu, Jun 02, 2022 at 09:17:57PM +0200, Cyril Brulebois wrote:

> ...
> Sorry I didn't manage to get that lined up before Bjorn's pull request.

No rush, if I had been on the ball, I would have done the reverts
earlier so v5.18 didn't release with the regression.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
@ 2022-06-02 20:05           ` Bjorn Helgaas
  0 siblings, 0 replies; 20+ messages in thread
From: Bjorn Helgaas @ 2022-06-02 20:05 UTC (permalink / raw)
  To: Cyril Brulebois
  Cc: Stefan Wahren, Jim Quinlan, linux-pci, Nicolas Saenz Julienne,
	Bjorn Helgaas, james.dutton, bcm-kernel-feedback-list,
	james.quinlan, Florian Fainelli, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list

On Thu, Jun 02, 2022 at 09:17:57PM +0200, Cyril Brulebois wrote:

> ...
> Sorry I didn't manage to get that lined up before Bjorn's pull request.

No rush, if I had been on the ball, I would have done the reverts
earlier so v5.18 didn't release with the regression.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
  2022-06-02 19:17         ` Cyril Brulebois
@ 2022-06-04 20:59           ` Jim Quinlan
  -1 siblings, 0 replies; 20+ messages in thread
From: Jim Quinlan @ 2022-06-04 20:59 UTC (permalink / raw)
  To: Cyril Brulebois
  Cc: Stefan Wahren, linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas,
	James Dutton, bcm-kernel-feedback-list, Jim Quinlan,
	Florian Fainelli, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list

On Thu, Jun 2, 2022 at 3:18 PM Cyril Brulebois <kibi@debian.org> wrote:
>stm32mp157c-ev1.dts
> Hi Stefan,
>
> Stefan Wahren <stefan.wahren@i2se.com> (2022-05-30):
> > i think we should better trust the results based on the v5.18 tag. During
> > the merge window, regressions from other subsystems are possible.
>
> Alright, that looks like a great plan.
>
> Before getting your answer, I had tried taking the reboot part out of
> the equation, but I found out that even cold boots might fail with the
> mmc storage.

Hi Cyril,

FWIW, I can deliberately reproduce the errors you observed by using an
incorrect "interrupts" property for
the sdhci device's DT node.  It can also be triggered by removing its
"clocks" property.

However, if I do one of the above,  the error will occur on every
boot, but in your case (I think) you are seeing it
sporadically.  So that is peculiar.  I've looked at the recent
upstream commits for changes in the  sdhci driver
and also the relevant DT node and do not see anything obvious that
might cause this.

BTW,  when you observe this error, can you please do a  "cat
/proc/interrupts" and post the results?

Thanks & regards,
Jim Quinlan
Broadcom STB


>
> I haven't been able to conduct a systematic testing of the patch on top sdhci driver or the relevant
DT node
> of the v5.18 tag yet (say, 10 or 20 cold boots, and the same with
> reboots) due to strong work constraints these past few days, but that's
> definitely still on my short term todo list (hopefully before the end of
> the week).
>
> Sorry I didn't manage to get that lined up before Bjorn's pull request.
>
>
> Cheers,
> --
> Cyril Brulebois (kibi@debian.org)            <https://debamax.com/>
> D-I release manager -- Release team member -- Freelance Consultant

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup
@ 2022-06-04 20:59           ` Jim Quinlan
  0 siblings, 0 replies; 20+ messages in thread
From: Jim Quinlan @ 2022-06-04 20:59 UTC (permalink / raw)
  To: Cyril Brulebois
  Cc: Stefan Wahren, linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas,
	James Dutton, bcm-kernel-feedback-list, Jim Quinlan,
	Florian Fainelli, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	open list

On Thu, Jun 2, 2022 at 3:18 PM Cyril Brulebois <kibi@debian.org> wrote:
>stm32mp157c-ev1.dts
> Hi Stefan,
>
> Stefan Wahren <stefan.wahren@i2se.com> (2022-05-30):
> > i think we should better trust the results based on the v5.18 tag. During
> > the merge window, regressions from other subsystems are possible.
>
> Alright, that looks like a great plan.
>
> Before getting your answer, I had tried taking the reboot part out of
> the equation, but I found out that even cold boots might fail with the
> mmc storage.

Hi Cyril,

FWIW, I can deliberately reproduce the errors you observed by using an
incorrect "interrupts" property for
the sdhci device's DT node.  It can also be triggered by removing its
"clocks" property.

However, if I do one of the above,  the error will occur on every
boot, but in your case (I think) you are seeing it
sporadically.  So that is peculiar.  I've looked at the recent
upstream commits for changes in the  sdhci driver
and also the relevant DT node and do not see anything obvious that
might cause this.

BTW,  when you observe this error, can you please do a  "cat
/proc/interrupts" and post the results?

Thanks & regards,
Jim Quinlan
Broadcom STB


>
> I haven't been able to conduct a systematic testing of the patch on top sdhci driver or the relevant
DT node
> of the v5.18 tag yet (say, 10 or 20 cold boots, and the same with
> reboots) due to strong work constraints these past few days, but that's
> definitely still on my short term todo list (hopefully before the end of
> the week).
>
> Sorry I didn't manage to get that lined up before Bjorn's pull request.
>
>
> Cheers,
> --
> Cyril Brulebois (kibi@debian.org)            <https://debamax.com/>
> D-I release manager -- Release team member -- Freelance Consultant

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-06-04 21:01 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-28 22:44 [PATCH v2 0/1] PCI: brcmstb: Fix regression regarding missing PCIe linkup Jim Quinlan
2022-05-28 22:44 ` Jim Quinlan
2022-05-28 22:44 ` [PATCH v2 1/1] " Jim Quinlan
2022-05-28 22:44   ` Jim Quinlan
2022-05-29  1:15   ` Cyril Brulebois
2022-05-29  1:15     ` Cyril Brulebois
2022-05-29  2:11     ` Jim Quinlan
2022-05-29  2:11       ` Jim Quinlan
2022-05-30 10:12     ` Stefan Wahren
2022-05-30 10:12       ` Stefan Wahren
2022-06-02 19:17       ` Cyril Brulebois
2022-06-02 19:17         ` Cyril Brulebois
2022-06-02 20:05         ` Bjorn Helgaas
2022-06-02 20:05           ` Bjorn Helgaas
2022-06-04 20:59         ` Jim Quinlan
2022-06-04 20:59           ` Jim Quinlan
2022-05-29 16:52   ` Jim Quinlan
2022-05-29 16:52     ` Jim Quinlan
2022-05-29 17:36     ` Thorsten Leemhuis
2022-05-29 17:36       ` Thorsten Leemhuis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.