linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PCI: Race condition in pci_create_sysfs_dev_files (can't boot)
@ 2021-07-30  8:18 Krzysztof Hałasa
  2021-07-30  9:49 ` Pali Rohár
  2022-01-20 23:00 ` Bjorn Helgaas
  0 siblings, 2 replies; 12+ messages in thread
From: Krzysztof Hałasa @ 2021-07-30  8:18 UTC (permalink / raw)
  To: Pali Rohár
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Lorenzo Pieralisi,
	Gregory Clement, Andrew Lunn, Krzysztof Wilczyński

Pali, et al,

I'm encountering a problem booting an i.MX6-based device (Gateworks
Ventana SBC). This is apparently a known issue:
https://lkml.org/lkml/2020/7/16/388

Do you guys know of a fix for this? Booting this machine reliably is
lately impossible.

First, it spews a warning:

pcieport 0000:00:00.0: PME: Signaling with IRQ 310

sysfs: cannot create duplicate filename '/devices/platform/soc/1ffc000.pcie/pci0000:00/0000:00:00.0/0000:e0'
CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.14.0-rc3+ #40
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Workqueue: events_unbound async_run_entry_fn

unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x40/0x4c
dump_stack_lvl from sysfs_warn_dup+0x54/0x60
sysfs_warn_dup from sysfs_add_file_mode_ns+0x154/0x1a8
sysfs_add_file_mode_ns from sysfs_create_bin_file+0x60/0x8c
sysfs_create_bin_file from pci_create_resource_files+0xf4/0x140
pci_create_resource_files from pci_bus_add_device+0x20/0x8c
pci_bus_add_device from pci_bus_add_devices+0x3c/0x80
pci_bus_add_devices from pci_bus_add_devices+0x70/0x80
pci_bus_add_devices from pci_host_probe+0x3c/0x90
pci_host_probe from dw_pcie_host_init+0x200/0x4b4
dw_pcie_host_init from imx6_pcie_probe+0x338/0x668
imx6_pcie_probe from platform_probe+0x80/0xc0
platform_probe from really_probe+0x158/0x324
really_probe from __driver_probe_device+0x84/0xe4
__driver_probe_device from driver_probe_device+0x34/0xd0
driver_probe_device from __driver_attach_async_helper+0x20/0x38
__driver_attach_async_helper from async_run_entry_fn+0x24/0xb4
async_run_entry_fn from process_one_work+0x164/0x3b0
process_one_work from worker_thread+0x2c/0x52c
worker_thread from kthread+0x110/0x154
kthread from ret_from_fork+0x14/0x24

pcieport 0000:01:00.0: enabling device (0140 -> 0143)

Then:
Unable to handle kernel paging request at virtual address 6f736572 (ASCII = "reso")
pgd = (ptrval)
[6f736572] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in:
CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.14.0-rc3+ #40
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Workqueue: events_unbound async_run_entry_fn
PC is at string_nocheck+0x20/0xa0
LR is at string+0x54/0x64
Process kworker/u8:0 (pid: 7, stack limit = 0x(ptrval))

string_nocheck from string+0x54/0x64
string from vsnprintf+0x1c4/0x48c
vsnprintf from vprintk_store+0x80/0x33c
vprintk_store from vprintk_emit+0x6c/0x1e0
vprintk_emit from vprintk_default+0x20/0x28
vprintk_default from printk+0x1c/0x2c
printk from sysfs_warn_dup+0x50/0x60
sysfs_warn_dup from sysfs_add_file_mode_ns+0x154/0x1a8
sysfs_add_file_mode_ns from sysfs_create_bin_file+0x60/0x8c
sysfs_create_bin_file from pci_create_resource_files+0xf4/0x140
pci_create_resource_files from pci_bus_add_device+0x20/0x8c
pci_bus_add_device from pci_bus_add_devices+0x3c/0x80
pci_bus_add_devices from pci_bus_add_devices+0x70/0x80
pci_bus_add_devices from pci_host_probe+0x3c/0x90
pci_host_probe from dw_pcie_host_init+0x200/0x4b4
dw_pcie_host_init from imx6_pcie_probe+0x338/0x668
imx6_pcie_probe from platform_probe+0x80/0xc0
platform_probe from really_probe+0x158/0x324
really_probe from __driver_probe_device+0x84/0xe4
__driver_probe_device from driver_probe_device+0x34/0xd0
driver_probe_device from __driver_attach_async_helper+0x20/0x38
__driver_attach_async_helper from async_run_entry_fn+0x24/0xb4
async_run_entry_fn from process_one_work+0x164/0x3b0
process_one_work from worker_thread+0x2c/0x52c
worker_thread from kthread+0x110/0x154
kthread from ret_from_fork+0x14/0x24
Code: e1dd42b2 e58d301c e3540000 0a000019 (e5d2e000)
--
Krzysztof "Chris" Hałasa

Sieć Badawcza Łukasiewicz
Przemysłowy Instytut Automatyki i Pomiarów PIAP
Al. Jerozolimskie 202, 02-486 Warszawa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI: Race condition in pci_create_sysfs_dev_files (can't boot)
  2021-07-30  8:18 PCI: Race condition in pci_create_sysfs_dev_files (can't boot) Krzysztof Hałasa
@ 2021-07-30  9:49 ` Pali Rohár
  2022-01-20 23:00 ` Bjorn Helgaas
  1 sibling, 0 replies; 12+ messages in thread
From: Pali Rohár @ 2021-07-30  9:49 UTC (permalink / raw)
  To: Krzysztof Hałasa
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, Lorenzo Pieralisi,
	Gregory Clement, Andrew Lunn, Krzysztof Wilczyński

On Friday 30 July 2021 10:18:44 Krzysztof Hałasa wrote:
> Pali, et al,
> 
> I'm encountering a problem booting an i.MX6-based device (Gateworks
> Ventana SBC). This is apparently a known issue:
> https://lkml.org/lkml/2020/7/16/388
> 
> Do you guys know of a fix for this? Booting this machine reliably is
> lately impossible.

Hello Krzysztof Hałasa! This is known issue and Krzysztof Wilczyński is
working on it... Just it will take some time as fixing it is not so
easy.

> First, it spews a warning:
> 
> pcieport 0000:00:00.0: PME: Signaling with IRQ 310
> 
> sysfs: cannot create duplicate filename '/devices/platform/soc/1ffc000.pcie/pci0000:00/0000:00:00.0/0000:e0'
> CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.14.0-rc3+ #40
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> Workqueue: events_unbound async_run_entry_fn
> 
> unwind_backtrace from show_stack+0x10/0x14
> show_stack from dump_stack_lvl+0x40/0x4c
> dump_stack_lvl from sysfs_warn_dup+0x54/0x60
> sysfs_warn_dup from sysfs_add_file_mode_ns+0x154/0x1a8
> sysfs_add_file_mode_ns from sysfs_create_bin_file+0x60/0x8c
> sysfs_create_bin_file from pci_create_resource_files+0xf4/0x140
> pci_create_resource_files from pci_bus_add_device+0x20/0x8c
> pci_bus_add_device from pci_bus_add_devices+0x3c/0x80
> pci_bus_add_devices from pci_bus_add_devices+0x70/0x80
> pci_bus_add_devices from pci_host_probe+0x3c/0x90
> pci_host_probe from dw_pcie_host_init+0x200/0x4b4
> dw_pcie_host_init from imx6_pcie_probe+0x338/0x668
> imx6_pcie_probe from platform_probe+0x80/0xc0
> platform_probe from really_probe+0x158/0x324
> really_probe from __driver_probe_device+0x84/0xe4
> __driver_probe_device from driver_probe_device+0x34/0xd0
> driver_probe_device from __driver_attach_async_helper+0x20/0x38
> __driver_attach_async_helper from async_run_entry_fn+0x24/0xb4
> async_run_entry_fn from process_one_work+0x164/0x3b0
> process_one_work from worker_thread+0x2c/0x52c
> worker_thread from kthread+0x110/0x154
> kthread from ret_from_fork+0x14/0x24
> 
> pcieport 0000:01:00.0: enabling device (0140 -> 0143)
> 
> Then:
> Unable to handle kernel paging request at virtual address 6f736572 (ASCII = "reso")
> pgd = (ptrval)
> [6f736572] *pgd=00000000
> Internal error: Oops: 5 [#1] SMP ARM
> Modules linked in:
> CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.14.0-rc3+ #40
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> Workqueue: events_unbound async_run_entry_fn
> PC is at string_nocheck+0x20/0xa0
> LR is at string+0x54/0x64
> Process kworker/u8:0 (pid: 7, stack limit = 0x(ptrval))
> 
> string_nocheck from string+0x54/0x64
> string from vsnprintf+0x1c4/0x48c
> vsnprintf from vprintk_store+0x80/0x33c
> vprintk_store from vprintk_emit+0x6c/0x1e0
> vprintk_emit from vprintk_default+0x20/0x28
> vprintk_default from printk+0x1c/0x2c
> printk from sysfs_warn_dup+0x50/0x60
> sysfs_warn_dup from sysfs_add_file_mode_ns+0x154/0x1a8
> sysfs_add_file_mode_ns from sysfs_create_bin_file+0x60/0x8c
> sysfs_create_bin_file from pci_create_resource_files+0xf4/0x140
> pci_create_resource_files from pci_bus_add_device+0x20/0x8c
> pci_bus_add_device from pci_bus_add_devices+0x3c/0x80
> pci_bus_add_devices from pci_bus_add_devices+0x70/0x80
> pci_bus_add_devices from pci_host_probe+0x3c/0x90
> pci_host_probe from dw_pcie_host_init+0x200/0x4b4
> dw_pcie_host_init from imx6_pcie_probe+0x338/0x668
> imx6_pcie_probe from platform_probe+0x80/0xc0
> platform_probe from really_probe+0x158/0x324
> really_probe from __driver_probe_device+0x84/0xe4
> __driver_probe_device from driver_probe_device+0x34/0xd0
> driver_probe_device from __driver_attach_async_helper+0x20/0x38
> __driver_attach_async_helper from async_run_entry_fn+0x24/0xb4
> async_run_entry_fn from process_one_work+0x164/0x3b0
> process_one_work from worker_thread+0x2c/0x52c
> worker_thread from kthread+0x110/0x154
> kthread from ret_from_fork+0x14/0x24
> Code: e1dd42b2 e58d301c e3540000 0a000019 (e5d2e000)
> --
> Krzysztof "Chris" Hałasa
> 
> Sieć Badawcza Łukasiewicz
> Przemysłowy Instytut Automatyki i Pomiarów PIAP
> Al. Jerozolimskie 202, 02-486 Warszawa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI: Race condition in pci_create_sysfs_dev_files (can't boot)
  2021-07-30  8:18 PCI: Race condition in pci_create_sysfs_dev_files (can't boot) Krzysztof Hałasa
  2021-07-30  9:49 ` Pali Rohár
@ 2022-01-20 23:00 ` Bjorn Helgaas
  2022-01-21  7:14   ` Krzysztof Hałasa
  1 sibling, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2022-01-20 23:00 UTC (permalink / raw)
  To: Krzysztof Hałasa
  Cc: Pali Rohár, Bjorn Helgaas, linux-pci, linux-kernel,
	Lorenzo Pieralisi, Gregory Clement, Andrew Lunn,
	Krzysztof Wilczyński

On Fri, Jul 30, 2021 at 10:18:44AM +0200, Krzysztof Hałasa wrote:
> Pali, et al,
> 
> I'm encountering a problem booting an i.MX6-based device (Gateworks
> Ventana SBC). This is apparently a known issue:
> https://lkml.org/lkml/2020/7/16/388
> 
> Do you guys know of a fix for this? Booting this machine reliably is
> lately impossible.

Hi Krzysztof, is this still an issue?

e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute") and
similar patches appeared in v5.13, and they should resolve things like
this:

  sysfs: cannot create duplicate filename '/devices/platform/soc/d0070000.pcie/pci0000:00/0000:00:00.0/config'

But your report is different because it looks like we have a problem
when creating a *directory*:

> sysfs: cannot create duplicate filename '/devices/platform/soc/1ffc000.pcie/pci0000:00/0000:00:00.0/0000:e0'

and then it seems that we corrupted a pointer:

> Unable to handle kernel paging request at virtual address 6f736572 (ASCII = "reso")

which is even more worrisome.

Bjorn

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI: Race condition in pci_create_sysfs_dev_files (can't boot)
  2022-01-20 23:00 ` Bjorn Helgaas
@ 2022-01-21  7:14   ` Krzysztof Hałasa
  2022-01-21 16:57     ` Bjorn Helgaas
  0 siblings, 1 reply; 12+ messages in thread
From: Krzysztof Hałasa @ 2022-01-21  7:14 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Pali Rohár, Bjorn Helgaas, linux-pci, linux-kernel,
	Lorenzo Pieralisi, Gregory Clement, Andrew Lunn,
	Krzysztof Wilczyński

Hi Bjorn, Krzysztof,

Bjorn Helgaas <helgaas@kernel.org> writes:

> On Fri, Jul 30, 2021 at 10:18:44AM +0200, Krzysztof Hałasa wrote:
>> I'm encountering a problem booting an i.MX6-based device (Gateworks
>> Ventana SBC). This is apparently a known issue:
>> https://lkml.org/lkml/2020/7/16/388

> Hi Krzysztof, is this still an issue?

Well... I'm still using i.MX6 with 5.14 and Krzysiek Wilczyński's "PCI:
Race condition in pci_create_sysfs_dev_files (can't boot)" patch (which
fixes the problem). It seems parts of this patch are now in mainline,
but most of it is still missing. So I guess the problem isn't fixed.

Perhaps because the DEC Alpha part of the patch is not yet ready?
Krzysztof?

> e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute") and
> similar patches appeared in v5.13,

Right, but they hadn't fixed the problem. 5.13 was released in June, and
I have started using the patch in August (apparently with pre-5.14
first, then with final 5.14).
-- 
Krzysztof "Chris" Hałasa

Sieć Badawcza Łukasiewicz
Przemysłowy Instytut Automatyki i Pomiarów PIAP
Al. Jerozolimskie 202, 02-486 Warszawa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI: Race condition in pci_create_sysfs_dev_files (can't boot)
  2022-01-21  7:14   ` Krzysztof Hałasa
@ 2022-01-21 16:57     ` Bjorn Helgaas
  2022-01-31 11:56       ` Krzysztof Hałasa
  0 siblings, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2022-01-21 16:57 UTC (permalink / raw)
  To: Krzysztof Hałasa
  Cc: Pali Rohár, Bjorn Helgaas, linux-pci, linux-kernel,
	Lorenzo Pieralisi, Gregory Clement, Andrew Lunn,
	Krzysztof Wilczyński

On Fri, Jan 21, 2022 at 08:14:46AM +0100, Krzysztof Hałasa wrote:
> Hi Bjorn, Krzysztof,
> 
> Bjorn Helgaas <helgaas@kernel.org> writes:
> 
> > On Fri, Jul 30, 2021 at 10:18:44AM +0200, Krzysztof Hałasa wrote:
> >> I'm encountering a problem booting an i.MX6-based device (Gateworks
> >> Ventana SBC). This is apparently a known issue:
> >> https://lkml.org/lkml/2020/7/16/388
> 
> > Hi Krzysztof, is this still an issue?
> 
> Well... I'm still using i.MX6 with 5.14 and Krzysiek Wilczyński's "PCI:
> Race condition in pci_create_sysfs_dev_files (can't boot)" patch (which
> fixes the problem). It seems parts of this patch are now in mainline,
> but most of it is still missing. So I guess the problem isn't fixed.
> 
> Perhaps because the DEC Alpha part of the patch is not yet ready?
> Krzysztof?
> 
> > e1d3f3268b0e ("PCI/sysfs: Convert "config" to static attribute") and
> > similar patches appeared in v5.13,
> 
> Right, but they hadn't fixed the problem. 5.13 was released in June, and
> I have started using the patch in August (apparently with pre-5.14
> first, then with final 5.14).

Thanks.  e1d3f3268b0e and related patches converted individual files
("config", "rom", "vpd", etc) to static attributes, but since the
problem you're seeing is with a directory, it's likely different.

I opened this bugzilla report to try to keep this from getting lost:
https://bugzilla.kernel.org/show_bug.cgi?id=215515

Bjorn

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI: Race condition in pci_create_sysfs_dev_files (can't boot)
  2022-01-21 16:57     ` Bjorn Helgaas
@ 2022-01-31 11:56       ` Krzysztof Hałasa
  2022-02-08 23:40         ` Bjorn Helgaas
  0 siblings, 1 reply; 12+ messages in thread
From: Krzysztof Hałasa @ 2022-01-31 11:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Pali Rohár, Bjorn Helgaas, linux-pci, linux-kernel,
	Lorenzo Pieralisi, Gregory Clement, Andrew Lunn,
	Krzysztof Wilczyński

Bjorn Helgaas <helgaas@kernel.org> writes:

> Thanks.  e1d3f3268b0e and related patches converted individual files
> ("config", "rom", "vpd", etc) to static attributes, but since the
> problem you're seeing is with a directory, it's likely different.
>
> I opened this bugzilla report to try to keep this from getting lost:
> https://bugzilla.kernel.org/show_bug.cgi?id=215515

Ok, thanks.
-- 
Krzysztof "Chris" Hałasa

Sieć Badawcza Łukasiewicz
Przemysłowy Instytut Automatyki i Pomiarów PIAP
Al. Jerozolimskie 202, 02-486 Warszawa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI: Race condition in pci_create_sysfs_dev_files (can't boot)
  2022-01-31 11:56       ` Krzysztof Hałasa
@ 2022-02-08 23:40         ` Bjorn Helgaas
  2022-02-11 12:31           ` Krzysztof Hałasa
  2022-02-15  6:35           ` Krzysztof Hałasa
  0 siblings, 2 replies; 12+ messages in thread
From: Bjorn Helgaas @ 2022-02-08 23:40 UTC (permalink / raw)
  To: Krzysztof Hałasa
  Cc: Pali Rohár, Bjorn Helgaas, linux-pci, linux-kernel,
	Lorenzo Pieralisi, Gregory Clement, Andrew Lunn,
	Krzysztof Wilczyński, Koen Vandeputte, Dexuan Cui

[+cc Koen, Dexuan from this old thread:
https://lore.kernel.org/all/20200716110423.xtfyb3n6tn5ixedh@pali/]

On Mon, Jan 31, 2022 at 12:56:18PM +0100, Krzysztof Hałasa wrote:
> Bjorn Helgaas <helgaas@kernel.org> writes:
> 
> > Thanks.  e1d3f3268b0e and related patches converted individual files
> > ("config", "rom", "vpd", etc) to static attributes, but since the
> > problem you're seeing is with a directory, it's likely different.
> >
> > I opened this bugzilla report to try to keep this from getting lost:
> > https://bugzilla.kernel.org/show_bug.cgi?id=215515
> 
> Ok, thanks.

Koen collected some interesting logs at
https://lore.kernel.org/all/cd4812f0-1de3-0582-936c-ba30906595af@citymesh.com/
They're from v5.10, which was before all of Krzysztof W's nice work
converting to static attributes, but Koen's log shows the error
happening in the pci_sysfs_init() initcall, which is *after*
imx6_pcie_probe():

  imx6_pcie_probe                # probably device initcall (level 6)
    ...
      pci_create_sysfs_dev_files

  pci_sysfs_init                 # late initcall (level 7)
    pci_create_sysfs_dev_files
      "sysfs: cannot create duplicate filename"

Krzysztof, you're running v5.14, which includes Krzysztof W's work,
but that shouldn't affect the imx6_pcie_probe()/pci_sysfs_init()
ordering.  Your log shows the error in imx6_pcie_probe().

Would you mind adding the patch below and attaching a complete dmesg
log to the bugzilla?

Bjorn

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 7bbf2673c7f2..6a2b62fe7704 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -1200,6 +1200,9 @@ static int pci_create_attr(struct pci_dev *pdev, int num, int write_combine)
 	res_attr->attr.mode = 0600;
 	res_attr->size = pci_resource_len(pdev, num);
 	res_attr->private = (void *)(unsigned long)num;
+	pci_info(pdev, "pci_create_attr:");
+	print_hex_dump(KERN_INFO, "", DUMP_PREFIX_OFFSET, 16, 1,
+			res_attr, 64, true);
 	retval = sysfs_create_bin_file(&pdev->dev.kobj, res_attr);
 	if (retval)
 		kfree(res_attr);
@@ -1380,6 +1383,9 @@ static const struct attribute_group pci_dev_reset_attr_group = {
 
 int __must_check pci_create_sysfs_dev_files(struct pci_dev *pdev)
 {
+	pci_info(pdev, "pci_create_sysfs_dev_files\n");
+	dump_stack();
+
 	if (!sysfs_initialized)
 		return -EACCES;
 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI: Race condition in pci_create_sysfs_dev_files (can't boot)
  2022-02-08 23:40         ` Bjorn Helgaas
@ 2022-02-11 12:31           ` Krzysztof Hałasa
  2022-02-15  6:35           ` Krzysztof Hałasa
  1 sibling, 0 replies; 12+ messages in thread
From: Krzysztof Hałasa @ 2022-02-11 12:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Pali Rohár, Bjorn Helgaas, linux-pci, linux-kernel,
	Lorenzo Pieralisi, Gregory Clement, Andrew Lunn,
	Krzysztof Wilczyński, Koen Vandeputte, Dexuan Cui

Bjorn,

> Krzysztof, you're running v5.14, which includes Krzysztof W's work,
> but that shouldn't affect the imx6_pcie_probe()/pci_sysfs_init()
> ordering.  Your log shows the error in imx6_pcie_probe().
>
> Would you mind adding the patch below and attaching a complete dmesg
> log to the bugzilla?

Sure. In fact I was using 5.16-rc4 (with the media tree) lately, without
the fix from the other Krzysztof, and, as I can now see, it worked
without the problem in question.

I switched it to 5.14 I was using previously, reverted the fix, added
your patch, and tried again. It booted 100+ times without experiencing
the problem, too.

I will try again with the previous 5.14-rc3+ (which needed the fix) next
week. Perhaps it was an unrelated bug affecting 5.14-rc only (fixed
before final 5.14)? I haven't checked back then if 5.14 needed the fix,
most probably I've simply rebased from rc3+ to final.

The other possibility is changed .config. Or, I don't know, the gcc
versions. We will see.
-- 
Krzysztof "Chris" Hałasa

Sieć Badawcza Łukasiewicz
Przemysłowy Instytut Automatyki i Pomiarów PIAP
Al. Jerozolimskie 202, 02-486 Warszawa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI: Race condition in pci_create_sysfs_dev_files (can't boot)
  2022-02-08 23:40         ` Bjorn Helgaas
  2022-02-11 12:31           ` Krzysztof Hałasa
@ 2022-02-15  6:35           ` Krzysztof Hałasa
  2022-04-01 13:50             ` Koen Vandeputte
  1 sibling, 1 reply; 12+ messages in thread
From: Krzysztof Hałasa @ 2022-02-15  6:35 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Pali Rohár, Bjorn Helgaas, linux-pci, linux-kernel,
	Lorenzo Pieralisi, Gregory Clement, Andrew Lunn,
	Krzysztof Wilczyński, Koen Vandeputte, Dexuan Cui

Hi Bjorn,

Bjorn Helgaas <helgaas@kernel.org> writes:

> Koen collected some interesting logs at
> https://lore.kernel.org/all/cd4812f0-1de3-0582-936c-ba30906595af@citymesh.com/
> They're from v5.10, which was before all of Krzysztof W's nice work
> converting to static attributes, but Koen's log shows the error
> happening in the pci_sysfs_init() initcall, which is *after*
> imx6_pcie_probe():
>
>   imx6_pcie_probe                # probably device initcall (level 6)
>     ...
>       pci_create_sysfs_dev_files
>
>   pci_sysfs_init                 # late initcall (level 7)
>     pci_create_sysfs_dev_files
>       "sysfs: cannot create duplicate filename"

Well, imx6_pcie_probe() is called indirectly by
platform_driver_register(). I guess it doesn't know about the initcall
ordering, after it's registered.

It looks like the problem is the imx6_pcie_probe() (via
dw_pcie_host_init() -> pci_host_probe()) is interfering with
pci_sysfs_init(). This may eventually cause some invalid memory access
as well.

BTW I thought for a moment that maybe 5.14 is free from this. I was
wrong. The problem doesn't manifest itself on my custom i.MX6 device
(using Tinyrex CPU module from Voipac/Fedevel, perhaps because I don't
use any PCI devices there). It does on Ventana SBC from Gateworks,
though. BTW the above (and below) is v5.16.

It goes like this:
[0.096212] do_initcall_level: 6
[0.105625] imx6_pcie_init
[0.106106] imx6_pcie_probe       <<<<<<<<<<<<<<<<<<<<<
[0.106412] imx6q-pcie 1ffc000.pcie: host bridge /soc/pcie@1ffc000 ranges:

[0.322613] imx6q-pcie 1ffc000.pcie: Link up
[0.322776] imx6q-pcie 1ffc000.pcie: PCI host bridge to bus 0000:00
[0.322790] pci_bus 0000:00: root bus resource [bus 00-ff]

[0.405251] do_initcall_level: 6 ENDs but imx6_pcie_probe() still active
[0.405262] do_initcall_level: 7

[0.410393] pci_sysfs_init        <<<<<<<<<<<<<<<<<<<<<
[0.410423] pci 0000:00:00.0: pci_create_sysfs_dev_files

[0.410532] [<8068091c>] (pci_create_sysfs_dev_files)
[0.410551] [<80918710>] (pci_sysfs_init)
[0.410568] [<8010166c>] (do_one_initcall)

[0.410717] pci_sysfs_init END    <<<<<<<<<<<<<<<<<<<<<

[0.533843] [<803f1c74>] (pci_bus_add_devices)
[0.533862] [<803f574c>] (pci_host_probe)
[0.533879] [<80414310>] (dw_pcie_host_init)
[0.533895] [<80681ac8>] (imx6_pcie_probe)
[0.533915] [<8045e9e4>] (platform_probe)
(Repeats multiple times, I guess for each PCI device)

[0.543893] imx6_pcie_probe END   <<<<<<<<<<<<<<<<<<<<<

[0.692244] do_initcall_level: 7 END
-- 
Krzysztof "Chris" Hałasa

Sieć Badawcza Łukasiewicz
Przemysłowy Instytut Automatyki i Pomiarów PIAP
Al. Jerozolimskie 202, 02-486 Warszawa

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI: Race condition in pci_create_sysfs_dev_files (can't boot)
  2022-02-15  6:35           ` Krzysztof Hałasa
@ 2022-04-01 13:50             ` Koen Vandeputte
  2022-04-06 14:08               ` Koen Vandeputte
  0 siblings, 1 reply; 12+ messages in thread
From: Koen Vandeputte @ 2022-04-01 13:50 UTC (permalink / raw)
  To: Krzysztof Hałasa, Bjorn Helgaas
  Cc: Pali Rohár, Bjorn Helgaas, linux-pci, linux-kernel,
	Lorenzo Pieralisi, Gregory Clement, Andrew Lunn,
	Krzysztof Wilczyński, Dexuan Cui


On 15.02.22 07:35, Krzysztof Hałasa wrote:
> Hi Bjorn,
>
> Bjorn Helgaas <helgaas@kernel.org> writes:
>
>> Koen collected some interesting logs at
>> https://lore.kernel.org/all/cd4812f0-1de3-0582-936c-ba30906595af@citymesh.com/
>> They're from v5.10, which was before all of Krzysztof W's nice work
>> converting to static attributes, but Koen's log shows the error
>> happening in the pci_sysfs_init() initcall, which is *after*
>> imx6_pcie_probe():
>>
>>    imx6_pcie_probe                # probably device initcall (level 6)
>>      ...
>>        pci_create_sysfs_dev_files
>>
>>    pci_sysfs_init                 # late initcall (level 7)
>>      pci_create_sysfs_dev_files
>>        "sysfs: cannot create duplicate filename"
> Well, imx6_pcie_probe() is called indirectly by
> platform_driver_register(). I guess it doesn't know about the initcall
> ordering, after it's registered.
>
> It looks like the problem is the imx6_pcie_probe() (via
> dw_pcie_host_init() -> pci_host_probe()) is interfering with
> pci_sysfs_init(). This may eventually cause some invalid memory access
> as well.
>
> BTW I thought for a moment that maybe 5.14 is free from this. I was
> wrong. The problem doesn't manifest itself on my custom i.MX6 device
> (using Tinyrex CPU module from Voipac/Fedevel, perhaps because I don't
> use any PCI devices there). It does on Ventana SBC from Gateworks,
> though. BTW the above (and below) is v5.16.
>
> It goes like this:
> [0.096212] do_initcall_level: 6
> [0.105625] imx6_pcie_init
> [0.106106] imx6_pcie_probe       <<<<<<<<<<<<<<<<<<<<<
> [0.106412] imx6q-pcie 1ffc000.pcie: host bridge /soc/pcie@1ffc000 ranges:
>
> [0.322613] imx6q-pcie 1ffc000.pcie: Link up
> [0.322776] imx6q-pcie 1ffc000.pcie: PCI host bridge to bus 0000:00
> [0.322790] pci_bus 0000:00: root bus resource [bus 00-ff]
>
> [0.405251] do_initcall_level: 6 ENDs but imx6_pcie_probe() still active
> [0.405262] do_initcall_level: 7
>
> [0.410393] pci_sysfs_init        <<<<<<<<<<<<<<<<<<<<<
> [0.410423] pci 0000:00:00.0: pci_create_sysfs_dev_files
>
> [0.410532] [<8068091c>] (pci_create_sysfs_dev_files)
> [0.410551] [<80918710>] (pci_sysfs_init)
> [0.410568] [<8010166c>] (do_one_initcall)
>
> [0.410717] pci_sysfs_init END    <<<<<<<<<<<<<<<<<<<<<
>
> [0.533843] [<803f1c74>] (pci_bus_add_devices)
> [0.533862] [<803f574c>] (pci_host_probe)
> [0.533879] [<80414310>] (dw_pcie_host_init)
> [0.533895] [<80681ac8>] (imx6_pcie_probe)
> [0.533915] [<8045e9e4>] (platform_probe)
> (Repeats multiple times, I guess for each PCI device)
>
> [0.543893] imx6_pcie_probe END   <<<<<<<<<<<<<<<<<<<<<
>
> [0.692244] do_initcall_level: 7 END


Hi all,

Any update on this topic?
I just tested kernel 5.15 on imx6 (gateworks Ventana 5200) and as soon 
as I connect a pcie device on one of the ports,

following happens:

https://pastebin.com/raw/mgfSvTRB

Any idea if this is related?


Thanks,

Koen


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI: Race condition in pci_create_sysfs_dev_files (can't boot)
  2022-04-01 13:50             ` Koen Vandeputte
@ 2022-04-06 14:08               ` Koen Vandeputte
  2022-04-13 12:52                 ` Koen Vandeputte
  0 siblings, 1 reply; 12+ messages in thread
From: Koen Vandeputte @ 2022-04-06 14:08 UTC (permalink / raw)
  To: Krzysztof Hałasa, Bjorn Helgaas
  Cc: Pali Rohár, Bjorn Helgaas, linux-pci, linux-kernel,
	Lorenzo Pieralisi, Gregory Clement, Andrew Lunn,
	Krzysztof Wilczyński, Dexuan Cui, Petr Štetiar,
	Piotr Dymacz


On 01.04.22 15:50, Koen Vandeputte wrote:
>
> On 15.02.22 07:35, Krzysztof Hałasa wrote:
>> Hi Bjorn,
>>
>> Bjorn Helgaas <helgaas@kernel.org> writes:
>>
>>> Koen collected some interesting logs at
>>> https://lore.kernel.org/all/cd4812f0-1de3-0582-936c-ba30906595af@citymesh.com/ 
>>>
>>> They're from v5.10, which was before all of Krzysztof W's nice work
>>> converting to static attributes, but Koen's log shows the error
>>> happening in the pci_sysfs_init() initcall, which is *after*
>>> imx6_pcie_probe():
>>>
>>>    imx6_pcie_probe                # probably device initcall (level 6)
>>>      ...
>>>        pci_create_sysfs_dev_files
>>>
>>>    pci_sysfs_init                 # late initcall (level 7)
>>>      pci_create_sysfs_dev_files
>>>        "sysfs: cannot create duplicate filename"
>> Well, imx6_pcie_probe() is called indirectly by
>> platform_driver_register(). I guess it doesn't know about the initcall
>> ordering, after it's registered.
>>
>> It looks like the problem is the imx6_pcie_probe() (via
>> dw_pcie_host_init() -> pci_host_probe()) is interfering with
>> pci_sysfs_init(). This may eventually cause some invalid memory access
>> as well.
>>
>> BTW I thought for a moment that maybe 5.14 is free from this. I was
>> wrong. The problem doesn't manifest itself on my custom i.MX6 device
>> (using Tinyrex CPU module from Voipac/Fedevel, perhaps because I don't
>> use any PCI devices there). It does on Ventana SBC from Gateworks,
>> though. BTW the above (and below) is v5.16.
>>
>> It goes like this:
>> [0.096212] do_initcall_level: 6
>> [0.105625] imx6_pcie_init
>> [0.106106] imx6_pcie_probe <<<<<<<<<<<<<<<<<<<<<
>> [0.106412] imx6q-pcie 1ffc000.pcie: host bridge /soc/pcie@1ffc000 
>> ranges:
>>
>> [0.322613] imx6q-pcie 1ffc000.pcie: Link up
>> [0.322776] imx6q-pcie 1ffc000.pcie: PCI host bridge to bus 0000:00
>> [0.322790] pci_bus 0000:00: root bus resource [bus 00-ff]
>>
>> [0.405251] do_initcall_level: 6 ENDs but imx6_pcie_probe() still active
>> [0.405262] do_initcall_level: 7
>>
>> [0.410393] pci_sysfs_init <<<<<<<<<<<<<<<<<<<<<
>> [0.410423] pci 0000:00:00.0: pci_create_sysfs_dev_files
>>
>> [0.410532] [<8068091c>] (pci_create_sysfs_dev_files)
>> [0.410551] [<80918710>] (pci_sysfs_init)
>> [0.410568] [<8010166c>] (do_one_initcall)
>>
>> [0.410717] pci_sysfs_init END <<<<<<<<<<<<<<<<<<<<<
>>
>> [0.533843] [<803f1c74>] (pci_bus_add_devices)
>> [0.533862] [<803f574c>] (pci_host_probe)
>> [0.533879] [<80414310>] (dw_pcie_host_init)
>> [0.533895] [<80681ac8>] (imx6_pcie_probe)
>> [0.533915] [<8045e9e4>] (platform_probe)
>> (Repeats multiple times, I guess for each PCI device)
>>
>> [0.543893] imx6_pcie_probe END <<<<<<<<<<<<<<<<<<<<<
>>
>> [0.692244] do_initcall_level: 7 END
>
>
> Hi all,
>
> Any update on this topic?
> I just tested kernel 5.15 on imx6 (gateworks Ventana 5200) and as soon 
> as I connect a pcie device on one of the ports,
>
> following happens:
>
> https://pastebin.com/raw/mgfSvTRB
>
> Any idea if this is related?
>
>
> Thanks,
>
> Koen
>
Hi all,

I tested a bit more today and simply let the board reboot all day long.
After roughly 20 reboots, it suddenly booted once stable without any 
errors/warnings.

Looks like a race condition ..

Any idea?

Thanks,

Koen


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PCI: Race condition in pci_create_sysfs_dev_files (can't boot)
  2022-04-06 14:08               ` Koen Vandeputte
@ 2022-04-13 12:52                 ` Koen Vandeputte
  0 siblings, 0 replies; 12+ messages in thread
From: Koen Vandeputte @ 2022-04-13 12:52 UTC (permalink / raw)
  To: Krzysztof Hałasa, Bjorn Helgaas
  Cc: Pali Rohár, Bjorn Helgaas, linux-pci, linux-kernel,
	Lorenzo Pieralisi, Gregory Clement, Andrew Lunn,
	Krzysztof Wilczyński, Dexuan Cui, Petr Štetiar,
	Piotr Dymacz


On 06.04.22 16:08, Koen Vandeputte wrote:
>
> On 01.04.22 15:50, Koen Vandeputte wrote:
>>
>> On 15.02.22 07:35, Krzysztof Hałasa wrote:
>>> Hi Bjorn,
>>>
>>> Bjorn Helgaas <helgaas@kernel.org> writes:
>>>
>>>> Koen collected some interesting logs at
>>>> https://lore.kernel.org/all/cd4812f0-1de3-0582-936c-ba30906595af@citymesh.com/ 
>>>>
>>>> They're from v5.10, which was before all of Krzysztof W's nice work
>>>> converting to static attributes, but Koen's log shows the error
>>>> happening in the pci_sysfs_init() initcall, which is *after*
>>>> imx6_pcie_probe():
>>>>
>>>>    imx6_pcie_probe                # probably device initcall (level 6)
>>>>      ...
>>>>        pci_create_sysfs_dev_files
>>>>
>>>>    pci_sysfs_init                 # late initcall (level 7)
>>>>      pci_create_sysfs_dev_files
>>>>        "sysfs: cannot create duplicate filename"
>>> Well, imx6_pcie_probe() is called indirectly by
>>> platform_driver_register(). I guess it doesn't know about the initcall
>>> ordering, after it's registered.
>>>
>>> It looks like the problem is the imx6_pcie_probe() (via
>>> dw_pcie_host_init() -> pci_host_probe()) is interfering with
>>> pci_sysfs_init(). This may eventually cause some invalid memory access
>>> as well.
>>>
>>> BTW I thought for a moment that maybe 5.14 is free from this. I was
>>> wrong. The problem doesn't manifest itself on my custom i.MX6 device
>>> (using Tinyrex CPU module from Voipac/Fedevel, perhaps because I don't
>>> use any PCI devices there). It does on Ventana SBC from Gateworks,
>>> though. BTW the above (and below) is v5.16.
>>>
>>> It goes like this:
>>> [0.096212] do_initcall_level: 6
>>> [0.105625] imx6_pcie_init
>>> [0.106106] imx6_pcie_probe <<<<<<<<<<<<<<<<<<<<<
>>> [0.106412] imx6q-pcie 1ffc000.pcie: host bridge /soc/pcie@1ffc000 
>>> ranges:
>>>
>>> [0.322613] imx6q-pcie 1ffc000.pcie: Link up
>>> [0.322776] imx6q-pcie 1ffc000.pcie: PCI host bridge to bus 0000:00
>>> [0.322790] pci_bus 0000:00: root bus resource [bus 00-ff]
>>>
>>> [0.405251] do_initcall_level: 6 ENDs but imx6_pcie_probe() still active
>>> [0.405262] do_initcall_level: 7
>>>
>>> [0.410393] pci_sysfs_init <<<<<<<<<<<<<<<<<<<<<
>>> [0.410423] pci 0000:00:00.0: pci_create_sysfs_dev_files
>>>
>>> [0.410532] [<8068091c>] (pci_create_sysfs_dev_files)
>>> [0.410551] [<80918710>] (pci_sysfs_init)
>>> [0.410568] [<8010166c>] (do_one_initcall)
>>>
>>> [0.410717] pci_sysfs_init END <<<<<<<<<<<<<<<<<<<<<
>>>
>>> [0.533843] [<803f1c74>] (pci_bus_add_devices)
>>> [0.533862] [<803f574c>] (pci_host_probe)
>>> [0.533879] [<80414310>] (dw_pcie_host_init)
>>> [0.533895] [<80681ac8>] (imx6_pcie_probe)
>>> [0.533915] [<8045e9e4>] (platform_probe)
>>> (Repeats multiple times, I guess for each PCI device)
>>>
>>> [0.543893] imx6_pcie_probe END <<<<<<<<<<<<<<<<<<<<<
>>>
>>> [0.692244] do_initcall_level: 7 END
>>
>>
>> Hi all,
>>
>> Any update on this topic?
>> I just tested kernel 5.15 on imx6 (gateworks Ventana 5200) and as 
>> soon as I connect a pcie device on one of the ports,
>>
>> following happens:
>>
>> https://pastebin.com/raw/mgfSvTRB
>>
>> Any idea if this is related?
>>
>>
>> Thanks,
>>
>> Koen
>>
> Hi all,
>
> I tested a bit more today and simply let the board reboot all day long.
> After roughly 20 reboots, it suddenly booted once stable without any 
> errors/warnings.
>
> Looks like a race condition ..
>
> Any idea?
>
> Thanks,
>
> Koen
>
As an additional addendum:

This issue is seen on a Gateworks Ventana gw5200 which has a PLX bridge.
I also have a GW5100 which is identical but without the PLX bridge, and 
it works fine every time.

So when a PCI device is sitting behind a bridge, the issue is triggered.


Hope this helps to easily reproduce.

Koen


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-04-13 12:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30  8:18 PCI: Race condition in pci_create_sysfs_dev_files (can't boot) Krzysztof Hałasa
2021-07-30  9:49 ` Pali Rohár
2022-01-20 23:00 ` Bjorn Helgaas
2022-01-21  7:14   ` Krzysztof Hałasa
2022-01-21 16:57     ` Bjorn Helgaas
2022-01-31 11:56       ` Krzysztof Hałasa
2022-02-08 23:40         ` Bjorn Helgaas
2022-02-11 12:31           ` Krzysztof Hałasa
2022-02-15  6:35           ` Krzysztof Hałasa
2022-04-01 13:50             ` Koen Vandeputte
2022-04-06 14:08               ` Koen Vandeputte
2022-04-13 12:52                 ` Koen Vandeputte

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).