All of lore.kernel.org
 help / color / mirror / Atom feed
* Beaglebone Ethernet Probe Failure In 6.8+
@ 2024-04-17 15:42 Colin Foster
  2024-04-17 19:30 ` Andrew Lunn
  0 siblings, 1 reply; 8+ messages in thread
From: Colin Foster @ 2024-04-17 15:42 UTC (permalink / raw)
  To: netdev

Hello,

I'm chasing down an issue in recent kernels. My setup is slightly
unconventional: a BBB with ETH0 as a CPU port to a DSA switch that is
controlled by SPI. I'll have hardware next week, but think it is worth
getting a discussion going.

The commit in question is commit df16c1c51d81 ("net: phy: mdio_device:
Reset device only when necessary"). This seems to cause a probe error of
the MDIO device. A dump_stack was added where the reset is skipped.

SMSC LAN8710/LAN8720: probe of 4a101000.mdio:00 failed with error -5

(actual dmesg is below)

Because this failure happens much earlier than DSA, I suspect is isn't
isolated to me and my setup - but I'm not positive at the moment.

I suspect one of the following:

1. There's an issue with my setup / configuration.

2. This is an issue for every BBB device, but probe failures don't
actually break functionality.


Depending on which of those is the case, I'll either need to:

A. revert the patch because it is causing probe failures

B. determine why the probe is failing in the MDIO driver and try to fix
that

C. Introduce an API to force resets, regardless of the previous state,
and apply that to the failure cases.


I assume the path forward is option B... but if the issue is more
widespread, options A or C might be the correct path.


I'll be able to test on hardware again next week if there's more
information needed.



[    1.539656] mdio_bus 4a101000.mdio:00: using DT '/ocp/interconnect@4a000000/segment@0/target-module@100000/switch@0/mdio@1000/ethernet-phy@0' for 'reset' GPIO lookup
[    1.539911] of_get_named_gpiod_flags: parsed 'reset-gpios' property of node '/ocp/interconnect@4a000000/segment@0/target-module@100000/switch@0/mdio@1000/ethernet-phy@0[0]' - status (0)
[    1.540193] gpio gpiochip0: Persistence not supported for GPIO 8
[    1.548962] CPU: 0 PID: 25 Comm: kworker/u2:2 Not tainted 6.7.0-rc3-00667-gdf16c1c51d81-dirty #1407
[    1.548991] Hardware name: Generic AM33XX (Flattened Device Tree)
[    1.549004] Workqueue: events_unbound deferred_probe_work_func
[    1.549044] Backtrace: 
[    1.549055]  dump_backtrace from show_stack+0x20/0x24
[    1.549098]  show_stack from dump_stack_lvl+0x60/0x78
[    1.549127]  dump_stack_lvl from dump_stack+0x18/0x1c
[    1.549156]  dump_stack from mdio_device_reset+0xc4/0xfc
[    1.549183]  mdio_device_reset from phy_probe+0x6c/0x488
[    1.549209]  phy_probe from really_probe+0xd8/0x2e8
[    1.549243]  really_probe from __driver_probe_device+0x98/0x1b0
[    1.549266]  __driver_probe_device from driver_probe_device+0x40/0x114
[    1.549291]  driver_probe_device from __device_attach_driver+0xa4/0x10c
[    1.549317]  __device_attach_driver from bus_for_each_drv+0x94/0xec
[    1.549356]  bus_for_each_drv from __device_attach+0xbc/0x1e0
[    1.549384]  __device_attach from device_initial_probe+0x1c/0x20
[    1.549408]  device_initial_probe from bus_probe_device+0x98/0x9c
[    1.549422]  bus_probe_device from device_add+0x5a8/0x778
[    1.549450]  device_add from phy_device_register+0x50/0x90
[    1.549485]  phy_device_register from fwnode_mdiobus_phy_device_register+0xd0/0x114
[    1.549511]  fwnode_mdiobus_phy_device_register from fwnode_mdiobus_register_phy+0x16c/0x1c0
[    1.549539]  fwnode_mdiobus_register_phy from __of_mdiobus_register+0x150/0x38c
[    1.549568]  __of_mdiobus_register from davinci_mdio_probe+0x2ac/0x474
[    1.549613]  davinci_mdio_probe from platform_probe+0x6c/0xcc
[    1.549646]  platform_probe from really_probe+0xd8/0x2e8
[    1.549671]  really_probe from __driver_probe_device+0x98/0x1b0
[    1.549695]  __driver_probe_device from driver_probe_device+0x40/0x114
[    1.549720]  driver_probe_device from __device_attach_driver+0xa4/0x10c
[    1.549745]  __device_attach_driver from bus_for_each_drv+0x94/0xec
[    1.549774]  bus_for_each_drv from __device_attach+0xbc/0x1e0
[    1.549802]  __device_attach from device_initial_probe+0x1c/0x20
[    1.549825]  device_initial_probe from bus_probe_device+0x98/0x9c
[    1.549839]  bus_probe_device from device_add+0x5a8/0x778
[    1.549867]  device_add from of_device_add+0x44/0x4c
[    1.549909]  of_device_add from of_platform_device_create_pdata+0xa0/0xd0
[    1.549927]  of_platform_device_create_pdata from of_platform_bus_create+0x1b4/0x384
[    1.549956]  of_platform_bus_create from of_platform_populate+0x80/0xe4
[    1.549991]  of_platform_populate from devm_of_platform_populate+0x60/0xa8
[    1.550026]  devm_of_platform_populate from cpsw_probe+0x214/0xd2c
[    1.550060]  cpsw_probe from platform_probe+0x6c/0xcc
[    1.550096]  platform_probe from really_probe+0xd8/0x2e8
[    1.550121]  really_probe from __driver_probe_device+0x98/0x1b0
[    1.550145]  __driver_probe_device from driver_probe_device+0x40/0x114
[    1.550170]  driver_probe_device from __device_attach_driver+0xa4/0x10c
[    1.550196]  __device_attach_driver from bus_for_each_drv+0x94/0xec
[    1.550224]  bus_for_each_drv from __device_attach+0xbc/0x1e0
[    1.550252]  __device_attach from device_initial_probe+0x1c/0x20
[    1.550274]  device_initial_probe from bus_probe_device+0x98/0x9c
[    1.550288]  bus_probe_device from device_add+0x5a8/0x778
[    1.550316]  device_add from of_device_add+0x44/0x4c
[    1.550353]  of_device_add from of_platform_device_create_pdata+0xa0/0xd0
[    1.550369]  of_platform_device_create_pdata from of_platform_bus_create+0x1b4/0x384
[    1.550398]  of_platform_bus_create from of_platform_populate+0x80/0xe4
[    1.550433]  of_platform_populate from sysc_probe+0xff0/0x148c
[    1.550479]  sysc_probe from platform_probe+0x6c/0xcc
[    1.550516]  platform_probe from really_probe+0xd8/0x2e8
[    1.550541]  really_probe from __driver_probe_device+0x98/0x1b0
[    1.550565]  __driver_probe_device from driver_probe_device+0x40/0x114
[    1.550589]  driver_probe_device from __device_attach_driver+0xa4/0x10c
[    1.550615]  __device_attach_driver from bus_for_each_drv+0x94/0xec
[    1.550644]  bus_for_each_drv from __device_attach+0xbc/0x1e0
[    1.550673]  __device_attach from device_initial_probe+0x1c/0x20
[    1.550696]  device_initial_probe from bus_probe_device+0x98/0x9c
[    1.550709]  bus_probe_device from device_add+0x5a8/0x778
[    1.550738]  device_add from of_device_add+0x44/0x4c
[    1.550773]  of_device_add from of_platform_device_create_pdata+0xa0/0xd0
[    1.550790]  of_platform_device_create_pdata from of_platform_bus_create+0x1b4/0x384
[    1.550819]  of_platform_bus_create from of_platform_populate+0x80/0xe4
[    1.550854]  of_platform_populate from simple_pm_bus_probe+0xd8/0xfc
[    1.550890]  simple_pm_bus_probe from platform_probe+0x6c/0xcc
[    1.550919]  platform_probe from really_probe+0xd8/0x2e8
[    1.550944]  really_probe from __driver_probe_device+0x98/0x1b0
[    1.550968]  __driver_probe_device from driver_probe_device+0x40/0x114
[    1.550992]  driver_probe_device from __device_attach_driver+0xa4/0x10c
[    1.551019]  __device_attach_driver from bus_for_each_drv+0x94/0xec
[    1.551047]  bus_for_each_drv from __device_attach+0xbc/0x1e0
[    1.551075]  __device_attach from device_initial_probe+0x1c/0x20
[    1.551098]  device_initial_probe from bus_probe_device+0x98/0x9c
[    1.551111]  bus_probe_device from device_add+0x5a8/0x778
[    1.551139]  device_add from of_device_add+0x44/0x4c
[    1.551175]  of_device_add from of_platform_device_create_pdata+0xa0/0xd0
[    1.551192]  of_platform_device_create_pdata from of_platform_bus_create+0x1b4/0x384
[    1.551220]  of_platform_bus_create from of_platform_populate+0x80/0xe4
[    1.551255]  of_platform_populate from simple_pm_bus_probe+0xd8/0xfc
[    1.551291]  simple_pm_bus_probe from platform_probe+0x6c/0xcc
[    1.551320]  platform_probe from really_probe+0xd8/0x2e8
[    1.551345]  really_probe from __driver_probe_device+0x98/0x1b0
[    1.551369]  __driver_probe_device from driver_probe_device+0x40/0x114
[    1.551394]  driver_probe_device from __device_attach_driver+0xa4/0x10c
[    1.551419]  __device_attach_driver from bus_for_each_drv+0x94/0xec
[    1.551448]  bus_for_each_drv from __device_attach+0xbc/0x1e0
[    1.551476]  __device_attach from device_initial_probe+0x1c/0x20
[    1.551499]  device_initial_probe from bus_probe_device+0x98/0x9c
[    1.551513]  bus_probe_device from device_add+0x5a8/0x778
[    1.551541]  device_add from of_device_add+0x44/0x4c
[    1.551577]  of_device_add from of_platform_device_create_pdata+0xa0/0xd0
[    1.551593]  of_platform_device_create_pdata from of_platform_bus_create+0x1b4/0x384
[    1.551622]  of_platform_bus_create from of_platform_populate+0x80/0xe4
[    1.551657]  of_platform_populate from simple_pm_bus_probe+0xd8/0xfc
[    1.551693]  simple_pm_bus_probe from platform_probe+0x6c/0xcc
[    1.551722]  platform_probe from really_probe+0xd8/0x2e8
[    1.551747]  really_probe from __driver_probe_device+0x98/0x1b0
[    1.551770]  __driver_probe_device from driver_probe_device+0x40/0x114
[    1.551795]  driver_probe_device from __device_attach_driver+0xa4/0x10c
[    1.551821]  __device_attach_driver from bus_for_each_drv+0x94/0xec
[    1.551850]  bus_for_each_drv from __device_attach+0xbc/0x1e0
[    1.551878]  __device_attach from device_initial_probe+0x1c/0x20
[    1.551901]  device_initial_probe from bus_probe_device+0x98/0x9c
[    1.551915]  bus_probe_device from deferred_probe_work_func+0x88/0xb4
[    1.551940]  deferred_probe_work_func from process_one_work+0x170/0x43c
[    1.551971]  process_one_work from worker_thread+0x2c4/0x4ec
[    1.552006]  worker_thread from kthread+0x114/0x148
[    1.552040]  kthread from ret_from_fork+0x14/0x28
[    1.552056] Exception stack(0xe006dfb0 to 0xe006dff8)
[    1.552069] dfa0:                                     00000000 00000000 00000000 00000000
[    1.552081] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.552091] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    1.552103]  r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c037b070 r4:c21d6140
[    1.553623] SMSC LAN8710/LAN8720: probe of 4a101000.mdio:00 failed with error -5
[    1.553762] davinci_mdio 4a101000.mdio: phy[0]: device 4a101000.mdio:00, driver SMSC LAN8710/LAN8720
[    1.554978] cpsw-switch 4a100000.switch: initialized cpsw ale version 1.4
[    1.555011] cpsw-switch 4a100000.switch: ALE Table size 1024
[    1.555210] cpsw-switch 4a100000.switch: cpts: overflow check period 500 (jiffies)
[    1.555234] cpsw-switch 4a100000.switch: CPTS: ref_clk_freq:250000000 calc_mult:2147483648 calc_shift:29 error:0 nsec/sec
[    1.555343] cpsw-switch 4a100000.switch: Detected MACID = 24:76:25:76:35:37
[    1.558098] cpsw-switch 4a100000.switch: initialized (regs 0x4a100000, pool size 256) hw_ver:0019010C 1.12 (0)
[    1.600301] debugfs: Directory '49000000.dma' with parent 'dmaengine' already present!
[    1.600356] edma 49000000.dma: TI EDMA DMA engine driver



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Beaglebone Ethernet Probe Failure In 6.8+
  2024-04-17 15:42 Beaglebone Ethernet Probe Failure In 6.8+ Colin Foster
@ 2024-04-17 19:30 ` Andrew Lunn
  2024-04-17 23:52   ` Colin Foster
  2024-04-23  4:00   ` Colin Foster
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Lunn @ 2024-04-17 19:30 UTC (permalink / raw)
  To: Colin Foster; +Cc: netdev

On Wed, Apr 17, 2024 at 10:42:02AM -0500, Colin Foster wrote:
> Hello,
> 
> I'm chasing down an issue in recent kernels. My setup is slightly
> unconventional: a BBB with ETH0 as a CPU port to a DSA switch that is
> controlled by SPI. I'll have hardware next week, but think it is worth
> getting a discussion going.
> 
> The commit in question is commit df16c1c51d81 ("net: phy: mdio_device:
> Reset device only when necessary"). This seems to cause a probe error of
> the MDIO device. A dump_stack was added where the reset is skipped.
> 
> SMSC LAN8710/LAN8720: probe of 4a101000.mdio:00 failed with error -5

Can you confirm this EIO is this one:

https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/ti/davinci_mdio.c#L440

It would be good to check the value of USERACCESS_ACK, and what the
datasheet says about it.

The MDIO bus itself has no real way of telling if there is a device on
the bus at a given address, and so if the devices actually transfers
anything on a read. So if the resets are wrong, the device is still in
reset, or coming out of reset but not yet ready, you should just read
0xffff. Returning EIO would indicate some other issue.

> Because this failure happens much earlier than DSA, I suspect is isn't
> isolated to me and my setup - but I'm not positive at the moment.
> 
> I suspect one of the following:
> 
> 1. There's an issue with my setup / configuration.
> 
> 2. This is an issue for every BBB device, but probe failures don't
> actually break functionality.
> 
> 
> Depending on which of those is the case, I'll either need to:
> 
> A. revert the patch because it is causing probe failures
> 
> B. determine why the probe is failing in the MDIO driver and try to fix
> that
> 
> C. Introduce an API to force resets, regardless of the previous state,
> and apply that to the failure cases.
> 
> 
> I assume the path forward is option B... but if the issue is more
> widespread, options A or C might be the correct path.

I would prefer B, at least lets try to understand the
problem. Depending on what we find, we might need A, but lets decided
that later.

> [    1.553623] SMSC LAN8710/LAN8720: probe of 4a101000.mdio:00 failed with error -5
> [    1.553762] davinci_mdio 4a101000.mdio: phy[0]: device 4a101000.mdio:00, driver SMSC LAN8710/LAN8720
> [    1.554978] cpsw-switch 4a100000.switch: initialized cpsw ale version 1.4
> [    1.555011] cpsw-switch 4a100000.switch: ALE Table size 1024
> [    1.555210] cpsw-switch 4a100000.switch: cpts: overflow check period 500 (jiffies)
> [    1.555234] cpsw-switch 4a100000.switch: CPTS: ref_clk_freq:250000000 calc_mult:2147483648 calc_shift:29 error:0 nsec/sec
> [    1.555343] cpsw-switch 4a100000.switch: Detected MACID = 24:76:25:76:35:37
> [    1.558098] cpsw-switch 4a100000.switch: initialized (regs 0x4a100000, pool size 256) hw_ver:0019010C 1.12 (0)

So despite the -EIO, it finds the PHY, and the switch seems to probe
O.K?

	Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Beaglebone Ethernet Probe Failure In 6.8+
  2024-04-17 19:30 ` Andrew Lunn
@ 2024-04-17 23:52   ` Colin Foster
  2024-04-23  4:00   ` Colin Foster
  1 sibling, 0 replies; 8+ messages in thread
From: Colin Foster @ 2024-04-17 23:52 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev

Hi Andrew,

On Wed, Apr 17, 2024 at 09:30:58PM +0200, Andrew Lunn wrote:
> On Wed, Apr 17, 2024 at 10:42:02AM -0500, Colin Foster wrote:
> > Hello,
> > 
> > I'm chasing down an issue in recent kernels. My setup is slightly
> > unconventional: a BBB with ETH0 as a CPU port to a DSA switch that is
> > controlled by SPI. I'll have hardware next week, but think it is worth
> > getting a discussion going.
> > 
> > The commit in question is commit df16c1c51d81 ("net: phy: mdio_device:
> > Reset device only when necessary"). This seems to cause a probe error of
> > the MDIO device. A dump_stack was added where the reset is skipped.
> > 
> > SMSC LAN8710/LAN8720: probe of 4a101000.mdio:00 failed with error -5
> 
> Can you confirm this EIO is this one:
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/ti/davinci_mdio.c#L440
> 
> It would be good to check the value of USERACCESS_ACK, and what the
> datasheet says about it.
> 
> The MDIO bus itself has no real way of telling if there is a device on
> the bus at a given address, and so if the devices actually transfers
> anything on a read. So if the resets are wrong, the device is still in
> reset, or coming out of reset but not yet ready, you should just read
> 0xffff. Returning EIO would indicate some other issue.

I'll look into this next week when I have hardware again.

> 
> > Because this failure happens much earlier than DSA, I suspect is isn't
> > isolated to me and my setup - but I'm not positive at the moment.
> > 
> > I suspect one of the following:
> > 
> > 1. There's an issue with my setup / configuration.
> > 
> > 2. This is an issue for every BBB device, but probe failures don't
> > actually break functionality.
> > 
> > 
> > Depending on which of those is the case, I'll either need to:
> > 
> > A. revert the patch because it is causing probe failures
> > 
> > B. determine why the probe is failing in the MDIO driver and try to fix
> > that
> > 
> > C. Introduce an API to force resets, regardless of the previous state,
> > and apply that to the failure cases.
> > 
> > 
> > I assume the path forward is option B... but if the issue is more
> > widespread, options A or C might be the correct path.
> 
> I would prefer B, at least lets try to understand the
> problem. Depending on what we find, we might need A, but lets decided
> that later.

Agreed.

> 
> > [    1.553623] SMSC LAN8710/LAN8720: probe of 4a101000.mdio:00 failed with error -5
> > [    1.553762] davinci_mdio 4a101000.mdio: phy[0]: device 4a101000.mdio:00, driver SMSC LAN8710/LAN8720
> > [    1.554978] cpsw-switch 4a100000.switch: initialized cpsw ale version 1.4
> > [    1.555011] cpsw-switch 4a100000.switch: ALE Table size 1024
> > [    1.555210] cpsw-switch 4a100000.switch: cpts: overflow check period 500 (jiffies)
> > [    1.555234] cpsw-switch 4a100000.switch: CPTS: ref_clk_freq:250000000 calc_mult:2147483648 calc_shift:29 error:0 nsec/sec
> > [    1.555343] cpsw-switch 4a100000.switch: Detected MACID = 24:76:25:76:35:37
> > [    1.558098] cpsw-switch 4a100000.switch: initialized (regs 0x4a100000, pool size 256) hw_ver:0019010C 1.12 (0)
> 
> So despite the -EIO, it finds the PHY, and the switch seems to probe
> O.K?

Yes. The issue I face is actually down the line when I enable the DSA
ports. I haven't diagnosed it yet, but a separate reset happens from
within phy_init_hw.

Here I've kept the dump_stack() from the patch, but removed the
return, so it is functional.

This is why it seems like it might be a bug that everyone is seeing, but
nobody is noticing... I hope to know more next week.

[    8.581463] EXT4-fs (mmcblk0p2): re-mounted 084255e0-9101-48d6-af17-9601fd9c5a1d r/w. Quota mode: disabled.
[   32.500235] cpsw-switch 4a100000.switch: starting ndev. mode: dual_mac
[   32.522610] CPU: 0 PID: 166 Comm: ip Not tainted 6.7.0-rc3-00667-gdf16c1c51d81-dirty #1408
[   32.530962] Hardware name: Generic AM33XX (Flattened Device Tree)
[   32.537090] Backtrace: 
[   32.539561]  dump_backtrace from show_stack+0x20/0x24
[   32.550363]  show_stack from dump_stack_lvl+0x60/0x78
[   32.555461]  dump_stack_lvl from dump_stack+0x18/0x1c
[   32.566238]  dump_stack from mdio_device_reset+0xc4/0x108
[   32.571685]  mdio_device_reset from phy_init_hw+0x20/0xb8
[   32.580713]  phy_init_hw from phy_attach_direct+0x148/0x340
[   32.589911]  phy_attach_direct from phy_connect_direct+0x2c/0x68
[   32.607416]  phy_connect_direct from of_phy_connect+0x54/0x7c
[   32.618889]  of_phy_connect from cpsw_ndo_open+0x30c/0x4e4
[   32.630096]  cpsw_ndo_open from __dev_open+0xfc/0x1b0
[   32.645608]  __dev_open from __dev_change_flags+0x198/0x218
[   32.656909]  __dev_change_flags from dev_change_flags+0x28/0x64
[   32.670656]  dev_change_flags from do_setlink+0x258/0xed4
[   32.681789]  do_setlink from rtnl_newlink+0x544/0x87c
[   32.697294]  rtnl_newlink from rtnetlink_rcv_msg+0x138/0x318
[   32.713408]  rtnetlink_rcv_msg from netlink_rcv_skb+0xc8/0x12c
[   32.729702]  netlink_rcv_skb from rtnetlink_rcv+0x20/0x24
[   32.740825]  rtnetlink_rcv from netlink_unicast+0x1b0/0x2a4
[   32.746435]  netlink_unicast from netlink_sendmsg+0x1a4/0x408
[   32.760001]  netlink_sendmsg from ____sys_sendmsg+0xb8/0x2c4
[   32.776110]  ____sys_sendmsg from ___sys_sendmsg+0x7c/0xb4
[   32.792046]  ___sys_sendmsg from sys_sendmsg+0x60/0xa8
[   32.803952]  sys_sendmsg from ret_fast_syscall+0x0/0x1c
[   32.809212] Exception stack(0xe0c3dfa8 to 0xe0c3dff0)
[   32.814295] dfa0:                   00000002 0054ecc8 00000003 bec65790 00000000 00000000
[   32.822514] dfc0: 00000002 0054ecc8 b6f54880 00000128 00000000 00000001 bec65f32 bec65f35
[   32.830731] dfe0: 00000128 bec65748 b6e4e52f b6dcce06
[   32.835809]  r6:b6f54880 r5:0054ecc8 r4:00000002
[   32.979240] SMSC LAN8710/LAN8720 4a101000.mdio:00: attached PHY driver (mii_bus:phy_addr=4a101000.mdio:00, irq=POLL)
[   32.994721] 8021q: adding VLAN 0 to HW filter on device eth0
[   33.020751] ocelot-ext-switch ocelot-ext-switch.5.auto swp1: configuring for phy/internal link mode
[   33.055444] ocelot-ext-switch ocelot-ext-switch.5.auto swp2: configuring for phy/internal link mode
[   33.089784] ocelot-ext-switch ocelot-ext-switch.5.auto swp3: configuring for phy/internal link mode
[   33.124241] ocelot-ext-switch ocelot-ext-switch.5.auto swp4: configuring for phy/qsgmii link mode
[   33.161283] ocelot-ext-switch ocelot-ext-switch.5.auto swp5: configuring for phy/qsgmii link mode
[   33.198704] ocelot-ext-switch ocelot-ext-switch.5.auto swp6: configuring for phy/qsgmii link mode
[   33.235518] ocelot-ext-switch ocelot-ext-switch.5.auto swp7: configuring for phy/qsgmii link mode


Colin Foster

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Beaglebone Ethernet Probe Failure In 6.8+
  2024-04-17 19:30 ` Andrew Lunn
  2024-04-17 23:52   ` Colin Foster
@ 2024-04-23  4:00   ` Colin Foster
  2024-04-23 13:52     ` Andrew Lunn
  1 sibling, 1 reply; 8+ messages in thread
From: Colin Foster @ 2024-04-23  4:00 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev, Andrew Halaney

Hi Andrew L,

(I CC'd Andrew Hanley, original author, for visibility)

On Wed, Apr 17, 2024 at 09:30:58PM +0200, Andrew Lunn wrote:
> On Wed, Apr 17, 2024 at 10:42:02AM -0500, Colin Foster wrote:
> > Hello,
> > 
> > I'm chasing down an issue in recent kernels. My setup is slightly
> > unconventional: a BBB with ETH0 as a CPU port to a DSA switch that is
> > controlled by SPI. I'll have hardware next week, but think it is worth
> > getting a discussion going.
> > 
> > The commit in question is commit df16c1c51d81 ("net: phy: mdio_device:
> > Reset device only when necessary"). This seems to cause a probe error of
> > the MDIO device. A dump_stack was added where the reset is skipped.
> > 
> > SMSC LAN8710/LAN8720: probe of 4a101000.mdio:00 failed with error -5
> 
> Can you confirm this EIO is this one:
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/ti/davinci_mdio.c#L440

Yes, I can confirm this is the EIO.

> 
> It would be good to check the value of USERACCESS_ACK, and what the
> datasheet says about it.

The register value is 0x0020ffff

The datasheet is https://www.ti.com/lit/ug/spruh73q/spruh73q.pdf and if
you search for 14-260 you'll find the bit definition table, but there
isn't much there for that bit...

The patch I threw in:

--- a/drivers/net/ethernet/ti/davinci_mdio.c
+++ b/drivers/net/ethernet/ti/davinci_mdio.c
@@ -437,7 +437,10 @@ static int davinci_mdio_read(struct mii_bus *bus, int phy_id, int phy_reg)
                        break;

                reg = readl(&data->regs->user[0].access);
+               printk("davinci mdio reg is 0x%08x\n", reg);
                ret = (reg & USERACCESS_ACK) ? (reg & USERACCESS_DATA) : -EIO;
+               if (ret == -EIO)
+                   printk("ret is this EIO\n");
                break;
        }


The print:

[    1.537767] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6, bus freq 1000000
[    1.538111] davinci mdio reg is 0x20400007
[    1.538372] davinci mdio reg is 0x2060c0f1
[    1.549523] davinci mdio reg is 0x03a0ffff
[    1.549551] ret is this EIO
[    1.549806] davinci mdio reg is 0x0020ffff
[    1.549821] ret is this EIO
[    1.550471] SMSC LAN8710/LAN8720: probe of 4a101000.mdio:00 failed with error -5
[    1.550592] davinci_mdio 4a101000.mdio: phy[0]: device 4a101000.mdio:00, driver SMSC LAN8710/LAN8720

Without the mdiodev->reset_state patch, I see the following:

[    1.537817] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6, bus freq 1000000
[    1.538165] davinci mdio reg is 0x20400007
[    1.538426] davinci mdio reg is 0x2060c0f1
[    1.558442] davinci mdio reg is 0x23a00090
[    1.558717] davinci mdio reg is 0x20207809
[    1.559681] davinci mdio reg is 0x21c0ffff
[    1.559996] davinci_mdio 4a101000.mdio: phy[0]: device 4a101000.mdio:00, driver SMSC LAN8710/LAN8720


For sanity, I went back and confirmed it still fails in 6.9-rc5.

Since I'm just using a Beaglebone, I was hoping to find an online CI
test report similar to this:
https://qa-reports.linaro.org/lkft/linux-next-master/
just to confirm that I'm not the only one seeing this. My search was
unsuccessful so far.

I think I'm getting a better understanding of the problem... maybe what
is going on is obvious to someone else's eyes at this point. I'll have
to call it a day for now.

Colin Foster

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Beaglebone Ethernet Probe Failure In 6.8+
  2024-04-23  4:00   ` Colin Foster
@ 2024-04-23 13:52     ` Andrew Lunn
  2024-04-23 20:07       ` Andrew Halaney
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Lunn @ 2024-04-23 13:52 UTC (permalink / raw)
  To: Colin Foster; +Cc: netdev, Andrew Halaney

On Mon, Apr 22, 2024 at 11:00:51PM -0500, Colin Foster wrote:
> Hi Andrew L,
> 
> (I CC'd Andrew Hanley, original author, for visibility)
> 
> On Wed, Apr 17, 2024 at 09:30:58PM +0200, Andrew Lunn wrote:
> > On Wed, Apr 17, 2024 at 10:42:02AM -0500, Colin Foster wrote:
> > > Hello,
> > > 
> > > I'm chasing down an issue in recent kernels. My setup is slightly
> > > unconventional: a BBB with ETH0 as a CPU port to a DSA switch that is
> > > controlled by SPI. I'll have hardware next week, but think it is worth
> > > getting a discussion going.
> > > 
> > > The commit in question is commit df16c1c51d81 ("net: phy: mdio_device:
> > > Reset device only when necessary"). This seems to cause a probe error of
> > > the MDIO device. A dump_stack was added where the reset is skipped.
> > > 
> > > SMSC LAN8710/LAN8720: probe of 4a101000.mdio:00 failed with error -5
> > 
> > Can you confirm this EIO is this one:
> > 
> > https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/ti/davinci_mdio.c#L440
> 
> Yes, I can confirm this is the EIO.
> 
> > 
> > It would be good to check the value of USERACCESS_ACK, and what the
> > datasheet says about it.
> 
> The register value is 0x0020ffff

The 0xffff is the value read from the bus. That probably means the PHY
did not answer, although it could legitimately return 0xffff to a
read. More important is bit 29: "Acknowledge. This bit is set if the
PHY acknowledged the read transaction." It is 0, so it thinks the PHY
did not respond.

> The patch I threw in:
> 
> --- a/drivers/net/ethernet/ti/davinci_mdio.c
> +++ b/drivers/net/ethernet/ti/davinci_mdio.c
> @@ -437,7 +437,10 @@ static int davinci_mdio_read(struct mii_bus *bus, int phy_id, int phy_reg)
>                         break;
> 
>                 reg = readl(&data->regs->user[0].access);
> +               printk("davinci mdio reg is 0x%08x\n", reg);
>                 ret = (reg & USERACCESS_ACK) ? (reg & USERACCESS_DATA) : -EIO;
> +               if (ret == -EIO)
> +                   printk("ret is this EIO\n");
>                 break;
>         }
> 
> 
> The print:
> 
> [    1.537767] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6, bus freq 1000000
> [    1.538111] davinci mdio reg is 0x20400007

This is a read of register 2, and the register has value 0x0007

> [    1.538372] davinci mdio reg is 0x2060c0f1

This is a read of register 3, and the register has value 0xc0f1.

These are the ID registers, and match SMSC LAN8710/LAN8720.

> [    1.549523] davinci mdio reg is 0x03a0ffff

Register 0x1d. Not one of the standard registers. I don't know what is
happening here.

> [    1.549551] ret is this EIO
> [    1.549806] davinci mdio reg is 0x0020ffff

Register 1, basic mode status register.

> [    1.549821] ret is this EIO

In these two last transactions, the ACK bit is not set.

> [    1.550471] SMSC LAN8710/LAN8720: probe of 4a101000.mdio:00 failed with error -5
> [    1.550592] davinci_mdio 4a101000.mdio: phy[0]: device 4a101000.mdio:00, driver SMSC LAN8710/LAN8720
> 
> Without the mdiodev->reset_state patch, I see the following:
> 
> [    1.537817] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6, bus freq 1000000
> [    1.538165] davinci mdio reg is 0x20400007
> [    1.538426] davinci mdio reg is 0x2060c0f1

Same as above.

> [    1.558442] davinci mdio reg is 0x23a00090
> [    1.558717] davinci mdio reg is 0x20207809
> [    1.559681] davinci mdio reg is 0x21c0ffff

In all these cases, we see the ACK bit set. 

So the PHY is responding to registers 2 and 3, the ID registers. But
it seems to be failing to respond to other registers. At a guess, i
would say it is still coming out of reset. Does the datasheet for the
LAN8710/LAN8720 say anything about how long a reset takes? Can you get
a logic analyser onto the reset line and MDIO bus and see how
different the timing is? It might be you need to add some delay values
to the reset in DT.

   Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Beaglebone Ethernet Probe Failure In 6.8+
  2024-04-23 13:52     ` Andrew Lunn
@ 2024-04-23 20:07       ` Andrew Halaney
  2024-04-29  1:58         ` Colin Foster
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Halaney @ 2024-04-23 20:07 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Colin Foster, netdev

On Tue, Apr 23, 2024 at 03:52:35PM +0200, Andrew Lunn wrote:
> On Mon, Apr 22, 2024 at 11:00:51PM -0500, Colin Foster wrote:
> > Hi Andrew L,
> > 
> > (I CC'd Andrew Hanley, original author, for visibility)
> > 
> > On Wed, Apr 17, 2024 at 09:30:58PM +0200, Andrew Lunn wrote:
> > > On Wed, Apr 17, 2024 at 10:42:02AM -0500, Colin Foster wrote:
> > > > Hello,
> > > > 
> > > > I'm chasing down an issue in recent kernels. My setup is slightly
> > > > unconventional: a BBB with ETH0 as a CPU port to a DSA switch that is
> > > > controlled by SPI. I'll have hardware next week, but think it is worth
> > > > getting a discussion going.
> > > > 
> > > > The commit in question is commit df16c1c51d81 ("net: phy: mdio_device:
> > > > Reset device only when necessary"). This seems to cause a probe error of
> > > > the MDIO device. A dump_stack was added where the reset is skipped.
> > > > 
> > > > SMSC LAN8710/LAN8720: probe of 4a101000.mdio:00 failed with error -5
> > > 
> > > Can you confirm this EIO is this one:
> > > 
> > > https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/ti/davinci_mdio.c#L440
> > 
> > Yes, I can confirm this is the EIO.
> > 
> > > 
> > > It would be good to check the value of USERACCESS_ACK, and what the
> > > datasheet says about it.
> > 
> > The register value is 0x0020ffff
> 
> The 0xffff is the value read from the bus. That probably means the PHY
> did not answer, although it could legitimately return 0xffff to a
> read. More important is bit 29: "Acknowledge. This bit is set if the
> PHY acknowledged the read transaction." It is 0, so it thinks the PHY
> did not respond.
> 
> > The patch I threw in:
> > 
> > --- a/drivers/net/ethernet/ti/davinci_mdio.c
> > +++ b/drivers/net/ethernet/ti/davinci_mdio.c
> > @@ -437,7 +437,10 @@ static int davinci_mdio_read(struct mii_bus *bus, int phy_id, int phy_reg)
> >                         break;
> > 
> >                 reg = readl(&data->regs->user[0].access);
> > +               printk("davinci mdio reg is 0x%08x\n", reg);
> >                 ret = (reg & USERACCESS_ACK) ? (reg & USERACCESS_DATA) : -EIO;
> > +               if (ret == -EIO)
> > +                   printk("ret is this EIO\n");
> >                 break;
> >         }
> > 
> > 
> > The print:
> > 
> > [    1.537767] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6, bus freq 1000000
> > [    1.538111] davinci mdio reg is 0x20400007
> 
> This is a read of register 2, and the register has value 0x0007
> 
> > [    1.538372] davinci mdio reg is 0x2060c0f1
> 
> This is a read of register 3, and the register has value 0xc0f1.
> 
> These are the ID registers, and match SMSC LAN8710/LAN8720.
> 
> > [    1.549523] davinci mdio reg is 0x03a0ffff
> 
> Register 0x1d. Not one of the standard registers. I don't know what is
> happening here.
> 
> > [    1.549551] ret is this EIO
> > [    1.549806] davinci mdio reg is 0x0020ffff
> 
> Register 1, basic mode status register.
> 
> > [    1.549821] ret is this EIO
> 
> In these two last transactions, the ACK bit is not set.
> 
> > [    1.550471] SMSC LAN8710/LAN8720: probe of 4a101000.mdio:00 failed with error -5
> > [    1.550592] davinci_mdio 4a101000.mdio: phy[0]: device 4a101000.mdio:00, driver SMSC LAN8710/LAN8720
> > 
> > Without the mdiodev->reset_state patch, I see the following:
> > 
> > [    1.537817] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6, bus freq 1000000
> > [    1.538165] davinci mdio reg is 0x20400007
> > [    1.538426] davinci mdio reg is 0x2060c0f1
> 
> Same as above.
> 
> > [    1.558442] davinci mdio reg is 0x23a00090
> > [    1.558717] davinci mdio reg is 0x20207809
> > [    1.559681] davinci mdio reg is 0x21c0ffff
> 
> In all these cases, we see the ACK bit set. 
> 
> So the PHY is responding to registers 2 and 3, the ID registers. But
> it seems to be failing to respond to other registers. At a guess, i
> would say it is still coming out of reset. Does the datasheet for the
> LAN8710/LAN8720 say anything about how long a reset takes? Can you get
> a logic analyser onto the reset line and MDIO bus and see how
> different the timing is? It might be you need to add some delay values
> to the reset in DT.

For what its worth, I think that this theory makes sense if reverting the patch
highlighted above makes this go away. Before that patch, you'd see a
flow like this:

    net: phy: mdio_device: Reset device only when necessary

    Currently the phy reset sequence is as shown below for a
    devicetree described mdio phy on boot:

    1. Assert the phy_device's reset as part of registering
    2. Deassert the phy_device's reset as part of registering
    3. Deassert the phy_device's reset as part of phy_probe
    4. Deassert the phy_device's reset as part of phy_hw_init

Which means whatever the deassert time was tripled in
practice before you got around to phy_hw_init() (which if I understand
is when things start reporting no ACK above).

I am not sure what devicetree upstream would be the one to look at for
your beaglebone, but microchip's datasheet for the LAN8720A has
"TABLE 5-8: POWER-ON NRST & ..." section detailing some reset requirements:

    https://ww1.microchip.com/downloads/en/devicedoc/00002165b.pdf

If I read it right, assert time needs to be >= 100 us, and
deassert... is not so clear to me unfortunately. Maybe for starters
triple your value and see if things work ok (just based on the 3
repeated deasserts going down to 1 with the patch applied)? Hopefully
longer term the actual deassert timing can be confirmed.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Beaglebone Ethernet Probe Failure In 6.8+
  2024-04-23 20:07       ` Andrew Halaney
@ 2024-04-29  1:58         ` Colin Foster
  2024-04-29 12:36           ` Andrew Lunn
  0 siblings, 1 reply; 8+ messages in thread
From: Colin Foster @ 2024-04-29  1:58 UTC (permalink / raw)
  To: Andrew Halaney; +Cc: Andrew Lunn, netdev, linux-omap

Hi Andrew L and Andrew H,

Sorry for the delayed response. I couldn't get to testing anything until
just now.

On Tue, Apr 23, 2024 at 03:07:15PM -0500, Andrew Halaney wrote:
> On Tue, Apr 23, 2024 at 03:52:35PM +0200, Andrew Lunn wrote:
> > On Mon, Apr 22, 2024 at 11:00:51PM -0500, Colin Foster wrote:
> > 
> > In these two last transactions, the ACK bit is not set.
> > 
> > > [    1.550471] SMSC LAN8710/LAN8720: probe of 4a101000.mdio:00 failed with error -5
> > > [    1.550592] davinci_mdio 4a101000.mdio: phy[0]: device 4a101000.mdio:00, driver SMSC LAN8710/LAN8720
> > > 
> > > Without the mdiodev->reset_state patch, I see the following:
> > > 
> > > [    1.537817] davinci_mdio 4a101000.mdio: davinci mdio revision 1.6, bus freq 1000000
> > > [    1.538165] davinci mdio reg is 0x20400007
> > > [    1.538426] davinci mdio reg is 0x2060c0f1
> > 
> > Same as above.
> > 
> > > [    1.558442] davinci mdio reg is 0x23a00090
> > > [    1.558717] davinci mdio reg is 0x20207809
> > > [    1.559681] davinci mdio reg is 0x21c0ffff
> > 
> > In all these cases, we see the ACK bit set. 
> > 
> > So the PHY is responding to registers 2 and 3, the ID registers. But
> > it seems to be failing to respond to other registers. At a guess, i
> > would say it is still coming out of reset. Does the datasheet for the
> > LAN8710/LAN8720 say anything about how long a reset takes? Can you get
> > a logic analyser onto the reset line and MDIO bus and see how
> > different the timing is? It might be you need to add some delay values
> > to the reset in DT.

I don't think I'll be able to get onto those lines. But I do think this
is the right tree to bark up. I also found some kernelci logs that
suggest I'm not the only one seeing this issue:

https://storage.kernelci.org/mainline/master/v6.9-rc5/arm/multi_v7_defconfig/gcc-10/lab-cip/baseline-beaglebone-black.html

There might be ways to navigate the kernelci database that I'm not aware
of, but I couldn't reasonably say "before 6.8 it didn't happen, and
after 6.8 it did." I'm not sure that matters at this point though.

> 
> For what its worth, I think that this theory makes sense if reverting the patch
> highlighted above makes this go away. Before that patch, you'd see a
> flow like this:
> 
>     net: phy: mdio_device: Reset device only when necessary
> 
>     Currently the phy reset sequence is as shown below for a
>     devicetree described mdio phy on boot:
> 
>     1. Assert the phy_device's reset as part of registering
>     2. Deassert the phy_device's reset as part of registering
>     3. Deassert the phy_device's reset as part of phy_probe
>     4. Deassert the phy_device's reset as part of phy_hw_init
> 
> Which means whatever the deassert time was tripled in
> practice before you got around to phy_hw_init() (which if I understand
> is when things start reporting no ACK above).
> 
> I am not sure what devicetree upstream would be the one to look at for
> your beaglebone, but microchip's datasheet for the LAN8720A has
> "TABLE 5-8: POWER-ON NRST & ..." section detailing some reset requirements:
> 
>     https://ww1.microchip.com/downloads/en/devicedoc/00002165b.pdf
> 
> If I read it right, assert time needs to be >= 100 us, and
> deassert... is not so clear to me unfortunately. Maybe for starters
> triple your value and see if things work ok (just based on the 3
> repeated deasserts going down to 1 with the patch applied)? Hopefully
> longer term the actual deassert timing can be confirmed.

I went all in and did a 100ms delay before returning from the resets of
3 and 4 you mention. Sure enough, everything worked! It certainly should
be understood and optimized. I added the linux-omap list to this thread
(please let me know if there were others I should've CC'd on any of
these emails).

Either way, thank you both for helping me understand this! I hope to be
able to fix the issue, but at the very least I hope it is considered
"reported".


Colin Foster

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Beaglebone Ethernet Probe Failure In 6.8+
  2024-04-29  1:58         ` Colin Foster
@ 2024-04-29 12:36           ` Andrew Lunn
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Lunn @ 2024-04-29 12:36 UTC (permalink / raw)
  To: Colin Foster; +Cc: Andrew Halaney, netdev, linux-omap

> I went all in and did a 100ms delay before returning from the resets of
> 3 and 4 you mention. Sure enough, everything worked! It certainly should
> be understood and optimized. I added the linux-omap list to this thread
> (please let me know if there were others I should've CC'd on any of
> these emails).

It probably has little to do with the OMAP, it is the Microchip PHY
you need more information about.

100ms is a very long time. The data sheet says:

  Note: For the first 16us after coming out of reset, the RMII
  interface will run at 2.5 MHz. After this time, it will switch to 25
  MHz if auto-negotiation is enabled.

I did not find anything else. I would try a deassert time the same as
the asset time as a starting point. It could also be that you ask
Linux for 100us sleep, and Linux actually gives your 1ms because of
the resolution of the timers. But i doubt many application will care
about a 1ms delay.

    Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-04-29 12:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-17 15:42 Beaglebone Ethernet Probe Failure In 6.8+ Colin Foster
2024-04-17 19:30 ` Andrew Lunn
2024-04-17 23:52   ` Colin Foster
2024-04-23  4:00   ` Colin Foster
2024-04-23 13:52     ` Andrew Lunn
2024-04-23 20:07       ` Andrew Halaney
2024-04-29  1:58         ` Colin Foster
2024-04-29 12:36           ` Andrew Lunn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.