All of lore.kernel.org
 help / color / mirror / Atom feed
From: Francesco Dolcini <francesco.dolcini@toradex.com>
To: Joakim Zhang <qiangqing.zhang@nxp.com>,
	Andrew Lunn <andrew@lunn.ch>,
	Heiner Kallweit <hkallweit1@gmail.com>,
	Russell King <linux@armlinux.org.uk>
Cc: netdev@vger.kernel.org, Jakub Kicinski <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>,
	"David S. Miller" <davem@davemloft.net>,
	Fabio Estevam <festevam@gmail.com>,
	Tim Harvey <tharvey@gateworks.com>
Subject: FEC MDIO read timeout on linkup
Date: Fri, 22 Apr 2022 17:26:12 +0200	[thread overview]
Message-ID: <20220422152612.GA510015@francesco-nb.int.toradex.com> (raw)

Hello all,
I have been recently trying to debug an issue with FEC driver erroring
a MDIO read timeout during linkup [0]. At the beginning I was working
with an old 5.4 kernel, but today I tried with the current master
(5.18.0-rc3-00080-gd569e86915b7) and the issue is just there.

I'm also aware of the old discussions on the topic and I tried to
increase the timeout without success (even if I'm not sure is relevant
with the newer polling solution).

The issue was reproduced on an apalis-imx6 that has a KSZ9131
ethernet connected to the FEC MAC.

No load on the machine, 4 cores just idling during my test.

What I can see from the code is that the timeout is coming from
net/phy/micrel.c:kszphy_handle_interrupt().

Could this be some sort of race condition? Any suggestion for debugging
this?

Here the stack trace:

[  146.195696] fec 2188000.ethernet eth0: MDIO read timeout
[  146.201779] ------------[ cut here ]------------
[  146.206671] WARNING: CPU: 0 PID: 571 at drivers/net/phy/phy.c:942 phy_error+0x24/0x6c
[  146.214744] Modules linked in: bnep imx_vdoa imx_sdma evbug
[  146.220640] CPU: 0 PID: 571 Comm: irq/128-2188000 Not tainted 5.18.0-rc3-00080-gd569e86915b7 #9
[  146.229563] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  146.236257]  unwind_backtrace from show_stack+0x10/0x14
[  146.241640]  show_stack from dump_stack_lvl+0x58/0x70
[  146.246841]  dump_stack_lvl from __warn+0xb4/0x24c
[  146.251772]  __warn from warn_slowpath_fmt+0x5c/0xd4
[  146.256873]  warn_slowpath_fmt from phy_error+0x24/0x6c
[  146.262249]  phy_error from kszphy_handle_interrupt+0x40/0x48
[  146.268159]  kszphy_handle_interrupt from irq_thread_fn+0x1c/0x78
[  146.274417]  irq_thread_fn from irq_thread+0xf0/0x1dc
[  146.279605]  irq_thread from kthread+0xe4/0x104
[  146.284267]  kthread from ret_from_fork+0x14/0x28
[  146.289164] Exception stack(0xe6fa1fb0 to 0xe6fa1ff8)
[  146.294448] 1fa0:                                     00000000 00000000 00000000 00000000
[  146.302842] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  146.311281] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  146.318262] irq event stamp: 12325
[  146.321780] hardirqs last  enabled at (12333): [<c01984c4>] __up_console_sem+0x50/0x60
[  146.330013] hardirqs last disabled at (12342): [<c01984b0>] __up_console_sem+0x3c/0x60
[  146.338259] softirqs last  enabled at (12324): [<c01017f0>] __do_softirq+0x2c0/0x624
[  146.346311] softirqs last disabled at (12319): [<c01300ac>] __irq_exit_rcu+0x138/0x178
[  146.354447] ---[ end trace 0000000000000000 ]---


The issue is not systematic, however using the following script is
pretty easy (minutes) to trigger:

```
#!/bin/bash

count=0

wait_link_or_exit()
{
	tmo=600
	while ! ethtool eth0 |grep -qF 'Link detected: yes'
	do
		sleep 0.1
		tmo=$((tmo - 1))
		[ $tmo -gt 0 ] || exit 1
	done
}

while true
do
	count=$((count + 1))
	echo "run $count"

	ethtool -s eth0 speed 10 duplex half autoneg on
	wait_link_or_exit

	ethtool -s eth0 speed 10 duplex full autoneg on
	wait_link_or_exit

	ethtool -s eth0 speed 100 duplex half autoneg on
	wait_link_or_exit

	ethtool -s eth0 speed 100 duplex full autoneg on
	wait_link_or_exit
done

```

Francesco

[0] https://lore.kernel.org/all/20220325140808.GA1047855@francesco-nb.int.toradex.com/



             reply	other threads:[~2022-04-22 15:32 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-22 15:26 Francesco Dolcini [this message]
2022-04-22 15:55 ` FEC MDIO read timeout on linkup Fabio Estevam
2022-04-22 16:04   ` Francesco Dolcini
2022-04-29 15:15     ` Francesco Dolcini
2022-05-02 17:05 ` Francesco Dolcini
2022-05-02 18:21   ` Andrew Lunn
2022-05-02 18:25     ` Francesco Dolcini
2022-05-02 18:24   ` Andrew Lunn
2022-05-02 18:34     ` Francesco Dolcini
2022-05-03 16:13       ` Francesco Dolcini
2022-05-03 22:17         ` Andrew Lunn
2022-05-05  8:29           ` Francesco Dolcini
2022-05-05 17:41             ` Andrew Lunn
2022-05-05 17:54               ` Francesco Dolcini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220422152612.GA510015@francesco-nb.int.toradex.com \
    --to=francesco.dolcini@toradex.com \
    --cc=andrew@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=festevam@gmail.com \
    --cc=hkallweit1@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=qiangqing.zhang@nxp.com \
    --cc=tharvey@gateworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.