netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* FEC MDIO timeout and polled IO
@ 2022-03-25 14:08 Francesco Dolcini
  2022-03-25 15:17 ` Andrew Lunn
  0 siblings, 1 reply; 3+ messages in thread
From: Francesco Dolcini @ 2022-03-25 14:08 UTC (permalink / raw)
  To: Andrew Lunn, Russell King; +Cc: netdev, fugang.duan, Chris Healy

Hello Andrew and all,
I was recently debugging an issue in the FEC driver, about 2% of the
time the driver is failing with "MDIO read timeout" at boot on a 5.4
kernel.

This issue is not new and from time to time appear again, it seems that
the previous interrupt based mechanism is somehow easy to break.

I backported your patch
f166f890c8f0 (net: ethernet: fec: Replace interrupt driven MDIO with polled IO, 2020-05-02)
to kernel 5.4 and it seems that it fixes the issue (I was able to do 470
power cycles, while before it was failing after a couple of hundreds
cycles best case).

Shouldn't this patch be backported to kernel 5.4? 

Francesco


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: FEC MDIO timeout and polled IO
  2022-03-25 14:08 FEC MDIO timeout and polled IO Francesco Dolcini
@ 2022-03-25 15:17 ` Andrew Lunn
  2022-03-25 15:33   ` Francesco Dolcini
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Lunn @ 2022-03-25 15:17 UTC (permalink / raw)
  To: Francesco Dolcini; +Cc: Russell King, netdev, fugang.duan, Chris Healy

On Fri, Mar 25, 2022 at 03:08:08PM +0100, Francesco Dolcini wrote:
> Hello Andrew and all,
> I was recently debugging an issue in the FEC driver, about 2% of the
> time the driver is failing with "MDIO read timeout" at boot on a 5.4
> kernel.
> 
> This issue is not new and from time to time appear again, it seems that
> the previous interrupt based mechanism is somehow easy to break.
> 
> I backported your patch
> f166f890c8f0 (net: ethernet: fec: Replace interrupt driven MDIO with polled IO, 2020-05-02)
> to kernel 5.4 and it seems that it fixes the issue (I was able to do 470
> power cycles, while before it was failing after a couple of hundreds
> cycles best case).
> 
> Shouldn't this patch be backported to kernel 5.4? 

Hi Francesco

This patch was purely a performance boost, it was not a bug fix in any
way. That change also caused a lot of pain. There are at least two
different implementations of the MDIO bus in the FEC, and they
behaviour slightly differently. So what worked for me with the Vybrid
broke some other platforms. It took an NXP software engineer talking
to there hardware guys to figure out how to do this correctly. Which
is why you will see a complicated patch history.

I personally would not recommend a back port, unless you can test the
back port on a wide range of SoC with the FEC.

If you are getting timeouts, i would suggest you look at whatever else
is happening in the system during boot. Are interrupts getting
disabled for too long? Is something blocking the running of the
completion?

Or just update to v5.15.

   Andrew

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: FEC MDIO timeout and polled IO
  2022-03-25 15:17 ` Andrew Lunn
@ 2022-03-25 15:33   ` Francesco Dolcini
  0 siblings, 0 replies; 3+ messages in thread
From: Francesco Dolcini @ 2022-03-25 15:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Francesco Dolcini, Russell King, netdev, fugang.duan, Chris Healy

Hello Andrew

On Fri, Mar 25, 2022 at 04:17:13PM +0100, Andrew Lunn wrote:
> On Fri, Mar 25, 2022 at 03:08:08PM +0100, Francesco Dolcini wrote:
> > Hello Andrew and all,
> > I was recently debugging an issue in the FEC driver, about 2% of the
> > time the driver is failing with "MDIO read timeout" at boot on a 5.4
> > kernel.
> > 
> > This issue is not new and from time to time appear again, it seems that
> > the previous interrupt based mechanism is somehow easy to break.
> > 
> > I backported your patch
> > f166f890c8f0 (net: ethernet: fec: Replace interrupt driven MDIO with polled IO, 2020-05-02)
> > to kernel 5.4 and it seems that it fixes the issue (I was able to do 470
> > power cycles, while before it was failing after a couple of hundreds
> > cycles best case).
> > 
> > Shouldn't this patch be backported to kernel 5.4? 
> 
> Hi Francesco
> 
> This patch was purely a performance boost, it was not a bug fix in any
> way. That change also caused a lot of pain. There are at least two
> different implementations of the MDIO bus in the FEC, and they
> behaviour slightly differently. So what worked for me with the Vybrid
> broke some other platforms. It took an NXP software engineer talking
> to there hardware guys to figure out how to do this correctly. Which
> is why you will see a complicated patch history.
> 
> I personally would not recommend a back port, unless you can test the
> back port on a wide range of SoC with the FEC.
I can test quite a few of i.MX SoC, but there is more than that using
this driver. I do not see a reason to push for such a change if you do
not feel like being a good idea.

> If you are getting timeouts, i would suggest you look at whatever else
> is happening in the system during boot. Are interrupts getting
> disabled for too long? Is something blocking the running of the
> completion?
I tried to do some debugging, but it was incredibly painful given that
the issue manifest itself only after a couple of hundreds boots. I also
tried the very simple workaround to double the timeout but it didn't
work out.

Bad enough the issue started to appear after updating to a more recent
5.4 kernel patch version.

> Or just update to v5.15.
I will probably just keep your patch in our tree till we are able to
migrate to a newer kernel, it seems to work pretty well (and yes, I took
also this [0]).

Thanks a lot,
Francesco!


[0] 0f0011824921 (net: fec: fix MDIO probing for some FEC hardware blocks, 2020-10-28)



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-03-25 15:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-25 14:08 FEC MDIO timeout and polled IO Francesco Dolcini
2022-03-25 15:17 ` Andrew Lunn
2022-03-25 15:33   ` Francesco Dolcini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).