linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* i.MX28 based system losing eth0 on boot
@ 2014-05-06 16:44 Brian Lilly
  2014-05-06 18:11 ` Uwe Kleine-König
  2014-05-07  3:17 ` Fabio Estevam
  0 siblings, 2 replies; 15+ messages in thread
From: Brian Lilly @ 2014-05-06 16:44 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: David S. Miller, Fabio Estevam, Jim Baxter, Frank Li,
	Fugang Duan, netdev, linux-kernel

Uwe:

With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0
come up, then brought right back down with an MDIO rx timeout moments
after.  Adding back in the removed code keeps the interface alive and
it's working afterward without trouble.  I've tested the re-inserted
code in 3.12, 3.14 without issue on our boards.

Is there something else that can be done to prevent the MDIO timeouts?
We are using basically the same schematic for networking as the
imx28evk.

Any thoughts on how to resolve this?

Thanks,
Brian Lilly

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i.MX28 based system losing eth0 on boot
  2014-05-06 16:44 i.MX28 based system losing eth0 on boot Brian Lilly
@ 2014-05-06 18:11 ` Uwe Kleine-König
  2014-05-06 18:39   ` Florian Fainelli
  2014-05-06 19:12   ` Brian Lilly
  2014-05-07  3:17 ` Fabio Estevam
  1 sibling, 2 replies; 15+ messages in thread
From: Uwe Kleine-König @ 2014-05-06 18:11 UTC (permalink / raw)
  To: Brian Lilly
  Cc: David S. Miller, Fabio Estevam, Jim Baxter, Frank Li,
	Fugang Duan, netdev, linux-kernel, kernel

Hello Brian,

On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote:
> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0
> come up, then brought right back down with an MDIO rx timeout moments
> after.  Adding back in the removed code keeps the interface alive and
> it's working afterward without trouble.  I've tested the re-inserted
> code in 3.12, 3.14 without issue on our boards.
So you can reliably trigger that problem? You're just doing

	ifconfig eth0 1.2.3.4 up

(or equivalent) and the interface goes down without further
interference with the above mentioned commit? The exact error you're
seeing is

	MDIO read timeout

(with some prefix saying something about fec and eth0 I think)?

This error is also present with a264b981f2 reverted, just doesn't affect
eth0 being functional? Does the timeout always happen, or only on
specific addresses?

This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT?
 
> Is there something else that can be done to prevent the MDIO timeouts?
> We are using basically the same schematic for networking as the
> imx28evk.
Hard to say, but assuming it works just fine on the imx28evk for you,
too, there seems to be some hardware difference that makes your machine
fail. (That doesn't mean it's not fixable in software.)

I don't know if a mdio read error is intended to make the device go
down, maybe one the the netdev guys can answer that.
Assuming that it's not intended, instrument the code, find out how that
timeout makes your device go down and find the wrong branch. I'd start
with adding stackdumps when the mdio timeout happens and when
fec_enet_start_xmit is called with fep->link == 0.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i.MX28 based system losing eth0 on boot
  2014-05-06 18:11 ` Uwe Kleine-König
@ 2014-05-06 18:39   ` Florian Fainelli
  2014-05-06 19:12   ` Brian Lilly
  1 sibling, 0 replies; 15+ messages in thread
From: Florian Fainelli @ 2014-05-06 18:39 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Brian Lilly, David S. Miller, Fabio Estevam, Jim Baxter,
	Frank Li, Fugang Duan, netdev, linux-kernel, kernel

2014-05-06 11:11 GMT-07:00 Uwe Kleine-König <u.kleine-koenig@pengutronix.de>:
> Hello Brian,
>
> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote:
>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0
>> come up, then brought right back down with an MDIO rx timeout moments
>> after.  Adding back in the removed code keeps the interface alive and
>> it's working afterward without trouble.  I've tested the re-inserted
>> code in 3.12, 3.14 without issue on our boards.
> So you can reliably trigger that problem? You're just doing
>
>         ifconfig eth0 1.2.3.4 up
>
> (or equivalent) and the interface goes down without further
> interference with the above mentioned commit? The exact error you're
> seeing is
>
>         MDIO read timeout
>
> (with some prefix saying something about fec and eth0 I think)?
>
> This error is also present with a264b981f2 reverted, just doesn't affect
> eth0 being functional? Does the timeout always happen, or only on
> specific addresses?
>
> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT?
>
>> Is there something else that can be done to prevent the MDIO timeouts?
>> We are using basically the same schematic for networking as the
>> imx28evk.
> Hard to say, but assuming it works just fine on the imx28evk for you,
> too, there seems to be some hardware difference that makes your machine
> fail. (That doesn't mean it's not fixable in software.)
>
> I don't know if a mdio read error is intended to make the device go
> down, maybe one the the netdev guys can answer that.

What is likely happening is that you are failing auto-negotiation
(phy_read_status return < 0) because of the MDIO timeout, so we never
call netif_carrier_on(), and so the link is not UP. The reason for
that could be a genuine MDIO read timeout from the bus, or your PHY
might be slightly bogus and need more time to complete
auto-negotiation, or anything that ressembles that. There is some
special MDIO timeout logic in the FEC driver that I would seriously
audit as it seems to be bogus, or it seems at the very least that the
MDIO timeouts are known and need to be worked around.

> Assuming that it's not intended, instrument the code, find out how that
> timeout makes your device go down and find the wrong branch. I'd start
> with adding stackdumps when the mdio timeout happens and when
> fec_enet_start_xmit is called with fep->link == 0.

I would also double check fec_enet_adjust_link() which seems to handle
a case where we have a MDIO bus timeout, and tries to do something
that looks incorrect to me. PHY_HALTED basically corresponds to
phy_stop() being called, which means that you won't be running the
adjust_link callback, so I wonder how this situation is actually
happening.

>
> Best regards
> Uwe
>
> --
> Pengutronix e.K.                           | Uwe Kleine-König            |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Florian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i.MX28 based system losing eth0 on boot
  2014-05-06 18:11 ` Uwe Kleine-König
  2014-05-06 18:39   ` Florian Fainelli
@ 2014-05-06 19:12   ` Brian Lilly
  2014-05-06 19:24     ` Florian Fainelli
  1 sibling, 1 reply; 15+ messages in thread
From: Brian Lilly @ 2014-05-06 19:12 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: David S. Miller, Fabio Estevam, Jim Baxter, Frank Li,
	Fugang Duan, netdev, linux-kernel, kernel

It is happening during boot up:

<snip, kernel 3.12 >

Configuring network interfaces... [   35.117114] fec 800f0000.ethernet
eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720]
(mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
[   35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
udhcpc (v1.21.1) started

Sending discover...

[   37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
[   37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Sending discover...

Sending select for 10.10.10.217...
Lease of 10.10.10.217 obtained, lease time 86400
/etc/udhcpc.d/50default: Adding DNS 10.10.10.13
[   39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
done.
Starting rpcbind daemon...done.
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
Mon Apr 14 22:40:00 UTC 2014
INIT: Entering runlevel: 5
Starting Xserver
Starting system message bus: dbus.
Starting Connection Manager
Starting wpa_supplicant
Successfully initialized wpa_supplicant
Starting Dropbear SSH server
[   44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
[SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
[   45.781364] fec 800f0000.ethernet eth0: MDIO read timeout
[   46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   47.811385] fec 800f0000.ethernet eth0: MDIO read timeout

With a different kernel (3.14):

[   28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
[Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
[   30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
[   37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
[Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
[   38.398346] fec 800f0000.ethernet eth0: MDIO read timeout
[   39.438412] fec 800f0000.ethernet eth0: MDIO read timeout
[   39.468419] fec 800f0000.ethernet eth0: MDIO write timeout
[   40.498848] fec 800f0000.ethernet eth0: MDIO read timeout

Afterward I have to ifdown eth0, ifup eth0 and then it functions
normally, without reverting the commit.

root@cfa100xx:~# ifdown eth0
[ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
[Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
root@cfa100xx:~# ifup eth0
udhcpc (v1.21.1) started
Sending discover...
[ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
Sending discover...
Sending select for 10.10.10.217...
Lease of 10.10.10.217 obtained, lease time 86400
ip: RTNETLINK answers: File exists

--
Brian


On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König
<u.kleine-koenig@pengutronix.de> wrote:
> Hello Brian,
>
> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote:
>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0
>> come up, then brought right back down with an MDIO rx timeout moments
>> after.  Adding back in the removed code keeps the interface alive and
>> it's working afterward without trouble.  I've tested the re-inserted
>> code in 3.12, 3.14 without issue on our boards.
> So you can reliably trigger that problem? You're just doing
>
>         ifconfig eth0 1.2.3.4 up
>
> (or equivalent) and the interface goes down without further
> interference with the above mentioned commit? The exact error you're
> seeing is
>
>         MDIO read timeout
>
> (with some prefix saying something about fec and eth0 I think)?
>
> This error is also present with a264b981f2 reverted, just doesn't affect
> eth0 being functional? Does the timeout always happen, or only on
> specific addresses?
>
> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT?
>
>> Is there something else that can be done to prevent the MDIO timeouts?
>> We are using basically the same schematic for networking as the
>> imx28evk.
> Hard to say, but assuming it works just fine on the imx28evk for you,
> too, there seems to be some hardware difference that makes your machine
> fail. (That doesn't mean it's not fixable in software.)
>
> I don't know if a mdio read error is intended to make the device go
> down, maybe one the the netdev guys can answer that.
> Assuming that it's not intended, instrument the code, find out how that
> timeout makes your device go down and find the wrong branch. I'd start
> with adding stackdumps when the mdio timeout happens and when
> fec_enet_start_xmit is called with fep->link == 0.
>
> Best regards
> Uwe
>
> --
> Pengutronix e.K.                           | Uwe Kleine-König            |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i.MX28 based system losing eth0 on boot
  2014-05-06 19:12   ` Brian Lilly
@ 2014-05-06 19:24     ` Florian Fainelli
  2014-05-06 21:40       ` Brian Lilly
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Fainelli @ 2014-05-06 19:24 UTC (permalink / raw)
  To: Brian Lilly
  Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam,
	Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel, kernel

2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
> It is happening during boot up:
>
> <snip, kernel 3.12 >
>
> Configuring network interfaces... [   35.117114] fec 800f0000.ethernet
> eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720]

Note that the SMSC PHY driver is picked up here, and that specific
driver implements a different phy_read_status() callback due to how
the PHY operates. The PHY driver also overrides the config_init()
callback to perform some PHY-specific initialization. See below for
more.

> (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
> [   35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> udhcpc (v1.21.1) started
>
> Sending discover...
>
> [   37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
> [   37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> Sending discover...
>
> Sending select for 10.10.10.217...
> Lease of 10.10.10.217 obtained, lease time 86400
> /etc/udhcpc.d/50default: Adding DNS 10.10.10.13
> [   39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
> done.
> Starting rpcbind daemon...done.
> net.ipv4.conf.default.rp_filter = 1
> net.ipv4.conf.all.rp_filter = 1
> Mon Apr 14 22:40:00 UTC 2014
> INIT: Entering runlevel: 5
> Starting Xserver
> Starting system message bus: dbus.
> Starting Connection Manager
> Starting wpa_supplicant
> Successfully initialized wpa_supplicant
> Starting Dropbear SSH server
> [   44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)

The correct PHY driver is selected here...

> [   45.781364] fec 800f0000.ethernet eth0: MDIO read timeout
> [   46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [   47.811385] fec 800f0000.ethernet eth0: MDIO read timeout

But we are still seeing MDIO read timeouts, which is not great.

>
> With a different kernel (3.14):
>
> [   28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
> [   30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
> [   37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)

Here, the Generic PHY driver has been selected, which will use the
MII_BMSR register contents to determine the Link status and
parameters. You might want to make sure that your board selects the
appropriate PHY driver, such that we are not chasing two issues here.

> [   38.398346] fec 800f0000.ethernet eth0: MDIO read timeout
> [   39.438412] fec 800f0000.ethernet eth0: MDIO read timeout
> [   39.468419] fec 800f0000.ethernet eth0: MDIO write timeout
> [   40.498848] fec 800f0000.ethernet eth0: MDIO read timeout

It would also be helpful to print the register that were accessed,
such that you could correlate this with the exact steps in the PHY
library state machine. Please also retry the experiment with the SMSC
PHY driver enabled, as it does some PHY specific initialization that
seems to be relevant. Then we are hopefully left with only the MDIO
timeout issue and not the PHY mis-configuration + MDIO timeout.

>
> Afterward I have to ifdown eth0, ifup eth0 and then it functions
> normally, without reverting the commit.
>
> root@cfa100xx:~# ifdown eth0
> [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
> root@cfa100xx:~# ifup eth0
> udhcpc (v1.21.1) started
> Sending discover...
> [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
> Sending discover...
> Sending select for 10.10.10.217...
> Lease of 10.10.10.217 obtained, lease time 86400
> ip: RTNETLINK answers: File exists
>
> --
> Brian
>
>
> On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König
> <u.kleine-koenig@pengutronix.de> wrote:
>> Hello Brian,
>>
>> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote:
>>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0
>>> come up, then brought right back down with an MDIO rx timeout moments
>>> after.  Adding back in the removed code keeps the interface alive and
>>> it's working afterward without trouble.  I've tested the re-inserted
>>> code in 3.12, 3.14 without issue on our boards.
>> So you can reliably trigger that problem? You're just doing
>>
>>         ifconfig eth0 1.2.3.4 up
>>
>> (or equivalent) and the interface goes down without further
>> interference with the above mentioned commit? The exact error you're
>> seeing is
>>
>>         MDIO read timeout
>>
>> (with some prefix saying something about fec and eth0 I think)?
>>
>> This error is also present with a264b981f2 reverted, just doesn't affect
>> eth0 being functional? Does the timeout always happen, or only on
>> specific addresses?
>>
>> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT?
>>
>>> Is there something else that can be done to prevent the MDIO timeouts?
>>> We are using basically the same schematic for networking as the
>>> imx28evk.
>> Hard to say, but assuming it works just fine on the imx28evk for you,
>> too, there seems to be some hardware difference that makes your machine
>> fail. (That doesn't mean it's not fixable in software.)
>>
>> I don't know if a mdio read error is intended to make the device go
>> down, maybe one the the netdev guys can answer that.
>> Assuming that it's not intended, instrument the code, find out how that
>> timeout makes your device go down and find the wrong branch. I'd start
>> with adding stackdumps when the mdio timeout happens and when
>> fec_enet_start_xmit is called with fep->link == 0.
>>
>> Best regards
>> Uwe
>>
>> --
>> Pengutronix e.K.                           | Uwe Kleine-König            |
>> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Florian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i.MX28 based system losing eth0 on boot
  2014-05-06 19:24     ` Florian Fainelli
@ 2014-05-06 21:40       ` Brian Lilly
  2014-05-06 22:06         ` Florian Fainelli
  0 siblings, 1 reply; 15+ messages in thread
From: Brian Lilly @ 2014-05-06 21:40 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam,
	Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel, kernel

The PHY on board is the SMSC LAN8720

With the generic PHY driver selected:  http://pastebin.com/A4MH4Ptw

[   28.828761] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
[Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
[   28.840626] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   30.827536] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
[   30.833739] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   32.986999] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
[   37.316421] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
[Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
[   38.345047] fec 800f0000.ethernet eth0: MDIO read timeout
[   39.506210] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   40.374961] fec 800f0000.ethernet eth0: MDIO read timeout

With the SMSC PHY driver selected:  http://pastebin.com/DhdDyrMv

[   28.778974] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
[SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
[   28.791742] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   30.773078] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
[   30.779286] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   32.934692] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
[   37.242162] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
[SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
[   38.270611] fec 800f0000.ethernet eth0: MDIO read timeout
[   39.415256] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   40.300454] fec 800f0000.ethernet eth0: MDIO read timeout

On Tue, May 6, 2014 at 12:24 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> 2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>> It is happening during boot up:
>>
>> <snip, kernel 3.12 >
>>
>> Configuring network interfaces... [   35.117114] fec 800f0000.ethernet
>> eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720]
>
> Note that the SMSC PHY driver is picked up here, and that specific
> driver implements a different phy_read_status() callback due to how
> the PHY operates. The PHY driver also overrides the config_init()
> callback to perform some PHY-specific initialization. See below for
> more.
>
>> (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>> [   35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>> udhcpc (v1.21.1) started
>>
>> Sending discover...
>>
>> [   37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>> [   37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>> Sending discover...
>>
>> Sending select for 10.10.10.217...
>> Lease of 10.10.10.217 obtained, lease time 86400
>> /etc/udhcpc.d/50default: Adding DNS 10.10.10.13
>> [   39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>> done.
>> Starting rpcbind daemon...done.
>> net.ipv4.conf.default.rp_filter = 1
>> net.ipv4.conf.all.rp_filter = 1
>> Mon Apr 14 22:40:00 UTC 2014
>> INIT: Entering runlevel: 5
>> Starting Xserver
>> Starting system message bus: dbus.
>> Starting Connection Manager
>> Starting wpa_supplicant
>> Successfully initialized wpa_supplicant
>> Starting Dropbear SSH server
>> [   44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>
> The correct PHY driver is selected here...
>
>> [   45.781364] fec 800f0000.ethernet eth0: MDIO read timeout
>> [   46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>> [   47.811385] fec 800f0000.ethernet eth0: MDIO read timeout
>
> But we are still seeing MDIO read timeouts, which is not great.
>
>>
>> With a different kernel (3.14):
>>
>> [   28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>> [   30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>> [   37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>
> Here, the Generic PHY driver has been selected, which will use the
> MII_BMSR register contents to determine the Link status and
> parameters. You might want to make sure that your board selects the
> appropriate PHY driver, such that we are not chasing two issues here.
>
>> [   38.398346] fec 800f0000.ethernet eth0: MDIO read timeout
>> [   39.438412] fec 800f0000.ethernet eth0: MDIO read timeout
>> [   39.468419] fec 800f0000.ethernet eth0: MDIO write timeout
>> [   40.498848] fec 800f0000.ethernet eth0: MDIO read timeout
>
> It would also be helpful to print the register that were accessed,
> such that you could correlate this with the exact steps in the PHY
> library state machine. Please also retry the experiment with the SMSC
> PHY driver enabled, as it does some PHY specific initialization that
> seems to be relevant. Then we are hopefully left with only the MDIO
> timeout issue and not the PHY mis-configuration + MDIO timeout.
>
>>
>> Afterward I have to ifdown eth0, ifup eth0 and then it functions
>> normally, without reverting the commit.
>>
>> root@cfa100xx:~# ifdown eth0
>> [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>> root@cfa100xx:~# ifup eth0
>> udhcpc (v1.21.1) started
>> Sending discover...
>> [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>> Sending discover...
>> Sending select for 10.10.10.217...
>> Lease of 10.10.10.217 obtained, lease time 86400
>> ip: RTNETLINK answers: File exists
>>
>> --
>> Brian
>>
>>
>> On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König
>> <u.kleine-koenig@pengutronix.de> wrote:
>>> Hello Brian,
>>>
>>> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote:
>>>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0
>>>> come up, then brought right back down with an MDIO rx timeout moments
>>>> after.  Adding back in the removed code keeps the interface alive and
>>>> it's working afterward without trouble.  I've tested the re-inserted
>>>> code in 3.12, 3.14 without issue on our boards.
>>> So you can reliably trigger that problem? You're just doing
>>>
>>>         ifconfig eth0 1.2.3.4 up
>>>
>>> (or equivalent) and the interface goes down without further
>>> interference with the above mentioned commit? The exact error you're
>>> seeing is
>>>
>>>         MDIO read timeout
>>>
>>> (with some prefix saying something about fec and eth0 I think)?
>>>
>>> This error is also present with a264b981f2 reverted, just doesn't affect
>>> eth0 being functional? Does the timeout always happen, or only on
>>> specific addresses?
>>>
>>> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT?
>>>
>>>> Is there something else that can be done to prevent the MDIO timeouts?
>>>> We are using basically the same schematic for networking as the
>>>> imx28evk.
>>> Hard to say, but assuming it works just fine on the imx28evk for you,
>>> too, there seems to be some hardware difference that makes your machine
>>> fail. (That doesn't mean it's not fixable in software.)
>>>
>>> I don't know if a mdio read error is intended to make the device go
>>> down, maybe one the the netdev guys can answer that.
>>> Assuming that it's not intended, instrument the code, find out how that
>>> timeout makes your device go down and find the wrong branch. I'd start
>>> with adding stackdumps when the mdio timeout happens and when
>>> fec_enet_start_xmit is called with fep->link == 0.
>>>
>>> Best regards
>>> Uwe
>>>
>>> --
>>> Pengutronix e.K.                           | Uwe Kleine-König            |
>>> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Florian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i.MX28 based system losing eth0 on boot
  2014-05-06 21:40       ` Brian Lilly
@ 2014-05-06 22:06         ` Florian Fainelli
  2014-05-06 22:27           ` Brian Lilly
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Fainelli @ 2014-05-06 22:06 UTC (permalink / raw)
  To: Brian Lilly
  Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam,
	Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel, kernel

2014-05-06 14:40 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
> The PHY on board is the SMSC LAN8720
>
> With the generic PHY driver selected:  http://pastebin.com/A4MH4Ptw
>
> [   28.828761] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
> [   28.840626] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [   30.827536] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
> [   30.833739] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [   32.986999] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
> [   37.316421] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
> [   38.345047] fec 800f0000.ethernet eth0: MDIO read timeout
> [   39.506210] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [   40.374961] fec 800f0000.ethernet eth0: MDIO read timeout
>
> With the SMSC PHY driver selected:  http://pastebin.com/DhdDyrMv
>
> [   28.778974] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
> [   28.791742] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [   30.773078] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
> [   30.779286] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> [   32.934692] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
> [   37.242162] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
> [   38.270611] fec 800f0000.ethernet eth0: MDIO read timeout
> [   39.415256] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> [   40.300454] fec 800f0000.ethernet eth0: MDIO read timeout

Thanks for trying this, at least this is consistent no matter which
PHY driver we are using. Just to rule out a potential PHY power-down
issue, could you try to revert the following commit
be9dad1f9f26604fb71c0d53ccb39a8f1d425807 ("net: phy: suspend phydev
when going to HALTED") and see if that works better for you?

Thanks!

>
> On Tue, May 6, 2014 at 12:24 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>> 2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>>> It is happening during boot up:
>>>
>>> <snip, kernel 3.12 >
>>>
>>> Configuring network interfaces... [   35.117114] fec 800f0000.ethernet
>>> eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720]
>>
>> Note that the SMSC PHY driver is picked up here, and that specific
>> driver implements a different phy_read_status() callback due to how
>> the PHY operates. The PHY driver also overrides the config_init()
>> callback to perform some PHY-specific initialization. See below for
>> more.
>>
>>> (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>> [   35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>> udhcpc (v1.21.1) started
>>>
>>> Sending discover...
>>>
>>> [   37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>> [   37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>> Sending discover...
>>>
>>> Sending select for 10.10.10.217...
>>> Lease of 10.10.10.217 obtained, lease time 86400
>>> /etc/udhcpc.d/50default: Adding DNS 10.10.10.13
>>> [   39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>>> done.
>>> Starting rpcbind daemon...done.
>>> net.ipv4.conf.default.rp_filter = 1
>>> net.ipv4.conf.all.rp_filter = 1
>>> Mon Apr 14 22:40:00 UTC 2014
>>> INIT: Entering runlevel: 5
>>> Starting Xserver
>>> Starting system message bus: dbus.
>>> Starting Connection Manager
>>> Starting wpa_supplicant
>>> Successfully initialized wpa_supplicant
>>> Starting Dropbear SSH server
>>> [   44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>
>> The correct PHY driver is selected here...
>>
>>> [   45.781364] fec 800f0000.ethernet eth0: MDIO read timeout
>>> [   46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>> [   47.811385] fec 800f0000.ethernet eth0: MDIO read timeout
>>
>> But we are still seeing MDIO read timeouts, which is not great.
>>
>>>
>>> With a different kernel (3.14):
>>>
>>> [   28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>> [   30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>> [   37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>
>> Here, the Generic PHY driver has been selected, which will use the
>> MII_BMSR register contents to determine the Link status and
>> parameters. You might want to make sure that your board selects the
>> appropriate PHY driver, such that we are not chasing two issues here.
>>
>>> [   38.398346] fec 800f0000.ethernet eth0: MDIO read timeout
>>> [   39.438412] fec 800f0000.ethernet eth0: MDIO read timeout
>>> [   39.468419] fec 800f0000.ethernet eth0: MDIO write timeout
>>> [   40.498848] fec 800f0000.ethernet eth0: MDIO read timeout
>>
>> It would also be helpful to print the register that were accessed,
>> such that you could correlate this with the exact steps in the PHY
>> library state machine. Please also retry the experiment with the SMSC
>> PHY driver enabled, as it does some PHY specific initialization that
>> seems to be relevant. Then we are hopefully left with only the MDIO
>> timeout issue and not the PHY mis-configuration + MDIO timeout.
>>
>>>
>>> Afterward I have to ifdown eth0, ifup eth0 and then it functions
>>> normally, without reverting the commit.
>>>
>>> root@cfa100xx:~# ifdown eth0
>>> [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>> root@cfa100xx:~# ifup eth0
>>> udhcpc (v1.21.1) started
>>> Sending discover...
>>> [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>> Sending discover...
>>> Sending select for 10.10.10.217...
>>> Lease of 10.10.10.217 obtained, lease time 86400
>>> ip: RTNETLINK answers: File exists
>>>
>>> --
>>> Brian
>>>
>>>
>>> On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König
>>> <u.kleine-koenig@pengutronix.de> wrote:
>>>> Hello Brian,
>>>>
>>>> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote:
>>>>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0
>>>>> come up, then brought right back down with an MDIO rx timeout moments
>>>>> after.  Adding back in the removed code keeps the interface alive and
>>>>> it's working afterward without trouble.  I've tested the re-inserted
>>>>> code in 3.12, 3.14 without issue on our boards.
>>>> So you can reliably trigger that problem? You're just doing
>>>>
>>>>         ifconfig eth0 1.2.3.4 up
>>>>
>>>> (or equivalent) and the interface goes down without further
>>>> interference with the above mentioned commit? The exact error you're
>>>> seeing is
>>>>
>>>>         MDIO read timeout
>>>>
>>>> (with some prefix saying something about fec and eth0 I think)?
>>>>
>>>> This error is also present with a264b981f2 reverted, just doesn't affect
>>>> eth0 being functional? Does the timeout always happen, or only on
>>>> specific addresses?
>>>>
>>>> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT?
>>>>
>>>>> Is there something else that can be done to prevent the MDIO timeouts?
>>>>> We are using basically the same schematic for networking as the
>>>>> imx28evk.
>>>> Hard to say, but assuming it works just fine on the imx28evk for you,
>>>> too, there seems to be some hardware difference that makes your machine
>>>> fail. (That doesn't mean it's not fixable in software.)
>>>>
>>>> I don't know if a mdio read error is intended to make the device go
>>>> down, maybe one the the netdev guys can answer that.
>>>> Assuming that it's not intended, instrument the code, find out how that
>>>> timeout makes your device go down and find the wrong branch. I'd start
>>>> with adding stackdumps when the mdio timeout happens and when
>>>> fec_enet_start_xmit is called with fep->link == 0.
>>>>
>>>> Best regards
>>>> Uwe
>>>>
>>>> --
>>>> Pengutronix e.K.                           | Uwe Kleine-König            |
>>>> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Florian



-- 
Florian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i.MX28 based system losing eth0 on boot
  2014-05-06 22:06         ` Florian Fainelli
@ 2014-05-06 22:27           ` Brian Lilly
  2014-05-07  3:07             ` Florian Fainelli
  0 siblings, 1 reply; 15+ messages in thread
From: Brian Lilly @ 2014-05-06 22:27 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam,
	Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel, kernel

It would appear that I don't have that commit.  I could move to 3.14
to see if it makes a difference, but the last couple of responses have
been on 3.12.18 -- or perhaps I'm missing something else.
Please let me know if you have any questions.

Thank you.

Brian Lilly
Crystalfontz America, Incorporated
12412 East Saltese Road
Spokane Valley, WA 99216
brian@crystalfontz.com http://www.crystalfontz.com
Twitter: @Crystalfontz
US toll-free (888) 206-9720 voice (509) 892-1200


On Tue, May 6, 2014 at 3:06 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> 2014-05-06 14:40 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>> The PHY on board is the SMSC LAN8720
>>
>> With the generic PHY driver selected:  http://pastebin.com/A4MH4Ptw
>>
>> [   28.828761] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>> [   28.840626] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>> [   30.827536] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>> [   30.833739] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>> [   32.986999] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>> [   37.316421] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>> [   38.345047] fec 800f0000.ethernet eth0: MDIO read timeout
>> [   39.506210] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>> [   40.374961] fec 800f0000.ethernet eth0: MDIO read timeout
>>
>> With the SMSC PHY driver selected:  http://pastebin.com/DhdDyrMv
>>
>> [   28.778974] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>> [   28.791742] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>> [   30.773078] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>> [   30.779286] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>> [   32.934692] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>> [   37.242162] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>> [   38.270611] fec 800f0000.ethernet eth0: MDIO read timeout
>> [   39.415256] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>> [   40.300454] fec 800f0000.ethernet eth0: MDIO read timeout
>
> Thanks for trying this, at least this is consistent no matter which
> PHY driver we are using. Just to rule out a potential PHY power-down
> issue, could you try to revert the following commit
> be9dad1f9f26604fb71c0d53ccb39a8f1d425807 ("net: phy: suspend phydev
> when going to HALTED") and see if that works better for you?
>
> Thanks!
>
>>
>> On Tue, May 6, 2014 at 12:24 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>> 2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>>>> It is happening during boot up:
>>>>
>>>> <snip, kernel 3.12 >
>>>>
>>>> Configuring network interfaces... [   35.117114] fec 800f0000.ethernet
>>>> eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720]
>>>
>>> Note that the SMSC PHY driver is picked up here, and that specific
>>> driver implements a different phy_read_status() callback due to how
>>> the PHY operates. The PHY driver also overrides the config_init()
>>> callback to perform some PHY-specific initialization. See below for
>>> more.
>>>
>>>> (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>> [   35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>> udhcpc (v1.21.1) started
>>>>
>>>> Sending discover...
>>>>
>>>> [   37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>> [   37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>> Sending discover...
>>>>
>>>> Sending select for 10.10.10.217...
>>>> Lease of 10.10.10.217 obtained, lease time 86400
>>>> /etc/udhcpc.d/50default: Adding DNS 10.10.10.13
>>>> [   39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>>>> done.
>>>> Starting rpcbind daemon...done.
>>>> net.ipv4.conf.default.rp_filter = 1
>>>> net.ipv4.conf.all.rp_filter = 1
>>>> Mon Apr 14 22:40:00 UTC 2014
>>>> INIT: Entering runlevel: 5
>>>> Starting Xserver
>>>> Starting system message bus: dbus.
>>>> Starting Connection Manager
>>>> Starting wpa_supplicant
>>>> Successfully initialized wpa_supplicant
>>>> Starting Dropbear SSH server
>>>> [   44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>
>>> The correct PHY driver is selected here...
>>>
>>>> [   45.781364] fec 800f0000.ethernet eth0: MDIO read timeout
>>>> [   46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>> [   47.811385] fec 800f0000.ethernet eth0: MDIO read timeout
>>>
>>> But we are still seeing MDIO read timeouts, which is not great.
>>>
>>>>
>>>> With a different kernel (3.14):
>>>>
>>>> [   28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>> [   30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>> [   37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>
>>> Here, the Generic PHY driver has been selected, which will use the
>>> MII_BMSR register contents to determine the Link status and
>>> parameters. You might want to make sure that your board selects the
>>> appropriate PHY driver, such that we are not chasing two issues here.
>>>
>>>> [   38.398346] fec 800f0000.ethernet eth0: MDIO read timeout
>>>> [   39.438412] fec 800f0000.ethernet eth0: MDIO read timeout
>>>> [   39.468419] fec 800f0000.ethernet eth0: MDIO write timeout
>>>> [   40.498848] fec 800f0000.ethernet eth0: MDIO read timeout
>>>
>>> It would also be helpful to print the register that were accessed,
>>> such that you could correlate this with the exact steps in the PHY
>>> library state machine. Please also retry the experiment with the SMSC
>>> PHY driver enabled, as it does some PHY specific initialization that
>>> seems to be relevant. Then we are hopefully left with only the MDIO
>>> timeout issue and not the PHY mis-configuration + MDIO timeout.
>>>
>>>>
>>>> Afterward I have to ifdown eth0, ifup eth0 and then it functions
>>>> normally, without reverting the commit.
>>>>
>>>> root@cfa100xx:~# ifdown eth0
>>>> [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>> root@cfa100xx:~# ifup eth0
>>>> udhcpc (v1.21.1) started
>>>> Sending discover...
>>>> [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>> Sending discover...
>>>> Sending select for 10.10.10.217...
>>>> Lease of 10.10.10.217 obtained, lease time 86400
>>>> ip: RTNETLINK answers: File exists
>>>>
>>>> --
>>>> Brian
>>>>
>>>>
>>>> On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König
>>>> <u.kleine-koenig@pengutronix.de> wrote:
>>>>> Hello Brian,
>>>>>
>>>>> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote:
>>>>>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0
>>>>>> come up, then brought right back down with an MDIO rx timeout moments
>>>>>> after.  Adding back in the removed code keeps the interface alive and
>>>>>> it's working afterward without trouble.  I've tested the re-inserted
>>>>>> code in 3.12, 3.14 without issue on our boards.
>>>>> So you can reliably trigger that problem? You're just doing
>>>>>
>>>>>         ifconfig eth0 1.2.3.4 up
>>>>>
>>>>> (or equivalent) and the interface goes down without further
>>>>> interference with the above mentioned commit? The exact error you're
>>>>> seeing is
>>>>>
>>>>>         MDIO read timeout
>>>>>
>>>>> (with some prefix saying something about fec and eth0 I think)?
>>>>>
>>>>> This error is also present with a264b981f2 reverted, just doesn't affect
>>>>> eth0 being functional? Does the timeout always happen, or only on
>>>>> specific addresses?
>>>>>
>>>>> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT?
>>>>>
>>>>>> Is there something else that can be done to prevent the MDIO timeouts?
>>>>>> We are using basically the same schematic for networking as the
>>>>>> imx28evk.
>>>>> Hard to say, but assuming it works just fine on the imx28evk for you,
>>>>> too, there seems to be some hardware difference that makes your machine
>>>>> fail. (That doesn't mean it's not fixable in software.)
>>>>>
>>>>> I don't know if a mdio read error is intended to make the device go
>>>>> down, maybe one the the netdev guys can answer that.
>>>>> Assuming that it's not intended, instrument the code, find out how that
>>>>> timeout makes your device go down and find the wrong branch. I'd start
>>>>> with adding stackdumps when the mdio timeout happens and when
>>>>> fec_enet_start_xmit is called with fep->link == 0.
>>>>>
>>>>> Best regards
>>>>> Uwe
>>>>>
>>>>> --
>>>>> Pengutronix e.K.                           | Uwe Kleine-König            |
>>>>> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>>
>>> --
>>> Florian
>
>
>
> --
> Florian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i.MX28 based system losing eth0 on boot
  2014-05-06 22:27           ` Brian Lilly
@ 2014-05-07  3:07             ` Florian Fainelli
  2014-05-07 19:16               ` Brian Lilly
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Fainelli @ 2014-05-07  3:07 UTC (permalink / raw)
  To: Brian Lilly
  Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam,
	Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel, kernel

2014-05-06 15:27 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
> It would appear that I don't have that commit.  I could move to 3.14
> to see if it makes a difference, but the last couple of responses have
> been on 3.12.18 -- or perhaps I'm missing something else.

I did miss that you were also seeing the problem in 3.12. At that
point, I believe that the driver was working around a potential PHY
bug that is not covered by the SMSC PHY driver, or that the MDIO
timeout is simply not long enough, or that your MDIO interrupts fire
much longer than what the timeout allows, or that these interrupts are
not reliable.

You could probably try to ignore the timeout and see if you get
sensible data out of the MDIO bus regardless.

> Please let me know if you have any questions.
>
> Thank you.
>
> Brian Lilly
> Crystalfontz America, Incorporated
> 12412 East Saltese Road
> Spokane Valley, WA 99216
> brian@crystalfontz.com http://www.crystalfontz.com
> Twitter: @Crystalfontz
> US toll-free (888) 206-9720 voice (509) 892-1200
>
>
> On Tue, May 6, 2014 at 3:06 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>> 2014-05-06 14:40 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>>> The PHY on board is the SMSC LAN8720
>>>
>>> With the generic PHY driver selected:  http://pastebin.com/A4MH4Ptw
>>>
>>> [   28.828761] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>> [   28.840626] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>> [   30.827536] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>> [   30.833739] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>> [   32.986999] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>>> [   37.316421] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>> [   38.345047] fec 800f0000.ethernet eth0: MDIO read timeout
>>> [   39.506210] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>> [   40.374961] fec 800f0000.ethernet eth0: MDIO read timeout
>>>
>>> With the SMSC PHY driver selected:  http://pastebin.com/DhdDyrMv
>>>
>>> [   28.778974] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>> [   28.791742] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>> [   30.773078] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>> [   30.779286] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>> [   32.934692] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>>> [   37.242162] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>> [   38.270611] fec 800f0000.ethernet eth0: MDIO read timeout
>>> [   39.415256] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>> [   40.300454] fec 800f0000.ethernet eth0: MDIO read timeout
>>
>> Thanks for trying this, at least this is consistent no matter which
>> PHY driver we are using. Just to rule out a potential PHY power-down
>> issue, could you try to revert the following commit
>> be9dad1f9f26604fb71c0d53ccb39a8f1d425807 ("net: phy: suspend phydev
>> when going to HALTED") and see if that works better for you?
>>
>> Thanks!
>>
>>>
>>> On Tue, May 6, 2014 at 12:24 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>> 2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>>>>> It is happening during boot up:
>>>>>
>>>>> <snip, kernel 3.12 >
>>>>>
>>>>> Configuring network interfaces... [   35.117114] fec 800f0000.ethernet
>>>>> eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720]
>>>>
>>>> Note that the SMSC PHY driver is picked up here, and that specific
>>>> driver implements a different phy_read_status() callback due to how
>>>> the PHY operates. The PHY driver also overrides the config_init()
>>>> callback to perform some PHY-specific initialization. See below for
>>>> more.
>>>>
>>>>> (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>> [   35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>> udhcpc (v1.21.1) started
>>>>>
>>>>> Sending discover...
>>>>>
>>>>> [   37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>> [   37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>>> Sending discover...
>>>>>
>>>>> Sending select for 10.10.10.217...
>>>>> Lease of 10.10.10.217 obtained, lease time 86400
>>>>> /etc/udhcpc.d/50default: Adding DNS 10.10.10.13
>>>>> [   39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>>>>> done.
>>>>> Starting rpcbind daemon...done.
>>>>> net.ipv4.conf.default.rp_filter = 1
>>>>> net.ipv4.conf.all.rp_filter = 1
>>>>> Mon Apr 14 22:40:00 UTC 2014
>>>>> INIT: Entering runlevel: 5
>>>>> Starting Xserver
>>>>> Starting system message bus: dbus.
>>>>> Starting Connection Manager
>>>>> Starting wpa_supplicant
>>>>> Successfully initialized wpa_supplicant
>>>>> Starting Dropbear SSH server
>>>>> [   44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>
>>>> The correct PHY driver is selected here...
>>>>
>>>>> [   45.781364] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>> [   46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>> [   47.811385] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>
>>>> But we are still seeing MDIO read timeouts, which is not great.
>>>>
>>>>>
>>>>> With a different kernel (3.14):
>>>>>
>>>>> [   28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>> [   30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>> [   37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>
>>>> Here, the Generic PHY driver has been selected, which will use the
>>>> MII_BMSR register contents to determine the Link status and
>>>> parameters. You might want to make sure that your board selects the
>>>> appropriate PHY driver, such that we are not chasing two issues here.
>>>>
>>>>> [   38.398346] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>> [   39.438412] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>> [   39.468419] fec 800f0000.ethernet eth0: MDIO write timeout
>>>>> [   40.498848] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>
>>>> It would also be helpful to print the register that were accessed,
>>>> such that you could correlate this with the exact steps in the PHY
>>>> library state machine. Please also retry the experiment with the SMSC
>>>> PHY driver enabled, as it does some PHY specific initialization that
>>>> seems to be relevant. Then we are hopefully left with only the MDIO
>>>> timeout issue and not the PHY mis-configuration + MDIO timeout.
>>>>
>>>>>
>>>>> Afterward I have to ifdown eth0, ifup eth0 and then it functions
>>>>> normally, without reverting the commit.
>>>>>
>>>>> root@cfa100xx:~# ifdown eth0
>>>>> [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>> root@cfa100xx:~# ifup eth0
>>>>> udhcpc (v1.21.1) started
>>>>> Sending discover...
>>>>> [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>> Sending discover...
>>>>> Sending select for 10.10.10.217...
>>>>> Lease of 10.10.10.217 obtained, lease time 86400
>>>>> ip: RTNETLINK answers: File exists
>>>>>
>>>>> --
>>>>> Brian
>>>>>
>>>>>
>>>>> On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König
>>>>> <u.kleine-koenig@pengutronix.de> wrote:
>>>>>> Hello Brian,
>>>>>>
>>>>>> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote:
>>>>>>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0
>>>>>>> come up, then brought right back down with an MDIO rx timeout moments
>>>>>>> after.  Adding back in the removed code keeps the interface alive and
>>>>>>> it's working afterward without trouble.  I've tested the re-inserted
>>>>>>> code in 3.12, 3.14 without issue on our boards.
>>>>>> So you can reliably trigger that problem? You're just doing
>>>>>>
>>>>>>         ifconfig eth0 1.2.3.4 up
>>>>>>
>>>>>> (or equivalent) and the interface goes down without further
>>>>>> interference with the above mentioned commit? The exact error you're
>>>>>> seeing is
>>>>>>
>>>>>>         MDIO read timeout
>>>>>>
>>>>>> (with some prefix saying something about fec and eth0 I think)?
>>>>>>
>>>>>> This error is also present with a264b981f2 reverted, just doesn't affect
>>>>>> eth0 being functional? Does the timeout always happen, or only on
>>>>>> specific addresses?
>>>>>>
>>>>>> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT?
>>>>>>
>>>>>>> Is there something else that can be done to prevent the MDIO timeouts?
>>>>>>> We are using basically the same schematic for networking as the
>>>>>>> imx28evk.
>>>>>> Hard to say, but assuming it works just fine on the imx28evk for you,
>>>>>> too, there seems to be some hardware difference that makes your machine
>>>>>> fail. (That doesn't mean it's not fixable in software.)
>>>>>>
>>>>>> I don't know if a mdio read error is intended to make the device go
>>>>>> down, maybe one the the netdev guys can answer that.
>>>>>> Assuming that it's not intended, instrument the code, find out how that
>>>>>> timeout makes your device go down and find the wrong branch. I'd start
>>>>>> with adding stackdumps when the mdio timeout happens and when
>>>>>> fec_enet_start_xmit is called with fep->link == 0.
>>>>>>
>>>>>> Best regards
>>>>>> Uwe
>>>>>>
>>>>>> --
>>>>>> Pengutronix e.K.                           | Uwe Kleine-König            |
>>>>>> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>> --
>>>> Florian
>>
>>
>>
>> --
>> Florian



-- 
Florian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i.MX28 based system losing eth0 on boot
  2014-05-06 16:44 i.MX28 based system losing eth0 on boot Brian Lilly
  2014-05-06 18:11 ` Uwe Kleine-König
@ 2014-05-07  3:17 ` Fabio Estevam
  2014-05-07 19:00   ` Brian Lilly
  1 sibling, 1 reply; 15+ messages in thread
From: Fabio Estevam @ 2014-05-07  3:17 UTC (permalink / raw)
  To: Brian Lilly
  Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam,
	Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel

Brian,

On Tue, May 6, 2014 at 1:44 PM, Brian Lilly <brian@crystalfontz.com> wrote:
> Uwe:
>
> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0
> come up, then brought right back down with an MDIO rx timeout moments
> after.  Adding back in the removed code keeps the interface alive and
> it's working afterward without trouble.  I've tested the re-inserted
> code in 3.12, 3.14 without issue on our boards.
>
> Is there something else that can be done to prevent the MDIO timeouts?
> We are using basically the same schematic for networking as the
> imx28evk.
>
> Any thoughts on how to resolve this?

Could you try the latest Russell's FEC patches available at?
 http://ftp.arm.linux.org.uk/cgit/linux-arm.git/log/?h=fec-testing

In particular this one could help with your "MDIO timeout" issue:
http://ftp.arm.linux.org.uk/cgit/linux-arm.git/commit/?h=fec-testing&id=ec1fac3de70b16c69d3edc9f223e91d56b1915de

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i.MX28 based system losing eth0 on boot
  2014-05-07  3:17 ` Fabio Estevam
@ 2014-05-07 19:00   ` Brian Lilly
  0 siblings, 0 replies; 15+ messages in thread
From: Brian Lilly @ 2014-05-07 19:00 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam,
	Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel

Moving forward to 3.15.0-rc4 merged with Russell's FEC patches makes it much
more noisy  (http://pastebin.com/17TyyMPn):

Populating dev cache
Configuring network interfaces... [   26.268156] fec 800f0000.ethernet
eth0: MDIO write timeout
[   26.298087] fec 800f0000.ethernet eth0: MDIO read timeout
[   26.328074] fec 800f0000.ethernet eth0: MDIO write timeout
[   26.358077] fec 800f0000.ethernet eth0: MDIO read timeout
[   26.388070] fec 800f0000.ethernet eth0: MDIO write timeout
[   26.393631] fec 800f0000.ethernet eth0: could not attach to PHY
ip: SIOCSIFFLAGS: Connection timed out
Starting rpcbind daemon...rpcbind: cannot create socket for udp6
rpcbind: cannot create socket for tcp6
done.
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
INIT: Entering runlevel: 5
Starting Xserver
Starting system message bus: dbus.
Starting Connection Manager
Starting wpa_supplicant
Successfully initialized wpa_supplicant
Starting Dropbear SSH server: dropbear.
starting Busybox UDHCP Server: u[   31.129045] fec 800f0000.ethernet
eth0: MDIO write timeout
dhcpd... [   31.158388] fec 800f0000.ethernet eth0: MDIO read timeout
[   31.188437] fec 800f0000.ethernet eth0: MDIO write timeout
done.
[   31.218260] fec 800f0000.ethernet eth0: MDIO read timeout
[   31.248256] fec 800f0000.ethernet eth0: MDIO write timeout
[   31.253830] fec 800f0000.ethernet eth0: could not attach to PHY
Starting syslogd/klogd: done

from dmesg:

[   26.268156] fec 800f0000.ethernet eth0: MDIO write timeout
[   26.298087] fec 800f0000.ethernet eth0: MDIO read timeout
[   26.328074] fec 800f0000.ethernet eth0: MDIO write timeout
[   26.358077] fec 800f0000.ethernet eth0: MDIO read timeout
[   26.388070] fec 800f0000.ethernet eth0: MDIO write timeout
[   26.393631] fec 800f0000.ethernet eth0: could not attach to PHY
[   31.129045] fec 800f0000.ethernet eth0: MDIO write timeout
[   31.158388] fec 800f0000.ethernet eth0: MDIO read timeout
[   31.188437] fec 800f0000.ethernet eth0: MDIO write timeout
[   31.218260] fec 800f0000.ethernet eth0: MDIO read timeout
[   31.248256] fec 800f0000.ethernet eth0: MDIO write timeout
[   31.253830] fec 800f0000.ethernet eth0: could not attach to PHY

I can go back and cull the timeout bits in 3.12 or 3.14 and report
back if you think that it'd be helpful ...

Please let me know if you have any questions.

Thank you.

Brian Lilly
Crystalfontz America, Incorporated
12412 East Saltese Road
Spokane Valley, WA 99216
brian@crystalfontz.com http://www.crystalfontz.com
Twitter: @Crystalfontz
US toll-free (888) 206-9720 voice (509) 892-1200


On Tue, May 6, 2014 at 8:17 PM, Fabio Estevam <festevam@gmail.com> wrote:
> Brian,
>
> On Tue, May 6, 2014 at 1:44 PM, Brian Lilly <brian@crystalfontz.com> wrote:
>> Uwe:
>>
>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0
>> come up, then brought right back down with an MDIO rx timeout moments
>> after.  Adding back in the removed code keeps the interface alive and
>> it's working afterward without trouble.  I've tested the re-inserted
>> code in 3.12, 3.14 without issue on our boards.
>>
>> Is there something else that can be done to prevent the MDIO timeouts?
>> We are using basically the same schematic for networking as the
>> imx28evk.
>>
>> Any thoughts on how to resolve this?
>
> Could you try the latest Russell's FEC patches available at?
>  http://ftp.arm.linux.org.uk/cgit/linux-arm.git/log/?h=fec-testing
>
> In particular this one could help with your "MDIO timeout" issue:
> http://ftp.arm.linux.org.uk/cgit/linux-arm.git/commit/?h=fec-testing&id=ec1fac3de70b16c69d3edc9f223e91d56b1915de

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i.MX28 based system losing eth0 on boot
  2014-05-07  3:07             ` Florian Fainelli
@ 2014-05-07 19:16               ` Brian Lilly
  2014-05-07 19:34                 ` Florian Fainelli
  0 siblings, 1 reply; 15+ messages in thread
From: Brian Lilly @ 2014-05-07 19:16 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam,
	Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel, kernel

Also, in 3.14, commenting out both "return -ETIMEDOUT" instances in
fec_main.c results in a working interface.
Please let me know if you have any questions.

Thank you.

Brian Lilly
Crystalfontz America, Incorporated
12412 East Saltese Road
Spokane Valley, WA 99216
brian@crystalfontz.com http://www.crystalfontz.com
Twitter: @Crystalfontz
US toll-free (888) 206-9720 voice (509) 892-1200


On Tue, May 6, 2014 at 8:07 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> 2014-05-06 15:27 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>> It would appear that I don't have that commit.  I could move to 3.14
>> to see if it makes a difference, but the last couple of responses have
>> been on 3.12.18 -- or perhaps I'm missing something else.
>
> I did miss that you were also seeing the problem in 3.12. At that
> point, I believe that the driver was working around a potential PHY
> bug that is not covered by the SMSC PHY driver, or that the MDIO
> timeout is simply not long enough, or that your MDIO interrupts fire
> much longer than what the timeout allows, or that these interrupts are
> not reliable.
>
> You could probably try to ignore the timeout and see if you get
> sensible data out of the MDIO bus regardless.
>
>> Please let me know if you have any questions.
>>
>> Thank you.
>>
>> Brian Lilly
>> Crystalfontz America, Incorporated
>> 12412 East Saltese Road
>> Spokane Valley, WA 99216
>> brian@crystalfontz.com http://www.crystalfontz.com
>> Twitter: @Crystalfontz
>> US toll-free (888) 206-9720 voice (509) 892-1200
>>
>>
>> On Tue, May 6, 2014 at 3:06 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>> 2014-05-06 14:40 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>>>> The PHY on board is the SMSC LAN8720
>>>>
>>>> With the generic PHY driver selected:  http://pastebin.com/A4MH4Ptw
>>>>
>>>> [   28.828761] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>> [   28.840626] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>> [   30.827536] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>> [   30.833739] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>> [   32.986999] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>>>> [   37.316421] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>> [   38.345047] fec 800f0000.ethernet eth0: MDIO read timeout
>>>> [   39.506210] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>> [   40.374961] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>
>>>> With the SMSC PHY driver selected:  http://pastebin.com/DhdDyrMv
>>>>
>>>> [   28.778974] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>> [   28.791742] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>> [   30.773078] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>> [   30.779286] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>> [   32.934692] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>>>> [   37.242162] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>> [   38.270611] fec 800f0000.ethernet eth0: MDIO read timeout
>>>> [   39.415256] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>> [   40.300454] fec 800f0000.ethernet eth0: MDIO read timeout
>>>
>>> Thanks for trying this, at least this is consistent no matter which
>>> PHY driver we are using. Just to rule out a potential PHY power-down
>>> issue, could you try to revert the following commit
>>> be9dad1f9f26604fb71c0d53ccb39a8f1d425807 ("net: phy: suspend phydev
>>> when going to HALTED") and see if that works better for you?
>>>
>>> Thanks!
>>>
>>>>
>>>> On Tue, May 6, 2014 at 12:24 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>>> 2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>>>>>> It is happening during boot up:
>>>>>>
>>>>>> <snip, kernel 3.12 >
>>>>>>
>>>>>> Configuring network interfaces... [   35.117114] fec 800f0000.ethernet
>>>>>> eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720]
>>>>>
>>>>> Note that the SMSC PHY driver is picked up here, and that specific
>>>>> driver implements a different phy_read_status() callback due to how
>>>>> the PHY operates. The PHY driver also overrides the config_init()
>>>>> callback to perform some PHY-specific initialization. See below for
>>>>> more.
>>>>>
>>>>>> (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>> [   35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>>> udhcpc (v1.21.1) started
>>>>>>
>>>>>> Sending discover...
>>>>>>
>>>>>> [   37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>>> [   37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>>>> Sending discover...
>>>>>>
>>>>>> Sending select for 10.10.10.217...
>>>>>> Lease of 10.10.10.217 obtained, lease time 86400
>>>>>> /etc/udhcpc.d/50default: Adding DNS 10.10.10.13
>>>>>> [   39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>>>>>> done.
>>>>>> Starting rpcbind daemon...done.
>>>>>> net.ipv4.conf.default.rp_filter = 1
>>>>>> net.ipv4.conf.all.rp_filter = 1
>>>>>> Mon Apr 14 22:40:00 UTC 2014
>>>>>> INIT: Entering runlevel: 5
>>>>>> Starting Xserver
>>>>>> Starting system message bus: dbus.
>>>>>> Starting Connection Manager
>>>>>> Starting wpa_supplicant
>>>>>> Successfully initialized wpa_supplicant
>>>>>> Starting Dropbear SSH server
>>>>>> [   44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>
>>>>> The correct PHY driver is selected here...
>>>>>
>>>>>> [   45.781364] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>> [   46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>>> [   47.811385] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>
>>>>> But we are still seeing MDIO read timeouts, which is not great.
>>>>>
>>>>>>
>>>>>> With a different kernel (3.14):
>>>>>>
>>>>>> [   28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>> [   30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>>> [   37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>
>>>>> Here, the Generic PHY driver has been selected, which will use the
>>>>> MII_BMSR register contents to determine the Link status and
>>>>> parameters. You might want to make sure that your board selects the
>>>>> appropriate PHY driver, such that we are not chasing two issues here.
>>>>>
>>>>>> [   38.398346] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>> [   39.438412] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>> [   39.468419] fec 800f0000.ethernet eth0: MDIO write timeout
>>>>>> [   40.498848] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>
>>>>> It would also be helpful to print the register that were accessed,
>>>>> such that you could correlate this with the exact steps in the PHY
>>>>> library state machine. Please also retry the experiment with the SMSC
>>>>> PHY driver enabled, as it does some PHY specific initialization that
>>>>> seems to be relevant. Then we are hopefully left with only the MDIO
>>>>> timeout issue and not the PHY mis-configuration + MDIO timeout.
>>>>>
>>>>>>
>>>>>> Afterward I have to ifdown eth0, ifup eth0 and then it functions
>>>>>> normally, without reverting the commit.
>>>>>>
>>>>>> root@cfa100xx:~# ifdown eth0
>>>>>> [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>> root@cfa100xx:~# ifup eth0
>>>>>> udhcpc (v1.21.1) started
>>>>>> Sending discover...
>>>>>> [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>>> Sending discover...
>>>>>> Sending select for 10.10.10.217...
>>>>>> Lease of 10.10.10.217 obtained, lease time 86400
>>>>>> ip: RTNETLINK answers: File exists
>>>>>>
>>>>>> --
>>>>>> Brian
>>>>>>
>>>>>>
>>>>>> On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König
>>>>>> <u.kleine-koenig@pengutronix.de> wrote:
>>>>>>> Hello Brian,
>>>>>>>
>>>>>>> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote:
>>>>>>>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0
>>>>>>>> come up, then brought right back down with an MDIO rx timeout moments
>>>>>>>> after.  Adding back in the removed code keeps the interface alive and
>>>>>>>> it's working afterward without trouble.  I've tested the re-inserted
>>>>>>>> code in 3.12, 3.14 without issue on our boards.
>>>>>>> So you can reliably trigger that problem? You're just doing
>>>>>>>
>>>>>>>         ifconfig eth0 1.2.3.4 up
>>>>>>>
>>>>>>> (or equivalent) and the interface goes down without further
>>>>>>> interference with the above mentioned commit? The exact error you're
>>>>>>> seeing is
>>>>>>>
>>>>>>>         MDIO read timeout
>>>>>>>
>>>>>>> (with some prefix saying something about fec and eth0 I think)?
>>>>>>>
>>>>>>> This error is also present with a264b981f2 reverted, just doesn't affect
>>>>>>> eth0 being functional? Does the timeout always happen, or only on
>>>>>>> specific addresses?
>>>>>>>
>>>>>>> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT?
>>>>>>>
>>>>>>>> Is there something else that can be done to prevent the MDIO timeouts?
>>>>>>>> We are using basically the same schematic for networking as the
>>>>>>>> imx28evk.
>>>>>>> Hard to say, but assuming it works just fine on the imx28evk for you,
>>>>>>> too, there seems to be some hardware difference that makes your machine
>>>>>>> fail. (That doesn't mean it's not fixable in software.)
>>>>>>>
>>>>>>> I don't know if a mdio read error is intended to make the device go
>>>>>>> down, maybe one the the netdev guys can answer that.
>>>>>>> Assuming that it's not intended, instrument the code, find out how that
>>>>>>> timeout makes your device go down and find the wrong branch. I'd start
>>>>>>> with adding stackdumps when the mdio timeout happens and when
>>>>>>> fec_enet_start_xmit is called with fep->link == 0.
>>>>>>>
>>>>>>> Best regards
>>>>>>> Uwe
>>>>>>>
>>>>>>> --
>>>>>>> Pengutronix e.K.                           | Uwe Kleine-König            |
>>>>>>> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Florian
>>>
>>>
>>>
>>> --
>>> Florian
>
>
>
> --
> Florian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i.MX28 based system losing eth0 on boot
  2014-05-07 19:16               ` Brian Lilly
@ 2014-05-07 19:34                 ` Florian Fainelli
  2014-05-07 19:51                   ` Brian Lilly
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Fainelli @ 2014-05-07 19:34 UTC (permalink / raw)
  To: Brian Lilly
  Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam,
	Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel, kernel

2014-05-07 12:16 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
> Also, in 3.14, commenting out both "return -ETIMEDOUT" instances in
> fec_main.c results in a working interface.
> Please let me know if you have any questions.

At this point, you could probably instrument the interrupt handler and
see if you get FEC_MDIO interrupt causes at all?

>
> Thank you.
>
> Brian Lilly
> Crystalfontz America, Incorporated
> 12412 East Saltese Road
> Spokane Valley, WA 99216
> brian@crystalfontz.com http://www.crystalfontz.com
> Twitter: @Crystalfontz
> US toll-free (888) 206-9720 voice (509) 892-1200
>
>
> On Tue, May 6, 2014 at 8:07 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>> 2014-05-06 15:27 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>>> It would appear that I don't have that commit.  I could move to 3.14
>>> to see if it makes a difference, but the last couple of responses have
>>> been on 3.12.18 -- or perhaps I'm missing something else.
>>
>> I did miss that you were also seeing the problem in 3.12. At that
>> point, I believe that the driver was working around a potential PHY
>> bug that is not covered by the SMSC PHY driver, or that the MDIO
>> timeout is simply not long enough, or that your MDIO interrupts fire
>> much longer than what the timeout allows, or that these interrupts are
>> not reliable.
>>
>> You could probably try to ignore the timeout and see if you get
>> sensible data out of the MDIO bus regardless.
>>
>>> Please let me know if you have any questions.
>>>
>>> Thank you.
>>>
>>> Brian Lilly
>>> Crystalfontz America, Incorporated
>>> 12412 East Saltese Road
>>> Spokane Valley, WA 99216
>>> brian@crystalfontz.com http://www.crystalfontz.com
>>> Twitter: @Crystalfontz
>>> US toll-free (888) 206-9720 voice (509) 892-1200
>>>
>>>
>>> On Tue, May 6, 2014 at 3:06 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>> 2014-05-06 14:40 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>>>>> The PHY on board is the SMSC LAN8720
>>>>>
>>>>> With the generic PHY driver selected:  http://pastebin.com/A4MH4Ptw
>>>>>
>>>>> [   28.828761] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>> [   28.840626] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>> [   30.827536] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>> [   30.833739] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>>> [   32.986999] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>>>>> [   37.316421] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>> [   38.345047] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>> [   39.506210] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>> [   40.374961] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>
>>>>> With the SMSC PHY driver selected:  http://pastebin.com/DhdDyrMv
>>>>>
>>>>> [   28.778974] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>> [   28.791742] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>> [   30.773078] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>> [   30.779286] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>>> [   32.934692] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>>>>> [   37.242162] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>> [   38.270611] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>> [   39.415256] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>> [   40.300454] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>
>>>> Thanks for trying this, at least this is consistent no matter which
>>>> PHY driver we are using. Just to rule out a potential PHY power-down
>>>> issue, could you try to revert the following commit
>>>> be9dad1f9f26604fb71c0d53ccb39a8f1d425807 ("net: phy: suspend phydev
>>>> when going to HALTED") and see if that works better for you?
>>>>
>>>> Thanks!
>>>>
>>>>>
>>>>> On Tue, May 6, 2014 at 12:24 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>>>> 2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>>>>>>> It is happening during boot up:
>>>>>>>
>>>>>>> <snip, kernel 3.12 >
>>>>>>>
>>>>>>> Configuring network interfaces... [   35.117114] fec 800f0000.ethernet
>>>>>>> eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720]
>>>>>>
>>>>>> Note that the SMSC PHY driver is picked up here, and that specific
>>>>>> driver implements a different phy_read_status() callback due to how
>>>>>> the PHY operates. The PHY driver also overrides the config_init()
>>>>>> callback to perform some PHY-specific initialization. See below for
>>>>>> more.
>>>>>>
>>>>>>> (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>>> [   35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>>>> udhcpc (v1.21.1) started
>>>>>>>
>>>>>>> Sending discover...
>>>>>>>
>>>>>>> [   37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>>>> [   37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>>>>> Sending discover...
>>>>>>>
>>>>>>> Sending select for 10.10.10.217...
>>>>>>> Lease of 10.10.10.217 obtained, lease time 86400
>>>>>>> /etc/udhcpc.d/50default: Adding DNS 10.10.10.13
>>>>>>> [   39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>>>>>>> done.
>>>>>>> Starting rpcbind daemon...done.
>>>>>>> net.ipv4.conf.default.rp_filter = 1
>>>>>>> net.ipv4.conf.all.rp_filter = 1
>>>>>>> Mon Apr 14 22:40:00 UTC 2014
>>>>>>> INIT: Entering runlevel: 5
>>>>>>> Starting Xserver
>>>>>>> Starting system message bus: dbus.
>>>>>>> Starting Connection Manager
>>>>>>> Starting wpa_supplicant
>>>>>>> Successfully initialized wpa_supplicant
>>>>>>> Starting Dropbear SSH server
>>>>>>> [   44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>>
>>>>>> The correct PHY driver is selected here...
>>>>>>
>>>>>>> [   45.781364] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>>> [   46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>>>> [   47.811385] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>>
>>>>>> But we are still seeing MDIO read timeouts, which is not great.
>>>>>>
>>>>>>>
>>>>>>> With a different kernel (3.14):
>>>>>>>
>>>>>>> [   28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>>> [   30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>>>> [   37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>>
>>>>>> Here, the Generic PHY driver has been selected, which will use the
>>>>>> MII_BMSR register contents to determine the Link status and
>>>>>> parameters. You might want to make sure that your board selects the
>>>>>> appropriate PHY driver, such that we are not chasing two issues here.
>>>>>>
>>>>>>> [   38.398346] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>>> [   39.438412] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>>> [   39.468419] fec 800f0000.ethernet eth0: MDIO write timeout
>>>>>>> [   40.498848] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>>
>>>>>> It would also be helpful to print the register that were accessed,
>>>>>> such that you could correlate this with the exact steps in the PHY
>>>>>> library state machine. Please also retry the experiment with the SMSC
>>>>>> PHY driver enabled, as it does some PHY specific initialization that
>>>>>> seems to be relevant. Then we are hopefully left with only the MDIO
>>>>>> timeout issue and not the PHY mis-configuration + MDIO timeout.
>>>>>>
>>>>>>>
>>>>>>> Afterward I have to ifdown eth0, ifup eth0 and then it functions
>>>>>>> normally, without reverting the commit.
>>>>>>>
>>>>>>> root@cfa100xx:~# ifdown eth0
>>>>>>> [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>>> root@cfa100xx:~# ifup eth0
>>>>>>> udhcpc (v1.21.1) started
>>>>>>> Sending discover...
>>>>>>> [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>>>> Sending discover...
>>>>>>> Sending select for 10.10.10.217...
>>>>>>> Lease of 10.10.10.217 obtained, lease time 86400
>>>>>>> ip: RTNETLINK answers: File exists
>>>>>>>
>>>>>>> --
>>>>>>> Brian
>>>>>>>
>>>>>>>
>>>>>>> On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König
>>>>>>> <u.kleine-koenig@pengutronix.de> wrote:
>>>>>>>> Hello Brian,
>>>>>>>>
>>>>>>>> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote:
>>>>>>>>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0
>>>>>>>>> come up, then brought right back down with an MDIO rx timeout moments
>>>>>>>>> after.  Adding back in the removed code keeps the interface alive and
>>>>>>>>> it's working afterward without trouble.  I've tested the re-inserted
>>>>>>>>> code in 3.12, 3.14 without issue on our boards.
>>>>>>>> So you can reliably trigger that problem? You're just doing
>>>>>>>>
>>>>>>>>         ifconfig eth0 1.2.3.4 up
>>>>>>>>
>>>>>>>> (or equivalent) and the interface goes down without further
>>>>>>>> interference with the above mentioned commit? The exact error you're
>>>>>>>> seeing is
>>>>>>>>
>>>>>>>>         MDIO read timeout
>>>>>>>>
>>>>>>>> (with some prefix saying something about fec and eth0 I think)?
>>>>>>>>
>>>>>>>> This error is also present with a264b981f2 reverted, just doesn't affect
>>>>>>>> eth0 being functional? Does the timeout always happen, or only on
>>>>>>>> specific addresses?
>>>>>>>>
>>>>>>>> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT?
>>>>>>>>
>>>>>>>>> Is there something else that can be done to prevent the MDIO timeouts?
>>>>>>>>> We are using basically the same schematic for networking as the
>>>>>>>>> imx28evk.
>>>>>>>> Hard to say, but assuming it works just fine on the imx28evk for you,
>>>>>>>> too, there seems to be some hardware difference that makes your machine
>>>>>>>> fail. (That doesn't mean it's not fixable in software.)
>>>>>>>>
>>>>>>>> I don't know if a mdio read error is intended to make the device go
>>>>>>>> down, maybe one the the netdev guys can answer that.
>>>>>>>> Assuming that it's not intended, instrument the code, find out how that
>>>>>>>> timeout makes your device go down and find the wrong branch. I'd start
>>>>>>>> with adding stackdumps when the mdio timeout happens and when
>>>>>>>> fec_enet_start_xmit is called with fep->link == 0.
>>>>>>>>
>>>>>>>> Best regards
>>>>>>>> Uwe
>>>>>>>>
>>>>>>>> --
>>>>>>>> Pengutronix e.K.                           | Uwe Kleine-König            |
>>>>>>>> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Florian
>>>>
>>>>
>>>>
>>>> --
>>>> Florian
>>
>>
>>
>> --
>> Florian



-- 
Florian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i.MX28 based system losing eth0 on boot
  2014-05-07 19:34                 ` Florian Fainelli
@ 2014-05-07 19:51                   ` Brian Lilly
  2014-05-08  1:47                     ` fugang.duan
  0 siblings, 1 reply; 15+ messages in thread
From: Brian Lilly @ 2014-05-07 19:51 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Uwe Kleine-König, David S. Miller, Fabio Estevam,
	Jim Baxter, Frank Li, Fugang Duan, netdev, linux-kernel, kernel

Florian:

Thank you for your help.

After doubling the timeout length it worked.

I managed to get my hands on a imx28evk board and compared our
component load versus theirs, to find they have a 1.5k pull-up on
ENET_MDIO to +3.3v which wasn't present on our board.  Adding a 1.5k
pull-up resistor on ENET_MDIO solves the problem, and boots as
expected without patching anything.

Sorry for the trouble on this.

Apparently our EE had some question as to whether or not the pull-up
was necessary, and put it in the schematic, and the footprint on the
board, but marked it as a DNP, which of course left it off the board
and out of the BOM.

<facepalm>

On Wed, May 7, 2014 at 12:34 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> 2014-05-07 12:16 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>> Also, in 3.14, commenting out both "return -ETIMEDOUT" instances in
>> fec_main.c results in a working interface.
>> Please let me know if you have any questions.
>
> At this point, you could probably instrument the interrupt handler and
> see if you get FEC_MDIO interrupt causes at all?
>
>>
>> Thank you.
>>
>> Brian Lilly
>> Crystalfontz America, Incorporated
>> 12412 East Saltese Road
>> Spokane Valley, WA 99216
>> brian@crystalfontz.com http://www.crystalfontz.com
>> Twitter: @Crystalfontz
>> US toll-free (888) 206-9720 voice (509) 892-1200
>>
>>
>> On Tue, May 6, 2014 at 8:07 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>> 2014-05-06 15:27 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>>>> It would appear that I don't have that commit.  I could move to 3.14
>>>> to see if it makes a difference, but the last couple of responses have
>>>> been on 3.12.18 -- or perhaps I'm missing something else.
>>>
>>> I did miss that you were also seeing the problem in 3.12. At that
>>> point, I believe that the driver was working around a potential PHY
>>> bug that is not covered by the SMSC PHY driver, or that the MDIO
>>> timeout is simply not long enough, or that your MDIO interrupts fire
>>> much longer than what the timeout allows, or that these interrupts are
>>> not reliable.
>>>
>>> You could probably try to ignore the timeout and see if you get
>>> sensible data out of the MDIO bus regardless.
>>>
>>>> Please let me know if you have any questions.
>>>>
>>>> Thank you.
>>>>
>>>> Brian Lilly
>>>> Crystalfontz America, Incorporated
>>>> 12412 East Saltese Road
>>>> Spokane Valley, WA 99216
>>>> brian@crystalfontz.com http://www.crystalfontz.com
>>>> Twitter: @Crystalfontz
>>>> US toll-free (888) 206-9720 voice (509) 892-1200
>>>>
>>>>
>>>> On Tue, May 6, 2014 at 3:06 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>>> 2014-05-06 14:40 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>>>>>> The PHY on board is the SMSC LAN8720
>>>>>>
>>>>>> With the generic PHY driver selected:  http://pastebin.com/A4MH4Ptw
>>>>>>
>>>>>> [   28.828761] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>> [   28.840626] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>>> [   30.827536] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>>> [   30.833739] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>>>> [   32.986999] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>>>>>> [   37.316421] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>> [   38.345047] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>> [   39.506210] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>>> [   40.374961] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>>
>>>>>> With the SMSC PHY driver selected:  http://pastebin.com/DhdDyrMv
>>>>>>
>>>>>> [   28.778974] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>> [   28.791742] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>>> [   30.773078] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>>> [   30.779286] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>>>> [   32.934692] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>>>>>> [   37.242162] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>> [   38.270611] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>> [   39.415256] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>>> [   40.300454] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>
>>>>> Thanks for trying this, at least this is consistent no matter which
>>>>> PHY driver we are using. Just to rule out a potential PHY power-down
>>>>> issue, could you try to revert the following commit
>>>>> be9dad1f9f26604fb71c0d53ccb39a8f1d425807 ("net: phy: suspend phydev
>>>>> when going to HALTED") and see if that works better for you?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>>
>>>>>> On Tue, May 6, 2014 at 12:24 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>>>>> 2014-05-06 12:12 GMT-07:00 Brian Lilly <brian@crystalfontz.com>:
>>>>>>>> It is happening during boot up:
>>>>>>>>
>>>>>>>> <snip, kernel 3.12 >
>>>>>>>>
>>>>>>>> Configuring network interfaces... [   35.117114] fec 800f0000.ethernet
>>>>>>>> eth0: Freescale FEC PHY driver [SMSC LAN8710/LAN8720]
>>>>>>>
>>>>>>> Note that the SMSC PHY driver is picked up here, and that specific
>>>>>>> driver implements a different phy_read_status() callback due to how
>>>>>>> the PHY operates. The PHY driver also overrides the config_init()
>>>>>>> callback to perform some PHY-specific initialization. See below for
>>>>>>> more.
>>>>>>>
>>>>>>>> (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>>>> [   35.129967] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>>>>> udhcpc (v1.21.1) started
>>>>>>>>
>>>>>>>> Sending discover...
>>>>>>>>
>>>>>>>> [   37.113901] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>>>>> [   37.120134] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>>>>>>> Sending discover...
>>>>>>>>
>>>>>>>> Sending select for 10.10.10.217...
>>>>>>>> Lease of 10.10.10.217 obtained, lease time 86400
>>>>>>>> /etc/udhcpc.d/50default: Adding DNS 10.10.10.13
>>>>>>>> [   39.319957] IPv6: ADDRCONF(NETDEV_UP): usb0: link is not ready
>>>>>>>> done.
>>>>>>>> Starting rpcbind daemon...done.
>>>>>>>> net.ipv4.conf.default.rp_filter = 1
>>>>>>>> net.ipv4.conf.all.rp_filter = 1
>>>>>>>> Mon Apr 14 22:40:00 UTC 2014
>>>>>>>> INIT: Entering runlevel: 5
>>>>>>>> Starting Xserver
>>>>>>>> Starting system message bus: dbus.
>>>>>>>> Starting Connection Manager
>>>>>>>> Starting wpa_supplicant
>>>>>>>> Successfully initialized wpa_supplicant
>>>>>>>> Starting Dropbear SSH server
>>>>>>>> [   44.754915] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>>>> [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>>>
>>>>>>> The correct PHY driver is selected here...
>>>>>>>
>>>>>>>> [   45.781364] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>>>> [   46.826170] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
>>>>>>>> [   47.811385] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>>>
>>>>>>> But we are still seeing MDIO read timeouts, which is not great.
>>>>>>>
>>>>>>>>
>>>>>>>> With a different kernel (3.14):
>>>>>>>>
>>>>>>>> [   28.989897] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>>>> [   30.991210] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>>>>> [   37.369372] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>>>
>>>>>>> Here, the Generic PHY driver has been selected, which will use the
>>>>>>> MII_BMSR register contents to determine the Link status and
>>>>>>> parameters. You might want to make sure that your board selects the
>>>>>>> appropriate PHY driver, such that we are not chasing two issues here.
>>>>>>>
>>>>>>>> [   38.398346] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>>>> [   39.438412] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>>>> [   39.468419] fec 800f0000.ethernet eth0: MDIO write timeout
>>>>>>>> [   40.498848] fec 800f0000.ethernet eth0: MDIO read timeout
>>>>>>>
>>>>>>> It would also be helpful to print the register that were accessed,
>>>>>>> such that you could correlate this with the exact steps in the PHY
>>>>>>> library state machine. Please also retry the experiment with the SMSC
>>>>>>> PHY driver enabled, as it does some PHY specific initialization that
>>>>>>> seems to be relevant. Then we are hopefully left with only the MDIO
>>>>>>> timeout issue and not the PHY mis-configuration + MDIO timeout.
>>>>>>>
>>>>>>>>
>>>>>>>> Afterward I have to ifdown eth0, ifup eth0 and then it functions
>>>>>>>> normally, without reverting the commit.
>>>>>>>>
>>>>>>>> root@cfa100xx:~# ifdown eth0
>>>>>>>> [ 1154.679658] fec 800f0000.ethernet eth0: Freescale FEC PHY driver
>>>>>>>> [Generic PHY] (mii_bus:phy_addr=800f0000.etherne:00, irq=-1)
>>>>>>>> root@cfa100xx:~# ifup eth0
>>>>>>>> udhcpc (v1.21.1) started
>>>>>>>> Sending discover...
>>>>>>>> [ 1156.679547] libphy: 800f0000.etherne:00 - Link is Up - 100/Full
>>>>>>>> Sending discover...
>>>>>>>> Sending select for 10.10.10.217...
>>>>>>>> Lease of 10.10.10.217 obtained, lease time 86400
>>>>>>>> ip: RTNETLINK answers: File exists
>>>>>>>>
>>>>>>>> --
>>>>>>>> Brian
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, May 6, 2014 at 11:11 AM, Uwe Kleine-König
>>>>>>>> <u.kleine-koenig@pengutronix.de> wrote:
>>>>>>>>> Hello Brian,
>>>>>>>>>
>>>>>>>>> On Tue, May 06, 2014 at 09:44:34AM -0700, Brian Lilly wrote:
>>>>>>>>>> With commit a264b981f2c76e281ef27e7232774bf6c54ec865 we're having eth0
>>>>>>>>>> come up, then brought right back down with an MDIO rx timeout moments
>>>>>>>>>> after.  Adding back in the removed code keeps the interface alive and
>>>>>>>>>> it's working afterward without trouble.  I've tested the re-inserted
>>>>>>>>>> code in 3.12, 3.14 without issue on our boards.
>>>>>>>>> So you can reliably trigger that problem? You're just doing
>>>>>>>>>
>>>>>>>>>         ifconfig eth0 1.2.3.4 up
>>>>>>>>>
>>>>>>>>> (or equivalent) and the interface goes down without further
>>>>>>>>> interference with the above mentioned commit? The exact error you're
>>>>>>>>> seeing is
>>>>>>>>>
>>>>>>>>>         MDIO read timeout
>>>>>>>>>
>>>>>>>>> (with some prefix saying something about fec and eth0 I think)?
>>>>>>>>>
>>>>>>>>> This error is also present with a264b981f2 reverted, just doesn't affect
>>>>>>>>> eth0 being functional? Does the timeout always happen, or only on
>>>>>>>>> specific addresses?
>>>>>>>>>
>>>>>>>>> This is not a proper fix, but does it help to increment FEC_MII_TIMEOUT?
>>>>>>>>>
>>>>>>>>>> Is there something else that can be done to prevent the MDIO timeouts?
>>>>>>>>>> We are using basically the same schematic for networking as the
>>>>>>>>>> imx28evk.
>>>>>>>>> Hard to say, but assuming it works just fine on the imx28evk for you,
>>>>>>>>> too, there seems to be some hardware difference that makes your machine
>>>>>>>>> fail. (That doesn't mean it's not fixable in software.)
>>>>>>>>>
>>>>>>>>> I don't know if a mdio read error is intended to make the device go
>>>>>>>>> down, maybe one the the netdev guys can answer that.
>>>>>>>>> Assuming that it's not intended, instrument the code, find out how that
>>>>>>>>> timeout makes your device go down and find the wrong branch. I'd start
>>>>>>>>> with adding stackdumps when the mdio timeout happens and when
>>>>>>>>> fec_enet_start_xmit is called with fep->link == 0.
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>> Uwe
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Pengutronix e.K.                           | Uwe Kleine-König            |
>>>>>>>>> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Florian
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Florian
>>>
>>>
>>>
>>> --
>>> Florian
>
>
>
> --
> Florian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: i.MX28 based system losing eth0 on boot
  2014-05-07 19:51                   ` Brian Lilly
@ 2014-05-08  1:47                     ` fugang.duan
  0 siblings, 0 replies; 15+ messages in thread
From: fugang.duan @ 2014-05-08  1:47 UTC (permalink / raw)
  To: Brian Lilly, Florian Fainelli
  Cc: Uwe Kleine-König, David S. Miller, Fabio.Estevam,
	Jim Baxter, Frank.Li, netdev, linux-kernel, kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1287 bytes --]

From: Brian Lilly <brian@crystalfontz.com>
Data: Thursday, May 08, 2014 3:52 AM

>To: Florian Fainelli
>Cc: Uwe Kleine-König; David S. Miller; Estevam Fabio-R49496; Jim Baxter; Li Frank-
>B20596; Duan Fugang-B38611; netdev; linux-kernel@vger.kernel.org; kernel
>Subject: Re: i.MX28 based system losing eth0 on boot
>
>Florian:
>
>Thank you for your help.
>
>After doubling the timeout length it worked.
>
>I managed to get my hands on a imx28evk board and compared our component load
>versus theirs, to find they have a 1.5k pull-up on ENET_MDIO to +3.3v which wasn't
>present on our board.  Adding a 1.5k pull-up resistor on ENET_MDIO solves the
>problem, and boots as expected without patching anything.
>
>Sorry for the trouble on this.
>
>Apparently our EE had some question as to whether or not the pull-up was necessary,
>and put it in the schematic, and the footprint on the board, but marked it as a
>DNP, which of course left it off the board and out of the BOM.
[...]

Yes, 1.5K pull-up on MDIO is necessary, otherwise write/read phy register data is not right due to the drive strength is not enough.

Thanks,
Andy
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-05-08  2:02 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-06 16:44 i.MX28 based system losing eth0 on boot Brian Lilly
2014-05-06 18:11 ` Uwe Kleine-König
2014-05-06 18:39   ` Florian Fainelli
2014-05-06 19:12   ` Brian Lilly
2014-05-06 19:24     ` Florian Fainelli
2014-05-06 21:40       ` Brian Lilly
2014-05-06 22:06         ` Florian Fainelli
2014-05-06 22:27           ` Brian Lilly
2014-05-07  3:07             ` Florian Fainelli
2014-05-07 19:16               ` Brian Lilly
2014-05-07 19:34                 ` Florian Fainelli
2014-05-07 19:51                   ` Brian Lilly
2014-05-08  1:47                     ` fugang.duan
2014-05-07  3:17 ` Fabio Estevam
2014-05-07 19:00   ` Brian Lilly

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).