All of lore.kernel.org
 help / color / mirror / Atom feed
* net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-02-02 20:18 ` Erico Nunes
  0 siblings, 0 replies; 49+ messages in thread
From: Erico Nunes @ 2022-02-02 20:18 UTC (permalink / raw)
  To: Alexandre Torgue, Giuseppe Cavallaro, Jerome Brunet, Jose Abreu,
	Kevin Hilman, Martin Blumenstingl, Neil Armstrong, linux-amlogic,
	netdev

Hello,

I've been tracking down an issue with network interfaces from
meson8b-dwmac sometimes not coming up properly at boot.
The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
a group of them as part of a CI test farm that uses nfsroot.

After hopefully ruling out potential platform/firmware and network
issues I managed to bisect this commit in the kernel to make a big
difference:

  46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
link config in mac_link_up()

With a kernel before that commit, I am able to submit hundreds of test
jobs and the boards always start the network interface properly.

After that commit, around 30% of the jobs start hitting this:

  [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
[0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
  [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
MEM_TYPE_PAGE_POOL RxQ-0
  [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
Features support found
  [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
  [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
phy/rmii link mode
  [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
100Mbps/Full - flow control off
  [    3.783162] Sending DHCP requests ...... timed out!
  [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
  [   93.685712] IP-Config: Retrying forever (NFS root)...
  [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
[0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
  [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
MEM_TYPE_PAGE_POOL RxQ-0
  [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
Features support found
  [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
  [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
phy/rmii link mode
  [   93.807459] random: fast init done
  [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
100Mbps/Full - flow control off

This still happens with a kernel from master, currently 5.17-rc2 (less
frequently but still often hit by CI test jobs).
The jobs still usually get to work after restarting the interface a
couple of times, but sometimes it takes 3-4 attempts.

Here is one example and full dmesg:
https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw

Note that DHCP does not seem to be an issue here, besides the fact
that the problem only happens since the mentioned commit under the
same setup, I did try to set up the boards to use a static ip but then
the interfaces just don't communicate at all from boot.

For test purposes I attempted to revert
46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
does not apply trivially anymore, and by trying to revert it manually
I haven't been able to get a working interface.

Any advice on how to further debug or fix this?

Thanks

Erico

^ permalink raw reply	[flat|nested] 49+ messages in thread

* net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-02-02 20:18 ` Erico Nunes
  0 siblings, 0 replies; 49+ messages in thread
From: Erico Nunes @ 2022-02-02 20:18 UTC (permalink / raw)
  To: Alexandre Torgue, Giuseppe Cavallaro, Jerome Brunet, Jose Abreu,
	Kevin Hilman, Martin Blumenstingl, Neil Armstrong, linux-amlogic,
	netdev

Hello,

I've been tracking down an issue with network interfaces from
meson8b-dwmac sometimes not coming up properly at boot.
The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
a group of them as part of a CI test farm that uses nfsroot.

After hopefully ruling out potential platform/firmware and network
issues I managed to bisect this commit in the kernel to make a big
difference:

  46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
link config in mac_link_up()

With a kernel before that commit, I am able to submit hundreds of test
jobs and the boards always start the network interface properly.

After that commit, around 30% of the jobs start hitting this:

  [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
[0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
  [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
MEM_TYPE_PAGE_POOL RxQ-0
  [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
Features support found
  [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
  [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
phy/rmii link mode
  [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
100Mbps/Full - flow control off
  [    3.783162] Sending DHCP requests ...... timed out!
  [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
  [   93.685712] IP-Config: Retrying forever (NFS root)...
  [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
[0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
  [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
MEM_TYPE_PAGE_POOL RxQ-0
  [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
Features support found
  [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
  [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
phy/rmii link mode
  [   93.807459] random: fast init done
  [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
100Mbps/Full - flow control off

This still happens with a kernel from master, currently 5.17-rc2 (less
frequently but still often hit by CI test jobs).
The jobs still usually get to work after restarting the interface a
couple of times, but sometimes it takes 3-4 attempts.

Here is one example and full dmesg:
https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw

Note that DHCP does not seem to be an issue here, besides the fact
that the problem only happens since the mentioned commit under the
same setup, I did try to set up the boards to use a static ip but then
the interfaces just don't communicate at all from boot.

For test purposes I attempted to revert
46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
does not apply trivially anymore, and by trying to revert it manually
I haven't been able to get a working interface.

Any advice on how to further debug or fix this?

Thanks

Erico

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
  2022-02-02 20:18 ` Erico Nunes
@ 2022-02-03 13:53   ` Vyacheslav
  -1 siblings, 0 replies; 49+ messages in thread
From: Vyacheslav @ 2022-02-03 13:53 UTC (permalink / raw)
  To: Erico Nunes, Alexandre Torgue, Giuseppe Cavallaro, Jerome Brunet,
	Jose Abreu, Kevin Hilman, Martin Blumenstingl, Neil Armstrong,
	linux-amlogic, netdev

Hi

I have same problem with meson8b on S905W Amlogic SoC.
"ethtool -r" fixes problem after start

02.02.2022 23:18, Erico Nunes wrote:
> Hello,
> 
> I've been tracking down an issue with network interfaces from
> meson8b-dwmac sometimes not coming up properly at boot.
> The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
> a group of them as part of a CI test farm that uses nfsroot.
> 
> After hopefully ruling out potential platform/firmware and network
> issues I managed to bisect this commit in the kernel to make a big
> difference:
> 
>    46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
> link config in mac_link_up()
> 
> With a kernel before that commit, I am able to submit hundreds of test
> jobs and the boards always start the network interface properly.
> 
> After that commit, around 30% of the jobs start hitting this:
> 
>    [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>    [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
>    [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
>    [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>    [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
>    [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
>    [    3.783162] Sending DHCP requests ...... timed out!
>    [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
>    [   93.685712] IP-Config: Retrying forever (NFS root)...
>    [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>    [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
>    [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
>    [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>    [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
>    [   93.807459] random: fast init done
>    [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
> 
> This still happens with a kernel from master, currently 5.17-rc2 (less
> frequently but still often hit by CI test jobs).
> The jobs still usually get to work after restarting the interface a
> couple of times, but sometimes it takes 3-4 attempts.
> 
> Here is one example and full dmesg:
> https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw
> 
> Note that DHCP does not seem to be an issue here, besides the fact
> that the problem only happens since the mentioned commit under the
> same setup, I did try to set up the boards to use a static ip but then
> the interfaces just don't communicate at all from boot.
> 
> For test purposes I attempted to revert
> 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
> does not apply trivially anymore, and by trying to revert it manually
> I haven't been able to get a working interface.
> 
> Any advice on how to further debug or fix this?
> 
> Thanks
> 
> Erico
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-02-03 13:53   ` Vyacheslav
  0 siblings, 0 replies; 49+ messages in thread
From: Vyacheslav @ 2022-02-03 13:53 UTC (permalink / raw)
  To: Erico Nunes, Alexandre Torgue, Giuseppe Cavallaro, Jerome Brunet,
	Jose Abreu, Kevin Hilman, Martin Blumenstingl, Neil Armstrong,
	linux-amlogic, netdev

Hi

I have same problem with meson8b on S905W Amlogic SoC.
"ethtool -r" fixes problem after start

02.02.2022 23:18, Erico Nunes wrote:
> Hello,
> 
> I've been tracking down an issue with network interfaces from
> meson8b-dwmac sometimes not coming up properly at boot.
> The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
> a group of them as part of a CI test farm that uses nfsroot.
> 
> After hopefully ruling out potential platform/firmware and network
> issues I managed to bisect this commit in the kernel to make a big
> difference:
> 
>    46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
> link config in mac_link_up()
> 
> With a kernel before that commit, I am able to submit hundreds of test
> jobs and the boards always start the network interface properly.
> 
> After that commit, around 30% of the jobs start hitting this:
> 
>    [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>    [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
>    [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
>    [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>    [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
>    [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
>    [    3.783162] Sending DHCP requests ...... timed out!
>    [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
>    [   93.685712] IP-Config: Retrying forever (NFS root)...
>    [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>    [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
>    [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
>    [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>    [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
>    [   93.807459] random: fast init done
>    [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
> 
> This still happens with a kernel from master, currently 5.17-rc2 (less
> frequently but still often hit by CI test jobs).
> The jobs still usually get to work after restarting the interface a
> couple of times, but sometimes it takes 3-4 attempts.
> 
> Here is one example and full dmesg:
> https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw
> 
> Note that DHCP does not seem to be an issue here, besides the fact
> that the problem only happens since the mentioned commit under the
> same setup, I did try to set up the boards to use a static ip but then
> the interfaces just don't communicate at all from boot.
> 
> For test purposes I attempted to revert
> 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
> does not apply trivially anymore, and by trying to revert it manually
> I haven't been able to get a working interface.
> 
> Any advice on how to further debug or fix this?
> 
> Thanks
> 
> Erico
> 

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
  2022-02-02 20:18 ` Erico Nunes
  (?)
@ 2022-02-07 10:41   ` Jerome Brunet
  -1 siblings, 0 replies; 49+ messages in thread
From: Jerome Brunet @ 2022-02-07 10:41 UTC (permalink / raw)
  To: Erico Nunes, Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu,
	Kevin Hilman, Martin Blumenstingl, Neil Armstrong, linux-amlogic,
	netdev, linux-rockchip, linux-sunxi


On Wed 02 Feb 2022 at 21:18, Erico Nunes <nunes.erico@gmail.com> wrote:

> Hello,
>
> I've been tracking down an issue with network interfaces from
> meson8b-dwmac sometimes not coming up properly at boot.
> The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
> a group of them as part of a CI test farm that uses nfsroot.
>
> After hopefully ruling out potential platform/firmware and network
> issues I managed to bisect this commit in the kernel to make a big
> difference:
>
>   46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
> link config in mac_link_up()
>
> With a kernel before that commit, I am able to submit hundreds of test
> jobs and the boards always start the network interface properly.
>
> After that commit, around 30% of the jobs start hitting this:
>
>   [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>   [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
>   [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
>   [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>   [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
>   [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
>   [    3.783162] Sending DHCP requests ...... timed out!
>   [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
>   [   93.685712] IP-Config: Retrying forever (NFS root)...
>   [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>   [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
>   [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
>   [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>   [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
>   [   93.807459] random: fast init done
>   [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
>
> This still happens with a kernel from master, currently 5.17-rc2 (less
> frequently but still often hit by CI test jobs).
> The jobs still usually get to work after restarting the interface a
> couple of times, but sometimes it takes 3-4 attempts.
>
> Here is one example and full dmesg:
> https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw
>
> Note that DHCP does not seem to be an issue here, besides the fact
> that the problem only happens since the mentioned commit under the
> same setup, I did try to set up the boards to use a static ip but then
> the interfaces just don't communicate at all from boot.
>
> For test purposes I attempted to revert
> 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
> does not apply trivially anymore, and by trying to revert it manually
> I haven't been able to get a working interface.
>
> Any advice on how to further debug or fix this?

Hi Erico,

Thanks a lot for digging into this topic.
I'm seeing exactly the same behavior on the g12 based khadas-vim3:

* Boot stalled waiting for DHCP - with an NFS based filesystem
* Every minute, the network driver gets a reset and try again

Sometimes it works on the first attempt, sometimes it takes up to 5
attempts. Eventually, it reaches the prompt which might be why it went
unnoticed so far.

I think that NFS just makes the problem easier to see.
On devices with an eMMC based filesystem, I noticed that, sometimes, I
had unplug/plug the ethernet cable to make it go.

So far, the problem is reported on all the Amlogic SoC generation we
support. I think a way forward is to ask the the other users of
stmmac whether they have this problem or not - adding Allwinner and
Rockchip ML.

Since the commit you have identified is in the generic part of the
stmmac code, Maybe Jose can help us understand what is going on.

>
> Thanks
>
> Erico


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-02-07 10:41   ` Jerome Brunet
  0 siblings, 0 replies; 49+ messages in thread
From: Jerome Brunet @ 2022-02-07 10:41 UTC (permalink / raw)
  To: Erico Nunes, Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu,
	Kevin Hilman, Martin Blumenstingl, Neil Armstrong, linux-amlogic,
	netdev, linux-rockchip, linux-sunxi


On Wed 02 Feb 2022 at 21:18, Erico Nunes <nunes.erico@gmail.com> wrote:

> Hello,
>
> I've been tracking down an issue with network interfaces from
> meson8b-dwmac sometimes not coming up properly at boot.
> The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
> a group of them as part of a CI test farm that uses nfsroot.
>
> After hopefully ruling out potential platform/firmware and network
> issues I managed to bisect this commit in the kernel to make a big
> difference:
>
>   46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
> link config in mac_link_up()
>
> With a kernel before that commit, I am able to submit hundreds of test
> jobs and the boards always start the network interface properly.
>
> After that commit, around 30% of the jobs start hitting this:
>
>   [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>   [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
>   [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
>   [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>   [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
>   [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
>   [    3.783162] Sending DHCP requests ...... timed out!
>   [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
>   [   93.685712] IP-Config: Retrying forever (NFS root)...
>   [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>   [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
>   [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
>   [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>   [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
>   [   93.807459] random: fast init done
>   [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
>
> This still happens with a kernel from master, currently 5.17-rc2 (less
> frequently but still often hit by CI test jobs).
> The jobs still usually get to work after restarting the interface a
> couple of times, but sometimes it takes 3-4 attempts.
>
> Here is one example and full dmesg:
> https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw
>
> Note that DHCP does not seem to be an issue here, besides the fact
> that the problem only happens since the mentioned commit under the
> same setup, I did try to set up the boards to use a static ip but then
> the interfaces just don't communicate at all from boot.
>
> For test purposes I attempted to revert
> 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
> does not apply trivially anymore, and by trying to revert it manually
> I haven't been able to get a working interface.
>
> Any advice on how to further debug or fix this?

Hi Erico,

Thanks a lot for digging into this topic.
I'm seeing exactly the same behavior on the g12 based khadas-vim3:

* Boot stalled waiting for DHCP - with an NFS based filesystem
* Every minute, the network driver gets a reset and try again

Sometimes it works on the first attempt, sometimes it takes up to 5
attempts. Eventually, it reaches the prompt which might be why it went
unnoticed so far.

I think that NFS just makes the problem easier to see.
On devices with an eMMC based filesystem, I noticed that, sometimes, I
had unplug/plug the ethernet cable to make it go.

So far, the problem is reported on all the Amlogic SoC generation we
support. I think a way forward is to ask the the other users of
stmmac whether they have this problem or not - adding Allwinner and
Rockchip ML.

Since the commit you have identified is in the generic part of the
stmmac code, Maybe Jose can help us understand what is going on.

>
> Thanks
>
> Erico


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-02-07 10:41   ` Jerome Brunet
  0 siblings, 0 replies; 49+ messages in thread
From: Jerome Brunet @ 2022-02-07 10:41 UTC (permalink / raw)
  To: Erico Nunes, Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu,
	Kevin Hilman, Martin Blumenstingl, Neil Armstrong, linux-amlogic,
	netdev, linux-rockchip, linux-sunxi


On Wed 02 Feb 2022 at 21:18, Erico Nunes <nunes.erico@gmail.com> wrote:

> Hello,
>
> I've been tracking down an issue with network interfaces from
> meson8b-dwmac sometimes not coming up properly at boot.
> The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
> a group of them as part of a CI test farm that uses nfsroot.
>
> After hopefully ruling out potential platform/firmware and network
> issues I managed to bisect this commit in the kernel to make a big
> difference:
>
>   46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
> link config in mac_link_up()
>
> With a kernel before that commit, I am able to submit hundreds of test
> jobs and the boards always start the network interface properly.
>
> After that commit, around 30% of the jobs start hitting this:
>
>   [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>   [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
>   [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
>   [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>   [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
>   [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
>   [    3.783162] Sending DHCP requests ...... timed out!
>   [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
>   [   93.685712] IP-Config: Retrying forever (NFS root)...
>   [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>   [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
>   [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
>   [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>   [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
>   [   93.807459] random: fast init done
>   [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
>
> This still happens with a kernel from master, currently 5.17-rc2 (less
> frequently but still often hit by CI test jobs).
> The jobs still usually get to work after restarting the interface a
> couple of times, but sometimes it takes 3-4 attempts.
>
> Here is one example and full dmesg:
> https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw
>
> Note that DHCP does not seem to be an issue here, besides the fact
> that the problem only happens since the mentioned commit under the
> same setup, I did try to set up the boards to use a static ip but then
> the interfaces just don't communicate at all from boot.
>
> For test purposes I attempted to revert
> 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
> does not apply trivially anymore, and by trying to revert it manually
> I haven't been able to get a working interface.
>
> Any advice on how to further debug or fix this?

Hi Erico,

Thanks a lot for digging into this topic.
I'm seeing exactly the same behavior on the g12 based khadas-vim3:

* Boot stalled waiting for DHCP - with an NFS based filesystem
* Every minute, the network driver gets a reset and try again

Sometimes it works on the first attempt, sometimes it takes up to 5
attempts. Eventually, it reaches the prompt which might be why it went
unnoticed so far.

I think that NFS just makes the problem easier to see.
On devices with an eMMC based filesystem, I noticed that, sometimes, I
had unplug/plug the ethernet cable to make it go.

So far, the problem is reported on all the Amlogic SoC generation we
support. I think a way forward is to ask the the other users of
stmmac whether they have this problem or not - adding Allwinner and
Rockchip ML.

Since the commit you have identified is in the generic part of the
stmmac code, Maybe Jose can help us understand what is going on.

>
> Thanks
>
> Erico


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
  2022-02-07 10:41   ` Jerome Brunet
  (?)
@ 2022-02-20 16:51     ` Erico Nunes
  -1 siblings, 0 replies; 49+ messages in thread
From: Erico Nunes @ 2022-02-20 16:51 UTC (permalink / raw)
  To: Jerome Brunet
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Martin Blumenstingl, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On Mon, Feb 7, 2022 at 11:56 AM Jerome Brunet <jbrunet@baylibre.com> wrote:
>
>
> On Wed 02 Feb 2022 at 21:18, Erico Nunes <nunes.erico@gmail.com> wrote:
>
> > Hello,
> >
> > I've been tracking down an issue with network interfaces from
> > meson8b-dwmac sometimes not coming up properly at boot.
> > The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
> > a group of them as part of a CI test farm that uses nfsroot.
> >
> > After hopefully ruling out potential platform/firmware and network
> > issues I managed to bisect this commit in the kernel to make a big
> > difference:
> >
> >   46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
> > link config in mac_link_up()
> >
> > With a kernel before that commit, I am able to submit hundreds of test
> > jobs and the boards always start the network interface properly.
> >
> > After that commit, around 30% of the jobs start hitting this:
> >
> >   [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
> > [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> >   [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
> > MEM_TYPE_PAGE_POOL RxQ-0
> >   [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
> > Features support found
> >   [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> >   [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
> > phy/rmii link mode
> >   [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> > 100Mbps/Full - flow control off
> >   [    3.783162] Sending DHCP requests ...... timed out!
> >   [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
> >   [   93.685712] IP-Config: Retrying forever (NFS root)...
> >   [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
> > [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> >   [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
> > MEM_TYPE_PAGE_POOL RxQ-0
> >   [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
> > Features support found
> >   [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> >   [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
> > phy/rmii link mode
> >   [   93.807459] random: fast init done
> >   [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> > 100Mbps/Full - flow control off
> >
> > This still happens with a kernel from master, currently 5.17-rc2 (less
> > frequently but still often hit by CI test jobs).
> > The jobs still usually get to work after restarting the interface a
> > couple of times, but sometimes it takes 3-4 attempts.
> >
> > Here is one example and full dmesg:
> > https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw
> >
> > Note that DHCP does not seem to be an issue here, besides the fact
> > that the problem only happens since the mentioned commit under the
> > same setup, I did try to set up the boards to use a static ip but then
> > the interfaces just don't communicate at all from boot.
> >
> > For test purposes I attempted to revert
> > 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
> > does not apply trivially anymore, and by trying to revert it manually
> > I haven't been able to get a working interface.
> >
> > Any advice on how to further debug or fix this?
>
> Hi Erico,
>
> Thanks a lot for digging into this topic.
> I'm seeing exactly the same behavior on the g12 based khadas-vim3:
>
> * Boot stalled waiting for DHCP - with an NFS based filesystem
> * Every minute, the network driver gets a reset and try again
>
> Sometimes it works on the first attempt, sometimes it takes up to 5
> attempts. Eventually, it reaches the prompt which might be why it went
> unnoticed so far.
>
> I think that NFS just makes the problem easier to see.
> On devices with an eMMC based filesystem, I noticed that, sometimes, I
> had unplug/plug the ethernet cable to make it go.
>
> So far, the problem is reported on all the Amlogic SoC generation we
> support. I think a way forward is to ask the the other users of
> stmmac whether they have this problem or not - adding Allwinner and
> Rockchip ML.
>
> Since the commit you have identified is in the generic part of the
> stmmac code, Maybe Jose can help us understand what is going on.

Hi all,

thanks for the feedback so far, good to know that this is not only on
my board farm.

Any more feedback about this from the people in cc?

Thanks

Erico

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-02-20 16:51     ` Erico Nunes
  0 siblings, 0 replies; 49+ messages in thread
From: Erico Nunes @ 2022-02-20 16:51 UTC (permalink / raw)
  To: Jerome Brunet
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Martin Blumenstingl, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On Mon, Feb 7, 2022 at 11:56 AM Jerome Brunet <jbrunet@baylibre.com> wrote:
>
>
> On Wed 02 Feb 2022 at 21:18, Erico Nunes <nunes.erico@gmail.com> wrote:
>
> > Hello,
> >
> > I've been tracking down an issue with network interfaces from
> > meson8b-dwmac sometimes not coming up properly at boot.
> > The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
> > a group of them as part of a CI test farm that uses nfsroot.
> >
> > After hopefully ruling out potential platform/firmware and network
> > issues I managed to bisect this commit in the kernel to make a big
> > difference:
> >
> >   46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
> > link config in mac_link_up()
> >
> > With a kernel before that commit, I am able to submit hundreds of test
> > jobs and the boards always start the network interface properly.
> >
> > After that commit, around 30% of the jobs start hitting this:
> >
> >   [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
> > [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> >   [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
> > MEM_TYPE_PAGE_POOL RxQ-0
> >   [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
> > Features support found
> >   [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> >   [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
> > phy/rmii link mode
> >   [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> > 100Mbps/Full - flow control off
> >   [    3.783162] Sending DHCP requests ...... timed out!
> >   [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
> >   [   93.685712] IP-Config: Retrying forever (NFS root)...
> >   [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
> > [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> >   [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
> > MEM_TYPE_PAGE_POOL RxQ-0
> >   [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
> > Features support found
> >   [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> >   [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
> > phy/rmii link mode
> >   [   93.807459] random: fast init done
> >   [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> > 100Mbps/Full - flow control off
> >
> > This still happens with a kernel from master, currently 5.17-rc2 (less
> > frequently but still often hit by CI test jobs).
> > The jobs still usually get to work after restarting the interface a
> > couple of times, but sometimes it takes 3-4 attempts.
> >
> > Here is one example and full dmesg:
> > https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw
> >
> > Note that DHCP does not seem to be an issue here, besides the fact
> > that the problem only happens since the mentioned commit under the
> > same setup, I did try to set up the boards to use a static ip but then
> > the interfaces just don't communicate at all from boot.
> >
> > For test purposes I attempted to revert
> > 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
> > does not apply trivially anymore, and by trying to revert it manually
> > I haven't been able to get a working interface.
> >
> > Any advice on how to further debug or fix this?
>
> Hi Erico,
>
> Thanks a lot for digging into this topic.
> I'm seeing exactly the same behavior on the g12 based khadas-vim3:
>
> * Boot stalled waiting for DHCP - with an NFS based filesystem
> * Every minute, the network driver gets a reset and try again
>
> Sometimes it works on the first attempt, sometimes it takes up to 5
> attempts. Eventually, it reaches the prompt which might be why it went
> unnoticed so far.
>
> I think that NFS just makes the problem easier to see.
> On devices with an eMMC based filesystem, I noticed that, sometimes, I
> had unplug/plug the ethernet cable to make it go.
>
> So far, the problem is reported on all the Amlogic SoC generation we
> support. I think a way forward is to ask the the other users of
> stmmac whether they have this problem or not - adding Allwinner and
> Rockchip ML.
>
> Since the commit you have identified is in the generic part of the
> stmmac code, Maybe Jose can help us understand what is going on.

Hi all,

thanks for the feedback so far, good to know that this is not only on
my board farm.

Any more feedback about this from the people in cc?

Thanks

Erico

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-02-20 16:51     ` Erico Nunes
  0 siblings, 0 replies; 49+ messages in thread
From: Erico Nunes @ 2022-02-20 16:51 UTC (permalink / raw)
  To: Jerome Brunet
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Martin Blumenstingl, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On Mon, Feb 7, 2022 at 11:56 AM Jerome Brunet <jbrunet@baylibre.com> wrote:
>
>
> On Wed 02 Feb 2022 at 21:18, Erico Nunes <nunes.erico@gmail.com> wrote:
>
> > Hello,
> >
> > I've been tracking down an issue with network interfaces from
> > meson8b-dwmac sometimes not coming up properly at boot.
> > The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
> > a group of them as part of a CI test farm that uses nfsroot.
> >
> > After hopefully ruling out potential platform/firmware and network
> > issues I managed to bisect this commit in the kernel to make a big
> > difference:
> >
> >   46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
> > link config in mac_link_up()
> >
> > With a kernel before that commit, I am able to submit hundreds of test
> > jobs and the boards always start the network interface properly.
> >
> > After that commit, around 30% of the jobs start hitting this:
> >
> >   [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
> > [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> >   [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
> > MEM_TYPE_PAGE_POOL RxQ-0
> >   [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
> > Features support found
> >   [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> >   [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
> > phy/rmii link mode
> >   [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> > 100Mbps/Full - flow control off
> >   [    3.783162] Sending DHCP requests ...... timed out!
> >   [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
> >   [   93.685712] IP-Config: Retrying forever (NFS root)...
> >   [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
> > [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> >   [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
> > MEM_TYPE_PAGE_POOL RxQ-0
> >   [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
> > Features support found
> >   [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> >   [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
> > phy/rmii link mode
> >   [   93.807459] random: fast init done
> >   [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> > 100Mbps/Full - flow control off
> >
> > This still happens with a kernel from master, currently 5.17-rc2 (less
> > frequently but still often hit by CI test jobs).
> > The jobs still usually get to work after restarting the interface a
> > couple of times, but sometimes it takes 3-4 attempts.
> >
> > Here is one example and full dmesg:
> > https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw
> >
> > Note that DHCP does not seem to be an issue here, besides the fact
> > that the problem only happens since the mentioned commit under the
> > same setup, I did try to set up the boards to use a static ip but then
> > the interfaces just don't communicate at all from boot.
> >
> > For test purposes I attempted to revert
> > 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
> > does not apply trivially anymore, and by trying to revert it manually
> > I haven't been able to get a working interface.
> >
> > Any advice on how to further debug or fix this?
>
> Hi Erico,
>
> Thanks a lot for digging into this topic.
> I'm seeing exactly the same behavior on the g12 based khadas-vim3:
>
> * Boot stalled waiting for DHCP - with an NFS based filesystem
> * Every minute, the network driver gets a reset and try again
>
> Sometimes it works on the first attempt, sometimes it takes up to 5
> attempts. Eventually, it reaches the prompt which might be why it went
> unnoticed so far.
>
> I think that NFS just makes the problem easier to see.
> On devices with an eMMC based filesystem, I noticed that, sometimes, I
> had unplug/plug the ethernet cable to make it go.
>
> So far, the problem is reported on all the Amlogic SoC generation we
> support. I think a way forward is to ask the the other users of
> stmmac whether they have this problem or not - adding Allwinner and
> Rockchip ML.
>
> Since the commit you have identified is in the generic part of the
> stmmac code, Maybe Jose can help us understand what is going on.

Hi all,

thanks for the feedback so far, good to know that this is not only on
my board farm.

Any more feedback about this from the people in cc?

Thanks

Erico

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
  2022-02-20 16:51     ` Erico Nunes
  (?)
@ 2022-02-22  2:30       ` Samuel Holland
  -1 siblings, 0 replies; 49+ messages in thread
From: Samuel Holland @ 2022-02-22  2:30 UTC (permalink / raw)
  To: Erico Nunes, Jerome Brunet
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Martin Blumenstingl, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On 2/20/22 10:51 AM, Erico Nunes wrote:
> On Mon, Feb 7, 2022 at 11:56 AM Jerome Brunet <jbrunet@baylibre.com> wrote:
>>
>>
>> On Wed 02 Feb 2022 at 21:18, Erico Nunes <nunes.erico@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I've been tracking down an issue with network interfaces from
>>> meson8b-dwmac sometimes not coming up properly at boot.
>>> The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
>>> a group of them as part of a CI test farm that uses nfsroot.
>>>
>>> After hopefully ruling out potential platform/firmware and network
>>> issues I managed to bisect this commit in the kernel to make a big
>>> difference:
>>>
>>>   46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
>>> link config in mac_link_up()
>>>
>>> With a kernel before that commit, I am able to submit hundreds of test
>>> jobs and the boards always start the network interface properly.
>>>
>>> After that commit, around 30% of the jobs start hitting this:
>>>
>>>   [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
>>> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>>>   [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
>>> MEM_TYPE_PAGE_POOL RxQ-0
>>>   [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
>>> Features support found
>>>   [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>>>   [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
>>> phy/rmii link mode
>>>   [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
>>> 100Mbps/Full - flow control off
>>>   [    3.783162] Sending DHCP requests ...... timed out!
>>>   [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
>>>   [   93.685712] IP-Config: Retrying forever (NFS root)...
>>>   [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
>>> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>>>   [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
>>> MEM_TYPE_PAGE_POOL RxQ-0
>>>   [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
>>> Features support found
>>>   [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>>>   [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
>>> phy/rmii link mode
>>>   [   93.807459] random: fast init done
>>>   [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
>>> 100Mbps/Full - flow control off
>>>
>>> This still happens with a kernel from master, currently 5.17-rc2 (less
>>> frequently but still often hit by CI test jobs).
>>> The jobs still usually get to work after restarting the interface a
>>> couple of times, but sometimes it takes 3-4 attempts.
>>>
>>> Here is one example and full dmesg:
>>> https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw
>>>
>>> Note that DHCP does not seem to be an issue here, besides the fact
>>> that the problem only happens since the mentioned commit under the
>>> same setup, I did try to set up the boards to use a static ip but then
>>> the interfaces just don't communicate at all from boot.
>>>
>>> For test purposes I attempted to revert
>>> 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
>>> does not apply trivially anymore, and by trying to revert it manually
>>> I haven't been able to get a working interface.
>>>
>>> Any advice on how to further debug or fix this?
>>
>> Hi Erico,
>>
>> Thanks a lot for digging into this topic.
>> I'm seeing exactly the same behavior on the g12 based khadas-vim3:
>>
>> * Boot stalled waiting for DHCP - with an NFS based filesystem
>> * Every minute, the network driver gets a reset and try again
>>
>> Sometimes it works on the first attempt, sometimes it takes up to 5
>> attempts. Eventually, it reaches the prompt which might be why it went
>> unnoticed so far.
>>
>> I think that NFS just makes the problem easier to see.
>> On devices with an eMMC based filesystem, I noticed that, sometimes, I
>> had unplug/plug the ethernet cable to make it go.
>>
>> So far, the problem is reported on all the Amlogic SoC generation we
>> support. I think a way forward is to ask the the other users of
>> stmmac whether they have this problem or not - adding Allwinner and
>> Rockchip ML.
>>
>> Since the commit you have identified is in the generic part of the
>> stmmac code, Maybe Jose can help us understand what is going on.
> 
> Hi all,
> 
> thanks for the feedback so far, good to know that this is not only on
> my board farm.
> 
> Any more feedback about this from the people in cc?

The commit in question appears to have been merged in v5.7. I have been using
kernels newer than that (including up to v5.17-rc) on various Allwinner
platforms -- A64, H3, H6, D1 -- and I have not seen anything similar. I also
don't remember seeing reports of others having Ethernet issues at boot on
Allwinner boards either.

The only issue that's come up recently for us was related to runtime PM, but
that issue was traced to a commit a year later than the one you referenced here
(5ec55823438e).

Regards,
Samuel

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-02-22  2:30       ` Samuel Holland
  0 siblings, 0 replies; 49+ messages in thread
From: Samuel Holland @ 2022-02-22  2:30 UTC (permalink / raw)
  To: Erico Nunes, Jerome Brunet
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Martin Blumenstingl, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On 2/20/22 10:51 AM, Erico Nunes wrote:
> On Mon, Feb 7, 2022 at 11:56 AM Jerome Brunet <jbrunet@baylibre.com> wrote:
>>
>>
>> On Wed 02 Feb 2022 at 21:18, Erico Nunes <nunes.erico@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I've been tracking down an issue with network interfaces from
>>> meson8b-dwmac sometimes not coming up properly at boot.
>>> The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
>>> a group of them as part of a CI test farm that uses nfsroot.
>>>
>>> After hopefully ruling out potential platform/firmware and network
>>> issues I managed to bisect this commit in the kernel to make a big
>>> difference:
>>>
>>>   46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
>>> link config in mac_link_up()
>>>
>>> With a kernel before that commit, I am able to submit hundreds of test
>>> jobs and the boards always start the network interface properly.
>>>
>>> After that commit, around 30% of the jobs start hitting this:
>>>
>>>   [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
>>> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>>>   [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
>>> MEM_TYPE_PAGE_POOL RxQ-0
>>>   [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
>>> Features support found
>>>   [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>>>   [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
>>> phy/rmii link mode
>>>   [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
>>> 100Mbps/Full - flow control off
>>>   [    3.783162] Sending DHCP requests ...... timed out!
>>>   [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
>>>   [   93.685712] IP-Config: Retrying forever (NFS root)...
>>>   [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
>>> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>>>   [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
>>> MEM_TYPE_PAGE_POOL RxQ-0
>>>   [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
>>> Features support found
>>>   [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>>>   [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
>>> phy/rmii link mode
>>>   [   93.807459] random: fast init done
>>>   [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
>>> 100Mbps/Full - flow control off
>>>
>>> This still happens with a kernel from master, currently 5.17-rc2 (less
>>> frequently but still often hit by CI test jobs).
>>> The jobs still usually get to work after restarting the interface a
>>> couple of times, but sometimes it takes 3-4 attempts.
>>>
>>> Here is one example and full dmesg:
>>> https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw
>>>
>>> Note that DHCP does not seem to be an issue here, besides the fact
>>> that the problem only happens since the mentioned commit under the
>>> same setup, I did try to set up the boards to use a static ip but then
>>> the interfaces just don't communicate at all from boot.
>>>
>>> For test purposes I attempted to revert
>>> 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
>>> does not apply trivially anymore, and by trying to revert it manually
>>> I haven't been able to get a working interface.
>>>
>>> Any advice on how to further debug or fix this?
>>
>> Hi Erico,
>>
>> Thanks a lot for digging into this topic.
>> I'm seeing exactly the same behavior on the g12 based khadas-vim3:
>>
>> * Boot stalled waiting for DHCP - with an NFS based filesystem
>> * Every minute, the network driver gets a reset and try again
>>
>> Sometimes it works on the first attempt, sometimes it takes up to 5
>> attempts. Eventually, it reaches the prompt which might be why it went
>> unnoticed so far.
>>
>> I think that NFS just makes the problem easier to see.
>> On devices with an eMMC based filesystem, I noticed that, sometimes, I
>> had unplug/plug the ethernet cable to make it go.
>>
>> So far, the problem is reported on all the Amlogic SoC generation we
>> support. I think a way forward is to ask the the other users of
>> stmmac whether they have this problem or not - adding Allwinner and
>> Rockchip ML.
>>
>> Since the commit you have identified is in the generic part of the
>> stmmac code, Maybe Jose can help us understand what is going on.
> 
> Hi all,
> 
> thanks for the feedback so far, good to know that this is not only on
> my board farm.
> 
> Any more feedback about this from the people in cc?

The commit in question appears to have been merged in v5.7. I have been using
kernels newer than that (including up to v5.17-rc) on various Allwinner
platforms -- A64, H3, H6, D1 -- and I have not seen anything similar. I also
don't remember seeing reports of others having Ethernet issues at boot on
Allwinner boards either.

The only issue that's come up recently for us was related to runtime PM, but
that issue was traced to a commit a year later than the one you referenced here
(5ec55823438e).

Regards,
Samuel

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-02-22  2:30       ` Samuel Holland
  0 siblings, 0 replies; 49+ messages in thread
From: Samuel Holland @ 2022-02-22  2:30 UTC (permalink / raw)
  To: Erico Nunes, Jerome Brunet
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Martin Blumenstingl, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On 2/20/22 10:51 AM, Erico Nunes wrote:
> On Mon, Feb 7, 2022 at 11:56 AM Jerome Brunet <jbrunet@baylibre.com> wrote:
>>
>>
>> On Wed 02 Feb 2022 at 21:18, Erico Nunes <nunes.erico@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I've been tracking down an issue with network interfaces from
>>> meson8b-dwmac sometimes not coming up properly at boot.
>>> The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
>>> a group of them as part of a CI test farm that uses nfsroot.
>>>
>>> After hopefully ruling out potential platform/firmware and network
>>> issues I managed to bisect this commit in the kernel to make a big
>>> difference:
>>>
>>>   46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
>>> link config in mac_link_up()
>>>
>>> With a kernel before that commit, I am able to submit hundreds of test
>>> jobs and the boards always start the network interface properly.
>>>
>>> After that commit, around 30% of the jobs start hitting this:
>>>
>>>   [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
>>> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>>>   [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
>>> MEM_TYPE_PAGE_POOL RxQ-0
>>>   [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
>>> Features support found
>>>   [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>>>   [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
>>> phy/rmii link mode
>>>   [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
>>> 100Mbps/Full - flow control off
>>>   [    3.783162] Sending DHCP requests ...... timed out!
>>>   [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
>>>   [   93.685712] IP-Config: Retrying forever (NFS root)...
>>>   [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
>>> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>>>   [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
>>> MEM_TYPE_PAGE_POOL RxQ-0
>>>   [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
>>> Features support found
>>>   [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>>>   [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
>>> phy/rmii link mode
>>>   [   93.807459] random: fast init done
>>>   [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
>>> 100Mbps/Full - flow control off
>>>
>>> This still happens with a kernel from master, currently 5.17-rc2 (less
>>> frequently but still often hit by CI test jobs).
>>> The jobs still usually get to work after restarting the interface a
>>> couple of times, but sometimes it takes 3-4 attempts.
>>>
>>> Here is one example and full dmesg:
>>> https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw
>>>
>>> Note that DHCP does not seem to be an issue here, besides the fact
>>> that the problem only happens since the mentioned commit under the
>>> same setup, I did try to set up the boards to use a static ip but then
>>> the interfaces just don't communicate at all from boot.
>>>
>>> For test purposes I attempted to revert
>>> 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
>>> does not apply trivially anymore, and by trying to revert it manually
>>> I haven't been able to get a working interface.
>>>
>>> Any advice on how to further debug or fix this?
>>
>> Hi Erico,
>>
>> Thanks a lot for digging into this topic.
>> I'm seeing exactly the same behavior on the g12 based khadas-vim3:
>>
>> * Boot stalled waiting for DHCP - with an NFS based filesystem
>> * Every minute, the network driver gets a reset and try again
>>
>> Sometimes it works on the first attempt, sometimes it takes up to 5
>> attempts. Eventually, it reaches the prompt which might be why it went
>> unnoticed so far.
>>
>> I think that NFS just makes the problem easier to see.
>> On devices with an eMMC based filesystem, I noticed that, sometimes, I
>> had unplug/plug the ethernet cable to make it go.
>>
>> So far, the problem is reported on all the Amlogic SoC generation we
>> support. I think a way forward is to ask the the other users of
>> stmmac whether they have this problem or not - adding Allwinner and
>> Rockchip ML.
>>
>> Since the commit you have identified is in the generic part of the
>> stmmac code, Maybe Jose can help us understand what is going on.
> 
> Hi all,
> 
> thanks for the feedback so far, good to know that this is not only on
> my board farm.
> 
> Any more feedback about this from the people in cc?

The commit in question appears to have been merged in v5.7. I have been using
kernels newer than that (including up to v5.17-rc) on various Allwinner
platforms -- A64, H3, H6, D1 -- and I have not seen anything similar. I also
don't remember seeing reports of others having Ethernet issues at boot on
Allwinner boards either.

The only issue that's come up recently for us was related to runtime PM, but
that issue was traced to a commit a year later than the one you referenced here
(5ec55823438e).

Regards,
Samuel

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
  2022-02-20 16:51     ` Erico Nunes
  (?)
@ 2022-02-26 13:53       ` Heiner Kallweit
  -1 siblings, 0 replies; 49+ messages in thread
From: Heiner Kallweit @ 2022-02-26 13:53 UTC (permalink / raw)
  To: Erico Nunes, Jerome Brunet
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Martin Blumenstingl, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On 20.02.2022 17:51, Erico Nunes wrote:
> On Mon, Feb 7, 2022 at 11:56 AM Jerome Brunet <jbrunet@baylibre.com> wrote:
>>
>>
>> On Wed 02 Feb 2022 at 21:18, Erico Nunes <nunes.erico@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I've been tracking down an issue with network interfaces from
>>> meson8b-dwmac sometimes not coming up properly at boot.
>>> The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
>>> a group of them as part of a CI test farm that uses nfsroot.
>>>
>>> After hopefully ruling out potential platform/firmware and network
>>> issues I managed to bisect this commit in the kernel to make a big
>>> difference:
>>>
>>>   46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
>>> link config in mac_link_up()
>>>
>>> With a kernel before that commit, I am able to submit hundreds of test
>>> jobs and the boards always start the network interface properly.
>>>
>>> After that commit, around 30% of the jobs start hitting this:
>>>
>>>   [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
>>> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>>>   [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
>>> MEM_TYPE_PAGE_POOL RxQ-0
>>>   [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
>>> Features support found
>>>   [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>>>   [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
>>> phy/rmii link mode
>>>   [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
>>> 100Mbps/Full - flow control off
>>>   [    3.783162] Sending DHCP requests ...... timed out!
>>>   [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
>>>   [   93.685712] IP-Config: Retrying forever (NFS root)...
>>>   [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
>>> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>>>   [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
>>> MEM_TYPE_PAGE_POOL RxQ-0
>>>   [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
>>> Features support found
>>>   [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>>>   [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
>>> phy/rmii link mode
>>>   [   93.807459] random: fast init done
>>>   [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
>>> 100Mbps/Full - flow control off
>>>
>>> This still happens with a kernel from master, currently 5.17-rc2 (less
>>> frequently but still often hit by CI test jobs).
>>> The jobs still usually get to work after restarting the interface a
>>> couple of times, but sometimes it takes 3-4 attempts.
>>>
>>> Here is one example and full dmesg:
>>> https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw
>>>
>>> Note that DHCP does not seem to be an issue here, besides the fact
>>> that the problem only happens since the mentioned commit under the
>>> same setup, I did try to set up the boards to use a static ip but then
>>> the interfaces just don't communicate at all from boot.
>>>
>>> For test purposes I attempted to revert
>>> 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
>>> does not apply trivially anymore, and by trying to revert it manually
>>> I haven't been able to get a working interface.
>>>
>>> Any advice on how to further debug or fix this?
>>
>> Hi Erico,
>>
>> Thanks a lot for digging into this topic.
>> I'm seeing exactly the same behavior on the g12 based khadas-vim3:
>>
>> * Boot stalled waiting for DHCP - with an NFS based filesystem
>> * Every minute, the network driver gets a reset and try again
>>
>> Sometimes it works on the first attempt, sometimes it takes up to 5
>> attempts. Eventually, it reaches the prompt which might be why it went
>> unnoticed so far.
>>
>> I think that NFS just makes the problem easier to see.
>> On devices with an eMMC based filesystem, I noticed that, sometimes, I
>> had unplug/plug the ethernet cable to make it go.
>>
>> So far, the problem is reported on all the Amlogic SoC generation we
>> support. I think a way forward is to ask the the other users of
>> stmmac whether they have this problem or not - adding Allwinner and
>> Rockchip ML.
>>
>> Since the commit you have identified is in the generic part of the
>> stmmac code, Maybe Jose can help us understand what is going on.
> 
> Hi all,
> 
> thanks for the feedback so far, good to know that this is not only on
> my board farm.
> 
> Any more feedback about this from the people in cc?
> 
> Thanks
> 
> Erico

Just to rule out that the PHY may be involved:
- Does the issue occur with internal and/or external PHY?
- Issue still occurs in PHY polling mode? (disable PHY interrupt in dts)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-02-26 13:53       ` Heiner Kallweit
  0 siblings, 0 replies; 49+ messages in thread
From: Heiner Kallweit @ 2022-02-26 13:53 UTC (permalink / raw)
  To: Erico Nunes, Jerome Brunet
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Martin Blumenstingl, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On 20.02.2022 17:51, Erico Nunes wrote:
> On Mon, Feb 7, 2022 at 11:56 AM Jerome Brunet <jbrunet@baylibre.com> wrote:
>>
>>
>> On Wed 02 Feb 2022 at 21:18, Erico Nunes <nunes.erico@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I've been tracking down an issue with network interfaces from
>>> meson8b-dwmac sometimes not coming up properly at boot.
>>> The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
>>> a group of them as part of a CI test farm that uses nfsroot.
>>>
>>> After hopefully ruling out potential platform/firmware and network
>>> issues I managed to bisect this commit in the kernel to make a big
>>> difference:
>>>
>>>   46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
>>> link config in mac_link_up()
>>>
>>> With a kernel before that commit, I am able to submit hundreds of test
>>> jobs and the boards always start the network interface properly.
>>>
>>> After that commit, around 30% of the jobs start hitting this:
>>>
>>>   [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
>>> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>>>   [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
>>> MEM_TYPE_PAGE_POOL RxQ-0
>>>   [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
>>> Features support found
>>>   [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>>>   [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
>>> phy/rmii link mode
>>>   [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
>>> 100Mbps/Full - flow control off
>>>   [    3.783162] Sending DHCP requests ...... timed out!
>>>   [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
>>>   [   93.685712] IP-Config: Retrying forever (NFS root)...
>>>   [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
>>> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>>>   [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
>>> MEM_TYPE_PAGE_POOL RxQ-0
>>>   [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
>>> Features support found
>>>   [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>>>   [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
>>> phy/rmii link mode
>>>   [   93.807459] random: fast init done
>>>   [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
>>> 100Mbps/Full - flow control off
>>>
>>> This still happens with a kernel from master, currently 5.17-rc2 (less
>>> frequently but still often hit by CI test jobs).
>>> The jobs still usually get to work after restarting the interface a
>>> couple of times, but sometimes it takes 3-4 attempts.
>>>
>>> Here is one example and full dmesg:
>>> https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw
>>>
>>> Note that DHCP does not seem to be an issue here, besides the fact
>>> that the problem only happens since the mentioned commit under the
>>> same setup, I did try to set up the boards to use a static ip but then
>>> the interfaces just don't communicate at all from boot.
>>>
>>> For test purposes I attempted to revert
>>> 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
>>> does not apply trivially anymore, and by trying to revert it manually
>>> I haven't been able to get a working interface.
>>>
>>> Any advice on how to further debug or fix this?
>>
>> Hi Erico,
>>
>> Thanks a lot for digging into this topic.
>> I'm seeing exactly the same behavior on the g12 based khadas-vim3:
>>
>> * Boot stalled waiting for DHCP - with an NFS based filesystem
>> * Every minute, the network driver gets a reset and try again
>>
>> Sometimes it works on the first attempt, sometimes it takes up to 5
>> attempts. Eventually, it reaches the prompt which might be why it went
>> unnoticed so far.
>>
>> I think that NFS just makes the problem easier to see.
>> On devices with an eMMC based filesystem, I noticed that, sometimes, I
>> had unplug/plug the ethernet cable to make it go.
>>
>> So far, the problem is reported on all the Amlogic SoC generation we
>> support. I think a way forward is to ask the the other users of
>> stmmac whether they have this problem or not - adding Allwinner and
>> Rockchip ML.
>>
>> Since the commit you have identified is in the generic part of the
>> stmmac code, Maybe Jose can help us understand what is going on.
> 
> Hi all,
> 
> thanks for the feedback so far, good to know that this is not only on
> my board farm.
> 
> Any more feedback about this from the people in cc?
> 
> Thanks
> 
> Erico

Just to rule out that the PHY may be involved:
- Does the issue occur with internal and/or external PHY?
- Issue still occurs in PHY polling mode? (disable PHY interrupt in dts)

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-02-26 13:53       ` Heiner Kallweit
  0 siblings, 0 replies; 49+ messages in thread
From: Heiner Kallweit @ 2022-02-26 13:53 UTC (permalink / raw)
  To: Erico Nunes, Jerome Brunet
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Martin Blumenstingl, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On 20.02.2022 17:51, Erico Nunes wrote:
> On Mon, Feb 7, 2022 at 11:56 AM Jerome Brunet <jbrunet@baylibre.com> wrote:
>>
>>
>> On Wed 02 Feb 2022 at 21:18, Erico Nunes <nunes.erico@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I've been tracking down an issue with network interfaces from
>>> meson8b-dwmac sometimes not coming up properly at boot.
>>> The target systems are AML-S805X-CC boards (Amlogic S805X SoC), I have
>>> a group of them as part of a CI test farm that uses nfsroot.
>>>
>>> After hopefully ruling out potential platform/firmware and network
>>> issues I managed to bisect this commit in the kernel to make a big
>>> difference:
>>>
>>>   46f69ded988d2311e3be2e4c3898fc0edd7e6c5a net: stmmac: Use resolved
>>> link config in mac_link_up()
>>>
>>> With a kernel before that commit, I am able to submit hundreds of test
>>> jobs and the boards always start the network interface properly.
>>>
>>> After that commit, around 30% of the jobs start hitting this:
>>>
>>>   [    2.178078] meson8b-dwmac c9410000.ethernet eth0: PHY
>>> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>>>   [    2.183505] meson8b-dwmac c9410000.ethernet eth0: Register
>>> MEM_TYPE_PAGE_POOL RxQ-0
>>>   [    2.200784] meson8b-dwmac c9410000.ethernet eth0: No Safety
>>> Features support found
>>>   [    2.202713] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>>>   [    2.209825] meson8b-dwmac c9410000.ethernet eth0: configuring for
>>> phy/rmii link mode
>>>   [    3.762108] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
>>> 100Mbps/Full - flow control off
>>>   [    3.783162] Sending DHCP requests ...... timed out!
>>>   [   93.680402] meson8b-dwmac c9410000.ethernet eth0: Link is Down
>>>   [   93.685712] IP-Config: Retrying forever (NFS root)...
>>>   [   93.756540] meson8b-dwmac c9410000.ethernet eth0: PHY
>>> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
>>>   [   93.763266] meson8b-dwmac c9410000.ethernet eth0: Register
>>> MEM_TYPE_PAGE_POOL RxQ-0
>>>   [   93.779340] meson8b-dwmac c9410000.ethernet eth0: No Safety
>>> Features support found
>>>   [   93.781336] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
>>>   [   93.788088] meson8b-dwmac c9410000.ethernet eth0: configuring for
>>> phy/rmii link mode
>>>   [   93.807459] random: fast init done
>>>   [   95.353076] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
>>> 100Mbps/Full - flow control off
>>>
>>> This still happens with a kernel from master, currently 5.17-rc2 (less
>>> frequently but still often hit by CI test jobs).
>>> The jobs still usually get to work after restarting the interface a
>>> couple of times, but sometimes it takes 3-4 attempts.
>>>
>>> Here is one example and full dmesg:
>>> https://gitlab.freedesktop.org/enunes/mesa/-/jobs/16452399/raw
>>>
>>> Note that DHCP does not seem to be an issue here, besides the fact
>>> that the problem only happens since the mentioned commit under the
>>> same setup, I did try to set up the boards to use a static ip but then
>>> the interfaces just don't communicate at all from boot.
>>>
>>> For test purposes I attempted to revert
>>> 46f69ded988d2311e3be2e4c3898fc0edd7e6c5a on top of master but that
>>> does not apply trivially anymore, and by trying to revert it manually
>>> I haven't been able to get a working interface.
>>>
>>> Any advice on how to further debug or fix this?
>>
>> Hi Erico,
>>
>> Thanks a lot for digging into this topic.
>> I'm seeing exactly the same behavior on the g12 based khadas-vim3:
>>
>> * Boot stalled waiting for DHCP - with an NFS based filesystem
>> * Every minute, the network driver gets a reset and try again
>>
>> Sometimes it works on the first attempt, sometimes it takes up to 5
>> attempts. Eventually, it reaches the prompt which might be why it went
>> unnoticed so far.
>>
>> I think that NFS just makes the problem easier to see.
>> On devices with an eMMC based filesystem, I noticed that, sometimes, I
>> had unplug/plug the ethernet cable to make it go.
>>
>> So far, the problem is reported on all the Amlogic SoC generation we
>> support. I think a way forward is to ask the the other users of
>> stmmac whether they have this problem or not - adding Allwinner and
>> Rockchip ML.
>>
>> Since the commit you have identified is in the generic part of the
>> stmmac code, Maybe Jose can help us understand what is going on.
> 
> Hi all,
> 
> thanks for the feedback so far, good to know that this is not only on
> my board farm.
> 
> Any more feedback about this from the people in cc?
> 
> Thanks
> 
> Erico

Just to rule out that the PHY may be involved:
- Does the issue occur with internal and/or external PHY?
- Issue still occurs in PHY polling mode? (disable PHY interrupt in dts)

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
  2022-02-26 13:53       ` Heiner Kallweit
  (?)
@ 2022-03-02 10:33         ` Erico Nunes
  -1 siblings, 0 replies; 49+ messages in thread
From: Erico Nunes @ 2022-03-02 10:33 UTC (permalink / raw)
  To: Heiner Kallweit, Jerome Brunet
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Martin Blumenstingl, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On Sat, Feb 26, 2022 at 2:53 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> Just to rule out that the PHY may be involved:
> - Does the issue occur with internal and/or external PHY?

My target boards have the internal phy only. It is not possible for me
at the moment to test it with an external phy.

> - Issue still occurs in PHY polling mode? (disable PHY interrupt in dts)

Thanks for suggesting this. I did tests with this and it seems to be a
workaround.
With phy interrupt on recent kernels (around v5.17-rc3) I'm able to
reproduce the issue relatively easily over a batch of a hundred jobs.
With my tests with the phy in polling mode, I have not been able to
reproduce so far, even with several hundred jobs.

For completeness I also tested 46f69ded988d (from my initial analysis)
and setting the phy to polling mode there does not make a difference,
issue still reproduces. So it may have been a different bug. Though I
guess at this point we can disregard that and focus on the current
kernel.

I tried adding a few debugs and delays to the interrupt code path in
drivers/net/phy/meson-gxl.c but nothing gave me useful info so far.

Do you have more advice on how to proceed from here?

Thanks

Erico

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-02 10:33         ` Erico Nunes
  0 siblings, 0 replies; 49+ messages in thread
From: Erico Nunes @ 2022-03-02 10:33 UTC (permalink / raw)
  To: Heiner Kallweit, Jerome Brunet
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Martin Blumenstingl, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On Sat, Feb 26, 2022 at 2:53 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> Just to rule out that the PHY may be involved:
> - Does the issue occur with internal and/or external PHY?

My target boards have the internal phy only. It is not possible for me
at the moment to test it with an external phy.

> - Issue still occurs in PHY polling mode? (disable PHY interrupt in dts)

Thanks for suggesting this. I did tests with this and it seems to be a
workaround.
With phy interrupt on recent kernels (around v5.17-rc3) I'm able to
reproduce the issue relatively easily over a batch of a hundred jobs.
With my tests with the phy in polling mode, I have not been able to
reproduce so far, even with several hundred jobs.

For completeness I also tested 46f69ded988d (from my initial analysis)
and setting the phy to polling mode there does not make a difference,
issue still reproduces. So it may have been a different bug. Though I
guess at this point we can disregard that and focus on the current
kernel.

I tried adding a few debugs and delays to the interrupt code path in
drivers/net/phy/meson-gxl.c but nothing gave me useful info so far.

Do you have more advice on how to proceed from here?

Thanks

Erico

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-02 10:33         ` Erico Nunes
  0 siblings, 0 replies; 49+ messages in thread
From: Erico Nunes @ 2022-03-02 10:33 UTC (permalink / raw)
  To: Heiner Kallweit, Jerome Brunet
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Martin Blumenstingl, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On Sat, Feb 26, 2022 at 2:53 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> Just to rule out that the PHY may be involved:
> - Does the issue occur with internal and/or external PHY?

My target boards have the internal phy only. It is not possible for me
at the moment to test it with an external phy.

> - Issue still occurs in PHY polling mode? (disable PHY interrupt in dts)

Thanks for suggesting this. I did tests with this and it seems to be a
workaround.
With phy interrupt on recent kernels (around v5.17-rc3) I'm able to
reproduce the issue relatively easily over a batch of a hundred jobs.
With my tests with the phy in polling mode, I have not been able to
reproduce so far, even with several hundred jobs.

For completeness I also tested 46f69ded988d (from my initial analysis)
and setting the phy to polling mode there does not make a difference,
issue still reproduces. So it may have been a different bug. Though I
guess at this point we can disregard that and focus on the current
kernel.

I tried adding a few debugs and delays to the interrupt code path in
drivers/net/phy/meson-gxl.c but nothing gave me useful info so far.

Do you have more advice on how to proceed from here?

Thanks

Erico

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
  2022-03-02 10:33         ` Erico Nunes
  (?)
@ 2022-03-02 11:01           ` Heiner Kallweit
  -1 siblings, 0 replies; 49+ messages in thread
From: Heiner Kallweit @ 2022-03-02 11:01 UTC (permalink / raw)
  To: Erico Nunes, Jerome Brunet, Martin Blumenstingl
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On 02.03.2022 11:33, Erico Nunes wrote:
> On Sat, Feb 26, 2022 at 2:53 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>> Just to rule out that the PHY may be involved:
>> - Does the issue occur with internal and/or external PHY?
> 
> My target boards have the internal phy only. It is not possible for me
> at the moment to test it with an external phy.
> 
>> - Issue still occurs in PHY polling mode? (disable PHY interrupt in dts)
> 
> Thanks for suggesting this. I did tests with this and it seems to be a
> workaround.
> With phy interrupt on recent kernels (around v5.17-rc3) I'm able to
> reproduce the issue relatively easily over a batch of a hundred jobs.
> With my tests with the phy in polling mode, I have not been able to
> reproduce so far, even with several hundred jobs.
> 
It's my understanding that in the problem case the "aneg complete"
interrupt fires, but no data flows.
This might indicate a timing issue. According to the meson PHY driver
(I don't have the datasheet) the PHY doesn't have a "link up" interrupt
source, just the mentioned "aneg complete".

Below I send an experimental patch that delays the link up processing
a little and eliminates not needed interrupt sources.
Could you please test it with PHY interrupts enabled?


By the way, to all:
I found that interrupt mode is broken in fixed (aneg disabled) mode,
because link-up isn't signaled. Experiments showed that irq source
bit 7 can be used to fix this, but this bit isn't documented in the
driver.

> For completeness I also tested 46f69ded988d (from my initial analysis)
> and setting the phy to polling mode there does not make a difference,
> issue still reproduces. So it may have been a different bug. Though I
> guess at this point we can disregard that and focus on the current
> kernel.
> 
> I tried adding a few debugs and delays to the interrupt code path in
> drivers/net/phy/meson-gxl.c but nothing gave me useful info so far.
> 
> Do you have more advice on how to proceed from here?
> 
> Thanks
> 
> Erico

Heiner


diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index 7e7904fee..0acb3a99a 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -7,6 +7,7 @@
  * Author: Neil Armstrong <narmstrong@baylibre.com>
  */
 #include <linux/kernel.h>
+#include <linux/delay.h>
 #include <linux/module.h>
 #include <linux/mii.h>
 #include <linux/ethtool.h>
@@ -209,12 +210,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
 		if (ret)
 			return ret;
 
-		val = INTSRC_ANEG_PR
-			| INTSRC_PARALLEL_FAULT
-			| INTSRC_ANEG_LP_ACK
-			| INTSRC_LINK_DOWN
-			| INTSRC_REMOTE_FAULT
-			| INTSRC_ANEG_COMPLETE;
+		val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
 		ret = phy_write(phydev, INTSRC_MASK, val);
 	} else {
 		val = 0;
@@ -240,6 +236,9 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
 	if (irq_status == 0)
 		return IRQ_NONE;
 
+	if (irq_status & INTSRC_ANEG_COMPLETE)
+		msleep(100);
+
 	phy_trigger_machine(phydev);
 
 	return IRQ_HANDLED;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-02 11:01           ` Heiner Kallweit
  0 siblings, 0 replies; 49+ messages in thread
From: Heiner Kallweit @ 2022-03-02 11:01 UTC (permalink / raw)
  To: Erico Nunes, Jerome Brunet, Martin Blumenstingl
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On 02.03.2022 11:33, Erico Nunes wrote:
> On Sat, Feb 26, 2022 at 2:53 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>> Just to rule out that the PHY may be involved:
>> - Does the issue occur with internal and/or external PHY?
> 
> My target boards have the internal phy only. It is not possible for me
> at the moment to test it with an external phy.
> 
>> - Issue still occurs in PHY polling mode? (disable PHY interrupt in dts)
> 
> Thanks for suggesting this. I did tests with this and it seems to be a
> workaround.
> With phy interrupt on recent kernels (around v5.17-rc3) I'm able to
> reproduce the issue relatively easily over a batch of a hundred jobs.
> With my tests with the phy in polling mode, I have not been able to
> reproduce so far, even with several hundred jobs.
> 
It's my understanding that in the problem case the "aneg complete"
interrupt fires, but no data flows.
This might indicate a timing issue. According to the meson PHY driver
(I don't have the datasheet) the PHY doesn't have a "link up" interrupt
source, just the mentioned "aneg complete".

Below I send an experimental patch that delays the link up processing
a little and eliminates not needed interrupt sources.
Could you please test it with PHY interrupts enabled?


By the way, to all:
I found that interrupt mode is broken in fixed (aneg disabled) mode,
because link-up isn't signaled. Experiments showed that irq source
bit 7 can be used to fix this, but this bit isn't documented in the
driver.

> For completeness I also tested 46f69ded988d (from my initial analysis)
> and setting the phy to polling mode there does not make a difference,
> issue still reproduces. So it may have been a different bug. Though I
> guess at this point we can disregard that and focus on the current
> kernel.
> 
> I tried adding a few debugs and delays to the interrupt code path in
> drivers/net/phy/meson-gxl.c but nothing gave me useful info so far.
> 
> Do you have more advice on how to proceed from here?
> 
> Thanks
> 
> Erico

Heiner


diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index 7e7904fee..0acb3a99a 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -7,6 +7,7 @@
  * Author: Neil Armstrong <narmstrong@baylibre.com>
  */
 #include <linux/kernel.h>
+#include <linux/delay.h>
 #include <linux/module.h>
 #include <linux/mii.h>
 #include <linux/ethtool.h>
@@ -209,12 +210,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
 		if (ret)
 			return ret;
 
-		val = INTSRC_ANEG_PR
-			| INTSRC_PARALLEL_FAULT
-			| INTSRC_ANEG_LP_ACK
-			| INTSRC_LINK_DOWN
-			| INTSRC_REMOTE_FAULT
-			| INTSRC_ANEG_COMPLETE;
+		val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
 		ret = phy_write(phydev, INTSRC_MASK, val);
 	} else {
 		val = 0;
@@ -240,6 +236,9 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
 	if (irq_status == 0)
 		return IRQ_NONE;
 
+	if (irq_status & INTSRC_ANEG_COMPLETE)
+		msleep(100);
+
 	phy_trigger_machine(phydev);
 
 	return IRQ_HANDLED;
-- 
2.35.1


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-02 11:01           ` Heiner Kallweit
  0 siblings, 0 replies; 49+ messages in thread
From: Heiner Kallweit @ 2022-03-02 11:01 UTC (permalink / raw)
  To: Erico Nunes, Jerome Brunet, Martin Blumenstingl
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On 02.03.2022 11:33, Erico Nunes wrote:
> On Sat, Feb 26, 2022 at 2:53 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>> Just to rule out that the PHY may be involved:
>> - Does the issue occur with internal and/or external PHY?
> 
> My target boards have the internal phy only. It is not possible for me
> at the moment to test it with an external phy.
> 
>> - Issue still occurs in PHY polling mode? (disable PHY interrupt in dts)
> 
> Thanks for suggesting this. I did tests with this and it seems to be a
> workaround.
> With phy interrupt on recent kernels (around v5.17-rc3) I'm able to
> reproduce the issue relatively easily over a batch of a hundred jobs.
> With my tests with the phy in polling mode, I have not been able to
> reproduce so far, even with several hundred jobs.
> 
It's my understanding that in the problem case the "aneg complete"
interrupt fires, but no data flows.
This might indicate a timing issue. According to the meson PHY driver
(I don't have the datasheet) the PHY doesn't have a "link up" interrupt
source, just the mentioned "aneg complete".

Below I send an experimental patch that delays the link up processing
a little and eliminates not needed interrupt sources.
Could you please test it with PHY interrupts enabled?


By the way, to all:
I found that interrupt mode is broken in fixed (aneg disabled) mode,
because link-up isn't signaled. Experiments showed that irq source
bit 7 can be used to fix this, but this bit isn't documented in the
driver.

> For completeness I also tested 46f69ded988d (from my initial analysis)
> and setting the phy to polling mode there does not make a difference,
> issue still reproduces. So it may have been a different bug. Though I
> guess at this point we can disregard that and focus on the current
> kernel.
> 
> I tried adding a few debugs and delays to the interrupt code path in
> drivers/net/phy/meson-gxl.c but nothing gave me useful info so far.
> 
> Do you have more advice on how to proceed from here?
> 
> Thanks
> 
> Erico

Heiner


diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index 7e7904fee..0acb3a99a 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -7,6 +7,7 @@
  * Author: Neil Armstrong <narmstrong@baylibre.com>
  */
 #include <linux/kernel.h>
+#include <linux/delay.h>
 #include <linux/module.h>
 #include <linux/mii.h>
 #include <linux/ethtool.h>
@@ -209,12 +210,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
 		if (ret)
 			return ret;
 
-		val = INTSRC_ANEG_PR
-			| INTSRC_PARALLEL_FAULT
-			| INTSRC_ANEG_LP_ACK
-			| INTSRC_LINK_DOWN
-			| INTSRC_REMOTE_FAULT
-			| INTSRC_ANEG_COMPLETE;
+		val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
 		ret = phy_write(phydev, INTSRC_MASK, val);
 	} else {
 		val = 0;
@@ -240,6 +236,9 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
 	if (irq_status == 0)
 		return IRQ_NONE;
 
+	if (irq_status & INTSRC_ANEG_COMPLETE)
+		msleep(100);
+
 	phy_trigger_machine(phydev);
 
 	return IRQ_HANDLED;
-- 
2.35.1


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
  2022-03-02 11:01           ` Heiner Kallweit
  (?)
@ 2022-03-02 13:39             ` Jerome Brunet
  -1 siblings, 0 replies; 49+ messages in thread
From: Jerome Brunet @ 2022-03-02 13:39 UTC (permalink / raw)
  To: Heiner Kallweit, Erico Nunes, Martin Blumenstingl
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi


On Wed 02 Mar 2022 at 12:01, Heiner Kallweit <hkallweit1@gmail.com> wrote:

> On 02.03.2022 11:33, Erico Nunes wrote:
>> On Sat, Feb 26, 2022 at 2:53 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>> Just to rule out that the PHY may be involved:
>>> - Does the issue occur with internal and/or external PHY?
>> 
>> My target boards have the internal phy only. It is not possible for me
>> at the moment to test it with an external phy.
>> 
>>> - Issue still occurs in PHY polling mode? (disable PHY interrupt in dts)
>> 
>> Thanks for suggesting this. I did tests with this and it seems to be a
>> workaround.
>> With phy interrupt on recent kernels (around v5.17-rc3) I'm able to
>> reproduce the issue relatively easily over a batch of a hundred jobs.
>> With my tests with the phy in polling mode, I have not been able to
>> reproduce so far, even with several hundred jobs.
>> 
> It's my understanding that in the problem case the "aneg complete"
> interrupt fires, but no data flows.
> This might indicate a timing issue. According to the meson PHY driver
> (I don't have the datasheet) the PHY doesn't have a "link up" interrupt
> source, just the mentioned "aneg complete".
>
> Below I send an experimental patch that delays the link up processing
> a little and eliminates not needed interrupt sources.
> Could you please test it with PHY interrupts enabled?
>
>
> By the way, to all:
> I found that interrupt mode is broken in fixed (aneg disabled) mode,
> because link-up isn't signaled. Experiments showed that irq source
> bit 7 can be used to fix this, but this bit isn't documented in the
> driver.
>
>> For completeness I also tested 46f69ded988d (from my initial analysis)
>> and setting the phy to polling mode there does not make a difference,
>> issue still reproduces. So it may have been a different bug. Though I
>> guess at this point we can disregard that and focus on the current
>> kernel.
>> 
>> I tried adding a few debugs and delays to the interrupt code path in
>> drivers/net/phy/meson-gxl.c but nothing gave me useful info so far.
>> 
>> Do you have more advice on how to proceed from here?
>> 
>> Thanks
>> 
>> Erico
>
> Heiner

Hi,

I also did some tests on my side as well. Mostly with v5.10.93 ATM
It is true that I can recall seeing this issue only on boards using the
internal PHY (g12 and gxl board for me - I don't have meson8b boards)

I tried on the u200 (g12 based). Being the ref design it has both
the internal and external interfaces and I can choose.

To my surprise, I could not reproduce the issue on it with the internal
PHY ... until I noticed that eMMC was initialising more or less at the
same time as the network.

I disabled the eMMC, out of curiosity, and the issue was back.
Like Heiner, I suspect a timing issue - at this stage, I can't tell if it
is PHY related though.

I also tried with the external phy, could not reproduce. Unfortunately,
as we can see from the first test on the u200, not reproducing is not
really a proof and it difficult to conclude.

Like Erico, I tried bisecting but I ended up on a BT merge ... Clearly
inconclusive :(

Disabling the IRQ is an interesting test but, on my side, I have mixed
results (on the libretech-cc this time):

* I first tried quickly while bisecting, on commit
  5.6.0-rc3-01434-g8d4ccd7770e7:
  - With IRQ => NOK
  - POLL => NOK

Seeing Erico's report, I thought maybe I mixed things up so I tried again,
doubled checked IRQ were disabled ... still broken. There was another
commit I reproduce it without IRQ but I lost it.

* I also tried on v5.10.93:
  - With IRQ => NOK
  - POLL => OK ... (well, I got bored before the issue showed up)

It seems that switching to polling, in some case, changes the timings
just enough to hide the issue ... but not always. Unless I forgot to
consider something else ?? Ideas ?

If I understand the proposed patch correctly, it is mostly about the phy
IRQ. Since I reproduce without the IRQ, I suppose it is not the
problem we where looking for (might still be a problem worth fixing -
the phy is not "rock-solid" when it comes to aneg - I already tried
stabilising it a few years ago)

TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
sense :/

>
>
> diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
> index 7e7904fee..0acb3a99a 100644
> --- a/drivers/net/phy/meson-gxl.c
> +++ b/drivers/net/phy/meson-gxl.c
> @@ -7,6 +7,7 @@
>   * Author: Neil Armstrong <narmstrong@baylibre.com>
>   */
>  #include <linux/kernel.h>
> +#include <linux/delay.h>
>  #include <linux/module.h>
>  #include <linux/mii.h>
>  #include <linux/ethtool.h>
> @@ -209,12 +210,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
>  		if (ret)
>  			return ret;
>  
> -		val = INTSRC_ANEG_PR
> -			| INTSRC_PARALLEL_FAULT
> -			| INTSRC_ANEG_LP_ACK
> -			| INTSRC_LINK_DOWN
> -			| INTSRC_REMOTE_FAULT
> -			| INTSRC_ANEG_COMPLETE;
> +		val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
>  		ret = phy_write(phydev, INTSRC_MASK, val);
>  	} else {
>  		val = 0;
> @@ -240,6 +236,9 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
>  	if (irq_status == 0)
>  		return IRQ_NONE;
>  
> +	if (irq_status & INTSRC_ANEG_COMPLETE)
> +		msleep(100);
> +
>  	phy_trigger_machine(phydev);
>  
>  	return IRQ_HANDLED;


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-02 13:39             ` Jerome Brunet
  0 siblings, 0 replies; 49+ messages in thread
From: Jerome Brunet @ 2022-03-02 13:39 UTC (permalink / raw)
  To: Heiner Kallweit, Erico Nunes, Martin Blumenstingl
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi


On Wed 02 Mar 2022 at 12:01, Heiner Kallweit <hkallweit1@gmail.com> wrote:

> On 02.03.2022 11:33, Erico Nunes wrote:
>> On Sat, Feb 26, 2022 at 2:53 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>> Just to rule out that the PHY may be involved:
>>> - Does the issue occur with internal and/or external PHY?
>> 
>> My target boards have the internal phy only. It is not possible for me
>> at the moment to test it with an external phy.
>> 
>>> - Issue still occurs in PHY polling mode? (disable PHY interrupt in dts)
>> 
>> Thanks for suggesting this. I did tests with this and it seems to be a
>> workaround.
>> With phy interrupt on recent kernels (around v5.17-rc3) I'm able to
>> reproduce the issue relatively easily over a batch of a hundred jobs.
>> With my tests with the phy in polling mode, I have not been able to
>> reproduce so far, even with several hundred jobs.
>> 
> It's my understanding that in the problem case the "aneg complete"
> interrupt fires, but no data flows.
> This might indicate a timing issue. According to the meson PHY driver
> (I don't have the datasheet) the PHY doesn't have a "link up" interrupt
> source, just the mentioned "aneg complete".
>
> Below I send an experimental patch that delays the link up processing
> a little and eliminates not needed interrupt sources.
> Could you please test it with PHY interrupts enabled?
>
>
> By the way, to all:
> I found that interrupt mode is broken in fixed (aneg disabled) mode,
> because link-up isn't signaled. Experiments showed that irq source
> bit 7 can be used to fix this, but this bit isn't documented in the
> driver.
>
>> For completeness I also tested 46f69ded988d (from my initial analysis)
>> and setting the phy to polling mode there does not make a difference,
>> issue still reproduces. So it may have been a different bug. Though I
>> guess at this point we can disregard that and focus on the current
>> kernel.
>> 
>> I tried adding a few debugs and delays to the interrupt code path in
>> drivers/net/phy/meson-gxl.c but nothing gave me useful info so far.
>> 
>> Do you have more advice on how to proceed from here?
>> 
>> Thanks
>> 
>> Erico
>
> Heiner

Hi,

I also did some tests on my side as well. Mostly with v5.10.93 ATM
It is true that I can recall seeing this issue only on boards using the
internal PHY (g12 and gxl board for me - I don't have meson8b boards)

I tried on the u200 (g12 based). Being the ref design it has both
the internal and external interfaces and I can choose.

To my surprise, I could not reproduce the issue on it with the internal
PHY ... until I noticed that eMMC was initialising more or less at the
same time as the network.

I disabled the eMMC, out of curiosity, and the issue was back.
Like Heiner, I suspect a timing issue - at this stage, I can't tell if it
is PHY related though.

I also tried with the external phy, could not reproduce. Unfortunately,
as we can see from the first test on the u200, not reproducing is not
really a proof and it difficult to conclude.

Like Erico, I tried bisecting but I ended up on a BT merge ... Clearly
inconclusive :(

Disabling the IRQ is an interesting test but, on my side, I have mixed
results (on the libretech-cc this time):

* I first tried quickly while bisecting, on commit
  5.6.0-rc3-01434-g8d4ccd7770e7:
  - With IRQ => NOK
  - POLL => NOK

Seeing Erico's report, I thought maybe I mixed things up so I tried again,
doubled checked IRQ were disabled ... still broken. There was another
commit I reproduce it without IRQ but I lost it.

* I also tried on v5.10.93:
  - With IRQ => NOK
  - POLL => OK ... (well, I got bored before the issue showed up)

It seems that switching to polling, in some case, changes the timings
just enough to hide the issue ... but not always. Unless I forgot to
consider something else ?? Ideas ?

If I understand the proposed patch correctly, it is mostly about the phy
IRQ. Since I reproduce without the IRQ, I suppose it is not the
problem we where looking for (might still be a problem worth fixing -
the phy is not "rock-solid" when it comes to aneg - I already tried
stabilising it a few years ago)

TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
sense :/

>
>
> diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
> index 7e7904fee..0acb3a99a 100644
> --- a/drivers/net/phy/meson-gxl.c
> +++ b/drivers/net/phy/meson-gxl.c
> @@ -7,6 +7,7 @@
>   * Author: Neil Armstrong <narmstrong@baylibre.com>
>   */
>  #include <linux/kernel.h>
> +#include <linux/delay.h>
>  #include <linux/module.h>
>  #include <linux/mii.h>
>  #include <linux/ethtool.h>
> @@ -209,12 +210,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
>  		if (ret)
>  			return ret;
>  
> -		val = INTSRC_ANEG_PR
> -			| INTSRC_PARALLEL_FAULT
> -			| INTSRC_ANEG_LP_ACK
> -			| INTSRC_LINK_DOWN
> -			| INTSRC_REMOTE_FAULT
> -			| INTSRC_ANEG_COMPLETE;
> +		val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
>  		ret = phy_write(phydev, INTSRC_MASK, val);
>  	} else {
>  		val = 0;
> @@ -240,6 +236,9 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
>  	if (irq_status == 0)
>  		return IRQ_NONE;
>  
> +	if (irq_status & INTSRC_ANEG_COMPLETE)
> +		msleep(100);
> +
>  	phy_trigger_machine(phydev);
>  
>  	return IRQ_HANDLED;


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-02 13:39             ` Jerome Brunet
  0 siblings, 0 replies; 49+ messages in thread
From: Jerome Brunet @ 2022-03-02 13:39 UTC (permalink / raw)
  To: Heiner Kallweit, Erico Nunes, Martin Blumenstingl
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi


On Wed 02 Mar 2022 at 12:01, Heiner Kallweit <hkallweit1@gmail.com> wrote:

> On 02.03.2022 11:33, Erico Nunes wrote:
>> On Sat, Feb 26, 2022 at 2:53 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>> Just to rule out that the PHY may be involved:
>>> - Does the issue occur with internal and/or external PHY?
>> 
>> My target boards have the internal phy only. It is not possible for me
>> at the moment to test it with an external phy.
>> 
>>> - Issue still occurs in PHY polling mode? (disable PHY interrupt in dts)
>> 
>> Thanks for suggesting this. I did tests with this and it seems to be a
>> workaround.
>> With phy interrupt on recent kernels (around v5.17-rc3) I'm able to
>> reproduce the issue relatively easily over a batch of a hundred jobs.
>> With my tests with the phy in polling mode, I have not been able to
>> reproduce so far, even with several hundred jobs.
>> 
> It's my understanding that in the problem case the "aneg complete"
> interrupt fires, but no data flows.
> This might indicate a timing issue. According to the meson PHY driver
> (I don't have the datasheet) the PHY doesn't have a "link up" interrupt
> source, just the mentioned "aneg complete".
>
> Below I send an experimental patch that delays the link up processing
> a little and eliminates not needed interrupt sources.
> Could you please test it with PHY interrupts enabled?
>
>
> By the way, to all:
> I found that interrupt mode is broken in fixed (aneg disabled) mode,
> because link-up isn't signaled. Experiments showed that irq source
> bit 7 can be used to fix this, but this bit isn't documented in the
> driver.
>
>> For completeness I also tested 46f69ded988d (from my initial analysis)
>> and setting the phy to polling mode there does not make a difference,
>> issue still reproduces. So it may have been a different bug. Though I
>> guess at this point we can disregard that and focus on the current
>> kernel.
>> 
>> I tried adding a few debugs and delays to the interrupt code path in
>> drivers/net/phy/meson-gxl.c but nothing gave me useful info so far.
>> 
>> Do you have more advice on how to proceed from here?
>> 
>> Thanks
>> 
>> Erico
>
> Heiner

Hi,

I also did some tests on my side as well. Mostly with v5.10.93 ATM
It is true that I can recall seeing this issue only on boards using the
internal PHY (g12 and gxl board for me - I don't have meson8b boards)

I tried on the u200 (g12 based). Being the ref design it has both
the internal and external interfaces and I can choose.

To my surprise, I could not reproduce the issue on it with the internal
PHY ... until I noticed that eMMC was initialising more or less at the
same time as the network.

I disabled the eMMC, out of curiosity, and the issue was back.
Like Heiner, I suspect a timing issue - at this stage, I can't tell if it
is PHY related though.

I also tried with the external phy, could not reproduce. Unfortunately,
as we can see from the first test on the u200, not reproducing is not
really a proof and it difficult to conclude.

Like Erico, I tried bisecting but I ended up on a BT merge ... Clearly
inconclusive :(

Disabling the IRQ is an interesting test but, on my side, I have mixed
results (on the libretech-cc this time):

* I first tried quickly while bisecting, on commit
  5.6.0-rc3-01434-g8d4ccd7770e7:
  - With IRQ => NOK
  - POLL => NOK

Seeing Erico's report, I thought maybe I mixed things up so I tried again,
doubled checked IRQ were disabled ... still broken. There was another
commit I reproduce it without IRQ but I lost it.

* I also tried on v5.10.93:
  - With IRQ => NOK
  - POLL => OK ... (well, I got bored before the issue showed up)

It seems that switching to polling, in some case, changes the timings
just enough to hide the issue ... but not always. Unless I forgot to
consider something else ?? Ideas ?

If I understand the proposed patch correctly, it is mostly about the phy
IRQ. Since I reproduce without the IRQ, I suppose it is not the
problem we where looking for (might still be a problem worth fixing -
the phy is not "rock-solid" when it comes to aneg - I already tried
stabilising it a few years ago)

TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
sense :/

>
>
> diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
> index 7e7904fee..0acb3a99a 100644
> --- a/drivers/net/phy/meson-gxl.c
> +++ b/drivers/net/phy/meson-gxl.c
> @@ -7,6 +7,7 @@
>   * Author: Neil Armstrong <narmstrong@baylibre.com>
>   */
>  #include <linux/kernel.h>
> +#include <linux/delay.h>
>  #include <linux/module.h>
>  #include <linux/mii.h>
>  #include <linux/ethtool.h>
> @@ -209,12 +210,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
>  		if (ret)
>  			return ret;
>  
> -		val = INTSRC_ANEG_PR
> -			| INTSRC_PARALLEL_FAULT
> -			| INTSRC_ANEG_LP_ACK
> -			| INTSRC_LINK_DOWN
> -			| INTSRC_REMOTE_FAULT
> -			| INTSRC_ANEG_COMPLETE;
> +		val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
>  		ret = phy_write(phydev, INTSRC_MASK, val);
>  	} else {
>  		val = 0;
> @@ -240,6 +236,9 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
>  	if (irq_status == 0)
>  		return IRQ_NONE;
>  
> +	if (irq_status & INTSRC_ANEG_COMPLETE)
> +		msleep(100);
> +
>  	phy_trigger_machine(phydev);
>  
>  	return IRQ_HANDLED;


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
  2022-03-02 13:39             ` Jerome Brunet
  (?)
@ 2022-03-02 16:34               ` Heiner Kallweit
  -1 siblings, 0 replies; 49+ messages in thread
From: Heiner Kallweit @ 2022-03-02 16:34 UTC (permalink / raw)
  To: Jerome Brunet, Erico Nunes, Martin Blumenstingl
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On 02.03.2022 14:39, Jerome Brunet wrote:
> 
> On Wed 02 Mar 2022 at 12:01, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
>> On 02.03.2022 11:33, Erico Nunes wrote:
>>> On Sat, Feb 26, 2022 at 2:53 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>> Just to rule out that the PHY may be involved:
>>>> - Does the issue occur with internal and/or external PHY?
>>>
>>> My target boards have the internal phy only. It is not possible for me
>>> at the moment to test it with an external phy.
>>>
>>>> - Issue still occurs in PHY polling mode? (disable PHY interrupt in dts)
>>>
>>> Thanks for suggesting this. I did tests with this and it seems to be a
>>> workaround.
>>> With phy interrupt on recent kernels (around v5.17-rc3) I'm able to
>>> reproduce the issue relatively easily over a batch of a hundred jobs.
>>> With my tests with the phy in polling mode, I have not been able to
>>> reproduce so far, even with several hundred jobs.
>>>
>> It's my understanding that in the problem case the "aneg complete"
>> interrupt fires, but no data flows.
>> This might indicate a timing issue. According to the meson PHY driver
>> (I don't have the datasheet) the PHY doesn't have a "link up" interrupt
>> source, just the mentioned "aneg complete".
>>
>> Below I send an experimental patch that delays the link up processing
>> a little and eliminates not needed interrupt sources.
>> Could you please test it with PHY interrupts enabled?
>>
>>
>> By the way, to all:
>> I found that interrupt mode is broken in fixed (aneg disabled) mode,
>> because link-up isn't signaled. Experiments showed that irq source
>> bit 7 can be used to fix this, but this bit isn't documented in the
>> driver.
>>
>>> For completeness I also tested 46f69ded988d (from my initial analysis)
>>> and setting the phy to polling mode there does not make a difference,
>>> issue still reproduces. So it may have been a different bug. Though I
>>> guess at this point we can disregard that and focus on the current
>>> kernel.
>>>
>>> I tried adding a few debugs and delays to the interrupt code path in
>>> drivers/net/phy/meson-gxl.c but nothing gave me useful info so far.
>>>
>>> Do you have more advice on how to proceed from here?
>>>
>>> Thanks
>>>
>>> Erico
>>
>> Heiner
> 
> Hi,
> 
> I also did some tests on my side as well. Mostly with v5.10.93 ATM
> It is true that I can recall seeing this issue only on boards using the
> internal PHY (g12 and gxl board for me - I don't have meson8b boards)
> 
> I tried on the u200 (g12 based). Being the ref design it has both
> the internal and external interfaces and I can choose.
> 
> To my surprise, I could not reproduce the issue on it with the internal
> PHY ... until I noticed that eMMC was initialising more or less at the
> same time as the network.
> 
> I disabled the eMMC, out of curiosity, and the issue was back.
> Like Heiner, I suspect a timing issue - at this stage, I can't tell if it
> is PHY related though.
> 
> I also tried with the external phy, could not reproduce. Unfortunately,
> as we can see from the first test on the u200, not reproducing is not
> really a proof and it difficult to conclude.
> 
> Like Erico, I tried bisecting but I ended up on a BT merge ... Clearly
> inconclusive :(
> 
> Disabling the IRQ is an interesting test but, on my side, I have mixed
> results (on the libretech-cc this time):
> 
> * I first tried quickly while bisecting, on commit
>   5.6.0-rc3-01434-g8d4ccd7770e7:
>   - With IRQ => NOK
>   - POLL => NOK
> 
> Seeing Erico's report, I thought maybe I mixed things up so I tried again,
> doubled checked IRQ were disabled ... still broken. There was another
> commit I reproduce it without IRQ but I lost it.
> 
> * I also tried on v5.10.93:
>   - With IRQ => NOK
>   - POLL => OK ... (well, I got bored before the issue showed up)
> 
> It seems that switching to polling, in some case, changes the timings
> just enough to hide the issue ... but not always. Unless I forgot to
> consider something else ?? Ideas ?
> 
When using polling the time difference between aneg complete and
PHY state machine run is random in the interval 0 .. 1s.
Hence there's a certain chance that the difference is too small
to avoid the issue.

> If I understand the proposed patch correctly, it is mostly about the phy
> IRQ. Since I reproduce without the IRQ, I suppose it is not the
> problem we where looking for (might still be a problem worth fixing -
> the phy is not "rock-solid" when it comes to aneg - I already tried
> stabilising it a few years ago)

Below is a slightly improved version of the test patch. It doesn't sleep
in the (threaded) interrupt handler and lets the workqueue do it.

Maybe Amlogic is aware of a potentially related silicon issue?

> 
> TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
> sense :/
> 
>>
[...]
> 


diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index 7e7904fee..a3318ae01 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -209,12 +209,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
 		if (ret)
 			return ret;
 
-		val = INTSRC_ANEG_PR
-			| INTSRC_PARALLEL_FAULT
-			| INTSRC_ANEG_LP_ACK
-			| INTSRC_LINK_DOWN
-			| INTSRC_REMOTE_FAULT
-			| INTSRC_ANEG_COMPLETE;
+		val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
 		ret = phy_write(phydev, INTSRC_MASK, val);
 	} else {
 		val = 0;
@@ -240,7 +235,10 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
 	if (irq_status == 0)
 		return IRQ_NONE;
 
-	phy_trigger_machine(phydev);
+	if (irq_status & INTSRC_ANEG_COMPLETE)
+		phy_queue_state_machine(phydev, msecs_to_jiffies(100));
+	else
+		phy_trigger_machine(phydev);
 
 	return IRQ_HANDLED;
 }
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-02 16:34               ` Heiner Kallweit
  0 siblings, 0 replies; 49+ messages in thread
From: Heiner Kallweit @ 2022-03-02 16:34 UTC (permalink / raw)
  To: Jerome Brunet, Erico Nunes, Martin Blumenstingl
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On 02.03.2022 14:39, Jerome Brunet wrote:
> 
> On Wed 02 Mar 2022 at 12:01, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
>> On 02.03.2022 11:33, Erico Nunes wrote:
>>> On Sat, Feb 26, 2022 at 2:53 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>> Just to rule out that the PHY may be involved:
>>>> - Does the issue occur with internal and/or external PHY?
>>>
>>> My target boards have the internal phy only. It is not possible for me
>>> at the moment to test it with an external phy.
>>>
>>>> - Issue still occurs in PHY polling mode? (disable PHY interrupt in dts)
>>>
>>> Thanks for suggesting this. I did tests with this and it seems to be a
>>> workaround.
>>> With phy interrupt on recent kernels (around v5.17-rc3) I'm able to
>>> reproduce the issue relatively easily over a batch of a hundred jobs.
>>> With my tests with the phy in polling mode, I have not been able to
>>> reproduce so far, even with several hundred jobs.
>>>
>> It's my understanding that in the problem case the "aneg complete"
>> interrupt fires, but no data flows.
>> This might indicate a timing issue. According to the meson PHY driver
>> (I don't have the datasheet) the PHY doesn't have a "link up" interrupt
>> source, just the mentioned "aneg complete".
>>
>> Below I send an experimental patch that delays the link up processing
>> a little and eliminates not needed interrupt sources.
>> Could you please test it with PHY interrupts enabled?
>>
>>
>> By the way, to all:
>> I found that interrupt mode is broken in fixed (aneg disabled) mode,
>> because link-up isn't signaled. Experiments showed that irq source
>> bit 7 can be used to fix this, but this bit isn't documented in the
>> driver.
>>
>>> For completeness I also tested 46f69ded988d (from my initial analysis)
>>> and setting the phy to polling mode there does not make a difference,
>>> issue still reproduces. So it may have been a different bug. Though I
>>> guess at this point we can disregard that and focus on the current
>>> kernel.
>>>
>>> I tried adding a few debugs and delays to the interrupt code path in
>>> drivers/net/phy/meson-gxl.c but nothing gave me useful info so far.
>>>
>>> Do you have more advice on how to proceed from here?
>>>
>>> Thanks
>>>
>>> Erico
>>
>> Heiner
> 
> Hi,
> 
> I also did some tests on my side as well. Mostly with v5.10.93 ATM
> It is true that I can recall seeing this issue only on boards using the
> internal PHY (g12 and gxl board for me - I don't have meson8b boards)
> 
> I tried on the u200 (g12 based). Being the ref design it has both
> the internal and external interfaces and I can choose.
> 
> To my surprise, I could not reproduce the issue on it with the internal
> PHY ... until I noticed that eMMC was initialising more or less at the
> same time as the network.
> 
> I disabled the eMMC, out of curiosity, and the issue was back.
> Like Heiner, I suspect a timing issue - at this stage, I can't tell if it
> is PHY related though.
> 
> I also tried with the external phy, could not reproduce. Unfortunately,
> as we can see from the first test on the u200, not reproducing is not
> really a proof and it difficult to conclude.
> 
> Like Erico, I tried bisecting but I ended up on a BT merge ... Clearly
> inconclusive :(
> 
> Disabling the IRQ is an interesting test but, on my side, I have mixed
> results (on the libretech-cc this time):
> 
> * I first tried quickly while bisecting, on commit
>   5.6.0-rc3-01434-g8d4ccd7770e7:
>   - With IRQ => NOK
>   - POLL => NOK
> 
> Seeing Erico's report, I thought maybe I mixed things up so I tried again,
> doubled checked IRQ were disabled ... still broken. There was another
> commit I reproduce it without IRQ but I lost it.
> 
> * I also tried on v5.10.93:
>   - With IRQ => NOK
>   - POLL => OK ... (well, I got bored before the issue showed up)
> 
> It seems that switching to polling, in some case, changes the timings
> just enough to hide the issue ... but not always. Unless I forgot to
> consider something else ?? Ideas ?
> 
When using polling the time difference between aneg complete and
PHY state machine run is random in the interval 0 .. 1s.
Hence there's a certain chance that the difference is too small
to avoid the issue.

> If I understand the proposed patch correctly, it is mostly about the phy
> IRQ. Since I reproduce without the IRQ, I suppose it is not the
> problem we where looking for (might still be a problem worth fixing -
> the phy is not "rock-solid" when it comes to aneg - I already tried
> stabilising it a few years ago)

Below is a slightly improved version of the test patch. It doesn't sleep
in the (threaded) interrupt handler and lets the workqueue do it.

Maybe Amlogic is aware of a potentially related silicon issue?

> 
> TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
> sense :/
> 
>>
[...]
> 


diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index 7e7904fee..a3318ae01 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -209,12 +209,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
 		if (ret)
 			return ret;
 
-		val = INTSRC_ANEG_PR
-			| INTSRC_PARALLEL_FAULT
-			| INTSRC_ANEG_LP_ACK
-			| INTSRC_LINK_DOWN
-			| INTSRC_REMOTE_FAULT
-			| INTSRC_ANEG_COMPLETE;
+		val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
 		ret = phy_write(phydev, INTSRC_MASK, val);
 	} else {
 		val = 0;
@@ -240,7 +235,10 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
 	if (irq_status == 0)
 		return IRQ_NONE;
 
-	phy_trigger_machine(phydev);
+	if (irq_status & INTSRC_ANEG_COMPLETE)
+		phy_queue_state_machine(phydev, msecs_to_jiffies(100));
+	else
+		phy_trigger_machine(phydev);
 
 	return IRQ_HANDLED;
 }
-- 
2.35.1



_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-02 16:34               ` Heiner Kallweit
  0 siblings, 0 replies; 49+ messages in thread
From: Heiner Kallweit @ 2022-03-02 16:34 UTC (permalink / raw)
  To: Jerome Brunet, Erico Nunes, Martin Blumenstingl
  Cc: Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On 02.03.2022 14:39, Jerome Brunet wrote:
> 
> On Wed 02 Mar 2022 at 12:01, Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
>> On 02.03.2022 11:33, Erico Nunes wrote:
>>> On Sat, Feb 26, 2022 at 2:53 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>> Just to rule out that the PHY may be involved:
>>>> - Does the issue occur with internal and/or external PHY?
>>>
>>> My target boards have the internal phy only. It is not possible for me
>>> at the moment to test it with an external phy.
>>>
>>>> - Issue still occurs in PHY polling mode? (disable PHY interrupt in dts)
>>>
>>> Thanks for suggesting this. I did tests with this and it seems to be a
>>> workaround.
>>> With phy interrupt on recent kernels (around v5.17-rc3) I'm able to
>>> reproduce the issue relatively easily over a batch of a hundred jobs.
>>> With my tests with the phy in polling mode, I have not been able to
>>> reproduce so far, even with several hundred jobs.
>>>
>> It's my understanding that in the problem case the "aneg complete"
>> interrupt fires, but no data flows.
>> This might indicate a timing issue. According to the meson PHY driver
>> (I don't have the datasheet) the PHY doesn't have a "link up" interrupt
>> source, just the mentioned "aneg complete".
>>
>> Below I send an experimental patch that delays the link up processing
>> a little and eliminates not needed interrupt sources.
>> Could you please test it with PHY interrupts enabled?
>>
>>
>> By the way, to all:
>> I found that interrupt mode is broken in fixed (aneg disabled) mode,
>> because link-up isn't signaled. Experiments showed that irq source
>> bit 7 can be used to fix this, but this bit isn't documented in the
>> driver.
>>
>>> For completeness I also tested 46f69ded988d (from my initial analysis)
>>> and setting the phy to polling mode there does not make a difference,
>>> issue still reproduces. So it may have been a different bug. Though I
>>> guess at this point we can disregard that and focus on the current
>>> kernel.
>>>
>>> I tried adding a few debugs and delays to the interrupt code path in
>>> drivers/net/phy/meson-gxl.c but nothing gave me useful info so far.
>>>
>>> Do you have more advice on how to proceed from here?
>>>
>>> Thanks
>>>
>>> Erico
>>
>> Heiner
> 
> Hi,
> 
> I also did some tests on my side as well. Mostly with v5.10.93 ATM
> It is true that I can recall seeing this issue only on boards using the
> internal PHY (g12 and gxl board for me - I don't have meson8b boards)
> 
> I tried on the u200 (g12 based). Being the ref design it has both
> the internal and external interfaces and I can choose.
> 
> To my surprise, I could not reproduce the issue on it with the internal
> PHY ... until I noticed that eMMC was initialising more or less at the
> same time as the network.
> 
> I disabled the eMMC, out of curiosity, and the issue was back.
> Like Heiner, I suspect a timing issue - at this stage, I can't tell if it
> is PHY related though.
> 
> I also tried with the external phy, could not reproduce. Unfortunately,
> as we can see from the first test on the u200, not reproducing is not
> really a proof and it difficult to conclude.
> 
> Like Erico, I tried bisecting but I ended up on a BT merge ... Clearly
> inconclusive :(
> 
> Disabling the IRQ is an interesting test but, on my side, I have mixed
> results (on the libretech-cc this time):
> 
> * I first tried quickly while bisecting, on commit
>   5.6.0-rc3-01434-g8d4ccd7770e7:
>   - With IRQ => NOK
>   - POLL => NOK
> 
> Seeing Erico's report, I thought maybe I mixed things up so I tried again,
> doubled checked IRQ were disabled ... still broken. There was another
> commit I reproduce it without IRQ but I lost it.
> 
> * I also tried on v5.10.93:
>   - With IRQ => NOK
>   - POLL => OK ... (well, I got bored before the issue showed up)
> 
> It seems that switching to polling, in some case, changes the timings
> just enough to hide the issue ... but not always. Unless I forgot to
> consider something else ?? Ideas ?
> 
When using polling the time difference between aneg complete and
PHY state machine run is random in the interval 0 .. 1s.
Hence there's a certain chance that the difference is too small
to avoid the issue.

> If I understand the proposed patch correctly, it is mostly about the phy
> IRQ. Since I reproduce without the IRQ, I suppose it is not the
> problem we where looking for (might still be a problem worth fixing -
> the phy is not "rock-solid" when it comes to aneg - I already tried
> stabilising it a few years ago)

Below is a slightly improved version of the test patch. It doesn't sleep
in the (threaded) interrupt handler and lets the workqueue do it.

Maybe Amlogic is aware of a potentially related silicon issue?

> 
> TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
> sense :/
> 
>>
[...]
> 


diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index 7e7904fee..a3318ae01 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -209,12 +209,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
 		if (ret)
 			return ret;
 
-		val = INTSRC_ANEG_PR
-			| INTSRC_PARALLEL_FAULT
-			| INTSRC_ANEG_LP_ACK
-			| INTSRC_LINK_DOWN
-			| INTSRC_REMOTE_FAULT
-			| INTSRC_ANEG_COMPLETE;
+		val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
 		ret = phy_write(phydev, INTSRC_MASK, val);
 	} else {
 		val = 0;
@@ -240,7 +235,10 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
 	if (irq_status == 0)
 		return IRQ_NONE;
 
-	phy_trigger_machine(phydev);
+	if (irq_status & INTSRC_ANEG_COMPLETE)
+		phy_queue_state_machine(phydev, msecs_to_jiffies(100));
+	else
+		phy_trigger_machine(phydev);
 
 	return IRQ_HANDLED;
 }
-- 
2.35.1



_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
  2022-03-02 16:34               ` Heiner Kallweit
  (?)
@ 2022-03-06  9:40                 ` Erico Nunes
  -1 siblings, 0 replies; 49+ messages in thread
From: Erico Nunes @ 2022-03-06  9:40 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Jerome Brunet, Martin Blumenstingl, Alexandre Torgue,
	Giuseppe Cavallaro, Jose Abreu, Kevin Hilman, Neil Armstrong,
	linux-amlogic, netdev, open list:ARM/Rockchip SoC...,
	linux-sunxi

On Wed, Mar 2, 2022 at 5:35 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> When using polling the time difference between aneg complete and
> PHY state machine run is random in the interval 0 .. 1s.
> Hence there's a certain chance that the difference is too small
> to avoid the issue.
>
> > If I understand the proposed patch correctly, it is mostly about the phy
> > IRQ. Since I reproduce without the IRQ, I suppose it is not the
> > problem we where looking for (might still be a problem worth fixing -
> > the phy is not "rock-solid" when it comes to aneg - I already tried
> > stabilising it a few years ago)
>
> Below is a slightly improved version of the test patch. It doesn't sleep
> in the (threaded) interrupt handler and lets the workqueue do it.
>
> Maybe Amlogic is aware of a potentially related silicon issue?
>
> >
> > TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
> > sense :/
> >
> >>
> [...]
> >
>
>
> diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
> index 7e7904fee..a3318ae01 100644
> --- a/drivers/net/phy/meson-gxl.c
> +++ b/drivers/net/phy/meson-gxl.c
> @@ -209,12 +209,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
>                 if (ret)
>                         return ret;
>
> -               val = INTSRC_ANEG_PR
> -                       | INTSRC_PARALLEL_FAULT
> -                       | INTSRC_ANEG_LP_ACK
> -                       | INTSRC_LINK_DOWN
> -                       | INTSRC_REMOTE_FAULT
> -                       | INTSRC_ANEG_COMPLETE;
> +               val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
>                 ret = phy_write(phydev, INTSRC_MASK, val);
>         } else {
>                 val = 0;
> @@ -240,7 +235,10 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
>         if (irq_status == 0)
>                 return IRQ_NONE;
>
> -       phy_trigger_machine(phydev);
> +       if (irq_status & INTSRC_ANEG_COMPLETE)
> +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
> +       else
> +               phy_trigger_machine(phydev);
>
>         return IRQ_HANDLED;
>  }
> --
> 2.35.1

I did a lot of testing with this patch, and it seems to improve things.
To me it completely resolves the original issue which was more easily
reproducible where I would see "Link is Up" but the interface did not
really work.
At least in over a thousand jobs, that never reproduced again with this patch.

I do see a different issue now, but it is even less frequent and
harder to reproduce. In those over a thousand jobs, I have seen it
only about 4 times.
The difference is that now when the issue happens, the link is not
even reported as Up. The output is a bit different than the original
one, but it is consistently the same output in all instances where it
reproduced. Looks like this (note that there is no longer Link is
Down/Link is Up):

[    2.186151] meson8b-dwmac c9410000.ethernet eth0: PHY
[0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
[    2.191582] meson8b-dwmac c9410000.ethernet eth0: Register
MEM_TYPE_PAGE_POOL RxQ-0
[    2.208713] meson8b-dwmac c9410000.ethernet eth0: No Safety
Features support found
[    2.210673] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
[    2.218083] meson8b-dwmac c9410000.ethernet eth0: configuring for
phy/rmii link mode
[   22.227444] Waiting up to 100 more seconds for network.
[   42.231440] Waiting up to 80 more seconds for network.
[   62.235437] Waiting up to 60 more seconds for network.
[   82.239437] Waiting up to 40 more seconds for network.
[  102.243439] Waiting up to 20 more seconds for network.
[  122.243446] Sending DHCP requests ...
[  130.113944] random: fast init done
[  134.219441] ... timed out!
[  194.559562] IP-Config: Retrying forever (NFS root)...
[  194.624630] meson8b-dwmac c9410000.ethernet eth0: PHY
[0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
[  194.630739] meson8b-dwmac c9410000.ethernet eth0: Register
MEM_TYPE_PAGE_POOL RxQ-0
[  194.649138] meson8b-dwmac c9410000.ethernet eth0: No Safety
Features support found
[  194.651113] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
[  194.657931] meson8b-dwmac c9410000.ethernet eth0: configuring for
phy/rmii link mode
[  196.313602] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
100Mbps/Full - flow control off
[  196.339463] Sending DHCP requests ., OK
...


I don't remember seeing an output like this one in the previous tests.
Is there any further improvement we can do to the patch based on this?

Thanks

Erico

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-06  9:40                 ` Erico Nunes
  0 siblings, 0 replies; 49+ messages in thread
From: Erico Nunes @ 2022-03-06  9:40 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Jerome Brunet, Martin Blumenstingl, Alexandre Torgue,
	Giuseppe Cavallaro, Jose Abreu, Kevin Hilman, Neil Armstrong,
	linux-amlogic, netdev, open list:ARM/Rockchip SoC...,
	linux-sunxi

On Wed, Mar 2, 2022 at 5:35 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> When using polling the time difference between aneg complete and
> PHY state machine run is random in the interval 0 .. 1s.
> Hence there's a certain chance that the difference is too small
> to avoid the issue.
>
> > If I understand the proposed patch correctly, it is mostly about the phy
> > IRQ. Since I reproduce without the IRQ, I suppose it is not the
> > problem we where looking for (might still be a problem worth fixing -
> > the phy is not "rock-solid" when it comes to aneg - I already tried
> > stabilising it a few years ago)
>
> Below is a slightly improved version of the test patch. It doesn't sleep
> in the (threaded) interrupt handler and lets the workqueue do it.
>
> Maybe Amlogic is aware of a potentially related silicon issue?
>
> >
> > TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
> > sense :/
> >
> >>
> [...]
> >
>
>
> diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
> index 7e7904fee..a3318ae01 100644
> --- a/drivers/net/phy/meson-gxl.c
> +++ b/drivers/net/phy/meson-gxl.c
> @@ -209,12 +209,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
>                 if (ret)
>                         return ret;
>
> -               val = INTSRC_ANEG_PR
> -                       | INTSRC_PARALLEL_FAULT
> -                       | INTSRC_ANEG_LP_ACK
> -                       | INTSRC_LINK_DOWN
> -                       | INTSRC_REMOTE_FAULT
> -                       | INTSRC_ANEG_COMPLETE;
> +               val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
>                 ret = phy_write(phydev, INTSRC_MASK, val);
>         } else {
>                 val = 0;
> @@ -240,7 +235,10 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
>         if (irq_status == 0)
>                 return IRQ_NONE;
>
> -       phy_trigger_machine(phydev);
> +       if (irq_status & INTSRC_ANEG_COMPLETE)
> +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
> +       else
> +               phy_trigger_machine(phydev);
>
>         return IRQ_HANDLED;
>  }
> --
> 2.35.1

I did a lot of testing with this patch, and it seems to improve things.
To me it completely resolves the original issue which was more easily
reproducible where I would see "Link is Up" but the interface did not
really work.
At least in over a thousand jobs, that never reproduced again with this patch.

I do see a different issue now, but it is even less frequent and
harder to reproduce. In those over a thousand jobs, I have seen it
only about 4 times.
The difference is that now when the issue happens, the link is not
even reported as Up. The output is a bit different than the original
one, but it is consistently the same output in all instances where it
reproduced. Looks like this (note that there is no longer Link is
Down/Link is Up):

[    2.186151] meson8b-dwmac c9410000.ethernet eth0: PHY
[0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
[    2.191582] meson8b-dwmac c9410000.ethernet eth0: Register
MEM_TYPE_PAGE_POOL RxQ-0
[    2.208713] meson8b-dwmac c9410000.ethernet eth0: No Safety
Features support found
[    2.210673] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
[    2.218083] meson8b-dwmac c9410000.ethernet eth0: configuring for
phy/rmii link mode
[   22.227444] Waiting up to 100 more seconds for network.
[   42.231440] Waiting up to 80 more seconds for network.
[   62.235437] Waiting up to 60 more seconds for network.
[   82.239437] Waiting up to 40 more seconds for network.
[  102.243439] Waiting up to 20 more seconds for network.
[  122.243446] Sending DHCP requests ...
[  130.113944] random: fast init done
[  134.219441] ... timed out!
[  194.559562] IP-Config: Retrying forever (NFS root)...
[  194.624630] meson8b-dwmac c9410000.ethernet eth0: PHY
[0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
[  194.630739] meson8b-dwmac c9410000.ethernet eth0: Register
MEM_TYPE_PAGE_POOL RxQ-0
[  194.649138] meson8b-dwmac c9410000.ethernet eth0: No Safety
Features support found
[  194.651113] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
[  194.657931] meson8b-dwmac c9410000.ethernet eth0: configuring for
phy/rmii link mode
[  196.313602] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
100Mbps/Full - flow control off
[  196.339463] Sending DHCP requests ., OK
...


I don't remember seeing an output like this one in the previous tests.
Is there any further improvement we can do to the patch based on this?

Thanks

Erico

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-06  9:40                 ` Erico Nunes
  0 siblings, 0 replies; 49+ messages in thread
From: Erico Nunes @ 2022-03-06  9:40 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Jerome Brunet, Martin Blumenstingl, Alexandre Torgue,
	Giuseppe Cavallaro, Jose Abreu, Kevin Hilman, Neil Armstrong,
	linux-amlogic, netdev, open list:ARM/Rockchip SoC...,
	linux-sunxi

On Wed, Mar 2, 2022 at 5:35 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> When using polling the time difference between aneg complete and
> PHY state machine run is random in the interval 0 .. 1s.
> Hence there's a certain chance that the difference is too small
> to avoid the issue.
>
> > If I understand the proposed patch correctly, it is mostly about the phy
> > IRQ. Since I reproduce without the IRQ, I suppose it is not the
> > problem we where looking for (might still be a problem worth fixing -
> > the phy is not "rock-solid" when it comes to aneg - I already tried
> > stabilising it a few years ago)
>
> Below is a slightly improved version of the test patch. It doesn't sleep
> in the (threaded) interrupt handler and lets the workqueue do it.
>
> Maybe Amlogic is aware of a potentially related silicon issue?
>
> >
> > TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
> > sense :/
> >
> >>
> [...]
> >
>
>
> diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
> index 7e7904fee..a3318ae01 100644
> --- a/drivers/net/phy/meson-gxl.c
> +++ b/drivers/net/phy/meson-gxl.c
> @@ -209,12 +209,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
>                 if (ret)
>                         return ret;
>
> -               val = INTSRC_ANEG_PR
> -                       | INTSRC_PARALLEL_FAULT
> -                       | INTSRC_ANEG_LP_ACK
> -                       | INTSRC_LINK_DOWN
> -                       | INTSRC_REMOTE_FAULT
> -                       | INTSRC_ANEG_COMPLETE;
> +               val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
>                 ret = phy_write(phydev, INTSRC_MASK, val);
>         } else {
>                 val = 0;
> @@ -240,7 +235,10 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
>         if (irq_status == 0)
>                 return IRQ_NONE;
>
> -       phy_trigger_machine(phydev);
> +       if (irq_status & INTSRC_ANEG_COMPLETE)
> +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
> +       else
> +               phy_trigger_machine(phydev);
>
>         return IRQ_HANDLED;
>  }
> --
> 2.35.1

I did a lot of testing with this patch, and it seems to improve things.
To me it completely resolves the original issue which was more easily
reproducible where I would see "Link is Up" but the interface did not
really work.
At least in over a thousand jobs, that never reproduced again with this patch.

I do see a different issue now, but it is even less frequent and
harder to reproduce. In those over a thousand jobs, I have seen it
only about 4 times.
The difference is that now when the issue happens, the link is not
even reported as Up. The output is a bit different than the original
one, but it is consistently the same output in all instances where it
reproduced. Looks like this (note that there is no longer Link is
Down/Link is Up):

[    2.186151] meson8b-dwmac c9410000.ethernet eth0: PHY
[0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
[    2.191582] meson8b-dwmac c9410000.ethernet eth0: Register
MEM_TYPE_PAGE_POOL RxQ-0
[    2.208713] meson8b-dwmac c9410000.ethernet eth0: No Safety
Features support found
[    2.210673] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
[    2.218083] meson8b-dwmac c9410000.ethernet eth0: configuring for
phy/rmii link mode
[   22.227444] Waiting up to 100 more seconds for network.
[   42.231440] Waiting up to 80 more seconds for network.
[   62.235437] Waiting up to 60 more seconds for network.
[   82.239437] Waiting up to 40 more seconds for network.
[  102.243439] Waiting up to 20 more seconds for network.
[  122.243446] Sending DHCP requests ...
[  130.113944] random: fast init done
[  134.219441] ... timed out!
[  194.559562] IP-Config: Retrying forever (NFS root)...
[  194.624630] meson8b-dwmac c9410000.ethernet eth0: PHY
[0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
[  194.630739] meson8b-dwmac c9410000.ethernet eth0: Register
MEM_TYPE_PAGE_POOL RxQ-0
[  194.649138] meson8b-dwmac c9410000.ethernet eth0: No Safety
Features support found
[  194.651113] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
[  194.657931] meson8b-dwmac c9410000.ethernet eth0: configuring for
phy/rmii link mode
[  196.313602] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
100Mbps/Full - flow control off
[  196.339463] Sending DHCP requests ., OK
...


I don't remember seeing an output like this one in the previous tests.
Is there any further improvement we can do to the patch based on this?

Thanks

Erico

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
  2022-03-06  9:40                 ` Erico Nunes
  (?)
@ 2022-03-06 12:56                   ` Heiner Kallweit
  -1 siblings, 0 replies; 49+ messages in thread
From: Heiner Kallweit @ 2022-03-06 12:56 UTC (permalink / raw)
  To: Erico Nunes
  Cc: Jerome Brunet, Martin Blumenstingl, Alexandre Torgue,
	Giuseppe Cavallaro, Jose Abreu, Kevin Hilman, Neil Armstrong,
	linux-amlogic, netdev, open list:ARM/Rockchip SoC...,
	linux-sunxi

On 06.03.2022 10:40, Erico Nunes wrote:
> On Wed, Mar 2, 2022 at 5:35 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>> When using polling the time difference between aneg complete and
>> PHY state machine run is random in the interval 0 .. 1s.
>> Hence there's a certain chance that the difference is too small
>> to avoid the issue.
>>
>>> If I understand the proposed patch correctly, it is mostly about the phy
>>> IRQ. Since I reproduce without the IRQ, I suppose it is not the
>>> problem we where looking for (might still be a problem worth fixing -
>>> the phy is not "rock-solid" when it comes to aneg - I already tried
>>> stabilising it a few years ago)
>>
>> Below is a slightly improved version of the test patch. It doesn't sleep
>> in the (threaded) interrupt handler and lets the workqueue do it.
>>
>> Maybe Amlogic is aware of a potentially related silicon issue?
>>
>>>
>>> TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
>>> sense :/
>>>
>>>>
>> [...]
>>>
>>
>>
>> diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
>> index 7e7904fee..a3318ae01 100644
>> --- a/drivers/net/phy/meson-gxl.c
>> +++ b/drivers/net/phy/meson-gxl.c
>> @@ -209,12 +209,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
>>                 if (ret)
>>                         return ret;
>>
>> -               val = INTSRC_ANEG_PR
>> -                       | INTSRC_PARALLEL_FAULT
>> -                       | INTSRC_ANEG_LP_ACK
>> -                       | INTSRC_LINK_DOWN
>> -                       | INTSRC_REMOTE_FAULT
>> -                       | INTSRC_ANEG_COMPLETE;
>> +               val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
>>                 ret = phy_write(phydev, INTSRC_MASK, val);
>>         } else {
>>                 val = 0;
>> @@ -240,7 +235,10 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
>>         if (irq_status == 0)
>>                 return IRQ_NONE;
>>
>> -       phy_trigger_machine(phydev);
>> +       if (irq_status & INTSRC_ANEG_COMPLETE)
>> +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
>> +       else
>> +               phy_trigger_machine(phydev);
>>
>>         return IRQ_HANDLED;
>>  }
>> --
>> 2.35.1
> 
> I did a lot of testing with this patch, and it seems to improve things.
> To me it completely resolves the original issue which was more easily
> reproducible where I would see "Link is Up" but the interface did not
> really work.
> At least in over a thousand jobs, that never reproduced again with this patch.
> 
> I do see a different issue now, but it is even less frequent and
> harder to reproduce. In those over a thousand jobs, I have seen it
> only about 4 times.
> The difference is that now when the issue happens, the link is not
> even reported as Up. The output is a bit different than the original
> one, but it is consistently the same output in all instances where it
> reproduced. Looks like this (note that there is no longer Link is
> Down/Link is Up):
> 
> [    2.186151] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [    2.191582] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [    2.208713] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [    2.210673] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [    2.218083] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [   22.227444] Waiting up to 100 more seconds for network.
> [   42.231440] Waiting up to 80 more seconds for network.
> [   62.235437] Waiting up to 60 more seconds for network.
> [   82.239437] Waiting up to 40 more seconds for network.
> [  102.243439] Waiting up to 20 more seconds for network.
> [  122.243446] Sending DHCP requests ...
> [  130.113944] random: fast init done
> [  134.219441] ... timed out!
> [  194.559562] IP-Config: Retrying forever (NFS root)...
> [  194.624630] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [  194.630739] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [  194.649138] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [  194.651113] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [  194.657931] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [  196.313602] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
> [  196.339463] Sending DHCP requests ., OK
> ...
> 
> 
> I don't remember seeing an output like this one in the previous tests.
> Is there any further improvement we can do to the patch based on this?
> 
> Thanks
> 
> Erico

Thanks a lot for your testing efforts, much appreciated.
You could try the following (quick and dirty) test patch that fully mimics
the vendor driver as found here:
https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c

First apply
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
This patch is in the net tree currently and should show up in linux-next
beginning of the week.

On top please apply the following (it includes the test patch your working with).


diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index c49062ad7..92f94c8be 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -68,32 +68,19 @@ static int meson_gxl_open_banks(struct phy_device *phydev)
 	return phy_write(phydev, TSTCNTL, TSTCNTL_TEST_MODE);
 }
 
-static void meson_gxl_close_banks(struct phy_device *phydev)
-{
-	phy_write(phydev, TSTCNTL, 0);
-}
-
 static int meson_gxl_read_reg(struct phy_device *phydev,
 			      unsigned int bank, unsigned int reg)
 {
 	int ret;
 
-	ret = meson_gxl_open_banks(phydev);
-	if (ret)
-		goto out;
-
 	ret = phy_write(phydev, TSTCNTL, TSTCNTL_READ |
 			FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
 			TSTCNTL_TEST_MODE |
 			FIELD_PREP(TSTCNTL_READ_ADDRESS, reg));
 	if (ret)
-		goto out;
+		return ret;
 
-	ret = phy_read(phydev, TSTREAD1);
-out:
-	/* Close the bank access on our way out */
-	meson_gxl_close_banks(phydev);
-	return ret;
+	return phy_read(phydev, TSTREAD1);
 }
 
 static int meson_gxl_write_reg(struct phy_device *phydev,
@@ -102,29 +89,28 @@ static int meson_gxl_write_reg(struct phy_device *phydev,
 {
 	int ret;
 
-	ret = meson_gxl_open_banks(phydev);
-	if (ret)
-		goto out;
-
 	ret = phy_write(phydev, TSTWRITE, value);
 	if (ret)
-		goto out;
+		return ret;
 
-	ret = phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
-			FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
-			TSTCNTL_TEST_MODE |
-			FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
+	return phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
+			 FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
+			 TSTCNTL_TEST_MODE |
+			 FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
 
-out:
-	/* Close the bank access on our way out */
-	meson_gxl_close_banks(phydev);
-	return ret;
 }
 
 static int meson_gxl_config_init(struct phy_device *phydev)
 {
 	int ret;
 
+	phy_set_bits(phydev, 0x1b, BIT(12));
+	phy_write(phydev, 0x11, 0x0080);
+
+	meson_gxl_open_banks(phydev);
+
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x8e0d);
+
 	/* Enable fractional PLL */
 	ret = meson_gxl_write_reg(phydev, BANK_BIST, FR_PLL_CONTROL, 0x5);
 	if (ret)
@@ -140,6 +126,10 @@ static int meson_gxl_config_init(struct phy_device *phydev)
 	if (ret)
 		return ret;
 
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x18, 0x000c);
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x1a0c);
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x1a, 0x6400);
+
 	return 0;
 }
 
@@ -186,7 +176,7 @@ static int meson_gxl_read_status(struct phy_device *phydev)
 		if (!(wol & LPI_STATUS_RSV12) ||
 		    ((exp & EXPANSION_NWAY) && !(lpa & LPA_LPACK))) {
 			/* Looks like aneg failed after all */
-			phydev_dbg(phydev, "LPA corruption - aneg restart\n");
+			phydev_warn(phydev, "LPA corruption - aneg restart\n");
 			return genphy_restart_aneg(phydev);
 		}
 	}
@@ -243,11 +233,23 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
 	    irq_status == INTSRC_ENERGY_DETECT)
 		return IRQ_HANDLED;
 
-	phy_trigger_machine(phydev);
+	/* Give PHY some time before MAC starts sending data. This works
+	 * around an issue where network doesn't come up properly.
+	 */
+	if (irq_status & INTSRC_ANEG_COMPLETE)
+		phy_queue_state_machine(phydev, msecs_to_jiffies(100));
+	else
+		phy_trigger_machine(phydev);
 
 	return IRQ_HANDLED;
 }
 
+static void meson_gxl_link_change_notify(struct phy_device *phydev)
+{
+	if (phydev->state == PHY_RUNNING && phydev->speed == SPEED_100)
+		meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x14, 0xa900);
+}
+
 static struct phy_driver meson_gxl_phy[] = {
 	{
 		PHY_ID_MATCH_EXACT(0x01814400),
@@ -259,6 +261,7 @@ static struct phy_driver meson_gxl_phy[] = {
 		.read_status	= meson_gxl_read_status,
 		.config_intr	= meson_gxl_config_intr,
 		.handle_interrupt = meson_gxl_handle_interrupt,
+		.link_change_notify = meson_gxl_link_change_notify,
 		.suspend        = genphy_suspend,
 		.resume         = genphy_resume,
 	}, {
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-06 12:56                   ` Heiner Kallweit
  0 siblings, 0 replies; 49+ messages in thread
From: Heiner Kallweit @ 2022-03-06 12:56 UTC (permalink / raw)
  To: Erico Nunes
  Cc: Jerome Brunet, Martin Blumenstingl, Alexandre Torgue,
	Giuseppe Cavallaro, Jose Abreu, Kevin Hilman, Neil Armstrong,
	linux-amlogic, netdev, open list:ARM/Rockchip SoC...,
	linux-sunxi

On 06.03.2022 10:40, Erico Nunes wrote:
> On Wed, Mar 2, 2022 at 5:35 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>> When using polling the time difference between aneg complete and
>> PHY state machine run is random in the interval 0 .. 1s.
>> Hence there's a certain chance that the difference is too small
>> to avoid the issue.
>>
>>> If I understand the proposed patch correctly, it is mostly about the phy
>>> IRQ. Since I reproduce without the IRQ, I suppose it is not the
>>> problem we where looking for (might still be a problem worth fixing -
>>> the phy is not "rock-solid" when it comes to aneg - I already tried
>>> stabilising it a few years ago)
>>
>> Below is a slightly improved version of the test patch. It doesn't sleep
>> in the (threaded) interrupt handler and lets the workqueue do it.
>>
>> Maybe Amlogic is aware of a potentially related silicon issue?
>>
>>>
>>> TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
>>> sense :/
>>>
>>>>
>> [...]
>>>
>>
>>
>> diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
>> index 7e7904fee..a3318ae01 100644
>> --- a/drivers/net/phy/meson-gxl.c
>> +++ b/drivers/net/phy/meson-gxl.c
>> @@ -209,12 +209,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
>>                 if (ret)
>>                         return ret;
>>
>> -               val = INTSRC_ANEG_PR
>> -                       | INTSRC_PARALLEL_FAULT
>> -                       | INTSRC_ANEG_LP_ACK
>> -                       | INTSRC_LINK_DOWN
>> -                       | INTSRC_REMOTE_FAULT
>> -                       | INTSRC_ANEG_COMPLETE;
>> +               val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
>>                 ret = phy_write(phydev, INTSRC_MASK, val);
>>         } else {
>>                 val = 0;
>> @@ -240,7 +235,10 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
>>         if (irq_status == 0)
>>                 return IRQ_NONE;
>>
>> -       phy_trigger_machine(phydev);
>> +       if (irq_status & INTSRC_ANEG_COMPLETE)
>> +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
>> +       else
>> +               phy_trigger_machine(phydev);
>>
>>         return IRQ_HANDLED;
>>  }
>> --
>> 2.35.1
> 
> I did a lot of testing with this patch, and it seems to improve things.
> To me it completely resolves the original issue which was more easily
> reproducible where I would see "Link is Up" but the interface did not
> really work.
> At least in over a thousand jobs, that never reproduced again with this patch.
> 
> I do see a different issue now, but it is even less frequent and
> harder to reproduce. In those over a thousand jobs, I have seen it
> only about 4 times.
> The difference is that now when the issue happens, the link is not
> even reported as Up. The output is a bit different than the original
> one, but it is consistently the same output in all instances where it
> reproduced. Looks like this (note that there is no longer Link is
> Down/Link is Up):
> 
> [    2.186151] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [    2.191582] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [    2.208713] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [    2.210673] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [    2.218083] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [   22.227444] Waiting up to 100 more seconds for network.
> [   42.231440] Waiting up to 80 more seconds for network.
> [   62.235437] Waiting up to 60 more seconds for network.
> [   82.239437] Waiting up to 40 more seconds for network.
> [  102.243439] Waiting up to 20 more seconds for network.
> [  122.243446] Sending DHCP requests ...
> [  130.113944] random: fast init done
> [  134.219441] ... timed out!
> [  194.559562] IP-Config: Retrying forever (NFS root)...
> [  194.624630] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [  194.630739] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [  194.649138] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [  194.651113] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [  194.657931] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [  196.313602] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
> [  196.339463] Sending DHCP requests ., OK
> ...
> 
> 
> I don't remember seeing an output like this one in the previous tests.
> Is there any further improvement we can do to the patch based on this?
> 
> Thanks
> 
> Erico

Thanks a lot for your testing efforts, much appreciated.
You could try the following (quick and dirty) test patch that fully mimics
the vendor driver as found here:
https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c

First apply
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
This patch is in the net tree currently and should show up in linux-next
beginning of the week.

On top please apply the following (it includes the test patch your working with).


diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index c49062ad7..92f94c8be 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -68,32 +68,19 @@ static int meson_gxl_open_banks(struct phy_device *phydev)
 	return phy_write(phydev, TSTCNTL, TSTCNTL_TEST_MODE);
 }
 
-static void meson_gxl_close_banks(struct phy_device *phydev)
-{
-	phy_write(phydev, TSTCNTL, 0);
-}
-
 static int meson_gxl_read_reg(struct phy_device *phydev,
 			      unsigned int bank, unsigned int reg)
 {
 	int ret;
 
-	ret = meson_gxl_open_banks(phydev);
-	if (ret)
-		goto out;
-
 	ret = phy_write(phydev, TSTCNTL, TSTCNTL_READ |
 			FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
 			TSTCNTL_TEST_MODE |
 			FIELD_PREP(TSTCNTL_READ_ADDRESS, reg));
 	if (ret)
-		goto out;
+		return ret;
 
-	ret = phy_read(phydev, TSTREAD1);
-out:
-	/* Close the bank access on our way out */
-	meson_gxl_close_banks(phydev);
-	return ret;
+	return phy_read(phydev, TSTREAD1);
 }
 
 static int meson_gxl_write_reg(struct phy_device *phydev,
@@ -102,29 +89,28 @@ static int meson_gxl_write_reg(struct phy_device *phydev,
 {
 	int ret;
 
-	ret = meson_gxl_open_banks(phydev);
-	if (ret)
-		goto out;
-
 	ret = phy_write(phydev, TSTWRITE, value);
 	if (ret)
-		goto out;
+		return ret;
 
-	ret = phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
-			FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
-			TSTCNTL_TEST_MODE |
-			FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
+	return phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
+			 FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
+			 TSTCNTL_TEST_MODE |
+			 FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
 
-out:
-	/* Close the bank access on our way out */
-	meson_gxl_close_banks(phydev);
-	return ret;
 }
 
 static int meson_gxl_config_init(struct phy_device *phydev)
 {
 	int ret;
 
+	phy_set_bits(phydev, 0x1b, BIT(12));
+	phy_write(phydev, 0x11, 0x0080);
+
+	meson_gxl_open_banks(phydev);
+
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x8e0d);
+
 	/* Enable fractional PLL */
 	ret = meson_gxl_write_reg(phydev, BANK_BIST, FR_PLL_CONTROL, 0x5);
 	if (ret)
@@ -140,6 +126,10 @@ static int meson_gxl_config_init(struct phy_device *phydev)
 	if (ret)
 		return ret;
 
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x18, 0x000c);
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x1a0c);
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x1a, 0x6400);
+
 	return 0;
 }
 
@@ -186,7 +176,7 @@ static int meson_gxl_read_status(struct phy_device *phydev)
 		if (!(wol & LPI_STATUS_RSV12) ||
 		    ((exp & EXPANSION_NWAY) && !(lpa & LPA_LPACK))) {
 			/* Looks like aneg failed after all */
-			phydev_dbg(phydev, "LPA corruption - aneg restart\n");
+			phydev_warn(phydev, "LPA corruption - aneg restart\n");
 			return genphy_restart_aneg(phydev);
 		}
 	}
@@ -243,11 +233,23 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
 	    irq_status == INTSRC_ENERGY_DETECT)
 		return IRQ_HANDLED;
 
-	phy_trigger_machine(phydev);
+	/* Give PHY some time before MAC starts sending data. This works
+	 * around an issue where network doesn't come up properly.
+	 */
+	if (irq_status & INTSRC_ANEG_COMPLETE)
+		phy_queue_state_machine(phydev, msecs_to_jiffies(100));
+	else
+		phy_trigger_machine(phydev);
 
 	return IRQ_HANDLED;
 }
 
+static void meson_gxl_link_change_notify(struct phy_device *phydev)
+{
+	if (phydev->state == PHY_RUNNING && phydev->speed == SPEED_100)
+		meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x14, 0xa900);
+}
+
 static struct phy_driver meson_gxl_phy[] = {
 	{
 		PHY_ID_MATCH_EXACT(0x01814400),
@@ -259,6 +261,7 @@ static struct phy_driver meson_gxl_phy[] = {
 		.read_status	= meson_gxl_read_status,
 		.config_intr	= meson_gxl_config_intr,
 		.handle_interrupt = meson_gxl_handle_interrupt,
+		.link_change_notify = meson_gxl_link_change_notify,
 		.suspend        = genphy_suspend,
 		.resume         = genphy_resume,
 	}, {
-- 
2.35.1



_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-06 12:56                   ` Heiner Kallweit
  0 siblings, 0 replies; 49+ messages in thread
From: Heiner Kallweit @ 2022-03-06 12:56 UTC (permalink / raw)
  To: Erico Nunes
  Cc: Jerome Brunet, Martin Blumenstingl, Alexandre Torgue,
	Giuseppe Cavallaro, Jose Abreu, Kevin Hilman, Neil Armstrong,
	linux-amlogic, netdev, open list:ARM/Rockchip SoC...,
	linux-sunxi

On 06.03.2022 10:40, Erico Nunes wrote:
> On Wed, Mar 2, 2022 at 5:35 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>> When using polling the time difference between aneg complete and
>> PHY state machine run is random in the interval 0 .. 1s.
>> Hence there's a certain chance that the difference is too small
>> to avoid the issue.
>>
>>> If I understand the proposed patch correctly, it is mostly about the phy
>>> IRQ. Since I reproduce without the IRQ, I suppose it is not the
>>> problem we where looking for (might still be a problem worth fixing -
>>> the phy is not "rock-solid" when it comes to aneg - I already tried
>>> stabilising it a few years ago)
>>
>> Below is a slightly improved version of the test patch. It doesn't sleep
>> in the (threaded) interrupt handler and lets the workqueue do it.
>>
>> Maybe Amlogic is aware of a potentially related silicon issue?
>>
>>>
>>> TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
>>> sense :/
>>>
>>>>
>> [...]
>>>
>>
>>
>> diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
>> index 7e7904fee..a3318ae01 100644
>> --- a/drivers/net/phy/meson-gxl.c
>> +++ b/drivers/net/phy/meson-gxl.c
>> @@ -209,12 +209,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
>>                 if (ret)
>>                         return ret;
>>
>> -               val = INTSRC_ANEG_PR
>> -                       | INTSRC_PARALLEL_FAULT
>> -                       | INTSRC_ANEG_LP_ACK
>> -                       | INTSRC_LINK_DOWN
>> -                       | INTSRC_REMOTE_FAULT
>> -                       | INTSRC_ANEG_COMPLETE;
>> +               val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
>>                 ret = phy_write(phydev, INTSRC_MASK, val);
>>         } else {
>>                 val = 0;
>> @@ -240,7 +235,10 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
>>         if (irq_status == 0)
>>                 return IRQ_NONE;
>>
>> -       phy_trigger_machine(phydev);
>> +       if (irq_status & INTSRC_ANEG_COMPLETE)
>> +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
>> +       else
>> +               phy_trigger_machine(phydev);
>>
>>         return IRQ_HANDLED;
>>  }
>> --
>> 2.35.1
> 
> I did a lot of testing with this patch, and it seems to improve things.
> To me it completely resolves the original issue which was more easily
> reproducible where I would see "Link is Up" but the interface did not
> really work.
> At least in over a thousand jobs, that never reproduced again with this patch.
> 
> I do see a different issue now, but it is even less frequent and
> harder to reproduce. In those over a thousand jobs, I have seen it
> only about 4 times.
> The difference is that now when the issue happens, the link is not
> even reported as Up. The output is a bit different than the original
> one, but it is consistently the same output in all instances where it
> reproduced. Looks like this (note that there is no longer Link is
> Down/Link is Up):
> 
> [    2.186151] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [    2.191582] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [    2.208713] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [    2.210673] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [    2.218083] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [   22.227444] Waiting up to 100 more seconds for network.
> [   42.231440] Waiting up to 80 more seconds for network.
> [   62.235437] Waiting up to 60 more seconds for network.
> [   82.239437] Waiting up to 40 more seconds for network.
> [  102.243439] Waiting up to 20 more seconds for network.
> [  122.243446] Sending DHCP requests ...
> [  130.113944] random: fast init done
> [  134.219441] ... timed out!
> [  194.559562] IP-Config: Retrying forever (NFS root)...
> [  194.624630] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [  194.630739] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [  194.649138] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [  194.651113] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [  194.657931] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [  196.313602] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
> [  196.339463] Sending DHCP requests ., OK
> ...
> 
> 
> I don't remember seeing an output like this one in the previous tests.
> Is there any further improvement we can do to the patch based on this?
> 
> Thanks
> 
> Erico

Thanks a lot for your testing efforts, much appreciated.
You could try the following (quick and dirty) test patch that fully mimics
the vendor driver as found here:
https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c

First apply
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
This patch is in the net tree currently and should show up in linux-next
beginning of the week.

On top please apply the following (it includes the test patch your working with).


diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index c49062ad7..92f94c8be 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -68,32 +68,19 @@ static int meson_gxl_open_banks(struct phy_device *phydev)
 	return phy_write(phydev, TSTCNTL, TSTCNTL_TEST_MODE);
 }
 
-static void meson_gxl_close_banks(struct phy_device *phydev)
-{
-	phy_write(phydev, TSTCNTL, 0);
-}
-
 static int meson_gxl_read_reg(struct phy_device *phydev,
 			      unsigned int bank, unsigned int reg)
 {
 	int ret;
 
-	ret = meson_gxl_open_banks(phydev);
-	if (ret)
-		goto out;
-
 	ret = phy_write(phydev, TSTCNTL, TSTCNTL_READ |
 			FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
 			TSTCNTL_TEST_MODE |
 			FIELD_PREP(TSTCNTL_READ_ADDRESS, reg));
 	if (ret)
-		goto out;
+		return ret;
 
-	ret = phy_read(phydev, TSTREAD1);
-out:
-	/* Close the bank access on our way out */
-	meson_gxl_close_banks(phydev);
-	return ret;
+	return phy_read(phydev, TSTREAD1);
 }
 
 static int meson_gxl_write_reg(struct phy_device *phydev,
@@ -102,29 +89,28 @@ static int meson_gxl_write_reg(struct phy_device *phydev,
 {
 	int ret;
 
-	ret = meson_gxl_open_banks(phydev);
-	if (ret)
-		goto out;
-
 	ret = phy_write(phydev, TSTWRITE, value);
 	if (ret)
-		goto out;
+		return ret;
 
-	ret = phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
-			FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
-			TSTCNTL_TEST_MODE |
-			FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
+	return phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
+			 FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
+			 TSTCNTL_TEST_MODE |
+			 FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
 
-out:
-	/* Close the bank access on our way out */
-	meson_gxl_close_banks(phydev);
-	return ret;
 }
 
 static int meson_gxl_config_init(struct phy_device *phydev)
 {
 	int ret;
 
+	phy_set_bits(phydev, 0x1b, BIT(12));
+	phy_write(phydev, 0x11, 0x0080);
+
+	meson_gxl_open_banks(phydev);
+
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x8e0d);
+
 	/* Enable fractional PLL */
 	ret = meson_gxl_write_reg(phydev, BANK_BIST, FR_PLL_CONTROL, 0x5);
 	if (ret)
@@ -140,6 +126,10 @@ static int meson_gxl_config_init(struct phy_device *phydev)
 	if (ret)
 		return ret;
 
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x18, 0x000c);
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x1a0c);
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x1a, 0x6400);
+
 	return 0;
 }
 
@@ -186,7 +176,7 @@ static int meson_gxl_read_status(struct phy_device *phydev)
 		if (!(wol & LPI_STATUS_RSV12) ||
 		    ((exp & EXPANSION_NWAY) && !(lpa & LPA_LPACK))) {
 			/* Looks like aneg failed after all */
-			phydev_dbg(phydev, "LPA corruption - aneg restart\n");
+			phydev_warn(phydev, "LPA corruption - aneg restart\n");
 			return genphy_restart_aneg(phydev);
 		}
 	}
@@ -243,11 +233,23 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
 	    irq_status == INTSRC_ENERGY_DETECT)
 		return IRQ_HANDLED;
 
-	phy_trigger_machine(phydev);
+	/* Give PHY some time before MAC starts sending data. This works
+	 * around an issue where network doesn't come up properly.
+	 */
+	if (irq_status & INTSRC_ANEG_COMPLETE)
+		phy_queue_state_machine(phydev, msecs_to_jiffies(100));
+	else
+		phy_trigger_machine(phydev);
 
 	return IRQ_HANDLED;
 }
 
+static void meson_gxl_link_change_notify(struct phy_device *phydev)
+{
+	if (phydev->state == PHY_RUNNING && phydev->speed == SPEED_100)
+		meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x14, 0xa900);
+}
+
 static struct phy_driver meson_gxl_phy[] = {
 	{
 		PHY_ID_MATCH_EXACT(0x01814400),
@@ -259,6 +261,7 @@ static struct phy_driver meson_gxl_phy[] = {
 		.read_status	= meson_gxl_read_status,
 		.config_intr	= meson_gxl_config_intr,
 		.handle_interrupt = meson_gxl_handle_interrupt,
+		.link_change_notify = meson_gxl_link_change_notify,
 		.suspend        = genphy_suspend,
 		.resume         = genphy_resume,
 	}, {
-- 
2.35.1



_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
  2022-03-06 12:56                   ` Heiner Kallweit
  (?)
@ 2022-03-09 14:45                     ` Erico Nunes
  -1 siblings, 0 replies; 49+ messages in thread
From: Erico Nunes @ 2022-03-09 14:45 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Jerome Brunet, Martin Blumenstingl, Alexandre Torgue,
	Giuseppe Cavallaro, Jose Abreu, Kevin Hilman, Neil Armstrong,
	linux-amlogic, netdev, open list:ARM/Rockchip SoC...,
	linux-sunxi

On Sun, Mar 6, 2022 at 1:56 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> You could try the following (quick and dirty) test patch that fully mimics
> the vendor driver as found here:
> https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
>
> First apply
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
> This patch is in the net tree currently and should show up in linux-next
> beginning of the week.
>
> On top please apply the following (it includes the test patch your working with).

I triggered test jobs with this configuration (latest mainline +
a502a8f0409 + test patch for vendor driver behaviour), and the results
are pretty much the same as with the previous test patch from this
thread only.
That is, I never got the issue with non-functional link up anymore,
but I get the (rare) issue with link not going up.
The reproducibility is still extremely low, in the >1% range.

So at this point, I'm not sure how much more effort to invest into
this. Given the rate is very low and the fallback is it will just
reset the link and proceed to work, I think the situation would
already be much better with the solution from that test patch being
merged. If you propose that as a patch separately, I'm happy to test
the final submitted patch again and provide feedback there. Or if
there is another solution to try, I can try with that too.

Thanks


Erico

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-09 14:45                     ` Erico Nunes
  0 siblings, 0 replies; 49+ messages in thread
From: Erico Nunes @ 2022-03-09 14:45 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Jerome Brunet, Martin Blumenstingl, Alexandre Torgue,
	Giuseppe Cavallaro, Jose Abreu, Kevin Hilman, Neil Armstrong,
	linux-amlogic, netdev, open list:ARM/Rockchip SoC...,
	linux-sunxi

On Sun, Mar 6, 2022 at 1:56 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> You could try the following (quick and dirty) test patch that fully mimics
> the vendor driver as found here:
> https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
>
> First apply
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
> This patch is in the net tree currently and should show up in linux-next
> beginning of the week.
>
> On top please apply the following (it includes the test patch your working with).

I triggered test jobs with this configuration (latest mainline +
a502a8f0409 + test patch for vendor driver behaviour), and the results
are pretty much the same as with the previous test patch from this
thread only.
That is, I never got the issue with non-functional link up anymore,
but I get the (rare) issue with link not going up.
The reproducibility is still extremely low, in the >1% range.

So at this point, I'm not sure how much more effort to invest into
this. Given the rate is very low and the fallback is it will just
reset the link and proceed to work, I think the situation would
already be much better with the solution from that test patch being
merged. If you propose that as a patch separately, I'm happy to test
the final submitted patch again and provide feedback there. Or if
there is another solution to try, I can try with that too.

Thanks


Erico

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-09 14:45                     ` Erico Nunes
  0 siblings, 0 replies; 49+ messages in thread
From: Erico Nunes @ 2022-03-09 14:45 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Jerome Brunet, Martin Blumenstingl, Alexandre Torgue,
	Giuseppe Cavallaro, Jose Abreu, Kevin Hilman, Neil Armstrong,
	linux-amlogic, netdev, open list:ARM/Rockchip SoC...,
	linux-sunxi

On Sun, Mar 6, 2022 at 1:56 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> You could try the following (quick and dirty) test patch that fully mimics
> the vendor driver as found here:
> https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
>
> First apply
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
> This patch is in the net tree currently and should show up in linux-next
> beginning of the week.
>
> On top please apply the following (it includes the test patch your working with).

I triggered test jobs with this configuration (latest mainline +
a502a8f0409 + test patch for vendor driver behaviour), and the results
are pretty much the same as with the previous test patch from this
thread only.
That is, I never got the issue with non-functional link up anymore,
but I get the (rare) issue with link not going up.
The reproducibility is still extremely low, in the >1% range.

So at this point, I'm not sure how much more effort to invest into
this. Given the rate is very low and the fallback is it will just
reset the link and proceed to work, I think the situation would
already be much better with the solution from that test patch being
merged. If you propose that as a patch separately, I'm happy to test
the final submitted patch again and provide feedback there. Or if
there is another solution to try, I can try with that too.

Thanks


Erico

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
  2022-03-09 14:45                     ` Erico Nunes
  (?)
@ 2022-03-09 14:57                       ` Jerome Brunet
  -1 siblings, 0 replies; 49+ messages in thread
From: Jerome Brunet @ 2022-03-09 14:57 UTC (permalink / raw)
  To: Erico Nunes, Heiner Kallweit
  Cc: Martin Blumenstingl, Alexandre Torgue, Giuseppe Cavallaro,
	Jose Abreu, Kevin Hilman, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi


On Wed 09 Mar 2022 at 15:45, Erico Nunes <nunes.erico@gmail.com> wrote:

> On Sun, Mar 6, 2022 at 1:56 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>> You could try the following (quick and dirty) test patch that fully mimics
>> the vendor driver as found here:
>> https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
>>
>> First apply
>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
>> This patch is in the net tree currently and should show up in linux-next
>> beginning of the week.
>>
>> On top please apply the following (it includes the test patch your working with).
>
> I triggered test jobs with this configuration (latest mainline +
> a502a8f0409 + test patch for vendor driver behaviour), and the results
> are pretty much the same as with the previous test patch from this
> thread only.
> That is, I never got the issue with non-functional link up anymore,
> but I get the (rare) issue with link not going up.
> The reproducibility is still extremely low, in the >1% range.

Low reproducibility means the problem is still there, or at least not
understood completly.

I understand the benefit from the user standpoint.

Heiner if you are going to continue from the test patch you sent,
I would welcome some explanation with each of the changes.

We know very little about this IP and I'm not very confortable with
tweaking/aligning with AML sdk "blindly" on a driver that has otherwise
been working well so far.

Thx

>
> So at this point, I'm not sure how much more effort to invest into
> this. Given the rate is very low and the fallback is it will just
> reset the link and proceed to work, I think the situation would
> already be much better with the solution from that test patch being
> merged. If you propose that as a patch separately, I'm happy to test
> the final submitted patch again and provide feedback there. Or if
> there is another solution to try, I can try with that too.
>
> Thanks
>
>
> Erico


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-09 14:57                       ` Jerome Brunet
  0 siblings, 0 replies; 49+ messages in thread
From: Jerome Brunet @ 2022-03-09 14:57 UTC (permalink / raw)
  To: Erico Nunes, Heiner Kallweit
  Cc: Martin Blumenstingl, Alexandre Torgue, Giuseppe Cavallaro,
	Jose Abreu, Kevin Hilman, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi


On Wed 09 Mar 2022 at 15:45, Erico Nunes <nunes.erico@gmail.com> wrote:

> On Sun, Mar 6, 2022 at 1:56 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>> You could try the following (quick and dirty) test patch that fully mimics
>> the vendor driver as found here:
>> https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
>>
>> First apply
>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
>> This patch is in the net tree currently and should show up in linux-next
>> beginning of the week.
>>
>> On top please apply the following (it includes the test patch your working with).
>
> I triggered test jobs with this configuration (latest mainline +
> a502a8f0409 + test patch for vendor driver behaviour), and the results
> are pretty much the same as with the previous test patch from this
> thread only.
> That is, I never got the issue with non-functional link up anymore,
> but I get the (rare) issue with link not going up.
> The reproducibility is still extremely low, in the >1% range.

Low reproducibility means the problem is still there, or at least not
understood completly.

I understand the benefit from the user standpoint.

Heiner if you are going to continue from the test patch you sent,
I would welcome some explanation with each of the changes.

We know very little about this IP and I'm not very confortable with
tweaking/aligning with AML sdk "blindly" on a driver that has otherwise
been working well so far.

Thx

>
> So at this point, I'm not sure how much more effort to invest into
> this. Given the rate is very low and the fallback is it will just
> reset the link and proceed to work, I think the situation would
> already be much better with the solution from that test patch being
> merged. If you propose that as a patch separately, I'm happy to test
> the final submitted patch again and provide feedback there. Or if
> there is another solution to try, I can try with that too.
>
> Thanks
>
>
> Erico


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-09 14:57                       ` Jerome Brunet
  0 siblings, 0 replies; 49+ messages in thread
From: Jerome Brunet @ 2022-03-09 14:57 UTC (permalink / raw)
  To: Erico Nunes, Heiner Kallweit
  Cc: Martin Blumenstingl, Alexandre Torgue, Giuseppe Cavallaro,
	Jose Abreu, Kevin Hilman, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi


On Wed 09 Mar 2022 at 15:45, Erico Nunes <nunes.erico@gmail.com> wrote:

> On Sun, Mar 6, 2022 at 1:56 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>> You could try the following (quick and dirty) test patch that fully mimics
>> the vendor driver as found here:
>> https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
>>
>> First apply
>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
>> This patch is in the net tree currently and should show up in linux-next
>> beginning of the week.
>>
>> On top please apply the following (it includes the test patch your working with).
>
> I triggered test jobs with this configuration (latest mainline +
> a502a8f0409 + test patch for vendor driver behaviour), and the results
> are pretty much the same as with the previous test patch from this
> thread only.
> That is, I never got the issue with non-functional link up anymore,
> but I get the (rare) issue with link not going up.
> The reproducibility is still extremely low, in the >1% range.

Low reproducibility means the problem is still there, or at least not
understood completly.

I understand the benefit from the user standpoint.

Heiner if you are going to continue from the test patch you sent,
I would welcome some explanation with each of the changes.

We know very little about this IP and I'm not very confortable with
tweaking/aligning with AML sdk "blindly" on a driver that has otherwise
been working well so far.

Thx

>
> So at this point, I'm not sure how much more effort to invest into
> this. Given the rate is very low and the fallback is it will just
> reset the link and proceed to work, I think the situation would
> already be much better with the solution from that test patch being
> merged. If you propose that as a patch separately, I'm happy to test
> the final submitted patch again and provide feedback there. Or if
> there is another solution to try, I can try with that too.
>
> Thanks
>
>
> Erico


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
  2022-03-09 14:57                       ` Jerome Brunet
  (?)
@ 2022-03-09 20:42                         ` Heiner Kallweit
  -1 siblings, 0 replies; 49+ messages in thread
From: Heiner Kallweit @ 2022-03-09 20:42 UTC (permalink / raw)
  To: Jerome Brunet, Erico Nunes
  Cc: Martin Blumenstingl, Alexandre Torgue, Giuseppe Cavallaro,
	Jose Abreu, Kevin Hilman, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On 09.03.2022 15:57, Jerome Brunet wrote:
> 
> On Wed 09 Mar 2022 at 15:45, Erico Nunes <nunes.erico@gmail.com> wrote:
> 
>> On Sun, Mar 6, 2022 at 1:56 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>> You could try the following (quick and dirty) test patch that fully mimics
>>> the vendor driver as found here:
>>> https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
>>>
>>> First apply
>>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
>>> This patch is in the net tree currently and should show up in linux-next
>>> beginning of the week.
>>>
>>> On top please apply the following (it includes the test patch your working with).
>>
>> I triggered test jobs with this configuration (latest mainline +
>> a502a8f0409 + test patch for vendor driver behaviour), and the results
>> are pretty much the same as with the previous test patch from this
>> thread only.
>> That is, I never got the issue with non-functional link up anymore,
>> but I get the (rare) issue with link not going up.
>> The reproducibility is still extremely low, in the >1% range.
> 
> Low reproducibility means the problem is still there, or at least not
> understood completly.
> 
> I understand the benefit from the user standpoint.
> 
> Heiner if you are going to continue from the test patch you sent,
> I would welcome some explanation with each of the changes.
> 
The latest test patch was purely for checking whether we see any
difference in behavior between vendor driver and the mainlined
version. It's in no way meant to be applied to mainline.

> We know very little about this IP and I'm not very confortable with
> tweaking/aligning with AML sdk "blindly" on a driver that has otherwise
> been working well so far.
> 

This touches one thing I wanted to ask anyway: Supposedly Amlogic
didn't develop an own Ethernet PHY, and if they licensed an existing
IP then it should be similar to some other existing PHY (that may
have a driver in phylib).

Then what I'll do is submit the following small change that brought
the error rate significantly down according to Erico's tests.

-       phy_trigger_machine(phydev);
+       if (irq_status & INTSRC_ANEG_COMPLETE)
+               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
+       else
+               phy_trigger_machine(phydev);


> Thx
> 
>>
>> So at this point, I'm not sure how much more effort to invest into
>> this. Given the rate is very low and the fallback is it will just
>> reset the link and proceed to work, I think the situation would
>> already be much better with the solution from that test patch being
>> merged. If you propose that as a patch separately, I'm happy to test
>> the final submitted patch again and provide feedback there. Or if
>> there is another solution to try, I can try with that too.
>>
>> Thanks
>>
>>
>> Erico
> 

Heiner

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-09 20:42                         ` Heiner Kallweit
  0 siblings, 0 replies; 49+ messages in thread
From: Heiner Kallweit @ 2022-03-09 20:42 UTC (permalink / raw)
  To: Jerome Brunet, Erico Nunes
  Cc: Martin Blumenstingl, Alexandre Torgue, Giuseppe Cavallaro,
	Jose Abreu, Kevin Hilman, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On 09.03.2022 15:57, Jerome Brunet wrote:
> 
> On Wed 09 Mar 2022 at 15:45, Erico Nunes <nunes.erico@gmail.com> wrote:
> 
>> On Sun, Mar 6, 2022 at 1:56 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>> You could try the following (quick and dirty) test patch that fully mimics
>>> the vendor driver as found here:
>>> https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
>>>
>>> First apply
>>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
>>> This patch is in the net tree currently and should show up in linux-next
>>> beginning of the week.
>>>
>>> On top please apply the following (it includes the test patch your working with).
>>
>> I triggered test jobs with this configuration (latest mainline +
>> a502a8f0409 + test patch for vendor driver behaviour), and the results
>> are pretty much the same as with the previous test patch from this
>> thread only.
>> That is, I never got the issue with non-functional link up anymore,
>> but I get the (rare) issue with link not going up.
>> The reproducibility is still extremely low, in the >1% range.
> 
> Low reproducibility means the problem is still there, or at least not
> understood completly.
> 
> I understand the benefit from the user standpoint.
> 
> Heiner if you are going to continue from the test patch you sent,
> I would welcome some explanation with each of the changes.
> 
The latest test patch was purely for checking whether we see any
difference in behavior between vendor driver and the mainlined
version. It's in no way meant to be applied to mainline.

> We know very little about this IP and I'm not very confortable with
> tweaking/aligning with AML sdk "blindly" on a driver that has otherwise
> been working well so far.
> 

This touches one thing I wanted to ask anyway: Supposedly Amlogic
didn't develop an own Ethernet PHY, and if they licensed an existing
IP then it should be similar to some other existing PHY (that may
have a driver in phylib).

Then what I'll do is submit the following small change that brought
the error rate significantly down according to Erico's tests.

-       phy_trigger_machine(phydev);
+       if (irq_status & INTSRC_ANEG_COMPLETE)
+               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
+       else
+               phy_trigger_machine(phydev);


> Thx
> 
>>
>> So at this point, I'm not sure how much more effort to invest into
>> this. Given the rate is very low and the fallback is it will just
>> reset the link and proceed to work, I think the situation would
>> already be much better with the solution from that test patch being
>> merged. If you propose that as a patch separately, I'm happy to test
>> the final submitted patch again and provide feedback there. Or if
>> there is another solution to try, I can try with that too.
>>
>> Thanks
>>
>>
>> Erico
> 

Heiner

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-03-09 20:42                         ` Heiner Kallweit
  0 siblings, 0 replies; 49+ messages in thread
From: Heiner Kallweit @ 2022-03-09 20:42 UTC (permalink / raw)
  To: Jerome Brunet, Erico Nunes
  Cc: Martin Blumenstingl, Alexandre Torgue, Giuseppe Cavallaro,
	Jose Abreu, Kevin Hilman, Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

On 09.03.2022 15:57, Jerome Brunet wrote:
> 
> On Wed 09 Mar 2022 at 15:45, Erico Nunes <nunes.erico@gmail.com> wrote:
> 
>> On Sun, Mar 6, 2022 at 1:56 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>> You could try the following (quick and dirty) test patch that fully mimics
>>> the vendor driver as found here:
>>> https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
>>>
>>> First apply
>>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
>>> This patch is in the net tree currently and should show up in linux-next
>>> beginning of the week.
>>>
>>> On top please apply the following (it includes the test patch your working with).
>>
>> I triggered test jobs with this configuration (latest mainline +
>> a502a8f0409 + test patch for vendor driver behaviour), and the results
>> are pretty much the same as with the previous test patch from this
>> thread only.
>> That is, I never got the issue with non-functional link up anymore,
>> but I get the (rare) issue with link not going up.
>> The reproducibility is still extremely low, in the >1% range.
> 
> Low reproducibility means the problem is still there, or at least not
> understood completly.
> 
> I understand the benefit from the user standpoint.
> 
> Heiner if you are going to continue from the test patch you sent,
> I would welcome some explanation with each of the changes.
> 
The latest test patch was purely for checking whether we see any
difference in behavior between vendor driver and the mainlined
version. It's in no way meant to be applied to mainline.

> We know very little about this IP and I'm not very confortable with
> tweaking/aligning with AML sdk "blindly" on a driver that has otherwise
> been working well so far.
> 

This touches one thing I wanted to ask anyway: Supposedly Amlogic
didn't develop an own Ethernet PHY, and if they licensed an existing
IP then it should be similar to some other existing PHY (that may
have a driver in phylib).

Then what I'll do is submit the following small change that brought
the error rate significantly down according to Erico's tests.

-       phy_trigger_machine(phydev);
+       if (irq_status & INTSRC_ANEG_COMPLETE)
+               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
+       else
+               phy_trigger_machine(phydev);


> Thx
> 
>>
>> So at this point, I'm not sure how much more effort to invest into
>> this. Given the rate is very low and the fallback is it will just
>> reset the link and proceed to work, I think the situation would
>> already be much better with the solution from that test patch being
>> merged. If you propose that as a patch separately, I'm happy to test
>> the final submitted patch again and provide feedback there. Or if
>> there is another solution to try, I can try with that too.
>>
>> Thanks
>>
>>
>> Erico
> 

Heiner

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
       [not found]                         ` <CACdvmAhcyNXViJgk6o6oAoYvAjAg-NFD74Eym_nGHJx3YAqjzw@mail.gmail.com>
  2022-06-13  9:10                             ` Jerome Brunet
@ 2022-06-13  9:10                             ` Jerome Brunet
  0 siblings, 0 replies; 49+ messages in thread
From: Jerome Brunet @ 2022-06-13  9:10 UTC (permalink / raw)
  To: Da Xue, Heiner Kallweit
  Cc: Erico Nunes, Martin Blumenstingl, Alexandre Torgue,
	Giuseppe Cavallaro, Jose Abreu, Kevin Hilman, Neil Armstrong,
	linux-amlogic, netdev, open list:ARM/Rockchip SoC...,
	linux-sunxi


On Sat 11 Jun 2022 at 17:00, Da Xue <da@lessconfused.com> wrote:

> On Wed, Mar 9, 2022 at 3:42 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
>  On 09.03.2022 15:57, Jerome Brunet wrote:
>  > 
>  > On Wed 09 Mar 2022 at 15:45, Erico Nunes <nunes.erico@gmail.com> wrote:
>  > 
>  >> On Sun, Mar 6, 2022 at 1:56 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>  >>> You could try the following (quick and dirty) test patch that fully mimics
>  >>> the vendor driver as found here:
>  >>> https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
>  >>>
>  >>> First apply
>  >>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
>  >>> This patch is in the net tree currently and should show up in linux-next
>  >>> beginning of the week.
>  >>>
>  >>> On top please apply the following (it includes the test patch your working with).
>  >>
>  >> I triggered test jobs with this configuration (latest mainline +
>  >> a502a8f0409 + test patch for vendor driver behaviour), and the results
>  >> are pretty much the same as with the previous test patch from this
>  >> thread only.
>  >> That is, I never got the issue with non-functional link up anymore,
>  >> but I get the (rare) issue with link not going up.
>  >> The reproducibility is still extremely low, in the >1% range.
>  > 
>  > Low reproducibility means the problem is still there, or at least not
>  > understood completly.
>  > 
>  > I understand the benefit from the user standpoint.
>  > 
>  > Heiner if you are going to continue from the test patch you sent,
>  > I would welcome some explanation with each of the changes.
>  > 
>  The latest test patch was purely for checking whether we see any
>  difference in behavior between vendor driver and the mainlined
>  version. It's in no way meant to be applied to mainline.
>
>  > We know very little about this IP and I'm not very confortable with
>  > tweaking/aligning with AML sdk "blindly" on a driver that has otherwise
>  > been working well so far.
>  > 
>
>  This touches one thing I wanted to ask anyway: Supposedly Amlogic
>  didn't develop an own Ethernet PHY, and if they licensed an existing
>  IP then it should be similar to some other existing PHY (that may
>  have a driver in phylib).
>
>  Then what I'll do is submit the following small change that brought
>  the error rate significantly down according to Erico's tests.
>
>  -       phy_trigger_machine(phydev);
>  +       if (irq_status & INTSRC_ANEG_COMPLETE)
>  +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
>  +       else
>  +               phy_trigger_machine(phydev);
>
>  > Thx
>  > 
>  >>
>  >> So at this point, I'm not sure how much more effort to invest into
>  >> this. Given the rate is very low and the fallback is it will just
>  >> reset the link and proceed to work, I think the situation would
>  >> already be much better with the solution from that test patch being
>  >> merged. If you propose that as a patch separately, I'm happy to test
>  >> the final submitted patch again and provide feedback there. Or if
>  >> there is another solution to try, I can try with that too.
>  >>
>  >> Thanks
>  >>
>  >>
>  >> Erico
>  > 
>
>  Heiner
>
> To help reproduce this problem, I have had this problem for as long as I can remember and it still occurs with this patch.

Same here, on both gxl and g12a. Occurrence remains unchanged.
The is even reproduced if the PHY is switched to polling mode so the
merged change, related to the IRQ handling, is very unlikely to fix the
problem.

>
> This doesn't happen on first boot most of the time. It happens on reboot consistently. I have tested with AML-S805X-CC board, AML-S905X-CC V1, and V2 boards.
>

On my side, I confirm the network never seems to get stuck in u-boot but
it might break in Linux, even on the first boot after a power up from
what I have seen so far.

> I am on u-boot 22.04 with 5.18.3 which includes the patch.
> u-boot brings up ethernet on start and can grab an IP.
> Linux brings up ethernet and can grab an IP.
> reboot
> u-boot can grab an IP.
> Linux does not get anything. 
> I have to do ip link set dev eth0 down && up once or more to get ethernet to work again.
> Sometimes it spams meson8b-dwmac c9410000.ethernet eth0: Reset adapter. If it spams this, ethernet is dead and can't be recovered.

I tried several things, none showing any improvement so far
* Make sure LPI/EEE is disabled
* Add the ethernet reset from the main controller on the MAC
* Test the various DMA modes of STMMAC
* Port the differences from u-boot and the vendor kernel in the Phy driver

I have also tried to go back in time, up to v4.19 but the problem is actually
already there. It occurs at lot less though.
Since v5.6+ the occurence is quite high: approx 1 in 4 boots
On v4.19: 1 in 50 boots - up to 150.

>

When the problem happen
* link is reported up
* ifconfig / MAC is claiming to be sending packets (Tx increasing - no Rx)
* I see no traffic with wireshark

The packets are getting lost somewhere. Can't say for sure if it is in
the MAC or the PHY.

> This is fixed via power cycle so I'm assuming some register is not reset or maybe the IP is stuck.
>

`ethtool -r eth0` also seems to work around the problem.
This trigs the restart of so many things, it is close to an un/replug of
the ethernet cable :/

> Best,
> Da Xue


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-06-13  9:10                             ` Jerome Brunet
  0 siblings, 0 replies; 49+ messages in thread
From: Jerome Brunet @ 2022-06-13  9:10 UTC (permalink / raw)
  To: Da Xue, Heiner Kallweit
  Cc: Erico Nunes, Martin Blumenstingl, Alexandre Torgue,
	Giuseppe Cavallaro, Jose Abreu, Kevin Hilman, Neil Armstrong,
	linux-amlogic, netdev, open list:ARM/Rockchip SoC...,
	linux-sunxi


On Sat 11 Jun 2022 at 17:00, Da Xue <da@lessconfused.com> wrote:

> On Wed, Mar 9, 2022 at 3:42 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
>  On 09.03.2022 15:57, Jerome Brunet wrote:
>  > 
>  > On Wed 09 Mar 2022 at 15:45, Erico Nunes <nunes.erico@gmail.com> wrote:
>  > 
>  >> On Sun, Mar 6, 2022 at 1:56 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>  >>> You could try the following (quick and dirty) test patch that fully mimics
>  >>> the vendor driver as found here:
>  >>> https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
>  >>>
>  >>> First apply
>  >>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
>  >>> This patch is in the net tree currently and should show up in linux-next
>  >>> beginning of the week.
>  >>>
>  >>> On top please apply the following (it includes the test patch your working with).
>  >>
>  >> I triggered test jobs with this configuration (latest mainline +
>  >> a502a8f0409 + test patch for vendor driver behaviour), and the results
>  >> are pretty much the same as with the previous test patch from this
>  >> thread only.
>  >> That is, I never got the issue with non-functional link up anymore,
>  >> but I get the (rare) issue with link not going up.
>  >> The reproducibility is still extremely low, in the >1% range.
>  > 
>  > Low reproducibility means the problem is still there, or at least not
>  > understood completly.
>  > 
>  > I understand the benefit from the user standpoint.
>  > 
>  > Heiner if you are going to continue from the test patch you sent,
>  > I would welcome some explanation with each of the changes.
>  > 
>  The latest test patch was purely for checking whether we see any
>  difference in behavior between vendor driver and the mainlined
>  version. It's in no way meant to be applied to mainline.
>
>  > We know very little about this IP and I'm not very confortable with
>  > tweaking/aligning with AML sdk "blindly" on a driver that has otherwise
>  > been working well so far.
>  > 
>
>  This touches one thing I wanted to ask anyway: Supposedly Amlogic
>  didn't develop an own Ethernet PHY, and if they licensed an existing
>  IP then it should be similar to some other existing PHY (that may
>  have a driver in phylib).
>
>  Then what I'll do is submit the following small change that brought
>  the error rate significantly down according to Erico's tests.
>
>  -       phy_trigger_machine(phydev);
>  +       if (irq_status & INTSRC_ANEG_COMPLETE)
>  +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
>  +       else
>  +               phy_trigger_machine(phydev);
>
>  > Thx
>  > 
>  >>
>  >> So at this point, I'm not sure how much more effort to invest into
>  >> this. Given the rate is very low and the fallback is it will just
>  >> reset the link and proceed to work, I think the situation would
>  >> already be much better with the solution from that test patch being
>  >> merged. If you propose that as a patch separately, I'm happy to test
>  >> the final submitted patch again and provide feedback there. Or if
>  >> there is another solution to try, I can try with that too.
>  >>
>  >> Thanks
>  >>
>  >>
>  >> Erico
>  > 
>
>  Heiner
>
> To help reproduce this problem, I have had this problem for as long as I can remember and it still occurs with this patch.

Same here, on both gxl and g12a. Occurrence remains unchanged.
The is even reproduced if the PHY is switched to polling mode so the
merged change, related to the IRQ handling, is very unlikely to fix the
problem.

>
> This doesn't happen on first boot most of the time. It happens on reboot consistently. I have tested with AML-S805X-CC board, AML-S905X-CC V1, and V2 boards.
>

On my side, I confirm the network never seems to get stuck in u-boot but
it might break in Linux, even on the first boot after a power up from
what I have seen so far.

> I am on u-boot 22.04 with 5.18.3 which includes the patch.
> u-boot brings up ethernet on start and can grab an IP.
> Linux brings up ethernet and can grab an IP.
> reboot
> u-boot can grab an IP.
> Linux does not get anything. 
> I have to do ip link set dev eth0 down && up once or more to get ethernet to work again.
> Sometimes it spams meson8b-dwmac c9410000.ethernet eth0: Reset adapter. If it spams this, ethernet is dead and can't be recovered.

I tried several things, none showing any improvement so far
* Make sure LPI/EEE is disabled
* Add the ethernet reset from the main controller on the MAC
* Test the various DMA modes of STMMAC
* Port the differences from u-boot and the vendor kernel in the Phy driver

I have also tried to go back in time, up to v4.19 but the problem is actually
already there. It occurs at lot less though.
Since v5.6+ the occurence is quite high: approx 1 in 4 boots
On v4.19: 1 in 50 boots - up to 150.

>

When the problem happen
* link is reported up
* ifconfig / MAC is claiming to be sending packets (Tx increasing - no Rx)
* I see no traffic with wireshark

The packets are getting lost somewhere. Can't say for sure if it is in
the MAC or the PHY.

> This is fixed via power cycle so I'm assuming some register is not reset or maybe the IP is stuck.
>

`ethtool -r eth0` also seems to work around the problem.
This trigs the restart of so many things, it is close to an un/replug of
the ethernet cable :/

> Best,
> Da Xue


_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-06-13  9:10                             ` Jerome Brunet
  0 siblings, 0 replies; 49+ messages in thread
From: Jerome Brunet @ 2022-06-13  9:10 UTC (permalink / raw)
  To: Da Xue, Heiner Kallweit
  Cc: Erico Nunes, Martin Blumenstingl, Alexandre Torgue,
	Giuseppe Cavallaro, Jose Abreu, Kevin Hilman, Neil Armstrong,
	linux-amlogic, netdev, open list:ARM/Rockchip SoC...,
	linux-sunxi


On Sat 11 Jun 2022 at 17:00, Da Xue <da@lessconfused.com> wrote:

> On Wed, Mar 9, 2022 at 3:42 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
>  On 09.03.2022 15:57, Jerome Brunet wrote:
>  > 
>  > On Wed 09 Mar 2022 at 15:45, Erico Nunes <nunes.erico@gmail.com> wrote:
>  > 
>  >> On Sun, Mar 6, 2022 at 1:56 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>  >>> You could try the following (quick and dirty) test patch that fully mimics
>  >>> the vendor driver as found here:
>  >>> https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
>  >>>
>  >>> First apply
>  >>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
>  >>> This patch is in the net tree currently and should show up in linux-next
>  >>> beginning of the week.
>  >>>
>  >>> On top please apply the following (it includes the test patch your working with).
>  >>
>  >> I triggered test jobs with this configuration (latest mainline +
>  >> a502a8f0409 + test patch for vendor driver behaviour), and the results
>  >> are pretty much the same as with the previous test patch from this
>  >> thread only.
>  >> That is, I never got the issue with non-functional link up anymore,
>  >> but I get the (rare) issue with link not going up.
>  >> The reproducibility is still extremely low, in the >1% range.
>  > 
>  > Low reproducibility means the problem is still there, or at least not
>  > understood completly.
>  > 
>  > I understand the benefit from the user standpoint.
>  > 
>  > Heiner if you are going to continue from the test patch you sent,
>  > I would welcome some explanation with each of the changes.
>  > 
>  The latest test patch was purely for checking whether we see any
>  difference in behavior between vendor driver and the mainlined
>  version. It's in no way meant to be applied to mainline.
>
>  > We know very little about this IP and I'm not very confortable with
>  > tweaking/aligning with AML sdk "blindly" on a driver that has otherwise
>  > been working well so far.
>  > 
>
>  This touches one thing I wanted to ask anyway: Supposedly Amlogic
>  didn't develop an own Ethernet PHY, and if they licensed an existing
>  IP then it should be similar to some other existing PHY (that may
>  have a driver in phylib).
>
>  Then what I'll do is submit the following small change that brought
>  the error rate significantly down according to Erico's tests.
>
>  -       phy_trigger_machine(phydev);
>  +       if (irq_status & INTSRC_ANEG_COMPLETE)
>  +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
>  +       else
>  +               phy_trigger_machine(phydev);
>
>  > Thx
>  > 
>  >>
>  >> So at this point, I'm not sure how much more effort to invest into
>  >> this. Given the rate is very low and the fallback is it will just
>  >> reset the link and proceed to work, I think the situation would
>  >> already be much better with the solution from that test patch being
>  >> merged. If you propose that as a patch separately, I'm happy to test
>  >> the final submitted patch again and provide feedback there. Or if
>  >> there is another solution to try, I can try with that too.
>  >>
>  >> Thanks
>  >>
>  >>
>  >> Erico
>  > 
>
>  Heiner
>
> To help reproduce this problem, I have had this problem for as long as I can remember and it still occurs with this patch.

Same here, on both gxl and g12a. Occurrence remains unchanged.
The is even reproduced if the PHY is switched to polling mode so the
merged change, related to the IRQ handling, is very unlikely to fix the
problem.

>
> This doesn't happen on first boot most of the time. It happens on reboot consistently. I have tested with AML-S805X-CC board, AML-S905X-CC V1, and V2 boards.
>

On my side, I confirm the network never seems to get stuck in u-boot but
it might break in Linux, even on the first boot after a power up from
what I have seen so far.

> I am on u-boot 22.04 with 5.18.3 which includes the patch.
> u-boot brings up ethernet on start and can grab an IP.
> Linux brings up ethernet and can grab an IP.
> reboot
> u-boot can grab an IP.
> Linux does not get anything. 
> I have to do ip link set dev eth0 down && up once or more to get ethernet to work again.
> Sometimes it spams meson8b-dwmac c9410000.ethernet eth0: Reset adapter. If it spams this, ethernet is dead and can't be recovered.

I tried several things, none showing any improvement so far
* Make sure LPI/EEE is disabled
* Add the ethernet reset from the main controller on the MAC
* Test the various DMA modes of STMMAC
* Port the differences from u-boot and the vendor kernel in the Phy driver

I have also tried to go back in time, up to v4.19 but the problem is actually
already there. It occurs at lot less though.
Since v5.6+ the occurence is quite high: approx 1 in 4 boots
On v4.19: 1 in 50 boots - up to 150.

>

When the problem happen
* link is reported up
* ifconfig / MAC is claiming to be sending packets (Tx increasing - no Rx)
* I see no traffic with wireshark

The packets are getting lost somewhere. Can't say for sure if it is in
the MAC or the PHY.

> This is fixed via power cycle so I'm assuming some register is not reset or maybe the IP is stuck.
>

`ethtool -r eth0` also seems to work around the problem.
This trigs the restart of so many things, it is close to an un/replug of
the ethernet cable :/

> Best,
> Da Xue


_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
  2022-06-13  9:10                             ` Jerome Brunet
  (?)
@ 2022-07-15  5:35                               ` Anand Moon
  -1 siblings, 0 replies; 49+ messages in thread
From: Anand Moon @ 2022-07-15  5:35 UTC (permalink / raw)
  To: Jerome Brunet
  Cc: Da Xue, Heiner Kallweit, Erico Nunes, Martin Blumenstingl,
	Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

Hi Jerome

On Mon, 13 Jun 2022 at 15:10, Jerome Brunet <jbrunet@baylibre.com> wrote:
>
>
> On Sat 11 Jun 2022 at 17:00, Da Xue <da@lessconfused.com> wrote:
>
> > On Wed, Mar 9, 2022 at 3:42 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >
> >  On 09.03.2022 15:57, Jerome Brunet wrote:
> >  >
> >  > On Wed 09 Mar 2022 at 15:45, Erico Nunes <nunes.erico@gmail.com> wrote:
> >  >
> >  >> On Sun, Mar 6, 2022 at 1:56 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >  >>> You could try the following (quick and dirty) test patch that fully mimics
> >  >>> the vendor driver as found here:
> >  >>> https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
> >  >>>
> >  >>> First apply
> >  >>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
> >  >>> This patch is in the net tree currently and should show up in linux-next
> >  >>> beginning of the week.
> >  >>>
> >  >>> On top please apply the following (it includes the test patch your working with).
> >  >>
> >  >> I triggered test jobs with this configuration (latest mainline +
> >  >> a502a8f0409 + test patch for vendor driver behaviour), and the results
> >  >> are pretty much the same as with the previous test patch from this
> >  >> thread only.
> >  >> That is, I never got the issue with non-functional link up anymore,
> >  >> but I get the (rare) issue with link not going up.
> >  >> The reproducibility is still extremely low, in the >1% range.
> >  >
> >  > Low reproducibility means the problem is still there, or at least not
> >  > understood completly.
> >  >
> >  > I understand the benefit from the user standpoint.
> >  >
> >  > Heiner if you are going to continue from the test patch you sent,
> >  > I would welcome some explanation with each of the changes.
> >  >
> >  The latest test patch was purely for checking whether we see any
> >  difference in behavior between vendor driver and the mainlined
> >  version. It's in no way meant to be applied to mainline.
> >
> >  > We know very little about this IP and I'm not very confortable with
> >  > tweaking/aligning with AML sdk "blindly" on a driver that has otherwise
> >  > been working well so far.
> >  >
> >
> >  This touches one thing I wanted to ask anyway: Supposedly Amlogic
> >  didn't develop an own Ethernet PHY, and if they licensed an existing
> >  IP then it should be similar to some other existing PHY (that may
> >  have a driver in phylib).
> >
> >  Then what I'll do is submit the following small change that brought
> >  the error rate significantly down according to Erico's tests.
> >
> >  -       phy_trigger_machine(phydev);
> >  +       if (irq_status & INTSRC_ANEG_COMPLETE)
> >  +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
> >  +       else
> >  +               phy_trigger_machine(phydev);
> >
> >  > Thx
> >  >
> >  >>
> >  >> So at this point, I'm not sure how much more effort to invest into
> >  >> this. Given the rate is very low and the fallback is it will just
> >  >> reset the link and proceed to work, I think the situation would
> >  >> already be much better with the solution from that test patch being
> >  >> merged. If you propose that as a patch separately, I'm happy to test
> >  >> the final submitted patch again and provide feedback there. Or if
> >  >> there is another solution to try, I can try with that too.
> >  >>
> >  >> Thanks
> >  >>
> >  >>
> >  >> Erico
> >  >
> >
> >  Heiner
> >
> > To help reproduce this problem, I have had this problem for as long as I can remember and it still occurs with this patch.
>
> Same here, on both gxl and g12a. Occurrence remains unchanged.
> The is even reproduced if the PHY is switched to polling mode so the
> merged change, related to the IRQ handling, is very unlikely to fix the
> problem.
>
> >
> > This doesn't happen on first boot most of the time. It happens on reboot consistently. I have tested with AML-S805X-CC board, AML-S905X-CC V1, and V2 boards.
> >
>
> On my side, I confirm the network never seems to get stuck in u-boot but
> it might break in Linux, even on the first boot after a power up from
> what I have seen so far.
>
> > I am on u-boot 22.04 with 5.18.3 which includes the patch.
> > u-boot brings up ethernet on start and can grab an IP.
> > Linux brings up ethernet and can grab an IP.
> > reboot
> > u-boot can grab an IP.
> > Linux does not get anything.
> > I have to do ip link set dev eth0 down && up once or more to get ethernet to work again.
> > Sometimes it spams meson8b-dwmac c9410000.ethernet eth0: Reset adapter. If it spams this, ethernet is dead and can't be recovered.
>
> I tried several things, none showing any improvement so far
> * Make sure LPI/EEE is disabled
> * Add the ethernet reset from the main controller on the MAC
> * Test the various DMA modes of STMMAC
> * Port the differences from u-boot and the vendor kernel in the Phy driver
>
> I have also tried to go back in time, up to v4.19 but the problem is actually
> already there. It occurs at lot less though.
> Since v5.6+ the occurence is quite high: approx 1 in 4 boots
> On v4.19: 1 in 50 boots - up to 150.
>
> >
>
> When the problem happen
> * link is reported up
> * ifconfig / MAC is claiming to be sending packets (Tx increasing - no Rx)
> * I see no traffic with wireshark
>
> The packets are getting lost somewhere. Can't say for sure if it is in
> the MAC or the PHY.
>
> > This is fixed via power cycle so I'm assuming some register is not reset or maybe the IP is stuck.
> >
>
> `ethtool -r eth0` also seems to work around the problem.
> This trigs the restart of so many things, it is close to an un/replug of
> the ethernet cable :/
>

Have you give a try for setting up a regulator for ethernet and
implementing runtime power management

Best Regards
-Anand
> > Best,
> > Da Xue
>
>
> _______________________________________________
> Linux-rockchip mailing list
> Linux-rockchip@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-07-15  5:35                               ` Anand Moon
  0 siblings, 0 replies; 49+ messages in thread
From: Anand Moon @ 2022-07-15  5:35 UTC (permalink / raw)
  To: Jerome Brunet
  Cc: Da Xue, Heiner Kallweit, Erico Nunes, Martin Blumenstingl,
	Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

Hi Jerome

On Mon, 13 Jun 2022 at 15:10, Jerome Brunet <jbrunet@baylibre.com> wrote:
>
>
> On Sat 11 Jun 2022 at 17:00, Da Xue <da@lessconfused.com> wrote:
>
> > On Wed, Mar 9, 2022 at 3:42 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >
> >  On 09.03.2022 15:57, Jerome Brunet wrote:
> >  >
> >  > On Wed 09 Mar 2022 at 15:45, Erico Nunes <nunes.erico@gmail.com> wrote:
> >  >
> >  >> On Sun, Mar 6, 2022 at 1:56 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >  >>> You could try the following (quick and dirty) test patch that fully mimics
> >  >>> the vendor driver as found here:
> >  >>> https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
> >  >>>
> >  >>> First apply
> >  >>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
> >  >>> This patch is in the net tree currently and should show up in linux-next
> >  >>> beginning of the week.
> >  >>>
> >  >>> On top please apply the following (it includes the test patch your working with).
> >  >>
> >  >> I triggered test jobs with this configuration (latest mainline +
> >  >> a502a8f0409 + test patch for vendor driver behaviour), and the results
> >  >> are pretty much the same as with the previous test patch from this
> >  >> thread only.
> >  >> That is, I never got the issue with non-functional link up anymore,
> >  >> but I get the (rare) issue with link not going up.
> >  >> The reproducibility is still extremely low, in the >1% range.
> >  >
> >  > Low reproducibility means the problem is still there, or at least not
> >  > understood completly.
> >  >
> >  > I understand the benefit from the user standpoint.
> >  >
> >  > Heiner if you are going to continue from the test patch you sent,
> >  > I would welcome some explanation with each of the changes.
> >  >
> >  The latest test patch was purely for checking whether we see any
> >  difference in behavior between vendor driver and the mainlined
> >  version. It's in no way meant to be applied to mainline.
> >
> >  > We know very little about this IP and I'm not very confortable with
> >  > tweaking/aligning with AML sdk "blindly" on a driver that has otherwise
> >  > been working well so far.
> >  >
> >
> >  This touches one thing I wanted to ask anyway: Supposedly Amlogic
> >  didn't develop an own Ethernet PHY, and if they licensed an existing
> >  IP then it should be similar to some other existing PHY (that may
> >  have a driver in phylib).
> >
> >  Then what I'll do is submit the following small change that brought
> >  the error rate significantly down according to Erico's tests.
> >
> >  -       phy_trigger_machine(phydev);
> >  +       if (irq_status & INTSRC_ANEG_COMPLETE)
> >  +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
> >  +       else
> >  +               phy_trigger_machine(phydev);
> >
> >  > Thx
> >  >
> >  >>
> >  >> So at this point, I'm not sure how much more effort to invest into
> >  >> this. Given the rate is very low and the fallback is it will just
> >  >> reset the link and proceed to work, I think the situation would
> >  >> already be much better with the solution from that test patch being
> >  >> merged. If you propose that as a patch separately, I'm happy to test
> >  >> the final submitted patch again and provide feedback there. Or if
> >  >> there is another solution to try, I can try with that too.
> >  >>
> >  >> Thanks
> >  >>
> >  >>
> >  >> Erico
> >  >
> >
> >  Heiner
> >
> > To help reproduce this problem, I have had this problem for as long as I can remember and it still occurs with this patch.
>
> Same here, on both gxl and g12a. Occurrence remains unchanged.
> The is even reproduced if the PHY is switched to polling mode so the
> merged change, related to the IRQ handling, is very unlikely to fix the
> problem.
>
> >
> > This doesn't happen on first boot most of the time. It happens on reboot consistently. I have tested with AML-S805X-CC board, AML-S905X-CC V1, and V2 boards.
> >
>
> On my side, I confirm the network never seems to get stuck in u-boot but
> it might break in Linux, even on the first boot after a power up from
> what I have seen so far.
>
> > I am on u-boot 22.04 with 5.18.3 which includes the patch.
> > u-boot brings up ethernet on start and can grab an IP.
> > Linux brings up ethernet and can grab an IP.
> > reboot
> > u-boot can grab an IP.
> > Linux does not get anything.
> > I have to do ip link set dev eth0 down && up once or more to get ethernet to work again.
> > Sometimes it spams meson8b-dwmac c9410000.ethernet eth0: Reset adapter. If it spams this, ethernet is dead and can't be recovered.
>
> I tried several things, none showing any improvement so far
> * Make sure LPI/EEE is disabled
> * Add the ethernet reset from the main controller on the MAC
> * Test the various DMA modes of STMMAC
> * Port the differences from u-boot and the vendor kernel in the Phy driver
>
> I have also tried to go back in time, up to v4.19 but the problem is actually
> already there. It occurs at lot less though.
> Since v5.6+ the occurence is quite high: approx 1 in 4 boots
> On v4.19: 1 in 50 boots - up to 150.
>
> >
>
> When the problem happen
> * link is reported up
> * ifconfig / MAC is claiming to be sending packets (Tx increasing - no Rx)
> * I see no traffic with wireshark
>
> The packets are getting lost somewhere. Can't say for sure if it is in
> the MAC or the PHY.
>
> > This is fixed via power cycle so I'm assuming some register is not reset or maybe the IP is stuck.
> >
>
> `ethtool -r eth0` also seems to work around the problem.
> This trigs the restart of so many things, it is close to an un/replug of
> the ethernet cable :/
>

Have you give a try for setting up a regulator for ethernet and
implementing runtime power management

Best Regards
-Anand
> > Best,
> > Da Xue
>
>
> _______________________________________________
> Linux-rockchip mailing list
> Linux-rockchip@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip

_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot
@ 2022-07-15  5:35                               ` Anand Moon
  0 siblings, 0 replies; 49+ messages in thread
From: Anand Moon @ 2022-07-15  5:35 UTC (permalink / raw)
  To: Jerome Brunet
  Cc: Da Xue, Heiner Kallweit, Erico Nunes, Martin Blumenstingl,
	Alexandre Torgue, Giuseppe Cavallaro, Jose Abreu, Kevin Hilman,
	Neil Armstrong, linux-amlogic, netdev,
	open list:ARM/Rockchip SoC...,
	linux-sunxi

Hi Jerome

On Mon, 13 Jun 2022 at 15:10, Jerome Brunet <jbrunet@baylibre.com> wrote:
>
>
> On Sat 11 Jun 2022 at 17:00, Da Xue <da@lessconfused.com> wrote:
>
> > On Wed, Mar 9, 2022 at 3:42 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >
> >  On 09.03.2022 15:57, Jerome Brunet wrote:
> >  >
> >  > On Wed 09 Mar 2022 at 15:45, Erico Nunes <nunes.erico@gmail.com> wrote:
> >  >
> >  >> On Sun, Mar 6, 2022 at 1:56 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >  >>> You could try the following (quick and dirty) test patch that fully mimics
> >  >>> the vendor driver as found here:
> >  >>> https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
> >  >>>
> >  >>> First apply
> >  >>> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
> >  >>> This patch is in the net tree currently and should show up in linux-next
> >  >>> beginning of the week.
> >  >>>
> >  >>> On top please apply the following (it includes the test patch your working with).
> >  >>
> >  >> I triggered test jobs with this configuration (latest mainline +
> >  >> a502a8f0409 + test patch for vendor driver behaviour), and the results
> >  >> are pretty much the same as with the previous test patch from this
> >  >> thread only.
> >  >> That is, I never got the issue with non-functional link up anymore,
> >  >> but I get the (rare) issue with link not going up.
> >  >> The reproducibility is still extremely low, in the >1% range.
> >  >
> >  > Low reproducibility means the problem is still there, or at least not
> >  > understood completly.
> >  >
> >  > I understand the benefit from the user standpoint.
> >  >
> >  > Heiner if you are going to continue from the test patch you sent,
> >  > I would welcome some explanation with each of the changes.
> >  >
> >  The latest test patch was purely for checking whether we see any
> >  difference in behavior between vendor driver and the mainlined
> >  version. It's in no way meant to be applied to mainline.
> >
> >  > We know very little about this IP and I'm not very confortable with
> >  > tweaking/aligning with AML sdk "blindly" on a driver that has otherwise
> >  > been working well so far.
> >  >
> >
> >  This touches one thing I wanted to ask anyway: Supposedly Amlogic
> >  didn't develop an own Ethernet PHY, and if they licensed an existing
> >  IP then it should be similar to some other existing PHY (that may
> >  have a driver in phylib).
> >
> >  Then what I'll do is submit the following small change that brought
> >  the error rate significantly down according to Erico's tests.
> >
> >  -       phy_trigger_machine(phydev);
> >  +       if (irq_status & INTSRC_ANEG_COMPLETE)
> >  +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
> >  +       else
> >  +               phy_trigger_machine(phydev);
> >
> >  > Thx
> >  >
> >  >>
> >  >> So at this point, I'm not sure how much more effort to invest into
> >  >> this. Given the rate is very low and the fallback is it will just
> >  >> reset the link and proceed to work, I think the situation would
> >  >> already be much better with the solution from that test patch being
> >  >> merged. If you propose that as a patch separately, I'm happy to test
> >  >> the final submitted patch again and provide feedback there. Or if
> >  >> there is another solution to try, I can try with that too.
> >  >>
> >  >> Thanks
> >  >>
> >  >>
> >  >> Erico
> >  >
> >
> >  Heiner
> >
> > To help reproduce this problem, I have had this problem for as long as I can remember and it still occurs with this patch.
>
> Same here, on both gxl and g12a. Occurrence remains unchanged.
> The is even reproduced if the PHY is switched to polling mode so the
> merged change, related to the IRQ handling, is very unlikely to fix the
> problem.
>
> >
> > This doesn't happen on first boot most of the time. It happens on reboot consistently. I have tested with AML-S805X-CC board, AML-S905X-CC V1, and V2 boards.
> >
>
> On my side, I confirm the network never seems to get stuck in u-boot but
> it might break in Linux, even on the first boot after a power up from
> what I have seen so far.
>
> > I am on u-boot 22.04 with 5.18.3 which includes the patch.
> > u-boot brings up ethernet on start and can grab an IP.
> > Linux brings up ethernet and can grab an IP.
> > reboot
> > u-boot can grab an IP.
> > Linux does not get anything.
> > I have to do ip link set dev eth0 down && up once or more to get ethernet to work again.
> > Sometimes it spams meson8b-dwmac c9410000.ethernet eth0: Reset adapter. If it spams this, ethernet is dead and can't be recovered.
>
> I tried several things, none showing any improvement so far
> * Make sure LPI/EEE is disabled
> * Add the ethernet reset from the main controller on the MAC
> * Test the various DMA modes of STMMAC
> * Port the differences from u-boot and the vendor kernel in the Phy driver
>
> I have also tried to go back in time, up to v4.19 but the problem is actually
> already there. It occurs at lot less though.
> Since v5.6+ the occurence is quite high: approx 1 in 4 boots
> On v4.19: 1 in 50 boots - up to 150.
>
> >
>
> When the problem happen
> * link is reported up
> * ifconfig / MAC is claiming to be sending packets (Tx increasing - no Rx)
> * I see no traffic with wireshark
>
> The packets are getting lost somewhere. Can't say for sure if it is in
> the MAC or the PHY.
>
> > This is fixed via power cycle so I'm assuming some register is not reset or maybe the IP is stuck.
> >
>
> `ethtool -r eth0` also seems to work around the problem.
> This trigs the restart of so many things, it is close to an un/replug of
> the ethernet cable :/
>

Have you give a try for setting up a regulator for ethernet and
implementing runtime power management

Best Regards
-Anand
> > Best,
> > Da Xue
>
>
> _______________________________________________
> Linux-rockchip mailing list
> Linux-rockchip@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip

_______________________________________________
linux-amlogic mailing list
linux-amlogic@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-amlogic

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2022-07-15  5:36 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-02 20:18 net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot Erico Nunes
2022-02-02 20:18 ` Erico Nunes
2022-02-03 13:53 ` Vyacheslav
2022-02-03 13:53   ` Vyacheslav
2022-02-07 10:41 ` Jerome Brunet
2022-02-07 10:41   ` Jerome Brunet
2022-02-07 10:41   ` Jerome Brunet
2022-02-20 16:51   ` Erico Nunes
2022-02-20 16:51     ` Erico Nunes
2022-02-20 16:51     ` Erico Nunes
2022-02-22  2:30     ` Samuel Holland
2022-02-22  2:30       ` Samuel Holland
2022-02-22  2:30       ` Samuel Holland
2022-02-26 13:53     ` Heiner Kallweit
2022-02-26 13:53       ` Heiner Kallweit
2022-02-26 13:53       ` Heiner Kallweit
2022-03-02 10:33       ` Erico Nunes
2022-03-02 10:33         ` Erico Nunes
2022-03-02 10:33         ` Erico Nunes
2022-03-02 11:01         ` Heiner Kallweit
2022-03-02 11:01           ` Heiner Kallweit
2022-03-02 11:01           ` Heiner Kallweit
2022-03-02 13:39           ` Jerome Brunet
2022-03-02 13:39             ` Jerome Brunet
2022-03-02 13:39             ` Jerome Brunet
2022-03-02 16:34             ` Heiner Kallweit
2022-03-02 16:34               ` Heiner Kallweit
2022-03-02 16:34               ` Heiner Kallweit
2022-03-06  9:40               ` Erico Nunes
2022-03-06  9:40                 ` Erico Nunes
2022-03-06  9:40                 ` Erico Nunes
2022-03-06 12:56                 ` Heiner Kallweit
2022-03-06 12:56                   ` Heiner Kallweit
2022-03-06 12:56                   ` Heiner Kallweit
2022-03-09 14:45                   ` Erico Nunes
2022-03-09 14:45                     ` Erico Nunes
2022-03-09 14:45                     ` Erico Nunes
2022-03-09 14:57                     ` Jerome Brunet
2022-03-09 14:57                       ` Jerome Brunet
2022-03-09 14:57                       ` Jerome Brunet
2022-03-09 20:42                       ` Heiner Kallweit
2022-03-09 20:42                         ` Heiner Kallweit
2022-03-09 20:42                         ` Heiner Kallweit
     [not found]                         ` <CACdvmAhcyNXViJgk6o6oAoYvAjAg-NFD74Eym_nGHJx3YAqjzw@mail.gmail.com>
2022-06-13  9:10                           ` Jerome Brunet
2022-06-13  9:10                             ` Jerome Brunet
2022-06-13  9:10                             ` Jerome Brunet
2022-07-15  5:35                             ` Anand Moon
2022-07-15  5:35                               ` Anand Moon
2022-07-15  5:35                               ` Anand Moon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.