All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] mv88e6xxx: tx regression in v5.3
@ 2019-12-11  9:48 Baruch Siach
  2019-12-11 13:11 ` Andrew Lunn
  0 siblings, 1 reply; 27+ messages in thread
From: Baruch Siach @ 2019-12-11  9:48 UTC (permalink / raw)
  To: Andrew Lunn, Vivien Didelot; +Cc: netdev, Denis Odintsov

Hi Andrew, Vivien,

Since kernel v5.3 (tested v5.3.15), the 88E6141 switch on SolidRun
Clearfog GT-8K stopped transmitting packets on switch connected
ports. Kernel v5.2 works fine (tested v5.2.21).

Here are the relevant kernel v5.3 log lines:

[    2.867424] mv88e6085 f412a200.mdio-mii:04: switch 0x3400 detected: Marvell 88E6141, revision 0
[    2.927445] libphy: mdio: probed
[    3.578496] mv88e6085 f412a200.mdio-mii:04 lan2 (uninitialized): PHY [!cp1!config-space@f4000000!mdio@12a200!switch0@4!mdio:11] driver [Marvell 88E6390]
[    3.595674] mv88e6085 f412a200.mdio-mii:04 lan1 (uninitialized): PHY [!cp1!config-space@f4000000!mdio@12a200!switch0@4!mdio:12] driver [Marvell 88E6390]
[    3.612797] mv88e6085 f412a200.mdio-mii:04 lan4 (uninitialized): PHY [!cp1!config-space@f4000000!mdio@12a200!switch0@4!mdio:13] driver [Marvell 88E6390]
[    3.629910] mv88e6085 f412a200.mdio-mii:04 lan3 (uninitialized): PHY [!cp1!config-space@f4000000!mdio@12a200!switch0@4!mdio:14] driver [Marvell 88E6390]
[    3.646049] mv88e6085 f412a200.mdio-mii:04: configuring for phy/ link mode
[    3.654451] DSA: tree 0 setup
...
[   10.784521] mvpp2 f4000000.ethernet eth2: configuring for fixed/2500base-x link mode
[   10.792401] mvpp2 f4000000.ethernet eth2: Link is Up - 2.5Gbps/Full - flow control off
[   19.817981] mv88e6085 f412a200.mdio-mii:04 lan1: configuring for phy/ link mode
[   19.827083] 8021q: adding VLAN 0 to HW filter on device lan1
[   21.577276] mv88e6085 f412a200.mdio-mii:04 lan1: Link is Up - 100Mbps/Full - flow control rx/tx
[   21.586030] IPv6: ADDRCONF(NETDEV_CHANGE): lan1: link becomes ready

The Tx count on the lan1 interface increments, but the ARP packets don't
show on the network.

Do you have any idea?

Thanks,
baruch

--
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-11  9:48 [BUG] mv88e6xxx: tx regression in v5.3 Baruch Siach
@ 2019-12-11 13:11 ` Andrew Lunn
  2019-12-11 17:07   ` Baruch Siach
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Lunn @ 2019-12-11 13:11 UTC (permalink / raw)
  To: Baruch Siach; +Cc: Vivien Didelot, netdev, Denis Odintsov

On Wed, Dec 11, 2019 at 11:48:57AM +0200, Baruch Siach wrote:
> Hi Andrew, Vivien,
> 
> Since kernel v5.3 (tested v5.3.15), the 88E6141 switch on SolidRun
> Clearfog GT-8K stopped transmitting packets on switch connected
> ports. Kernel v5.2 works fine (tested v5.2.21).
> 
> Here are the relevant kernel v5.3 log lines:
> 
> [    2.867424] mv88e6085 f412a200.mdio-mii:04: switch 0x3400 detected: Marvell 88E6141, revision 0
> [    2.927445] libphy: mdio: probed
> [    3.578496] mv88e6085 f412a200.mdio-mii:04 lan2 (uninitialized): PHY [!cp1!config-space@f4000000!mdio@12a200!switch0@4!mdio:11] driver [Marvell 88E6390]
> [    3.595674] mv88e6085 f412a200.mdio-mii:04 lan1 (uninitialized): PHY [!cp1!config-space@f4000000!mdio@12a200!switch0@4!mdio:12] driver [Marvell 88E6390]
> [    3.612797] mv88e6085 f412a200.mdio-mii:04 lan4 (uninitialized): PHY [!cp1!config-space@f4000000!mdio@12a200!switch0@4!mdio:13] driver [Marvell 88E6390]
> [    3.629910] mv88e6085 f412a200.mdio-mii:04 lan3 (uninitialized): PHY [!cp1!config-space@f4000000!mdio@12a200!switch0@4!mdio:14] driver [Marvell 88E6390]
> [    3.646049] mv88e6085 f412a200.mdio-mii:04: configuring for phy/ link mode
> [    3.654451] DSA: tree 0 setup
> ...
> [   10.784521] mvpp2 f4000000.ethernet eth2: configuring for fixed/2500base-x link mode
> [   10.792401] mvpp2 f4000000.ethernet eth2: Link is Up - 2.5Gbps/Full - flow control off
> [   19.817981] mv88e6085 f412a200.mdio-mii:04 lan1: configuring for phy/ link mode
> [   19.827083] 8021q: adding VLAN 0 to HW filter on device lan1
> [   21.577276] mv88e6085 f412a200.mdio-mii:04 lan1: Link is Up - 100Mbps/Full - flow control rx/tx
> [   21.586030] IPv6: ADDRCONF(NETDEV_CHANGE): lan1: link becomes ready
> 
> The Tx count on the lan1 interface increments, but the ARP packets don't 
> show on the network.

Hi Baruch

I don't know of an issues.

If the MAC TX counter increases, it sounds like it is a PHY issue?
Does 100Mbps/Full make sense for this link?

Probably your best bet is to do a git bisect to find which commit
broke it.

      Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-11 13:11 ` Andrew Lunn
@ 2019-12-11 17:07   ` Baruch Siach
  2019-12-11 17:49     ` Andrew Lunn
  0 siblings, 1 reply; 27+ messages in thread
From: Baruch Siach @ 2019-12-11 17:07 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

Hi Andrew,

On Wed, Dec 11 2019, Andrew Lunn wrote:

> On Wed, Dec 11, 2019 at 11:48:57AM +0200, Baruch Siach wrote:
>> Hi Andrew, Vivien,
>> 
>> Since kernel v5.3 (tested v5.3.15), the 88E6141 switch on SolidRun
>> Clearfog GT-8K stopped transmitting packets on switch connected
>> ports. Kernel v5.2 works fine (tested v5.2.21).
>> 
>> Here are the relevant kernel v5.3 log lines:
>> 
>> [    2.867424] mv88e6085 f412a200.mdio-mii:04: switch 0x3400 detected: Marvell 88E6141, revision 0
>> [    2.927445] libphy: mdio: probed
>> [    3.578496] mv88e6085 f412a200.mdio-mii:04 lan2 (uninitialized): PHY [!cp1!config-space@f4000000!mdio@12a200!switch0@4!mdio:11] driver [Marvell 88E6390]
>> [    3.595674] mv88e6085 f412a200.mdio-mii:04 lan1 (uninitialized): PHY [!cp1!config-space@f4000000!mdio@12a200!switch0@4!mdio:12] driver [Marvell 88E6390]
>> [    3.612797] mv88e6085 f412a200.mdio-mii:04 lan4 (uninitialized): PHY [!cp1!config-space@f4000000!mdio@12a200!switch0@4!mdio:13] driver [Marvell 88E6390]
>> [    3.629910] mv88e6085 f412a200.mdio-mii:04 lan3 (uninitialized): PHY [!cp1!config-space@f4000000!mdio@12a200!switch0@4!mdio:14] driver [Marvell 88E6390]
>> [    3.646049] mv88e6085 f412a200.mdio-mii:04: configuring for phy/ link mode
>> [    3.654451] DSA: tree 0 setup
>> ...
>> [   10.784521] mvpp2 f4000000.ethernet eth2: configuring for fixed/2500base-x link mode
>> [   10.792401] mvpp2 f4000000.ethernet eth2: Link is Up - 2.5Gbps/Full - flow control off
>> [   19.817981] mv88e6085 f412a200.mdio-mii:04 lan1: configuring for phy/ link mode
>> [   19.827083] 8021q: adding VLAN 0 to HW filter on device lan1
>> [   21.577276] mv88e6085 f412a200.mdio-mii:04 lan1: Link is Up - 100Mbps/Full - flow control rx/tx
>> [   21.586030] IPv6: ADDRCONF(NETDEV_CHANGE): lan1: link becomes ready
>> 
>> The Tx count on the lan1 interface increments, but the ARP packets don't 
>> show on the network.
>
> Hi Baruch
>
> I don't know of an issues.
>
> If the MAC TX counter increases, it sounds like it is a PHY issue?
> Does 100Mbps/Full make sense for this link?

100Mbps switch is what I have at the other side of the link. Works
perfectly with v5.2.

> Probably your best bet is to do a git bisect to find which commit
> broke it.

Bisect points at 7fb5a711545d ("net: dsa: mv88e6xxx: drop adjust_link to
enabled phylink"). Reverting this commit on top of v5.3.15 fixes the
issue (and brings the warning back). As I understand, this basically
reverts the driver migration to phylink. What might be the issue with
phylink?

Thanks,
baruch

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-11 17:07   ` Baruch Siach
@ 2019-12-11 17:49     ` Andrew Lunn
  2019-12-12  8:50       ` Baruch Siach
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Lunn @ 2019-12-11 17:49 UTC (permalink / raw)
  To: Baruch Siach; +Cc: Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

> Bisect points at 7fb5a711545d ("net: dsa: mv88e6xxx: drop adjust_link to
> enabled phylink"). Reverting this commit on top of v5.3.15 fixes the
> issue (and brings the warning back). As I understand, this basically
> reverts the driver migration to phylink. What might be the issue with
> phylink?

That suggests the MAC is wrongly running at 1G, and the PHY at 100M,
which is why it does not work.

You probably want to add #define DEBUG to the top of phylink.c, and
scatter some debug prints in mv88e6xxx_mac_config(). My guess would
be, either mv88e6xxx_mac_config() is existing before configuring the
MAC, or mv88e6xxx_port_setup_mac() has wrongly decided nothing has
changed and so has not configured the MAC.

	Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-11 17:49     ` Andrew Lunn
@ 2019-12-12  8:50       ` Baruch Siach
  2019-12-12 13:14         ` Andrew Lunn
  0 siblings, 1 reply; 27+ messages in thread
From: Baruch Siach @ 2019-12-12  8:50 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

Hi Andrew,

On Wed, Dec 11, 2019 at 06:49:38PM +0100, Andrew Lunn wrote:
> > Bisect points at 7fb5a711545d ("net: dsa: mv88e6xxx: drop adjust_link to
> > enabled phylink"). Reverting this commit on top of v5.3.15 fixes the
> > issue (and brings the warning back). As I understand, this basically
> > reverts the driver migration to phylink. What might be the issue with
> > phylink?
> 
> That suggests the MAC is wrongly running at 1G, and the PHY at 100M,
> which is why it does not work.

I reproduced the same issue with 1Gb link as well:

[   63.510782] mv88e6085 f412a200.mdio-mii:04 lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[   63.519357] IPv6: ADDRCONF(NETDEV_CHANGE): lan1: link becomes ready

> You probably want to add #define DEBUG to the top of phylink.c, and
> scatter some debug prints in mv88e6xxx_mac_config(). My guess would
> be, either mv88e6xxx_mac_config() is existing before configuring the
> MAC, or mv88e6xxx_port_setup_mac() has wrongly decided nothing has
> changed and so has not configured the MAC.

As you guessed, mv88e6xxx_mac_config() exits early because 
mv88e6xxx_phy_is_internal() returns true for port number 2, and 'mode' is 
MLO_AN_PHY. What is the right MAC/PHY setup flow in this case?

Thanks,
baruch

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-12  8:50       ` Baruch Siach
@ 2019-12-12 13:14         ` Andrew Lunn
  2019-12-12 14:19           ` Andreas Tobler
  2019-12-12 15:08           ` Baruch Siach
  0 siblings, 2 replies; 27+ messages in thread
From: Andrew Lunn @ 2019-12-12 13:14 UTC (permalink / raw)
  To: Baruch Siach; +Cc: Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

> As you guessed, mv88e6xxx_mac_config() exits early because 
> mv88e6xxx_phy_is_internal() returns true for port number 2, and 'mode' is 
> MLO_AN_PHY. What is the right MAC/PHY setup flow in this case?

So this goes back to

commit d700ec4118f9d5e88db8f678e7342f28c93037b9
Author: Marek Vasut <marex@denx.de>
Date:   Wed Sep 12 00:15:24 2018 +0200

    net: dsa: mv88e6xxx: Make sure to configure ports with external PHYs
    
    The MV88E6xxx can have external PHYs attached to certain ports and those
    PHYs could even be on different MDIO bus than the one within the switch.
    This patch makes sure that ports with such PHYs are configured correctly
    according to the information provided by the PHY.

@@ -709,13 +717,17 @@ static void mv88e6xxx_mac_config(struct dsa_switch *ds, int port,
        struct mv88e6xxx_chip *chip = ds->priv;
        int speed, duplex, link, pause, err;
 
-       if (mode == MLO_AN_PHY)
+       if ((mode == MLO_AN_PHY) && mv88e6xxx_phy_is_internal(ds, port))
                return;

The idea being, that the MAC has direct knowledge of the PHY
configuration because it is internal. There is no need to configure
the MAC, it does it itself.

This assumption seems wrong for the switch you have.

I think it is just a optimisation. So we can probably remove this phy
internal test.

And
        } else if (!mv88e6xxx_phy_is_internal(ds, port)) {

also needs to change.

It would be interesting to know if the MAC is completely wrongly
configured, or it is just a subset of parameters.

	    Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-12 13:14         ` Andrew Lunn
@ 2019-12-12 14:19           ` Andreas Tobler
  2019-12-12 15:08           ` Baruch Siach
  1 sibling, 0 replies; 27+ messages in thread
From: Andreas Tobler @ 2019-12-12 14:19 UTC (permalink / raw)
  To: Andrew Lunn, Baruch Siach
  Cc: Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

On 12.12.19 14:14, Andrew Lunn wrote:
>> As you guessed, mv88e6xxx_mac_config() exits early because
>> mv88e6xxx_phy_is_internal() returns true for port number 2, and 'mode' is
>> MLO_AN_PHY. What is the right MAC/PHY setup flow in this case?
> 
> So this goes back to
> 
> commit d700ec4118f9d5e88db8f678e7342f28c93037b9
> Author: Marek Vasut <marex@denx.de>
> Date:   Wed Sep 12 00:15:24 2018 +0200
> 
>      net: dsa: mv88e6xxx: Make sure to configure ports with external PHYs
>      
>      The MV88E6xxx can have external PHYs attached to certain ports and those
>      PHYs could even be on different MDIO bus than the one within the switch.
>      This patch makes sure that ports with such PHYs are configured correctly
>      according to the information provided by the PHY.
> 
> @@ -709,13 +717,17 @@ static void mv88e6xxx_mac_config(struct dsa_switch *ds, int port,
>          struct mv88e6xxx_chip *chip = ds->priv;
>          int speed, duplex, link, pause, err;
>   
> -       if (mode == MLO_AN_PHY)
> +       if ((mode == MLO_AN_PHY) && mv88e6xxx_phy_is_internal(ds, port))
>                  return;
> 
> The idea being, that the MAC has direct knowledge of the PHY
> configuration because it is internal. There is no need to configure
> the MAC, it does it itself.
> 
> This assumption seems wrong for the switch you have.
> 
> I think it is just a optimisation. So we can probably remove this phy
> internal test.
> 
> And
>          } else if (!mv88e6xxx_phy_is_internal(ds, port)) {
> 
> also needs to change.
> 
> It would be interesting to know if the MAC is completely wrongly
> configured, or it is just a subset of parameters.

Interesting that you mention the patch above.

I jump in because I have an issue since the above patch went in.

In my case (see below) the links are up and running fine, but when I 
disconnect the lan on port 8 or 9, I do not get the IP back. The same 
applies if I set the device down and up again. I loose the ip and do not 
get it again.

If I set the link in the above 'else if' clause to LINK_UNFORCED, I get 
my ip's back and the links are working again.

         } else if (!mv88e6xxx_phy_is_internal(ds, port)) {
-               link = state->link;
+               link = LINK_UNFORCED;
                 speed = state->speed;

I hope I did not hijack the thread an my situation is related. Otherwise 
please let me know.

Thanks,
Andreas

The lan0/1/2 are 100M, lan3/4 are 1G

my switch info from the log, 5.4.2:

mv88e6085 f1072004.mdio-mii:00: switch 0x990 detected: Marvell 
88E6097/88E6097F, revision 2
libphy: mv88e6xxx SMI: probed
mv88e6085 f1072004.mdio-mii:00 lan0 (uninitialized): PHY 
[mv88e6xxx-1:05] driver [Generic PHY]
mv88e6085 f1072004.mdio-mii:00 lan1 (uninitialized): PHY 
[mv88e6xxx-1:06] driver [Generic PHY]
mv88e6085 f1072004.mdio-mii:00 lan2 (uninitialized): PHY 
[mv88e6xxx-1:07] driver [Generic PHY]
mv88e6085 f1072004.mdio-mii:00 lan3 (uninitialized): PHY 
[mv88e6xxx-1:08] driver [Marvell 88E1112]
mv88e6085 f1072004.mdio-mii:00 lan4 (uninitialized): PHY 
[mv88e6xxx-1:09] driver [Marvell 88E1112]




-- 
onway ag
Andreas Tobler
Software Engineer

Stauffacherstrasse 16, CH-8004 Zürich
Tel: +41 55 214 18 42
andreas.tobler@onway.ch
www.onway.ch


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-12 13:14         ` Andrew Lunn
  2019-12-12 14:19           ` Andreas Tobler
@ 2019-12-12 15:08           ` Baruch Siach
  2019-12-12 15:13             ` Andrew Lunn
  1 sibling, 1 reply; 27+ messages in thread
From: Baruch Siach @ 2019-12-12 15:08 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

Hi Andrew,

On Thu, Dec 12, 2019 at 02:14:48PM +0100, Andrew Lunn wrote:
> > As you guessed, mv88e6xxx_mac_config() exits early because 
> > mv88e6xxx_phy_is_internal() returns true for port number 2, and 'mode' is 
> > MLO_AN_PHY. What is the right MAC/PHY setup flow in this case?
> 
> So this goes back to
> 
> commit d700ec4118f9d5e88db8f678e7342f28c93037b9
> Author: Marek Vasut <marex@denx.de>
> Date:   Wed Sep 12 00:15:24 2018 +0200
> 
>     net: dsa: mv88e6xxx: Make sure to configure ports with external PHYs
>     
>     The MV88E6xxx can have external PHYs attached to certain ports and those
>     PHYs could even be on different MDIO bus than the one within the switch.
>     This patch makes sure that ports with such PHYs are configured correctly
>     according to the information provided by the PHY.
> 
> @@ -709,13 +717,17 @@ static void mv88e6xxx_mac_config(struct dsa_switch *ds, int port,
>         struct mv88e6xxx_chip *chip = ds->priv;
>         int speed, duplex, link, pause, err;
>  
> -       if (mode == MLO_AN_PHY)
> +       if ((mode == MLO_AN_PHY) && mv88e6xxx_phy_is_internal(ds, port))
>                 return;
> 
> The idea being, that the MAC has direct knowledge of the PHY
> configuration because it is internal. There is no need to configure
> the MAC, it does it itself.
> 
> This assumption seems wrong for the switch you have.
> 
> I think it is just a optimisation. So we can probably remove this phy
> internal test.
> 
> And
>         } else if (!mv88e6xxx_phy_is_internal(ds, port)) {
> 
> also needs to change.
> 
> It would be interesting to know if the MAC is completely wrongly
> configured, or it is just a subset of parameters.

I compared phylib to phylink calls to mv88e6xxx_port_setup_mac(). It turns out 
that the phylink adds mv88e6xxx_port_setup_mac() call for the cpu port (port 5 
in my case) with these parameters and call stack:

[    4.219148] mv88e6xxx_port_setup_mac: port: 5 link: 0 speed: -1 duplex: 255
[    4.226144] CPU: 2 PID: 21 Comm: kworker/2:0 Not tainted 5.3.15-00003-gb9bb09189d02-dirty #104
[    4.234795] Hardware name: SolidRun ClearFog GT 8K (DT)
[    4.240044] Workqueue: events deferred_probe_work_func
[    4.245205] Call trace:
[    4.247661]  dump_backtrace+0x0/0x128
[    4.251339]  show_stack+0x14/0x1c
[    4.254669]  dump_stack+0xa4/0xd0
[    4.257998]  mv88e6xxx_port_setup_mac+0x78/0x2a0
[    4.262635]  mv88e6xxx_mac_config+0xd0/0x154
[    4.266924]  dsa_port_phylink_mac_config+0x2c/0x38
[    4.271736]  phylink_mac_config+0xe0/0x1cc
[    4.275849]  phylink_start+0xc8/0x224
[    4.279527]  dsa_port_link_register_of+0xe8/0x1b0
[    4.284251]  dsa_register_switch+0x7fc/0x908
[    4.288539]  mv88e6xxx_probe+0x62c/0x66c
[    4.292478]  mdio_probe+0x30/0x5c
[    4.295806]  really_probe+0x1d0/0x280
[    4.299483]  driver_probe_device+0xd4/0xe4
[    4.303596]  __device_attach_driver+0x94/0xa0
[    4.307971]  bus_for_each_drv+0x94/0xb4
[    4.311823]  __device_attach+0xc0/0x12c
[    4.315674]  device_initial_probe+0x10/0x18
[    4.319875]  bus_probe_device+0x2c/0x8c
[    4.323726]  deferred_probe_work_func+0x84/0x98
[    4.328276]  process_one_work+0x19c/0x258
[    4.332303]  process_scheduled_works+0x3c/0x40
[    4.336765]  worker_thread+0x228/0x2f8
[    4.340530]  kthread+0x114/0x124
[    4.343771]  ret_from_fork+0x10/0x18

This hunk removed that mv88e6xxx_port_setup_mac() call, and fixed Tx for me on 
v5.3.15:

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index d0a97eb73a37..f0457274b5a9 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -611,6 +611,9 @@ static void mv88e6xxx_mac_config(struct dsa_switch *ds, int port,
 	if ((mode == MLO_AN_PHY) && mv88e6xxx_phy_is_internal(ds, port))
 		return;
 
+	if (dsa_is_cpu_port(ds, port))
+		return;
+
 	if (mode == MLO_AN_FIXED) {
 		link = LINK_FORCED_UP;
 		speed = state->speed;

Is that the right solution? Or maybe mv88e6xxx_port_setup_mac() should be 
called again when the cpu interface goes up?

baruch

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-12 15:08           ` Baruch Siach
@ 2019-12-12 15:13             ` Andrew Lunn
  2019-12-12 15:23               ` Baruch Siach
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Lunn @ 2019-12-12 15:13 UTC (permalink / raw)
  To: Baruch Siach; +Cc: Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

> I compared phylib to phylink calls to mv88e6xxx_port_setup_mac(). It turns out 
> that the phylink adds mv88e6xxx_port_setup_mac() call for the cpu port (port 5 
> in my case) with these parameters and call stack:
> 
> [    4.219148] mv88e6xxx_port_setup_mac: port: 5 link: 0 speed: -1 duplex: 255
> [    4.226144] CPU: 2 PID: 21 Comm: kworker/2:0 Not tainted 5.3.15-00003-gb9bb09189d02-dirty #104
> [    4.234795] Hardware name: SolidRun ClearFog GT 8K (DT)
> [    4.240044] Workqueue: events deferred_probe_work_func
> [    4.245205] Call trace:
> [    4.247661]  dump_backtrace+0x0/0x128
> [    4.251339]  show_stack+0x14/0x1c
> [    4.254669]  dump_stack+0xa4/0xd0
> [    4.257998]  mv88e6xxx_port_setup_mac+0x78/0x2a0
> [    4.262635]  mv88e6xxx_mac_config+0xd0/0x154
> [    4.266924]  dsa_port_phylink_mac_config+0x2c/0x38
> [    4.271736]  phylink_mac_config+0xe0/0x1cc
> [    4.275849]  phylink_start+0xc8/0x224
> [    4.279527]  dsa_port_link_register_of+0xe8/0x1b0
> [    4.284251]  dsa_register_switch+0x7fc/0x908
> [    4.288539]  mv88e6xxx_probe+0x62c/0x66c
> [    4.292478]  mdio_probe+0x30/0x5c
> [    4.295806]  really_probe+0x1d0/0x280
> [    4.299483]  driver_probe_device+0xd4/0xe4
> [    4.303596]  __device_attach_driver+0x94/0xa0
> [    4.307971]  bus_for_each_drv+0x94/0xb4
> [    4.311823]  __device_attach+0xc0/0x12c
> [    4.315674]  device_initial_probe+0x10/0x18
> [    4.319875]  bus_probe_device+0x2c/0x8c
> [    4.323726]  deferred_probe_work_func+0x84/0x98
> [    4.328276]  process_one_work+0x19c/0x258
> [    4.332303]  process_scheduled_works+0x3c/0x40
> [    4.336765]  worker_thread+0x228/0x2f8
> [    4.340530]  kthread+0x114/0x124
> [    4.343771]  ret_from_fork+0x10/0x18
> 
> This hunk removed that mv88e6xxx_port_setup_mac() call, and fixed Tx for me on 
> v5.3.15:
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index d0a97eb73a37..f0457274b5a9 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -611,6 +611,9 @@ static void mv88e6xxx_mac_config(struct dsa_switch *ds, int port,
>  	if ((mode == MLO_AN_PHY) && mv88e6xxx_phy_is_internal(ds, port))
>  		return;
>  
> +	if (dsa_is_cpu_port(ds, port))
> +		return;
> +
>  	if (mode == MLO_AN_FIXED) {
>  		link = LINK_FORCED_UP;
>  		speed = state->speed;
> 
> Is that the right solution?

What needs testing is:

                                        port@0 {
                                                reg = <0>;
                                                label = "cpu";
                                                ethernet = <&fec1>;

                                                fixed-link {
                                                        speed = <100>;
                                                        full-duplex;
                                                };

At some point, there is a call to configure the CPU port to 100Mbps,
because the SoC Ethernet does not support 1G. We need to ensure this
does not break with your change.

Thanks
	Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-12 15:13             ` Andrew Lunn
@ 2019-12-12 15:23               ` Baruch Siach
  2019-12-12 18:17                 ` Baruch Siach
  2019-12-12 18:36                 ` Marek Behun
  0 siblings, 2 replies; 27+ messages in thread
From: Baruch Siach @ 2019-12-12 15:23 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

Hi Andrew,

On Thu, Dec 12, 2019 at 04:13:55PM +0100, Andrew Lunn wrote:
> > I compared phylib to phylink calls to mv88e6xxx_port_setup_mac(). It turns out 
> > that the phylink adds mv88e6xxx_port_setup_mac() call for the cpu port (port 5 
> > in my case) with these parameters and call stack:
> > 
> > [    4.219148] mv88e6xxx_port_setup_mac: port: 5 link: 0 speed: -1 duplex: 255
> > [    4.226144] CPU: 2 PID: 21 Comm: kworker/2:0 Not tainted 5.3.15-00003-gb9bb09189d02-dirty #104
> > [    4.234795] Hardware name: SolidRun ClearFog GT 8K (DT)
> > [    4.240044] Workqueue: events deferred_probe_work_func
> > [    4.245205] Call trace:
> > [    4.247661]  dump_backtrace+0x0/0x128
> > [    4.251339]  show_stack+0x14/0x1c
> > [    4.254669]  dump_stack+0xa4/0xd0
> > [    4.257998]  mv88e6xxx_port_setup_mac+0x78/0x2a0
> > [    4.262635]  mv88e6xxx_mac_config+0xd0/0x154
> > [    4.266924]  dsa_port_phylink_mac_config+0x2c/0x38
> > [    4.271736]  phylink_mac_config+0xe0/0x1cc
> > [    4.275849]  phylink_start+0xc8/0x224
> > [    4.279527]  dsa_port_link_register_of+0xe8/0x1b0
> > [    4.284251]  dsa_register_switch+0x7fc/0x908
> > [    4.288539]  mv88e6xxx_probe+0x62c/0x66c
> > [    4.292478]  mdio_probe+0x30/0x5c
> > [    4.295806]  really_probe+0x1d0/0x280
> > [    4.299483]  driver_probe_device+0xd4/0xe4
> > [    4.303596]  __device_attach_driver+0x94/0xa0
> > [    4.307971]  bus_for_each_drv+0x94/0xb4
> > [    4.311823]  __device_attach+0xc0/0x12c
> > [    4.315674]  device_initial_probe+0x10/0x18
> > [    4.319875]  bus_probe_device+0x2c/0x8c
> > [    4.323726]  deferred_probe_work_func+0x84/0x98
> > [    4.328276]  process_one_work+0x19c/0x258
> > [    4.332303]  process_scheduled_works+0x3c/0x40
> > [    4.336765]  worker_thread+0x228/0x2f8
> > [    4.340530]  kthread+0x114/0x124
> > [    4.343771]  ret_from_fork+0x10/0x18
> > 
> > This hunk removed that mv88e6xxx_port_setup_mac() call, and fixed Tx for me on 
> > v5.3.15:
> > 
> > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> > index d0a97eb73a37..f0457274b5a9 100644
> > --- a/drivers/net/dsa/mv88e6xxx/chip.c
> > +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> > @@ -611,6 +611,9 @@ static void mv88e6xxx_mac_config(struct dsa_switch *ds, int port,
> >  	if ((mode == MLO_AN_PHY) && mv88e6xxx_phy_is_internal(ds, port))
> >  		return;
> >  
> > +	if (dsa_is_cpu_port(ds, port))
> > +		return;
> > +
> >  	if (mode == MLO_AN_FIXED) {
> >  		link = LINK_FORCED_UP;
> >  		speed = state->speed;
> > 
> > Is that the right solution?
> 
> What needs testing is:
> 
>                                         port@0 {
>                                                 reg = <0>;
>                                                 label = "cpu";
>                                                 ethernet = <&fec1>;
> 
>                                                 fixed-link {
>                                                         speed = <100>;
>                                                         full-duplex;
>                                                 };
> 
> At some point, there is a call to configure the CPU port to 100Mbps,
> because the SoC Ethernet does not support 1G. We need to ensure this
> does not break with your change.

So maybe this:

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index d0a97eb73a37..84ca4f36a778 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -611,6 +611,9 @@ static void mv88e6xxx_mac_config(struct dsa_switch *ds, int port,
 	if ((mode == MLO_AN_PHY) && mv88e6xxx_phy_is_internal(ds, port))
 		return;
 
+	if (mode != MLO_AN_FIXED && dsa_is_cpu_port(ds, port))
+		return;
+
 	if (mode == MLO_AN_FIXED) {
 		link = LINK_FORCED_UP;
 		speed = state->speed;

baruch

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-12 15:23               ` Baruch Siach
@ 2019-12-12 18:17                 ` Baruch Siach
  2019-12-12 18:32                   ` Marek Behun
  2019-12-12 18:36                 ` Marek Behun
  1 sibling, 1 reply; 27+ messages in thread
From: Baruch Siach @ 2019-12-12 18:17 UTC (permalink / raw)
  To: Andrew Lunn, Marek Behún
  Cc: Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

Hi Andrew, Marek,

On Thu, Dec 12, 2019 at 05:23:57PM +0200, Baruch Siach wrote:
> On Thu, Dec 12, 2019 at 04:13:55PM +0100, Andrew Lunn wrote:
> > > I compared phylib to phylink calls to mv88e6xxx_port_setup_mac(). It turns out 
> > > that the phylink adds mv88e6xxx_port_setup_mac() call for the cpu port (port 5 
> > > in my case) with these parameters and call stack:
> > > 
> > > [    4.219148] mv88e6xxx_port_setup_mac: port: 5 link: 0 speed: -1 duplex: 255
> > > [    4.226144] CPU: 2 PID: 21 Comm: kworker/2:0 Not tainted 5.3.15-00003-gb9bb09189d02-dirty #104
> > > [    4.234795] Hardware name: SolidRun ClearFog GT 8K (DT)
> > > [    4.240044] Workqueue: events deferred_probe_work_func
> > > [    4.245205] Call trace:
> > > [    4.247661]  dump_backtrace+0x0/0x128
> > > [    4.251339]  show_stack+0x14/0x1c
> > > [    4.254669]  dump_stack+0xa4/0xd0
> > > [    4.257998]  mv88e6xxx_port_setup_mac+0x78/0x2a0
> > > [    4.262635]  mv88e6xxx_mac_config+0xd0/0x154
> > > [    4.266924]  dsa_port_phylink_mac_config+0x2c/0x38
> > > [    4.271736]  phylink_mac_config+0xe0/0x1cc
> > > [    4.275849]  phylink_start+0xc8/0x224
> > > [    4.279527]  dsa_port_link_register_of+0xe8/0x1b0
> > > [    4.284251]  dsa_register_switch+0x7fc/0x908
> > > [    4.288539]  mv88e6xxx_probe+0x62c/0x66c
> > > [    4.292478]  mdio_probe+0x30/0x5c
> > > [    4.295806]  really_probe+0x1d0/0x280
> > > [    4.299483]  driver_probe_device+0xd4/0xe4
> > > [    4.303596]  __device_attach_driver+0x94/0xa0
> > > [    4.307971]  bus_for_each_drv+0x94/0xb4
> > > [    4.311823]  __device_attach+0xc0/0x12c
> > > [    4.315674]  device_initial_probe+0x10/0x18
> > > [    4.319875]  bus_probe_device+0x2c/0x8c
> > > [    4.323726]  deferred_probe_work_func+0x84/0x98
> > > [    4.328276]  process_one_work+0x19c/0x258
> > > [    4.332303]  process_scheduled_works+0x3c/0x40
> > > [    4.336765]  worker_thread+0x228/0x2f8
> > > [    4.340530]  kthread+0x114/0x124
> > > [    4.343771]  ret_from_fork+0x10/0x18
> > > 
> > > This hunk removed that mv88e6xxx_port_setup_mac() call, and fixed Tx for me on 
> > > v5.3.15:
> > > 
> > > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> > > index d0a97eb73a37..f0457274b5a9 100644
> > > --- a/drivers/net/dsa/mv88e6xxx/chip.c
> > > +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> > > @@ -611,6 +611,9 @@ static void mv88e6xxx_mac_config(struct dsa_switch *ds, int port,
> > >  	if ((mode == MLO_AN_PHY) && mv88e6xxx_phy_is_internal(ds, port))
> > >  		return;
> > >  
> > > +	if (dsa_is_cpu_port(ds, port))
> > > +		return;
> > > +
> > >  	if (mode == MLO_AN_FIXED) {
> > >  		link = LINK_FORCED_UP;
> > >  		speed = state->speed;
> > > 
> > > Is that the right solution?
> > 
> > What needs testing is:
> > 
> >                                         port@0 {
> >                                                 reg = <0>;
> >                                                 label = "cpu";
> >                                                 ethernet = <&fec1>;
> > 
> >                                                 fixed-link {
> >                                                         speed = <100>;
> >                                                         full-duplex;
> >                                                 };
> > 
> > At some point, there is a call to configure the CPU port to 100Mbps,
> > because the SoC Ethernet does not support 1G. We need to ensure this
> > does not break with your change.
> 
> So maybe this:
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index d0a97eb73a37..84ca4f36a778 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -611,6 +611,9 @@ static void mv88e6xxx_mac_config(struct dsa_switch *ds, int port,
>  	if ((mode == MLO_AN_PHY) && mv88e6xxx_phy_is_internal(ds, port))
>  		return;
>  
> +	if (mode != MLO_AN_FIXED && dsa_is_cpu_port(ds, port))
> +		return;
> +
>  	if (mode == MLO_AN_FIXED) {
>  		link = LINK_FORCED_UP;
>  		speed = state->speed;

This is not enough to fix v5.4, though. Commit 7a3007d22e8dc ("net: dsa: 
mv88e6xxx: fully support SERDES on Topaz family") breaks switch Tx on SolidRun 
Clearfog GT-8K in much the same way. I could not easily revert 7a3007d22e8dc 
on top of v5.4, but this is enough to make Tx work again with v5.4:

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 7b5b73499e37..e7e6400a994e 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3191,7 +3191,6 @@ static const struct mv88e6xxx_ops mv88e6141_ops = {
 	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
 	.port_link_state = mv88e6352_port_link_state,
 	.port_get_cmode = mv88e6352_port_get_cmode,
-	.port_set_cmode = mv88e6341_port_set_cmode,
 	.port_setup_message_port = mv88e6xxx_setup_message_port,
 	.stats_snapshot = mv88e6390_g1_stats_snapshot,
 	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,

Marek, do you have any idea how to properly fix this?

Thanks,
baruch

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-12 18:17                 ` Baruch Siach
@ 2019-12-12 18:32                   ` Marek Behun
  2019-12-12 18:48                     ` Baruch Siach
  0 siblings, 1 reply; 27+ messages in thread
From: Marek Behun @ 2019-12-12 18:32 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Andrew Lunn, Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

On Thu, 12 Dec 2019 20:17:29 +0200
Baruch Siach <baruch@tkos.co.il> wrote:

> This is not enough to fix v5.4, though. Commit 7a3007d22e8dc ("net: dsa: 
> mv88e6xxx: fully support SERDES on Topaz family") breaks switch Tx on SolidRun 
> Clearfog GT-8K in much the same way. I could not easily revert 7a3007d22e8dc 
> on top of v5.4, but this is enough to make Tx work again with v5.4:
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 7b5b73499e37..e7e6400a994e 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -3191,7 +3191,6 @@ static const struct mv88e6xxx_ops mv88e6141_ops = {
>  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
>  	.port_link_state = mv88e6352_port_link_state,
>  	.port_get_cmode = mv88e6352_port_get_cmode,
> -	.port_set_cmode = mv88e6341_port_set_cmode,
>  	.port_setup_message_port = mv88e6xxx_setup_message_port,
>  	.stats_snapshot = mv88e6390_g1_stats_snapshot,
>  	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
> 
> Marek, do you have any idea how to properly fix this?
> 
> Thanks,
> baruch
> 

Hi Baruch,

can you tell on which port the port_set_cmode is called when things
break? Because if it is other than port 5, than method
mv88e6341_port_set_cmode does nothing. If it is port 5, do you have cpu
connected to the switch via port 5? What phy mode?

Marek

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-12 15:23               ` Baruch Siach
  2019-12-12 18:17                 ` Baruch Siach
@ 2019-12-12 18:36                 ` Marek Behun
  2019-12-12 19:06                   ` Baruch Siach
  1 sibling, 1 reply; 27+ messages in thread
From: Marek Behun @ 2019-12-12 18:36 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Andrew Lunn, Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

On Thu, 12 Dec 2019 17:23:55 +0200
Baruch Siach <baruch@tkos.co.il> wrote:

> Hi Andrew,
> 
> On Thu, Dec 12, 2019 at 04:13:55PM +0100, Andrew Lunn wrote:
> > > I compared phylib to phylink calls to mv88e6xxx_port_setup_mac(). It turns out 
> > > that the phylink adds mv88e6xxx_port_setup_mac() call for the cpu port (port 5 
> > > in my case) with these parameters and call stack:
> > > 
> > > [    4.219148] mv88e6xxx_port_setup_mac: port: 5 link: 0 speed: -1 duplex: 255
> > > [    4.226144] CPU: 2 PID: 21 Comm: kworker/2:0 Not tainted 5.3.15-00003-gb9bb09189d02-dirty #104
> > > [    4.234795] Hardware name: SolidRun ClearFog GT 8K (DT)
> > > [    4.240044] Workqueue: events deferred_probe_work_func
> > > [    4.245205] Call trace:
> > > [    4.247661]  dump_backtrace+0x0/0x128
> > > [    4.251339]  show_stack+0x14/0x1c
> > > [    4.254669]  dump_stack+0xa4/0xd0
> > > [    4.257998]  mv88e6xxx_port_setup_mac+0x78/0x2a0
> > > [    4.262635]  mv88e6xxx_mac_config+0xd0/0x154
> > > [    4.266924]  dsa_port_phylink_mac_config+0x2c/0x38
> > > [    4.271736]  phylink_mac_config+0xe0/0x1cc
> > > [    4.275849]  phylink_start+0xc8/0x224
> > > [    4.279527]  dsa_port_link_register_of+0xe8/0x1b0
> > > [    4.284251]  dsa_register_switch+0x7fc/0x908
> > > [    4.288539]  mv88e6xxx_probe+0x62c/0x66c
> > > [    4.292478]  mdio_probe+0x30/0x5c
> > > [    4.295806]  really_probe+0x1d0/0x280
> > > [    4.299483]  driver_probe_device+0xd4/0xe4
> > > [    4.303596]  __device_attach_driver+0x94/0xa0
> > > [    4.307971]  bus_for_each_drv+0x94/0xb4
> > > [    4.311823]  __device_attach+0xc0/0x12c
> > > [    4.315674]  device_initial_probe+0x10/0x18
> > > [    4.319875]  bus_probe_device+0x2c/0x8c
> > > [    4.323726]  deferred_probe_work_func+0x84/0x98
> > > [    4.328276]  process_one_work+0x19c/0x258
> > > [    4.332303]  process_scheduled_works+0x3c/0x40
> > > [    4.336765]  worker_thread+0x228/0x2f8
> > > [    4.340530]  kthread+0x114/0x124
> > > [    4.343771]  ret_from_fork+0x10/0x18
> > > 
> > > This hunk removed that mv88e6xxx_port_setup_mac() call, and fixed Tx for me on 
> > > v5.3.15:
> > > 
> > > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> > > index d0a97eb73a37..f0457274b5a9 100644
> > > --- a/drivers/net/dsa/mv88e6xxx/chip.c
> > > +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> > > @@ -611,6 +611,9 @@ static void mv88e6xxx_mac_config(struct dsa_switch *ds, int port,
> > >  	if ((mode == MLO_AN_PHY) && mv88e6xxx_phy_is_internal(ds, port))
> > >  		return;
> > >  
> > > +	if (dsa_is_cpu_port(ds, port))
> > > +		return;
> > > +
> > >  	if (mode == MLO_AN_FIXED) {
> > >  		link = LINK_FORCED_UP;
> > >  		speed = state->speed;
> > > 
> > > Is that the right solution?  
> > 
> > What needs testing is:
> > 
> >                                         port@0 {
> >                                                 reg = <0>;
> >                                                 label = "cpu";
> >                                                 ethernet = <&fec1>;
> > 
> >                                                 fixed-link {
> >                                                         speed = <100>;
> >                                                         full-duplex;
> >                                                 };
> > 
> > At some point, there is a call to configure the CPU port to 100Mbps,
> > because the SoC Ethernet does not support 1G. We need to ensure this
> > does not break with your change.  
> 
> So maybe this:
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index d0a97eb73a37..84ca4f36a778 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -611,6 +611,9 @@ static void mv88e6xxx_mac_config(struct dsa_switch *ds, int port,
>  	if ((mode == MLO_AN_PHY) && mv88e6xxx_phy_is_internal(ds, port))
>  		return;
>  
> +	if (mode != MLO_AN_FIXED && dsa_is_cpu_port(ds, port))
> +		return;
> +
>  	if (mode == MLO_AN_FIXED) {
>  		link = LINK_FORCED_UP;
>  		speed = state->speed;
> 
> baruch
> 

No, if your switch is connected to the cpu via 100mbps on the cpu side,
you have to add fixed-link node as Andrew suggested. This is what we
were trying to do with the Topaz patches, so that dsa port nodes
support fixed-link via phylink.

Marek

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-12 18:32                   ` Marek Behun
@ 2019-12-12 18:48                     ` Baruch Siach
  0 siblings, 0 replies; 27+ messages in thread
From: Baruch Siach @ 2019-12-12 18:48 UTC (permalink / raw)
  To: Marek Behun
  Cc: Andrew Lunn, Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

Hi Marek,

On Thu, Dec 12, 2019 at 07:32:18PM +0100, Marek Behun wrote:
> On Thu, 12 Dec 2019 20:17:29 +0200
> Baruch Siach <baruch@tkos.co.il> wrote:
> 
> > This is not enough to fix v5.4, though. Commit 7a3007d22e8dc ("net: dsa: 
> > mv88e6xxx: fully support SERDES on Topaz family") breaks switch Tx on SolidRun 
> > Clearfog GT-8K in much the same way. I could not easily revert 7a3007d22e8dc 
> > on top of v5.4, but this is enough to make Tx work again with v5.4:
> > 
> > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> > index 7b5b73499e37..e7e6400a994e 100644
> > --- a/drivers/net/dsa/mv88e6xxx/chip.c
> > +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> > @@ -3191,7 +3191,6 @@ static const struct mv88e6xxx_ops mv88e6141_ops = {
> >  	.port_disable_pri_override = mv88e6xxx_port_disable_pri_override,
> >  	.port_link_state = mv88e6352_port_link_state,
> >  	.port_get_cmode = mv88e6352_port_get_cmode,
> > -	.port_set_cmode = mv88e6341_port_set_cmode,
> >  	.port_setup_message_port = mv88e6xxx_setup_message_port,
> >  	.stats_snapshot = mv88e6390_g1_stats_snapshot,
> >  	.stats_set_histogram = mv88e6095_g1_stats_set_histogram,
> > 
> > Marek, do you have any idea how to properly fix this?
> 
> can you tell on which port the port_set_cmode is called when things
> break? Because if it is other than port 5, than method
> mv88e6341_port_set_cmode does nothing. If it is port 5, do you have cpu
> connected to the switch via port 5? What phy mode?

This is indeed port 5. It is connected to the cpu (Armada 8040). PHY mode is 
19 (PHY_INTERFACE_MODE_2500BASEX).

I have now tested v5.5-rc1 which behaves just like v5.4. The same "fix" works 
with v5.5-rc1.

Thanks,
baruch

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-12 18:36                 ` Marek Behun
@ 2019-12-12 19:06                   ` Baruch Siach
  2019-12-12 19:31                     ` Andrew Lunn
  0 siblings, 1 reply; 27+ messages in thread
From: Baruch Siach @ 2019-12-12 19:06 UTC (permalink / raw)
  To: Marek Behun
  Cc: Andrew Lunn, Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

Hi Marek,

On Thu, Dec 12, 2019 at 07:36:11PM +0100, Marek Behun wrote:
> On Thu, 12 Dec 2019 17:23:55 +0200
> Baruch Siach <baruch@tkos.co.il> wrote:
> > 
> > On Thu, Dec 12, 2019 at 04:13:55PM +0100, Andrew Lunn wrote:
> > > > I compared phylib to phylink calls to mv88e6xxx_port_setup_mac(). It turns out 
> > > > that the phylink adds mv88e6xxx_port_setup_mac() call for the cpu port (port 5 
> > > > in my case) with these parameters and call stack:
> > > > 
> > > > [    4.219148] mv88e6xxx_port_setup_mac: port: 5 link: 0 speed: -1 duplex: 255
> > > > [    4.226144] CPU: 2 PID: 21 Comm: kworker/2:0 Not tainted 5.3.15-00003-gb9bb09189d02-dirty #104
> > > > [    4.234795] Hardware name: SolidRun ClearFog GT 8K (DT)
> > > > [    4.240044] Workqueue: events deferred_probe_work_func
> > > > [    4.245205] Call trace:
> > > > [    4.247661]  dump_backtrace+0x0/0x128
> > > > [    4.251339]  show_stack+0x14/0x1c
> > > > [    4.254669]  dump_stack+0xa4/0xd0
> > > > [    4.257998]  mv88e6xxx_port_setup_mac+0x78/0x2a0
> > > > [    4.262635]  mv88e6xxx_mac_config+0xd0/0x154
> > > > [    4.266924]  dsa_port_phylink_mac_config+0x2c/0x38
> > > > [    4.271736]  phylink_mac_config+0xe0/0x1cc
> > > > [    4.275849]  phylink_start+0xc8/0x224
> > > > [    4.279527]  dsa_port_link_register_of+0xe8/0x1b0
> > > > [    4.284251]  dsa_register_switch+0x7fc/0x908
> > > > [    4.288539]  mv88e6xxx_probe+0x62c/0x66c
> > > > [    4.292478]  mdio_probe+0x30/0x5c
> > > > [    4.295806]  really_probe+0x1d0/0x280
> > > > [    4.299483]  driver_probe_device+0xd4/0xe4
> > > > [    4.303596]  __device_attach_driver+0x94/0xa0
> > > > [    4.307971]  bus_for_each_drv+0x94/0xb4
> > > > [    4.311823]  __device_attach+0xc0/0x12c
> > > > [    4.315674]  device_initial_probe+0x10/0x18
> > > > [    4.319875]  bus_probe_device+0x2c/0x8c
> > > > [    4.323726]  deferred_probe_work_func+0x84/0x98
> > > > [    4.328276]  process_one_work+0x19c/0x258
> > > > [    4.332303]  process_scheduled_works+0x3c/0x40
> > > > [    4.336765]  worker_thread+0x228/0x2f8
> > > > [    4.340530]  kthread+0x114/0x124
> > > > [    4.343771]  ret_from_fork+0x10/0x18
> > > > 
> > > > This hunk removed that mv88e6xxx_port_setup_mac() call, and fixed Tx for me on 
> > > > v5.3.15:
> > > > 
> > > > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> > > > index d0a97eb73a37..f0457274b5a9 100644
> > > > --- a/drivers/net/dsa/mv88e6xxx/chip.c
> > > > +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> > > > @@ -611,6 +611,9 @@ static void mv88e6xxx_mac_config(struct dsa_switch *ds, int port,
> > > >  	if ((mode == MLO_AN_PHY) && mv88e6xxx_phy_is_internal(ds, port))
> > > >  		return;
> > > >  
> > > > +	if (dsa_is_cpu_port(ds, port))
> > > > +		return;
> > > > +
> > > >  	if (mode == MLO_AN_FIXED) {
> > > >  		link = LINK_FORCED_UP;
> > > >  		speed = state->speed;
> > > > 
> > > > Is that the right solution?  
> > > 
> > > What needs testing is:
> > > 
> > >                                         port@0 {
> > >                                                 reg = <0>;
> > >                                                 label = "cpu";
> > >                                                 ethernet = <&fec1>;
> > > 
> > >                                                 fixed-link {
> > >                                                         speed = <100>;
> > >                                                         full-duplex;
> > >                                                 };
> > > 
> > > At some point, there is a call to configure the CPU port to 100Mbps,
> > > because the SoC Ethernet does not support 1G. We need to ensure this
> > > does not break with your change.  
> > 
> > So maybe this:
> > 
> > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> > index d0a97eb73a37..84ca4f36a778 100644
> > --- a/drivers/net/dsa/mv88e6xxx/chip.c
> > +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> > @@ -611,6 +611,9 @@ static void mv88e6xxx_mac_config(struct dsa_switch *ds, int port,
> >  	if ((mode == MLO_AN_PHY) && mv88e6xxx_phy_is_internal(ds, port))
> >  		return;
> >  
> > +	if (mode != MLO_AN_FIXED && dsa_is_cpu_port(ds, port))
> > +		return;
> > +
> >  	if (mode == MLO_AN_FIXED) {
> >  		link = LINK_FORCED_UP;
> >  		speed = state->speed;
> 
> No, if your switch is connected to the cpu via 100mbps on the cpu side,
> you have to add fixed-link node as Andrew suggested. This is what we
> were trying to do with the Topaz patches, so that dsa port nodes
> support fixed-link via phylink.

Thanks for the clarification. This also works without the hunk above:

diff --git a/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts b/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts
index bd881497b872..8f61cae9d3b0 100644
--- a/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts
+++ b/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts
@@ -408,6 +408,11 @@ port@5 {
 				reg = <5>;
 				label = "cpu";
 				ethernet = <&cp1_eth2>;
+
+				fixed-link {
+					speed = <2500>;
+					full-duplex;
+				};
 			};
 		};
 
With this I get calls to mv88e6341_port_set_cmode() with PHY mode 0. So the 
other "fix" is still needed.

Thanks,
baruch

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-12 19:06                   ` Baruch Siach
@ 2019-12-12 19:31                     ` Andrew Lunn
  2019-12-12 19:41                       ` Marek Behun
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Lunn @ 2019-12-12 19:31 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Marek Behun, Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

> diff --git a/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts b/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts
> index bd881497b872..8f61cae9d3b0 100644
> --- a/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts
> +++ b/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts
> @@ -408,6 +408,11 @@ port@5 {
>  				reg = <5>;
>  				label = "cpu";
>  				ethernet = <&cp1_eth2>;
> +
> +				fixed-link {
> +					speed = <2500>;
> +					full-duplex;
> +				};
>  			};
>  		};

The DSA driver is expected to configure the CPU port at its maximum
speed. You should only add a fixed link if you need to slow it down.
I expect 2500 is the maximum speed of this port.

  Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-12 19:31                     ` Andrew Lunn
@ 2019-12-12 19:41                       ` Marek Behun
  2019-12-12 20:49                         ` Andrew Lunn
  2019-12-15 10:13                         ` Baruch Siach
  0 siblings, 2 replies; 27+ messages in thread
From: Marek Behun @ 2019-12-12 19:41 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Baruch Siach, Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

On Thu, 12 Dec 2019 20:31:29 +0100
Andrew Lunn <andrew@lunn.ch> wrote:

> > diff --git a/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts b/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts
> > index bd881497b872..8f61cae9d3b0 100644
> > --- a/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts
> > +++ b/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts
> > @@ -408,6 +408,11 @@ port@5 {
> >  				reg = <5>;
> >  				label = "cpu";
> >  				ethernet = <&cp1_eth2>;
> > +
> > +				fixed-link {
> > +					speed = <2500>;
> > +					full-duplex;
> > +				};
> >  			};
> >  		};  
> 
> The DSA driver is expected to configure the CPU port at its maximum
> speed. You should only add a fixed link if you need to slow it down.
> I expect 2500 is the maximum speed of this port.
> 
>   Andrew

Baruch, if the cpu port is in 2500 base-x, remove the fixed-link and do
this:

port@5 {
	reg = <5>;
	label = "cpu";
	ethernet = <&cp1_eth2>;
	phy-mode = "2500base-x";
	managed = "in-band-status";
};

Andrew, if the dsa driver is expected to do that, the code certainly
does not do so. For example in mv88e6xxx_port_set_cmode you have:
 /* Default to a slow mode, so freeing up SERDES interfaces for
  * other ports which might use them for SFPs.
  */
 if (mode == PHY_INTERFACE_MODE_NA)
         mode = PHY_INTERFACE_MODE_1000BASEX;

Marek

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-12 19:41                       ` Marek Behun
@ 2019-12-12 20:49                         ` Andrew Lunn
  2019-12-15 10:13                         ` Baruch Siach
  1 sibling, 0 replies; 27+ messages in thread
From: Andrew Lunn @ 2019-12-12 20:49 UTC (permalink / raw)
  To: Marek Behun
  Cc: Baruch Siach, Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

> Baruch, if the cpu port is in 2500 base-x, remove the fixed-link and do
> this:
> 
> port@5 {
> 	reg = <5>;
> 	label = "cpu";
> 	ethernet = <&cp1_eth2>;
> 	phy-mode = "2500base-x";
> 	managed = "in-band-status";
> };
> 
> Andrew, if the dsa driver is expected to do that, the code certainly
> does not do so. For example in mv88e6xxx_port_set_cmode you have:
>  /* Default to a slow mode, so freeing up SERDES interfaces for
>   * other ports which might use them for SFPs.
>   */
>  if (mode == PHY_INTERFACE_MODE_NA)
>          mode = PHY_INTERFACE_MODE_1000BASEX;

Yah, Ports 9 and 10 of 6390X are a bit odd. They can do 10Gbps. But
only if you set the correct phy-mode. Then they will default to 10G.
If these ports are not doing 10Gbps they can lend there SERDES
interfaces to other ports. There are a few boards which want this
lending, connecting lots of SFPs using these SERDES interfaces. And
there are other boards which do use the ports at 10G as DSA links. DSA
links also default to the maximum speed of the port, and that does
work if you set the correct phy-mode.

So in general, if the port supports > 1Gbps, you need to set the
phy-mode for CPU and DSA ports. It will then default to the maximum
speed for that mode.

     Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-12 19:41                       ` Marek Behun
  2019-12-12 20:49                         ` Andrew Lunn
@ 2019-12-15 10:13                         ` Baruch Siach
  2019-12-15 11:10                           ` Baruch Siach
  2019-12-18 14:30                           ` Marek Behún
  1 sibling, 2 replies; 27+ messages in thread
From: Baruch Siach @ 2019-12-15 10:13 UTC (permalink / raw)
  To: Marek Behun
  Cc: Andrew Lunn, Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

Hi Marek,

On Thu, Dec 12 2019, Marek Behun wrote:

> On Thu, 12 Dec 2019 20:31:29 +0100
> Andrew Lunn <andrew@lunn.ch> wrote:
>
>> > diff --git a/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts b/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts
>> > index bd881497b872..8f61cae9d3b0 100644
>> > --- a/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts
>> > +++ b/arch/arm64/boot/dts/marvell/armada-8040-clearfog-gt-8k.dts
>> > @@ -408,6 +408,11 @@ port@5 {
>> >  				reg = <5>;
>> >  				label = "cpu";
>> >  				ethernet = <&cp1_eth2>;
>> > +
>> > +				fixed-link {
>> > +					speed = <2500>;
>> > +					full-duplex;
>> > +				};
>> >  			};
>> >  		};
>>
>> The DSA driver is expected to configure the CPU port at its maximum
>> speed. You should only add a fixed link if you need to slow it down.
>> I expect 2500 is the maximum speed of this port.
>
> Baruch, if the cpu port is in 2500 base-x, remove the fixed-link and do
> this:
>
> port@5 {
> 	reg = <5>;
> 	label = "cpu";
> 	ethernet = <&cp1_eth2>;
> 	phy-mode = "2500base-x";
> 	managed = "in-band-status";
> };

Thanks. That is enough to fix the phylink issue triggered by commit
7fb5a711545 ("net: dsa: mv88e6xxx: drop adjust_link to enabled
phylink").

The Clearfog GT-8K DT has also this on the cpu side:

&cp1_eth2 {
        status = "okay";
        phy-mode = "2500base-x";
        phys = <&cp1_comphy5 2>;
        fixed-link {
                speed = <2500>;
                full-duplex;
        };
};

Should I drop fixed-link here as well?

The call to mv88e6341_port_set_cmode() introduced in commit 7a3007d22e8
("net: dsa: mv88e6xxx: fully support SERDES on Topaz family") still
breaks port 5 (cpu) configuration. When called, its mode parameter is
set to PHY_INTERFACE_MODE_2500BASEX (19).

Any idea?

Thanks,
baruch

--
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-15 10:13                         ` Baruch Siach
@ 2019-12-15 11:10                           ` Baruch Siach
  2019-12-15 14:53                             ` Andrew Lunn
  2019-12-18 14:30                           ` Marek Behún
  1 sibling, 1 reply; 27+ messages in thread
From: Baruch Siach @ 2019-12-15 11:10 UTC (permalink / raw)
  To: Marek Behun
  Cc: Andrew Lunn, Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

Hi Marek,

On Sun, Dec 15 2019, Baruch Siach wrote:
> The call to mv88e6341_port_set_cmode() introduced in commit 7a3007d22e8
> ("net: dsa: mv88e6xxx: fully support SERDES on Topaz family") still
> breaks port 5 (cpu) configuration. When called, its mode parameter is
> set to PHY_INTERFACE_MODE_2500BASEX (19).
>
> Any idea?

I dug a little further here. It turns out that
mv88e6xxx_port_set_cmode() does not do any register write, because it
exits early here with cmode == 0xb:

        /* cmode doesn't change, nothing to do for us */
        if (cmode == chip->ports[port].cmode)
                return 0;

mv88e6341_port_set_cmode_writable() breaks the port configuration with
its call to mv88e6xxx_port_hidden_write(). Before this call
mv88e6352_port_get_cmode() sets cmode to 0xb. After this call cmode is
6. This breaks the assumption that the equality test above relies on.

This fixes cpu port configuration for me:

diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
index 7fe256c5739d..a6c320978bcf 100644
--- a/drivers/net/dsa/mv88e6xxx/port.c
+++ b/drivers/net/dsa/mv88e6xxx/port.c
@@ -427,10 +427,6 @@ static int mv88e6xxx_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
 		cmode = 0;
 	}
 
-	/* cmode doesn't change, nothing to do for us */
-	if (cmode == chip->ports[port].cmode)
-		return 0;
-
 	lane = mv88e6xxx_serdes_get_lane(chip, port);
 	if (lane) {
 		if (chip->ports[port].serdes_irq) {

Does that make sense?

Thanks,
baruch

--
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-15 11:10                           ` Baruch Siach
@ 2019-12-15 14:53                             ` Andrew Lunn
  2019-12-15 15:08                               ` Baruch Siach
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Lunn @ 2019-12-15 14:53 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Marek Behun, Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

> This fixes cpu port configuration for me:
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
> index 7fe256c5739d..a6c320978bcf 100644
> --- a/drivers/net/dsa/mv88e6xxx/port.c
> +++ b/drivers/net/dsa/mv88e6xxx/port.c
> @@ -427,10 +427,6 @@ static int mv88e6xxx_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
>  		cmode = 0;
>  	}
>  
> -	/* cmode doesn't change, nothing to do for us */
> -	if (cmode == chip->ports[port].cmode)
> -		return 0;
> -
>  	lane = mv88e6xxx_serdes_get_lane(chip, port);
>  	if (lane) {
>  		if (chip->ports[port].serdes_irq) {
> 
> Does that make sense?

This needs testing on a 6390, with a port 9 or 10 using fixed link. We
have had issues in the past where mac_config() has been called once
per second, and each time it reconfigured the MAC, causing the link to
go down/up. So we try to avoid doing work which is not requires and
which could upset the link.

      Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-15 14:53                             ` Andrew Lunn
@ 2019-12-15 15:08                               ` Baruch Siach
  2019-12-15 16:14                                 ` Andrew Lunn
  0 siblings, 1 reply; 27+ messages in thread
From: Baruch Siach @ 2019-12-15 15:08 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Marek Behun, Vivien Didelot, netdev, Denis Odintsov,
	Hubert Feurstein, Heiner Kallweit

Hi Andrew,

On Sun, Dec 15 2019, Andrew Lunn wrote:
>> This fixes cpu port configuration for me:
>>
>> diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
>> index 7fe256c5739d..a6c320978bcf 100644
>> --- a/drivers/net/dsa/mv88e6xxx/port.c
>> +++ b/drivers/net/dsa/mv88e6xxx/port.c
>> @@ -427,10 +427,6 @@ static int mv88e6xxx_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
>>  		cmode = 0;
>>  	}
>>
>> -	/* cmode doesn't change, nothing to do for us */
>> -	if (cmode == chip->ports[port].cmode)
>> -		return 0;
>> -
>>  	lane = mv88e6xxx_serdes_get_lane(chip, port);
>>  	if (lane) {
>>  		if (chip->ports[port].serdes_irq) {
>>
>> Does that make sense?
>
> This needs testing on a 6390, with a port 9 or 10 using fixed link. We
> have had issues in the past where mac_config() has been called once
> per second, and each time it reconfigured the MAC, causing the link to
> go down/up. So we try to avoid doing work which is not requires and
> which could upset the link.

You refer to ed8fe20205ac ("net: dsa: mv88e6xxx: prevent interrupt storm
caused by mv88e6390x_port_set_cmode") that introduced this code.

The alternative is to call ->port_get_cmode() to refresh the cmode cache
after mv88e6xxx_port_hidden_write().

What do you think?

Thanks,
baruch

--
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-15 15:08                               ` Baruch Siach
@ 2019-12-15 16:14                                 ` Andrew Lunn
  2019-12-17 18:26                                   ` Baruch Siach
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew Lunn @ 2019-12-15 16:14 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Marek Behun, Vivien Didelot, netdev, Denis Odintsov,
	Hubert Feurstein, Heiner Kallweit

On Sun, Dec 15, 2019 at 05:08:01PM +0200, Baruch Siach wrote:
> Hi Andrew,
> 
> On Sun, Dec 15 2019, Andrew Lunn wrote:
> >> This fixes cpu port configuration for me:
> >>
> >> diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
> >> index 7fe256c5739d..a6c320978bcf 100644
> >> --- a/drivers/net/dsa/mv88e6xxx/port.c
> >> +++ b/drivers/net/dsa/mv88e6xxx/port.c
> >> @@ -427,10 +427,6 @@ static int mv88e6xxx_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
> >>  		cmode = 0;
> >>  	}
> >>
> >> -	/* cmode doesn't change, nothing to do for us */
> >> -	if (cmode == chip->ports[port].cmode)
> >> -		return 0;
> >> -
> >>  	lane = mv88e6xxx_serdes_get_lane(chip, port);
> >>  	if (lane) {
> >>  		if (chip->ports[port].serdes_irq) {
> >>
> >> Does that make sense?
> >
> > This needs testing on a 6390, with a port 9 or 10 using fixed link. We
> > have had issues in the past where mac_config() has been called once
> > per second, and each time it reconfigured the MAC, causing the link to
> > go down/up. So we try to avoid doing work which is not requires and
> > which could upset the link.
> 
> You refer to ed8fe20205ac ("net: dsa: mv88e6xxx: prevent interrupt storm
> caused by mv88e6390x_port_set_cmode") that introduced this code.
> 
> The alternative is to call ->port_get_cmode() to refresh the cmode cache
> after mv88e6xxx_port_hidden_write().

Refreshing the cmode after mv88e6xxx_port_hidden_write() sounds like a
good idea. But please limit it to just switches which need to make
cmode writable. cmode is rather fragile and the 6390 family is easy to
break in this area.

      Thanks
		Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-15 16:14                                 ` Andrew Lunn
@ 2019-12-17 18:26                                   ` Baruch Siach
  0 siblings, 0 replies; 27+ messages in thread
From: Baruch Siach @ 2019-12-17 18:26 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Marek Behun, Vivien Didelot, netdev, Denis Odintsov,
	Hubert Feurstein, Heiner Kallweit

Hi Andrew,

On Sun, Dec 15, 2019 at 05:14:23PM +0100, Andrew Lunn wrote:
> On Sun, Dec 15, 2019 at 05:08:01PM +0200, Baruch Siach wrote:
> > On Sun, Dec 15 2019, Andrew Lunn wrote:
> > >> This fixes cpu port configuration for me:
> > >>
> > >> diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
> > >> index 7fe256c5739d..a6c320978bcf 100644
> > >> --- a/drivers/net/dsa/mv88e6xxx/port.c
> > >> +++ b/drivers/net/dsa/mv88e6xxx/port.c
> > >> @@ -427,10 +427,6 @@ static int mv88e6xxx_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
> > >>  		cmode = 0;
> > >>  	}
> > >>
> > >> -	/* cmode doesn't change, nothing to do for us */
> > >> -	if (cmode == chip->ports[port].cmode)
> > >> -		return 0;
> > >> -
> > >>  	lane = mv88e6xxx_serdes_get_lane(chip, port);
> > >>  	if (lane) {
> > >>  		if (chip->ports[port].serdes_irq) {
> > >>
> > >> Does that make sense?
> > >
> > > This needs testing on a 6390, with a port 9 or 10 using fixed link. We
> > > have had issues in the past where mac_config() has been called once
> > > per second, and each time it reconfigured the MAC, causing the link to
> > > go down/up. So we try to avoid doing work which is not requires and
> > > which could upset the link.
> > 
> > You refer to ed8fe20205ac ("net: dsa: mv88e6xxx: prevent interrupt storm
> > caused by mv88e6390x_port_set_cmode") that introduced this code.
> > 
> > The alternative is to call ->port_get_cmode() to refresh the cmode cache
> > after mv88e6xxx_port_hidden_write().
> 
> Refreshing the cmode after mv88e6xxx_port_hidden_write() sounds like a
> good idea. But please limit it to just switches which need to make
> cmode writable. cmode is rather fragile and the 6390 family is easy to
> break in this area.

This turned out to be much harder than expected. cmode update after 
mv88e6xxx_port_hidden_write() breaks the serdes configuration. The link change 
irq does not trigger on serdes interface up. The strange thing is that if I 
add 10ms delay after cmode read (or anywhere before the mv88e6xxx_port_write() 
call in mv88e6xxx_port_set_cmode()) it works again. I have no idea what is 
going on here. The cmode cache is used in many places, so maybe setting it to 
an invalid value (6) is not a good idea.

I ended up with cmode write force in mv88e6341_port_set_cmode_writable() like 
this:

diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 8a8e38bfb161..70284f100d87 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -233,6 +233,7 @@ struct mv88e6xxx_port {
 	u64 vtu_member_violation;
 	u64 vtu_miss_violation;
 	u8 cmode;
+	bool force_cmode;
 	bool mirror_ingress;
 	bool mirror_egress;
 	unsigned int serdes_irq;
diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
index 7fe256c5739d..8e8724eb5669 100644
--- a/drivers/net/dsa/mv88e6xxx/port.c
+++ b/drivers/net/dsa/mv88e6xxx/port.c
@@ -427,9 +427,10 @@ static int mv88e6xxx_port_set_cmode(struct mv88e6xxx_chip *chip, int port,
 		cmode = 0;
 	}
 
-	/* cmode doesn't change, nothing to do for us */
-	if (cmode == chip->ports[port].cmode)
+	/* cmode doesn't change, nothing to do for us unless forced */
+	if (cmode == chip->ports[port].cmode && !chip->ports[port].force_cmode)
 		return 0;
+	chip->ports[port].force_cmode = false;
 
 	lane = mv88e6xxx_serdes_get_lane(chip, port);
 	if (lane) {
@@ -516,6 +517,8 @@ static int mv88e6341_port_set_cmode_writable(struct mv88e6xxx_chip *chip,
 	if (port != 5)
 		return -EOPNOTSUPP;
 
+	chip->ports[port].force_cmode = true;
+
 	addr = chip->info->port_base_addr + port;
 
 	err = mv88e6xxx_port_hidden_read(chip, 0x7, addr, 0, &reg);

Does that look right?

baruch

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-15 10:13                         ` Baruch Siach
  2019-12-15 11:10                           ` Baruch Siach
@ 2019-12-18 14:30                           ` Marek Behún
  2019-12-18 16:04                             ` Baruch Siach
  2019-12-19  9:04                             ` Baruch Siach
  1 sibling, 2 replies; 27+ messages in thread
From: Marek Behún @ 2019-12-18 14:30 UTC (permalink / raw)
  To: Baruch Siach
  Cc: Andrew Lunn, Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

Hi Baruch,

On Sun, 15 Dec 2019 12:13:25 +0200
Baruch Siach <baruch@tkos.co.il> wrote:

> Thanks. That is enough to fix the phylink issue triggered by commit
> 7fb5a711545 ("net: dsa: mv88e6xxx: drop adjust_link to enabled
> phylink").
> 
> The Clearfog GT-8K DT has also this on the cpu side:
> 
> &cp1_eth2 {
>         status = "okay";
>         phy-mode = "2500base-x";
>         phys = <&cp1_comphy5 2>;
>         fixed-link {
>                 speed = <2500>;
>                 full-duplex;
>         };
> };
> 
> Should I drop fixed-link here as well?

I would think yes. phy-mode = 2500base-x should already force 2500mbps,
the fixed-link should be irrelevant. Whether this is truly the case I
do not know, but on Turris Mox I do not use fixed-link with these.

> 
> The call to mv88e6341_port_set_cmode() introduced in commit
> 7a3007d22e8 ("net: dsa: mv88e6xxx: fully support SERDES on Topaz
> family") still breaks port 5 (cpu) configuration. When called, its
> mode parameter is set to PHY_INTERFACE_MODE_2500BASEX (19).
> 
> Any idea?

I shall look into this. On Turris Mox this works, so I will have to do
some experiments.

> Thanks,
> baruch
> 
> --
>      http://baruch.siach.name/blog/                  ~. .~   Tk Open
> Systems
> =}------------------------------------------------ooO--U--Ooo------------{=
>    - baruch@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il
> -


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-18 14:30                           ` Marek Behún
@ 2019-12-18 16:04                             ` Baruch Siach
  2019-12-19  9:04                             ` Baruch Siach
  1 sibling, 0 replies; 27+ messages in thread
From: Baruch Siach @ 2019-12-18 16:04 UTC (permalink / raw)
  To: Marek Behún
  Cc: Andrew Lunn, Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

Hi Marek,

On Wed, Dec 18 2019, Marek Behún wrote:
> On Sun, 15 Dec 2019 12:13:25 +0200
> Baruch Siach <baruch@tkos.co.il> wrote:
>
>> Thanks. That is enough to fix the phylink issue triggered by commit
>> 7fb5a711545 ("net: dsa: mv88e6xxx: drop adjust_link to enabled
>> phylink").
>> 
>> The Clearfog GT-8K DT has also this on the cpu side:
>> 
>> &cp1_eth2 {
>>         status = "okay";
>>         phy-mode = "2500base-x";
>>         phys = <&cp1_comphy5 2>;
>>         fixed-link {
>>                 speed = <2500>;
>>                 full-duplex;
>>         };
>> };
>> 
>> Should I drop fixed-link here as well?
>
> I would think yes. phy-mode = 2500base-x should already force 2500mbps,
> the fixed-link should be irrelevant. Whether this is truly the case I
> do not know, but on Turris Mox I do not use fixed-link with these.

OK. I'll test that.

>> The call to mv88e6341_port_set_cmode() introduced in commit
>> 7a3007d22e8 ("net: dsa: mv88e6xxx: fully support SERDES on Topaz
>> family") still breaks port 5 (cpu) configuration. When called, its
>> mode parameter is set to PHY_INTERFACE_MODE_2500BASEX (19).
>> 
>> Any idea?
>
> I shall look into this. On Turris Mox this works, so I will have to do
> some experiments.

I have made some progress on this issue. See my other messages on this
thread. My latest suggestion is here:

  https://lore.kernel.org/netdev/20191217182643.augknhx57pafnelv@sapphire.tkos.co.il/

I now think that the force_cmode field is not necessary. We should just
add a 'force' parameter to mv88e6xxx_port_set_cmode(), and set it true
in mv88e6341_port_set_cmode().

baruch

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [BUG] mv88e6xxx: tx regression in v5.3
  2019-12-18 14:30                           ` Marek Behún
  2019-12-18 16:04                             ` Baruch Siach
@ 2019-12-19  9:04                             ` Baruch Siach
  1 sibling, 0 replies; 27+ messages in thread
From: Baruch Siach @ 2019-12-19  9:04 UTC (permalink / raw)
  To: Marek Behún
  Cc: Andrew Lunn, Vivien Didelot, netdev, Denis Odintsov, Hubert Feurstein

Hi Marek,

On Wed, Dec 18, 2019 at 03:30:35PM +0100, Marek Behún wrote:
> On Sun, 15 Dec 2019 12:13:25 +0200
> Baruch Siach <baruch@tkos.co.il> wrote:
> 
> > Thanks. That is enough to fix the phylink issue triggered by commit
> > 7fb5a711545 ("net: dsa: mv88e6xxx: drop adjust_link to enabled
> > phylink").
> > 
> > The Clearfog GT-8K DT has also this on the cpu side:
> > 
> > &cp1_eth2 {
> >         status = "okay";
> >         phy-mode = "2500base-x";
> >         phys = <&cp1_comphy5 2>;
> >         fixed-link {
> >                 speed = <2500>;
> >                 full-duplex;
> >         };
> > };
> > 
> > Should I drop fixed-link here as well?
> 
> I would think yes. phy-mode = 2500base-x should already force 2500mbps,
> the fixed-link should be irrelevant. Whether this is truly the case I
> do not know, but on Turris Mox I do not use fixed-link with these.

It doesn't work here without fixed-link. The link state remains down.

baruch

> > The call to mv88e6341_port_set_cmode() introduced in commit
> > 7a3007d22e8 ("net: dsa: mv88e6xxx: fully support SERDES on Topaz
> > family") still breaks port 5 (cpu) configuration. When called, its
> > mode parameter is set to PHY_INTERFACE_MODE_2500BASEX (19).
> > 
> > Any idea?
> 
> I shall look into this. On Turris Mox this works, so I will have to do
> some experiments.

-- 
     http://baruch.siach.name/blog/                  ~. .~   Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
   - baruch@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il -

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-12-19  9:04 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-11  9:48 [BUG] mv88e6xxx: tx regression in v5.3 Baruch Siach
2019-12-11 13:11 ` Andrew Lunn
2019-12-11 17:07   ` Baruch Siach
2019-12-11 17:49     ` Andrew Lunn
2019-12-12  8:50       ` Baruch Siach
2019-12-12 13:14         ` Andrew Lunn
2019-12-12 14:19           ` Andreas Tobler
2019-12-12 15:08           ` Baruch Siach
2019-12-12 15:13             ` Andrew Lunn
2019-12-12 15:23               ` Baruch Siach
2019-12-12 18:17                 ` Baruch Siach
2019-12-12 18:32                   ` Marek Behun
2019-12-12 18:48                     ` Baruch Siach
2019-12-12 18:36                 ` Marek Behun
2019-12-12 19:06                   ` Baruch Siach
2019-12-12 19:31                     ` Andrew Lunn
2019-12-12 19:41                       ` Marek Behun
2019-12-12 20:49                         ` Andrew Lunn
2019-12-15 10:13                         ` Baruch Siach
2019-12-15 11:10                           ` Baruch Siach
2019-12-15 14:53                             ` Andrew Lunn
2019-12-15 15:08                               ` Baruch Siach
2019-12-15 16:14                                 ` Andrew Lunn
2019-12-17 18:26                                   ` Baruch Siach
2019-12-18 14:30                           ` Marek Behún
2019-12-18 16:04                             ` Baruch Siach
2019-12-19  9:04                             ` Baruch Siach

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.