All of lore.kernel.org
 help / color / mirror / Atom feed
* warnings from MTU setting on switch ports
@ 2020-11-30 14:30 Rasmus Villemoes
  2020-11-30 16:04 ` Vladimir Oltean
  0 siblings, 1 reply; 15+ messages in thread
From: Rasmus Villemoes @ 2020-11-30 14:30 UTC (permalink / raw)
  To: Network Development; +Cc: Vladimir Oltean, Florian Fainelli

Hi,

Updating our mpc8309 board to 5.9, we're starting to get

[    0.709832] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting
MTU on port 0
[    0.720721] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting
MTU on port 1
[    0.731002] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting
MTU on port 2
[    0.741333] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting
MTU on port 3
[    0.752220] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting
MTU on port 4
[    0.764231] eth1: mtu greater than device maximum
[    0.769022] ucc_geth e0102000.ethernet eth1: error -22 setting MTU to
include DSA overhead

So it does say "nonfatal", but do we have to live with those warnings on
every boot going forward, or is there something that we could do to
silence it?

It's a mv88e6250 switch with cpu port connected to a ucc_geth interface;
the ucc_geth driver indeed does not implement ndo_change_mtu and has
->max_mtu set to the default of 1500.

Rasmus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: warnings from MTU setting on switch ports
  2020-11-30 14:30 warnings from MTU setting on switch ports Rasmus Villemoes
@ 2020-11-30 16:04 ` Vladimir Oltean
  2020-11-30 22:13   ` Rasmus Villemoes
  0 siblings, 1 reply; 15+ messages in thread
From: Vladimir Oltean @ 2020-11-30 16:04 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Network Development, Florian Fainelli

Hi Rasmus,

On Mon, Nov 30, 2020 at 03:30:50PM +0100, Rasmus Villemoes wrote:
> Hi,
>
> Updating our mpc8309 board to 5.9, we're starting to get
>
> [    0.709832] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU on port 0
> [    0.720721] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU on port 1
> [    0.731002] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU on port 2
> [    0.741333] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU on port 3
> [    0.752220] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU on port 4
> [    0.764231] eth1: mtu greater than device maximum
> [    0.769022] ucc_geth e0102000.ethernet eth1: error -22 setting MTU to include DSA overhead
>
> So it does say "nonfatal", but do we have to live with those warnings on
> every boot going forward, or is there something that we could do to
> silence it?
>
> It's a mv88e6250 switch with cpu port connected to a ucc_geth interface;
> the ucc_geth driver indeed does not implement ndo_change_mtu and has
> ->max_mtu set to the default of 1500.

To suppress the warning:

commit 4349abdb409b04a5ed4ca4d2c1df7ef0cc16f6bd
Author: Vladimir Oltean <olteanv@gmail.com>
Date:   Tue Sep 8 02:25:56 2020 +0300

    net: dsa: don't print non-fatal MTU error if not supported

    Commit 72579e14a1d3 ("net: dsa: don't fail to probe if we couldn't set
    the MTU") changed, for some reason, the "err && err != -EOPNOTSUPP"
    check into a simple "err". This causes the MTU warning to be printed
    even for drivers that don't have the MTU operations implemented.
    Fix that.

    Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>

But you might also want to look into adding .ndo_change_mtu for
ucc_geth. If you are able to pass MTU-sized traffic through your
mv88e6085, then it is probably the case that the mpc8309 already
supports larger packets than 1500 bytes, and it is simply a matter of
letting the stack know about that. The warning is there to give people a
clue for the reason why MTU-sized traffic might not work over DSA.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: warnings from MTU setting on switch ports
  2020-11-30 16:04 ` Vladimir Oltean
@ 2020-11-30 22:13   ` Rasmus Villemoes
  2020-11-30 22:35     ` Vladimir Oltean
  2020-12-05 14:49     ` vlan_filtering=1 breaks all traffic (was: Re: warnings from MTU setting on switch ports) Rasmus Villemoes
  0 siblings, 2 replies; 15+ messages in thread
From: Rasmus Villemoes @ 2020-11-30 22:13 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Network Development, Florian Fainelli

On 30/11/2020 17.04, Vladimir Oltean wrote:
> Hi Rasmus,
> 
> On Mon, Nov 30, 2020 at 03:30:50PM +0100, Rasmus Villemoes wrote:
>> Hi,
>>
>> Updating our mpc8309 board to 5.9, we're starting to get
>>
>> [    0.709832] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU on port 0
>> [    0.720721] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU on port 1
>> [    0.731002] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU on port 2
>> [    0.741333] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU on port 3
>> [    0.752220] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU on port 4
>> [    0.764231] eth1: mtu greater than device maximum
>> [    0.769022] ucc_geth e0102000.ethernet eth1: error -22 setting MTU to include DSA overhead
>>
>> So it does say "nonfatal", but do we have to live with those warnings on
>> every boot going forward, or is there something that we could do to
>> silence it?
>>
>> It's a mv88e6250 switch with cpu port connected to a ucc_geth interface;
>> the ucc_geth driver indeed does not implement ndo_change_mtu and has
>> ->max_mtu set to the default of 1500.
> 
> To suppress the warning:
> 
> commit 4349abdb409b04a5ed4ca4d2c1df7ef0cc16f6bd
> Author: Vladimir Oltean <olteanv@gmail.com>
> Date:   Tue Sep 8 02:25:56 2020 +0300
> 
>     net: dsa: don't print non-fatal MTU error if not supported
>

Thanks, but I don't think that will change anything. -34 is -ERANGE.
> But you might also want to look into adding .ndo_change_mtu for
> ucc_geth. 

Well, that was what I first did, but I'm incapable of making sense of
the QE reference manual. Perhaps, given the domain of your email
address, you could poke someone that might know what would need to be done?

In any case, something else seems to be broken with 5.9; no network
traffic seems to be working (but the bridge creation etc. seems to work
just the same, link status works, and "ip link show" shows the same
things as in 4.19). So until I figure that out I can't play around with
modifying ucc_geth.

If you are able to pass MTU-sized traffic through your
> mv88e6085, then it is probably the case that the mpc8309 already
> supports larger packets than 1500 bytes, and it is simply a matter of
> letting the stack know about that. 

Perhaps, but I don't know how I should test that given that 1500
give-or-take is hardcoded. FWIW, on a 4.19 kernel, I can do 'ping -s X
-M do' for X up to 1472 for IPv4 and 1452 for IPv6, but I don't think
that tells me much about what the hardware could do.

A thought: Shouldn't the initialization of slave_dev->max_mtu in
dsa_slave_create() be capped by master->max_mtu minus tag overhead?

Rasmus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: warnings from MTU setting on switch ports
  2020-11-30 22:13   ` Rasmus Villemoes
@ 2020-11-30 22:35     ` Vladimir Oltean
  2020-11-30 22:50       ` Rasmus Villemoes
  2020-11-30 23:11       ` Andrew Lunn
  2020-12-05 14:49     ` vlan_filtering=1 breaks all traffic (was: Re: warnings from MTU setting on switch ports) Rasmus Villemoes
  1 sibling, 2 replies; 15+ messages in thread
From: Vladimir Oltean @ 2020-11-30 22:35 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Network Development, Florian Fainelli, Andrew Lunn

On Mon, Nov 30, 2020 at 11:13:39PM +0100, Rasmus Villemoes wrote:
> On 30/11/2020 17.04, Vladimir Oltean wrote:
> > Hi Rasmus,
> >
> > On Mon, Nov 30, 2020 at 03:30:50PM +0100, Rasmus Villemoes wrote:
> >> Hi,
> >>
> >> Updating our mpc8309 board to 5.9, we're starting to get
> >>
> >> [    0.709832] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU on port 0
> >> [    0.720721] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU on port 1
> >> [    0.731002] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU on port 2
> >> [    0.741333] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU on port 3
> >> [    0.752220] mv88e6085 mdio@e0102120:10: nonfatal error -34 setting MTU on port 4
> >> [    0.764231] eth1: mtu greater than device maximum
> >> [    0.769022] ucc_geth e0102000.ethernet eth1: error -22 setting MTU to include DSA overhead
> >>
> >> So it does say "nonfatal", but do we have to live with those warnings on
> >> every boot going forward, or is there something that we could do to
> >> silence it?
> >>
> >> It's a mv88e6250 switch with cpu port connected to a ucc_geth interface;
> >> the ucc_geth driver indeed does not implement ndo_change_mtu and has
> >> ->max_mtu set to the default of 1500.
> >
> > To suppress the warning:
> >
> > commit 4349abdb409b04a5ed4ca4d2c1df7ef0cc16f6bd
> > Author: Vladimir Oltean <olteanv@gmail.com>
> > Date:   Tue Sep 8 02:25:56 2020 +0300
> >
> >     net: dsa: don't print non-fatal MTU error if not supported
> >
>
> Thanks, but I don't think that will change anything. -34 is -ERANGE.

Yup, I thought of that after responding.

> > But you might also want to look into adding .ndo_change_mtu for
> > ucc_geth.
>
> Well, that was what I first did, but I'm incapable of making sense of
> the QE reference manual. Perhaps, given the domain of your email
> address, you could poke someone that might know what would need to be done?

Well, you could poke Leo Li, the QUICC engine maintainer, yourself too.
What exactly from the QEIWRM.pdf are you incapable of making sense of?

The MFLR register appears to be at offset 0x4c within the Rx Global
Parameter RAM.

> > If you are able to pass MTU-sized traffic through your
> > mv88e6085, then it is probably the case that the mpc8309 already
> > supports larger packets than 1500 bytes, and it is simply a matter of
> > letting the stack know about that.
>
> Perhaps, but I don't know how I should test that given that 1500
> give-or-take is hardcoded.

Very simply, you could run iperf3 or any other TCP stream (ssh,
whatever). That will eventually increase the TCP segment size to the
MTU. If the TCP stream doesn't hang, then the QUICC engine can receive
packets over the MTU.

> FWIW, on a 4.19 kernel, I can do 'ping -s X -M do' for X up to 1472
> for IPv4 and 1452 for IPv6, but I don't think that tells me much about
> what the hardware could do.

IP will get fragmented by the stack to the interface's MTU.

> A thought: Shouldn't the initialization of slave_dev->max_mtu in
> dsa_slave_create() be capped by master->max_mtu minus tag overhead?

Talk to Andrew:
https://www.spinics.net/lists/netdev/msg645810.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: warnings from MTU setting on switch ports
  2020-11-30 22:35     ` Vladimir Oltean
@ 2020-11-30 22:50       ` Rasmus Villemoes
  2020-11-30 23:11       ` Andrew Lunn
  1 sibling, 0 replies; 15+ messages in thread
From: Rasmus Villemoes @ 2020-11-30 22:50 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Network Development, Florian Fainelli, Andrew Lunn

On 30/11/2020 23.35, Vladimir Oltean wrote:
> On Mon, Nov 30, 2020 at 11:13:39PM +0100, Rasmus Villemoes wrote:

>> FWIW, on a 4.19 kernel, I can do 'ping -s X -M do' for X up to 1472
>> for IPv4 and 1452 for IPv6, but I don't think that tells me much about
>> what the hardware could do.
> 
> IP will get fragmented by the stack to the interface's MTU.

Nope, that's what the '-M do' is for:

# ping -M do -s 1880 169.254.246.13
PING 169.254.246.13 (169.254.246.13) 1880(1908) bytes of data.
ping: local error: message too long, mtu=1500

Rasmus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: warnings from MTU setting on switch ports
  2020-11-30 22:35     ` Vladimir Oltean
  2020-11-30 22:50       ` Rasmus Villemoes
@ 2020-11-30 23:11       ` Andrew Lunn
  1 sibling, 0 replies; 15+ messages in thread
From: Andrew Lunn @ 2020-11-30 23:11 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Rasmus Villemoes, Network Development, Florian Fainelli

> > A thought: Shouldn't the initialization of slave_dev->max_mtu in
> > dsa_slave_create() be capped by master->max_mtu minus tag overhead?
> 
> Talk to Andrew:
> https://www.spinics.net/lists/netdev/msg645810.html

Yes, this is historic. DSA started life with Marvell switches
connected to Marvell MACs. And Marvell MACs always allowed frames
bigger than the MTU to be sent/received. A few more MACs were paired
with Marvell switches, and they also happened to allow bigger frames
than the MTU to be used. So it all worked fine until a MAC/Switch pair
came along where the MAC did not pass frames bigger than the MTU. We
could not break backwards compatibility, so decided to ask the MAC to
up its MTU, but not error out if it failed. It was like that for a
long time, until Vladimir added jumbo support. Then you do need to
know if the MAC supports bigger MTU, so you can disallow jumbo. So
this warning got added.

     Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

* vlan_filtering=1 breaks all traffic (was: Re: warnings from MTU setting on switch ports)
  2020-11-30 22:13   ` Rasmus Villemoes
  2020-11-30 22:35     ` Vladimir Oltean
@ 2020-12-05 14:49     ` Rasmus Villemoes
  2020-12-05 15:45       ` Russell King - ARM Linux admin
  2020-12-05 19:03       ` vlan_filtering=1 breaks all traffic (was: Re: warnings from MTU setting on switch ports) Vladimir Oltean
  1 sibling, 2 replies; 15+ messages in thread
From: Rasmus Villemoes @ 2020-12-05 14:49 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Network Development, Florian Fainelli, Russell King - ARM Linux

On 30/11/2020 23.13, Rasmus Villemoes wrote:
> On 30/11/2020 17.04, Vladimir Oltean wrote:

> Thanks, but I don't think that will change anything. -34 is -ERANGE.
>> But you might also want to look into adding .ndo_change_mtu for
>> ucc_geth. 
> 
> Well, that was what I first did, but I'm incapable of making sense of
> the QE reference manual. Perhaps, given the domain of your email
> address, you could poke someone that might know what would need to be done?
> 
> In any case, something else seems to be broken with 5.9; no network
> traffic seems to be working (but the bridge creation etc. seems to work
> just the same, link status works, and "ip link show" shows the same
> things as in 4.19). So until I figure that out I can't play around with
> modifying ucc_geth.

So, I found out that the problem disappers when I disable
vlan_filtering, and googling then led me to Russell's patches from
around March
(https://patchwork.ozlabs.org/project/netdev/cover/20200218114515.GL18808@shell.armlinux.org.uk/).

But, unlike from what I gather from Russell's description, the problem
is there whether or not the bridge is created with vlan_filtering
enabled from the outset or not. Also, cherry-picking 517648c8d1 to
5.9.12 doesn't help. The problem also exists on 5.4.80, and (somewhat
naively) backporting 54a0ed0df496 as well as 517648c8d1 doesn't change that.

So I'm out of ideas. I also tried booting a 5.3, but that had some
horrible UBI/nand failure, and since the mv88e6250 support only landed
in 5.3, it's rather difficult to try kernels further back.

Does anyone know what might have happened between 4.19 and 5.4 that
caused this change in behaviour for Marvell switches?

Rasmus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vlan_filtering=1 breaks all traffic (was: Re: warnings from MTU setting on switch ports)
  2020-12-05 14:49     ` vlan_filtering=1 breaks all traffic (was: Re: warnings from MTU setting on switch ports) Rasmus Villemoes
@ 2020-12-05 15:45       ` Russell King - ARM Linux admin
  2020-12-05 16:02         ` vlan_filtering=1 breaks all traffic Rasmus Villemoes
  2020-12-05 19:03       ` vlan_filtering=1 breaks all traffic (was: Re: warnings from MTU setting on switch ports) Vladimir Oltean
  1 sibling, 1 reply; 15+ messages in thread
From: Russell King - ARM Linux admin @ 2020-12-05 15:45 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: Vladimir Oltean, Network Development, Florian Fainelli

On Sat, Dec 05, 2020 at 03:49:02PM +0100, Rasmus Villemoes wrote:
> So, I found out that the problem disappers when I disable
> vlan_filtering, and googling then led me to Russell's patches from
> around March
> (https://patchwork.ozlabs.org/project/netdev/cover/20200218114515.GL18808@shell.armlinux.org.uk/).
> 
> But, unlike from what I gather from Russell's description, the problem
> is there whether or not the bridge is created with vlan_filtering
> enabled from the outset or not.

No. My problem is where the bridge is created _without_ vlan_filtering
enabled, and is subsequently enabled. That caused traffic to stop as
soon as vlan_filtering was enabled.

Note that if the bridge were created with vlan_filtering enabled from
the start, there would be no problem.

> Also, cherry-picking 517648c8d1 to
> 5.9.12 doesn't help. The problem also exists on 5.4.80, and (somewhat
> naively) backporting 54a0ed0df496 as well as 517648c8d1 doesn't change that.

I'm not sure what 517648c8d1 is - this isn't a mainline or net-next
commit ID.

54a0ed0df496 is a modification by Vladimir Oltean of my original patch
("net: dsa: mv88e6xxx: fix vlan setup") that fixed it for Marvell DSA,
dropping the Marvell DSA bits of the patch, thereby neutering it for
its intended purpose, while still mostly maintaining the commit
description which implies that it fixes Marvell DSA.

I'm afraid that I've now given up dealing with anything that involves
interaction with the impossible Vladimir Oltean - the problems you are
experiencing are a direct result of Vladimir Oltean's attitudes towards
fellow kernel developers.

You will, however, find that the problem was subsequently fixed by
1fb74191988f on top of 54a0ed0df496, which adds the part that
Vladimir Oltean failed to carry forward. It's all an unnecessary
horrible mess.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vlan_filtering=1 breaks all traffic
  2020-12-05 15:45       ` Russell King - ARM Linux admin
@ 2020-12-05 16:02         ` Rasmus Villemoes
  0 siblings, 0 replies; 15+ messages in thread
From: Rasmus Villemoes @ 2020-12-05 16:02 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Vladimir Oltean, Network Development, Florian Fainelli

On 05/12/2020 16.45, Russell King - ARM Linux admin wrote:
> On Sat, Dec 05, 2020 at 03:49:02PM +0100, Rasmus Villemoes wrote:
>> So, I found out that the problem disappers when I disable
>> vlan_filtering, and googling then led me to Russell's patches from
>> around March
>> (https://patchwork.ozlabs.org/project/netdev/cover/20200218114515.GL18808@shell.armlinux.org.uk/).
>>
>> But, unlike from what I gather from Russell's description, the problem
>> is there whether or not the bridge is created with vlan_filtering
>> enabled from the outset or not.
> 
> No. My problem is where the bridge is created _without_ vlan_filtering
> enabled, and is subsequently enabled. That caused traffic to stop as
> soon as vlan_filtering was enabled.
> 
> Note that if the bridge were created with vlan_filtering enabled from
> the start, there would be no problem.

That was what I was trying to say: I see the problem whether or not
vlan_filtering is enabled from the beginning (hence the "unlike..."). So
since disabling vlan_filtering (or never enabling it in the first place)
makes traffic flow, the symptoms look similar to the problem you
described (and that was what led me to those patches).

>> Also, cherry-picking 517648c8d1 to
>> 5.9.12 doesn't help. The problem also exists on 5.4.80, and (somewhat
>> naively) backporting 54a0ed0df496 as well as 517648c8d1 doesn't change that.
> 
> I'm not sure what 517648c8d1 is - this isn't a mainline or net-next
> commit ID.

Sorry, I mistakenly took the sha1 from the 5.4 branch I cherry-picked it
to - I did indeed mean your subsequent 1fb74191988f setting the flag for
mv88e6xxxx.

> You will, however, find that the problem was subsequently fixed by
> 1fb74191988f on top of 54a0ed0df496, 

Unfortunately not. So either it's simply some completely different
underlying issue, or there's something else going on in my setup that
prevents the configure_vlan_while_not_filtering patches from fixing it.

Thanks,
Rasmus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vlan_filtering=1 breaks all traffic (was: Re: warnings from MTU setting on switch ports)
  2020-12-05 14:49     ` vlan_filtering=1 breaks all traffic (was: Re: warnings from MTU setting on switch ports) Rasmus Villemoes
  2020-12-05 15:45       ` Russell King - ARM Linux admin
@ 2020-12-05 19:03       ` Vladimir Oltean
  2020-12-05 20:13         ` vlan_filtering=1 breaks all traffic Rasmus Villemoes
  1 sibling, 1 reply; 15+ messages in thread
From: Vladimir Oltean @ 2020-12-05 19:03 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Network Development, Florian Fainelli, Russell King - ARM Linux

Hi Rasmus,

On Sat, Dec 05, 2020 at 03:49:02PM +0100, Rasmus Villemoes wrote:
> So I'm out of ideas. I also tried booting a 5.3, but that had some
> horrible UBI/nand failure,

Test with a ramdisk maybe?

> and since the mv88e6250 support only landed
> in 5.3, it's rather difficult to try kernels further back.
> Does anyone know what might have happened between 4.19 and 5.4 that
> caused this change in behaviour for Marvell switches?

I think the most important question is: what commands are you running,
and how are you determining that "it doesn't work"? What type of traffic
are you sending, and how are you receiving it?

I don't own a 6250 switch, but the 6390 and 6190. On my switches, both
termination and forwarding work fine with VLAN filtering enabled -
tested just now. This is with the official v5.9 tag:

commit bbf5c979011a099af5dc76498918ed7df445635b (HEAD, tag: v5.9)
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun Oct 11 14:15:50 2020 -0700

    Linux 5.9

It's interesting that it is you who added VLAN support for the 6250 in
commit bec8e5725281 ("net: dsa: mv88e6xxx: implement vtu_getnext and
vtu_loadpurge for mv88e6250") which appeared around the v5.3 timeframe.
When you tested it then, did you apply the same test as you did now?

It is very confusing that you mention v4.19, since of course, it is the
v5.3 tag where the 6250 support for VLANs has appeared, just as you said.
As far as I can gather from your email, the only kernel where your
testing passes is a kernel that nobody else can test, am I right?

So it either is a problem specific to the 6250, or a problem that I did
not catch with the trivial setup I had below:

# uname -a
Linux buildroot 5.9.0-mox #1526 SMP PREEMPT Sat Dec 5 20:53:11 EET 2020 aarch64 GNU/Linux
# ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
# ip link set lan4 master br0
[   56.393743] br0: port 1(lan4) entered blocking state
[   56.396614] br0: port 1(lan4) entered disabled state
[   56.408327] device lan4 entered promiscuous mode
[   56.410255] device eth1 entered promiscuous mode
[   57.155280] br0: port 1(lan4) entered blocking state
[   57.157639] br0: port 1(lan4) entered forwarding state
#
# ip addr add 192.168.100.2/24 dev br0
# ping 192.168.100.1
[...]
--- 192.168.100.1 ping statistics ---
18 packets transmitted, 18 received, 0% packet loss, time 17034ms
rtt min/avg/max/mdev = 1.389/1.643/2.578/0.266 ms
# bridge vlan del dev lan4 vid 1
# bridge vlan add dev lan4 vid 100 pvid untagged && bridge vlan add dev br0 vid 100 pvid untagged self
# ping 192.168.100.1
[...]
^C--- 192.168.100.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2005ms
rtt min/avg/max/mdev = 0.993/1.358/1.633/0.269 ms

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vlan_filtering=1 breaks all traffic
  2020-12-05 19:03       ` vlan_filtering=1 breaks all traffic (was: Re: warnings from MTU setting on switch ports) Vladimir Oltean
@ 2020-12-05 20:13         ` Rasmus Villemoes
  2020-12-06 19:45           ` Vladimir Oltean
  0 siblings, 1 reply; 15+ messages in thread
From: Rasmus Villemoes @ 2020-12-05 20:13 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Network Development, Florian Fainelli, Russell King - ARM Linux

On 05/12/2020 20.03, Vladimir Oltean wrote:
> Hi Rasmus,
> 
> On Sat, Dec 05, 2020 at 03:49:02PM +0100, Rasmus Villemoes wrote:
>> So I'm out of ideas. I also tried booting a 5.3, but that had some
>> horrible UBI/nand failure,
> 
> Test with a ramdisk maybe?
> 
>> and since the mv88e6250 support only landed
>> in 5.3, it's rather difficult to try kernels further back.
>> Does anyone know what might have happened between 4.19 and 5.4 that
>> caused this change in behaviour for Marvell switches?
> 
> I think the most important question is: what commands are you running,
> and how are you determining that "it doesn't work"? What type of traffic
> are you sending, and how are you receiving it?
> 

Ping, ssh, you-name-it. Testing with both IPv4 and IPv6LL.

> I don't own a 6250 switch, but the 6390 and 6190. On my switches, both
> termination and forwarding work fine with VLAN filtering enabled -
> tested just now. This is with the official v5.9 tag:
> 
> commit bbf5c979011a099af5dc76498918ed7df445635b (HEAD, tag: v5.9)
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Sun Oct 11 14:15:50 2020 -0700
> 
>     Linux 5.9
> 
> It's interesting that it is you who added VLAN support for the 6250 

Not just VLAN, all of 6250 was added by me around that time.

in
> commit bec8e5725281 ("net: dsa: mv88e6xxx: implement vtu_getnext and
> vtu_loadpurge for mv88e6250") which appeared around the v5.3 timeframe.
> When you tested it then, did you apply the same test as you did now?
>
> It is very confusing that you mention v4.19, since of course, it is the
> v5.3 tag where the 6250 support for VLANs has appeared, just as you said.

We're running with the set of 6250 patches backported to 4.19, and the
testing was/is done on that. I'd love to be able to run Linus' master
branch at any time, and that's actually what I'm currently in the
process of getting closer to (apart from the switch issue, 5.9 seems to
work just fine for us with just about five out-of-tree patches, none
network-related).

> As far as I can gather from your email, the only kernel where your
> testing passes is a kernel that nobody else can test, am I right?

In a sense, yes, but nobody else has the hardware that I have either.

> So it either is a problem specific to the 6250,

That is certainly a possibility.

 or a problem that I did
> not catch with the trivial setup I had below:
> 
> # uname -a
> Linux buildroot 5.9.0-mox #1526 SMP PREEMPT Sat Dec 5 20:53:11 EET 2020 aarch64 GNU/Linux
> # ip link add br0 type bridge vlan_filtering 1 && ip link set br0 up
> # ip link set lan4 master br0
> [   56.393743] br0: port 1(lan4) entered blocking state
> [   56.396614] br0: port 1(lan4) entered disabled state
> [   56.408327] device lan4 entered promiscuous mode
> [   56.410255] device eth1 entered promiscuous mode
> [   57.155280] br0: port 1(lan4) entered blocking state
> [   57.157639] br0: port 1(lan4) entered forwarding state
> #
> # ip addr add 192.168.100.2/24 dev br0
> # ping 192.168.100.1
> [...]
> --- 192.168.100.1 ping statistics ---
> 18 packets transmitted, 18 received, 0% packet loss, time 17034ms
> rtt min/avg/max/mdev = 1.389/1.643/2.578/0.266 ms

Yup, that corresponds pretty much to what I do. Just for good measure, I
tried doing exactly the above (with only a change in IP address), and...
it worked. So, first thought was "perhaps it's because you bring up br0
before adding the ports". But no, bringing it up after still works.
Second thought: "portS - hm, only one port is added here", and indeed,
once I add two or more ports to the bridge, it stops working. Removing
all but the single port that has a cable plugged in makes it work again.
It doesn't seem to matter whether the other ports are up or down.

I should probably mention that wireshark says that ARP (ipv4) and
neighbor solicitation (ipv6ll) packets do reach my laptop when I attempt
the ping. If I start by doing a succesful ping (i.e., no other ports
added), then add another port, then do a ping, the ping packets do reach
the laptop (and of course get answered). So the problem does not appear
to be on egress.

Rasmus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vlan_filtering=1 breaks all traffic
  2020-12-05 20:13         ` vlan_filtering=1 breaks all traffic Rasmus Villemoes
@ 2020-12-06 19:45           ` Vladimir Oltean
  2020-12-07 19:43             ` Rasmus Villemoes
  0 siblings, 1 reply; 15+ messages in thread
From: Vladimir Oltean @ 2020-12-06 19:45 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Network Development, Florian Fainelli, Russell King - ARM Linux

On Sat, Dec 05, 2020 at 09:13:37PM +0100, Rasmus Villemoes wrote:
> Yup, that corresponds pretty much to what I do. Just for good measure, I
> tried doing exactly the above (with only a change in IP address), and...
> it worked. So, first thought was "perhaps it's because you bring up br0
> before adding the ports". But no, bringing it up after still works.
> Second thought: "portS - hm, only one port is added here", and indeed,
> once I add two or more ports to the bridge, it stops working. Removing
> all but the single port that has a cable plugged in makes it work again.
> It doesn't seem to matter whether the other ports are up or down.
>
> I should probably mention that wireshark says that ARP (ipv4) and
> neighbor solicitation (ipv6ll) packets do reach my laptop when I attempt
> the ping. If I start by doing a succesful ping (i.e., no other ports
> added), then add another port, then do a ping, the ping packets do reach
> the laptop (and of course get answered). So the problem does not appear
> to be on egress.

It would be interesting to see what is the ingress drop reason, if that
could be deduced from the drop counters that are incrementing in ethtool -S.

I am not confident that it can be a VTU issue, given the fact that you
have not said that you see VTU violation warnings, which are fairly loud
on mv88e6xxx.

The procedure of joining a new port to the bridge does alter the VTU,
since that second port needs to be a part of the default_pvid of the
bridge as soon as it goes up (VID 1). However that is not the main thing
that bridging a new port does - instead, the in-chip Port VLAN map is
altered.

In theory it _would_ be possible (even if unlikely) for the VTU to get
overwritten by the second port join, in a way that removes the first
port from the bridge's VID 1. Remember that this issue only seems to be
observable on 8250, so it seems logical to search in 8250 specific code
first (therefore making the VTU a more likely suspect than the in-chip
Port VLAN map).

Since you've already made the effort to boot kernel 5.9, you could make
the extra leap to try out the 5.10 rc's and look at the VTU using
Andrew's devlink based tool:
https://github.com/lunn/mv88e6xxx_dump

# devlink dev
mdio_bus/d0032004.mdio-mii:11
mdio_bus/d0032004.mdio-mii:12
mdio_bus/d0032004.mdio-mii:10
# ./mv88e6xxx_dump --device mdio_bus/d0032004.mdio-mii:10 --vtu
VTU:
        V - a member, egress not modified
        U - a member, egress untagged
        T - a member, egress tagged
        X - not a member, Ingress frames with VID discarded
P  VID 0123456789a  FID  SID QPrio FPrio VidPolicy
# ip link add br0 type bridge vlan_filtering 1
# ip link set lan4 master br0
[   74.443547] br0: port 1(lan4) entered blocking state
[   74.446037] br0: port 1(lan4) entered disabled state
[   74.461416] device lan4 entered promiscuous mode
[   74.463564] device eth1 entered promiscuous mode

# ./mv88e6xxx_dump --device mdio_bus/d0032004.mdio-mii:10 --vtu
VTU:
        V - a member, egress not modified
        U - a member, egress untagged
        T - a member, egress tagged
        X - not a member, Ingress frames with VID discarded
P  VID 0123456789a  FID  SID QPrio FPrio VidPolicy
0    1 XXXXUXXXXVV    1    0     -     -     0

# ip link set lan5 master br0
[   84.533120] br0: port 2(lan5) entered blocking state
[   84.535563] br0: port 2(lan5) entered disabled state
[   84.552022] device lan5 entered promiscuous mode
#
# ./mv88e6xxx_dump --device mdio_bus/d0032004.mdio-mii:10 --vtu
VTU:
        V - a member, egress not modified
        U - a member, egress untagged
        T - a member, egress tagged
        X - not a member, Ingress frames with VID discarded
P  VID 0123456789a  FID  SID QPrio FPrio VidPolicy
0    1 XXXXUUXXXVV    1    0     -     -     0

You would expect to see two U's for your ports, and not something else
like X.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vlan_filtering=1 breaks all traffic
  2020-12-06 19:45           ` Vladimir Oltean
@ 2020-12-07 19:43             ` Rasmus Villemoes
  2020-12-07 20:15               ` Vladimir Oltean
  0 siblings, 1 reply; 15+ messages in thread
From: Rasmus Villemoes @ 2020-12-07 19:43 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Network Development, Florian Fainelli, Russell King - ARM Linux

On 06/12/2020 20.45, Vladimir Oltean wrote:

> It would be interesting to see what is the ingress drop reason, if that
> could be deduced from the drop counters that are incrementing in ethtool -S.

I don't see anything obvious from running ethtool -S before/during/after
ping.

> Since you've already made the effort to boot kernel 5.9, you could make
> the extra leap to try out the 5.10 rc's and look at the VTU using
> Andrew's devlink based tool:
> https://github.com/lunn/mv88e6xxx_dump
> 
> # devlink dev
> mdio_bus/d0032004.mdio-mii:11
> mdio_bus/d0032004.mdio-mii:12
> mdio_bus/d0032004.mdio-mii:10
> # ./mv88e6xxx_dump --device mdio_bus/d0032004.mdio-mii:10 --vtu
> VTU:

Thanks for the hint. Unfortunately:

# uname -a
Linux (none) 5.10.0-rc7-00035-g66d777e1729d #194 Mon Dec 7 16:00:30 CET
2020 ppc GNU/Linux
# devlink dev
mdio_bus/mdio@e0102120:10
# mv88e6xxx_dump --device mdio_bus/mdio@e0102120:10 --vtu
VTU:
Error: devlink: The requested region does not exist.
devlink answers: Invalid argument
Unable to snapshot vtu

--atu, --global1 and --global2 does work, but the latter two say
"Unknown mv88e6xxx chip 186a" (and 186a is 6250 in hex, so I think that
should have been printed in decimal to reduce confusion). Whether that
has anything to do with --vtu not working I don't know - the global1/2
registers to seem to get printed correctly.

Rasmus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vlan_filtering=1 breaks all traffic
  2020-12-07 19:43             ` Rasmus Villemoes
@ 2020-12-07 20:15               ` Vladimir Oltean
  2020-12-08  2:22                 ` Andrew Lunn
  0 siblings, 1 reply; 15+ messages in thread
From: Vladimir Oltean @ 2020-12-07 20:15 UTC (permalink / raw)
  To: Rasmus Villemoes, Andrew Lunn
  Cc: Network Development, Florian Fainelli, Russell King - ARM Linux

On Mon, Dec 07, 2020 at 08:43:18PM +0100, Rasmus Villemoes wrote:
> # uname -a
> Linux (none) 5.10.0-rc7-00035-g66d777e1729d #194 Mon Dec 7 16:00:30 CET
> 2020 ppc GNU/Linux
> # devlink dev
> mdio_bus/mdio@e0102120:10
> # mv88e6xxx_dump --device mdio_bus/mdio@e0102120:10 --vtu
> VTU:
> Error: devlink: The requested region does not exist.
> devlink answers: Invalid argument
> Unable to snapshot vtu
>
> --atu, --global1 and --global2 does work, but the latter two say
> "Unknown mv88e6xxx chip 186a" (and 186a is 6250 in hex, so I think that
> should have been printed in decimal to reduce confusion). Whether that
> has anything to do with --vtu not working I don't know - the global1/2
> registers to seem to get printed correctly.

Ahh, this is probably because Andrew didn't know how many bits is the
FID field width for 6250. The code has the 6250 treated the same as the
"default" case, which is to error out:

cmd_vtu:
	switch (ctx->chip) {
	case MV88E6190:
	case MV88E6191:
	case MV88E6290:
	case MV88E6390:
		return vtu_mv88e6xxx(ctx, 0x7ff);
	case MV88E6171:
	case MV88E6175:
	case MV88E6350:
	case MV88E6351:
	case MV88E6172:
	case MV88E6176:
	case MV88E6240:
	case MV88E6352:
		return vtu_mv88e6xxx(ctx, 0x7ff);
	case MV88E6141:
	case MV88E6341:
		return vtu_mv88e6xxx(ctx, 0xff);
	case MV88E6320:
	case MV88E6321:
		return vtu_mv88e6xxx(ctx, 0x7ff);
	case MV88E6220:
	case MV88E6250:
	case MV88E6131:
	case MV88E6185:
	case MV88E6123:
	case MV88E6161:
	case MV88E6165:
	default:
		printf("Unknown mv88e6xxx chip %x\n", ctx->chip);
	}

You can probably figure out the width of that field based on your
datasheet. You can probably just put a value in there and get some
output regardless of whether it's actually correct or not.

The kernel takes care of issuing the VTU GetNext operation and performs
a raw read of the VTU Data registers, which are passed to user space
(and where they get interpreted).

Nonetheless, I'm adding Andrew as well.

Andrew, this is the thread, for context, in case you haven't been
following. The threading got broken multiple times, it seems.
https://lore.kernel.org/netdev/6424c14e-bd25-2a06-cf0b-f1a07f9a3604@prevas.dk/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: vlan_filtering=1 breaks all traffic
  2020-12-07 20:15               ` Vladimir Oltean
@ 2020-12-08  2:22                 ` Andrew Lunn
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Lunn @ 2020-12-08  2:22 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Rasmus Villemoes, Network Development, Florian Fainelli,
	Russell King - ARM Linux

On Mon, Dec 07, 2020 at 08:15:28PM +0000, Vladimir Oltean wrote:
> On Mon, Dec 07, 2020 at 08:43:18PM +0100, Rasmus Villemoes wrote:
> > # uname -a
> > Linux (none) 5.10.0-rc7-00035-g66d777e1729d #194 Mon Dec 7 16:00:30 CET
> > 2020 ppc GNU/Linux
> > # devlink dev
> > mdio_bus/mdio@e0102120:10
> > # mv88e6xxx_dump --device mdio_bus/mdio@e0102120:10 --vtu
> > VTU:
> > Error: devlink: The requested region does not exist.
> > devlink answers: Invalid argument
> > Unable to snapshot vtu

VTU dumping is new. It is in net-next, but not 5.10-rc7.  atu, global1
and global2 are part for 5.10-rc7.

     Andrew

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-12-08  2:23 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-30 14:30 warnings from MTU setting on switch ports Rasmus Villemoes
2020-11-30 16:04 ` Vladimir Oltean
2020-11-30 22:13   ` Rasmus Villemoes
2020-11-30 22:35     ` Vladimir Oltean
2020-11-30 22:50       ` Rasmus Villemoes
2020-11-30 23:11       ` Andrew Lunn
2020-12-05 14:49     ` vlan_filtering=1 breaks all traffic (was: Re: warnings from MTU setting on switch ports) Rasmus Villemoes
2020-12-05 15:45       ` Russell King - ARM Linux admin
2020-12-05 16:02         ` vlan_filtering=1 breaks all traffic Rasmus Villemoes
2020-12-05 19:03       ` vlan_filtering=1 breaks all traffic (was: Re: warnings from MTU setting on switch ports) Vladimir Oltean
2020-12-05 20:13         ` vlan_filtering=1 breaks all traffic Rasmus Villemoes
2020-12-06 19:45           ` Vladimir Oltean
2020-12-07 19:43             ` Rasmus Villemoes
2020-12-07 20:15               ` Vladimir Oltean
2020-12-08  2:22                 ` Andrew Lunn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.