netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* commit 4c7ea3c0791e (net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports)
@ 2021-01-14 13:49 Rasmus Villemoes
  2021-01-16  1:42 ` Tobias Waldekranz
  0 siblings, 1 reply; 11+ messages in thread
From: Rasmus Villemoes @ 2021-01-14 13:49 UTC (permalink / raw)
  To: Andrew Lunn, Network Development, Vivien Didelot; +Cc: Horatiu Vultur

Hi

I've noticed something rather odd with my mv88e6250, which led me to the
commit in the subject.

First, the MAC address of the master device never seems to get learned
(at least according to "mv88e6xxx_dump --atu"), so all packets destined
for the machine gets flooded out all ports - which I can verify with
wireshark. That is, I have three machines

A --- B --- C

with B being the board with an embedded mv88e6250, A pinging B, and C
running wireshark - and it shows lots of "ping request (no response
found)". Same if B pings A; the responses from A also get to C.

But this is somewhat to be expected; automatic learning has been
disabled by commit 4c7ea3c0791e (later commits have change the logic
around there somewhat, but the end result is the same: the PAV for the
cpu port being clear), and I can't find anywhere in the code which would
manually add the master device's address to the ATU.

However: Even when I do

-	if (dsa_is_cpu_port(ds, port))
+	if (dsa_is_cpu_port(ds, port) && 0)

and verify with "mv88e6xxx_dump --ports" that the CPU port now has the
expected value in port offset 0x0b:

0b 0001 0002 0004 0008 0010 0020 0040

(it's port 5, i.e. the 0020 value), I still see the above behaviour -
the master device's address doesn't get learned (nor does some garbage
address appear in the ATU), and the unicast packets are still forwarded
out all ports. So I must be missing something else.

Finally, I'm wondering how the tagging could get in the way of learning
the right address, given that the tag is inserted after the DA and SA.

====

But this is all just some odd observations; the traffic does seem to
work, though sending all unicast traffic to all neighbours seems to be a
waste of bandwidth.

What I'm _really_ trying to do is to get my mv88e6250 to participate in
an MRP ring, which AFAICT will require that the master device's MAC gets
added as a static entry in the ATU: Otherwise, when the ring goes from
open to closed, I've seen the switch wrongly learn the node's own mac
address as being in the direction of one of the normal ports, which
obviously breaks all traffic. So if the topology is

   M
 /   \
C1 *** C2

with the link between C1 and C2 being broken, both M-C1 and M-C2 links
are in forwarding (hence learning) state, so when the C1-C2 link gets
reestablished, it will take at least one received test packet for M to
decide to put one of the ports in blocking state - by which time the
damage is done, and the ATU now has a broken entry for M's own mac address.

Rasmus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: commit 4c7ea3c0791e (net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports)
  2021-01-14 13:49 commit 4c7ea3c0791e (net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports) Rasmus Villemoes
@ 2021-01-16  1:42 ` Tobias Waldekranz
  2021-01-18 15:41   ` Rasmus Villemoes
  2021-01-18 21:19   ` Vladimir Oltean
  0 siblings, 2 replies; 11+ messages in thread
From: Tobias Waldekranz @ 2021-01-16  1:42 UTC (permalink / raw)
  To: Rasmus Villemoes, Andrew Lunn, Network Development, Vivien Didelot
  Cc: Horatiu Vultur, Vladimir Oltean

On Thu, Jan 14, 2021 at 14:49, Rasmus Villemoes <rasmus.villemoes@prevas.dk> wrote:
> Hi
>
> I've noticed something rather odd with my mv88e6250, which led me to the
> commit in the subject.
>
> First, the MAC address of the master device never seems to get learned
> (at least according to "mv88e6xxx_dump --atu"), so all packets destined
> for the machine gets flooded out all ports - which I can verify with
> wireshark. That is, I have three machines
>
> A --- B --- C
>
> with B being the board with an embedded mv88e6250, A pinging B, and C
> running wireshark - and it shows lots of "ping request (no response
> found)". Same if B pings A; the responses from A also get to C.
>
> But this is somewhat to be expected; automatic learning has been
> disabled by commit 4c7ea3c0791e (later commits have change the logic
> around there somewhat, but the end result is the same: the PAV for the
> cpu port being clear), and I can't find anywhere in the code which would
> manually add the master device's address to the ATU.
>
> However: Even when I do
>
> -	if (dsa_is_cpu_port(ds, port))
> +	if (dsa_is_cpu_port(ds, port) && 0)
>
> and verify with "mv88e6xxx_dump --ports" that the CPU port now has the
> expected value in port offset 0x0b:
>
> 0b 0001 0002 0004 0008 0010 0020 0040
>
> (it's port 5, i.e. the 0020 value), I still see the above behaviour -
> the master device's address doesn't get learned (nor does some garbage
> address appear in the ATU), and the unicast packets are still forwarded
> out all ports. So I must be missing something else.

The thing you are missing is that all packets from the CPU are sent with
FROM_CPU tags. SA learning is not performed on these as it intended for
control traffic.

Ideally, bulk traffic would be sent with a FORWARD tag. But there is
currently no way for the DSA tagger to discriminate the bulk data from
control traffic. And changing that is no small task.

In the mean time we could extend Vladimir's (added to CC) work on
assisted CPU port learning to include the local bridge addresses. You
pushed me to take a first stab at this :) Please have a look at this
series:

https://lore.kernel.org/netdev/20210116012515.3152-1-tobias@waldekranz.com/

> Finally, I'm wondering how the tagging could get in the way of learning
> the right address, given that the tag is inserted after the DA and SA.

Yes, but the CPU port is configured in DSA mode, so the switch will use
the tag command (FROM_CPU) to determine if learning should be done or
not.

> ====
>
> But this is all just some odd observations; the traffic does seem to
> work, though sending all unicast traffic to all neighbours seems to be a
> waste of bandwidth.
>
> What I'm _really_ trying to do is to get my mv88e6250 to participate in
> an MRP ring, which AFAICT will require that the master device's MAC gets
> added as a static entry in the ATU: Otherwise, when the ring goes from
> open to closed, I've seen the switch wrongly learn the node's own mac
> address as being in the direction of one of the normal ports, which
> obviously breaks all traffic. So if the topology is
>
>    M
>  /   \
> C1 *** C2
>
> with the link between C1 and C2 being broken, both M-C1 and M-C2 links
> are in forwarding (hence learning) state, so when the C1-C2 link gets
> reestablished, it will take at least one received test packet for M to
> decide to put one of the ports in blocking state - by which time the
> damage is done, and the ATU now has a broken entry for M's own mac address.

Well the static entry for the bridge MAC should be installed with the
aforementioned series applied. So that should not be an issue.

My guess is that MRP will still not work though, as you will probably
need the ability to trap certain groups to the CPU (management
entries). I.e. some MRP PDUs must be allowed to ingress on blocked
ports, no?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: commit 4c7ea3c0791e (net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports)
  2021-01-16  1:42 ` Tobias Waldekranz
@ 2021-01-18 15:41   ` Rasmus Villemoes
  2021-01-18 17:50     ` Andrew Lunn
  2021-01-18 21:19   ` Vladimir Oltean
  1 sibling, 1 reply; 11+ messages in thread
From: Rasmus Villemoes @ 2021-01-18 15:41 UTC (permalink / raw)
  To: Tobias Waldekranz, Andrew Lunn, Network Development, Vivien Didelot
  Cc: Horatiu Vultur, Vladimir Oltean

On 16/01/2021 02.42, Tobias Waldekranz wrote:
> On Thu, Jan 14, 2021 at 14:49, Rasmus Villemoes <rasmus.villemoes@prevas.dk> wrote:
>> Hi
>>
>> I've noticed something rather odd with my mv88e6250, which led me to the
>> commit in the subject.
>>
>> First, the MAC address of the master device never seems to get learned
>> (at least according to "mv88e6xxx_dump --atu"), so all packets destined
>> for the machine gets flooded out all ports 

[snip]

>> the master device's address doesn't get learned (nor does some garbage
>> address appear in the ATU), and the unicast packets are still forwarded
>> out all ports. So I must be missing something else.
> 
> The thing you are missing is that all packets from the CPU are sent with
> FROM_CPU tags. SA learning is not performed on these as it intended for
> control traffic.

Ah, yes, I do remember stumbling on the tagger using FROM_CPU and
wondering about that at some point. And I didn't recall the somewhat
subtle detail of those being treated as MGMT and thus not participating
in SA learning.

> Ideally, bulk traffic would be sent with a FORWARD tag. But there is
> currently no way for the DSA tagger to discriminate the bulk data from
> control traffic. And changing that is no small task.

Indeed.

> In the mean time we could extend Vladimir's (added to CC) work on
> assisted CPU port learning to include the local bridge addresses. You
> pushed me to take a first stab at this :) Please have a look at this
> series:
> 
> https://lore.kernel.org/netdev/20210116012515.3152-1-tobias@waldekranz.com/

I'll try these out, thanks. FWIW, in an earlier BSP there were some
horrible hacks to go behind the kernel's back and add the CPU's address
as a static entry in the switch, which is why I haven't seen this
before. And this was done for much the same reason as we will have to it
now (it implemented ERPS, another ring redundancy protocol).

>> Finally, I'm wondering how the tagging could get in the way of learning
>> the right address, given that the tag is inserted after the DA and SA.
> 
> Yes, but the CPU port is configured in DSA mode, so the switch will use
> the tag command (FROM_CPU) to determine if learning should be done or
> not.

Right, but that comment was directed at commit 4c7ea3c0791e; even if SA
learning did happen, bytes 6-11 of the frame are the same with or
without the tag added, so I don't understand how _corrupted_ addresses
could get learned.

>> What I'm _really_ trying to do is to get my mv88e6250 to participate in
>> an MRP ring, which AFAICT will require that the master device's MAC gets
>> added as a static entry in the ATU: Otherwise, when the ring goes from
>> open to closed, I've seen the switch wrongly learn the node's own mac
>> address as being in the direction of one of the normal ports, which
>> obviously breaks all traffic.
> 
> Well the static entry for the bridge MAC should be installed with the
> aforementioned series applied. So that should not be an issue.

Yes, so there are several good reasons for adding that address
statically to the hardware's database.

> My guess is that MRP will still not work though, as you will probably
> need the ability to trap certain groups to the CPU (management
> entries). I.e. some MRP PDUs must be allowed to ingress on blocked
> ports, no?

Indeed, and I'm currently just using a not-for-mainline patch that
hardcodes the two multicast addresses in the ATU.

--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1911,6 +1911,24 @@ static int mv88e6xxx_broadcast_setup(struct
mv88e6xxx_chip *chip, u16 vid)
                err = mv88e6xxx_port_add_broadcast(chip, port, vid);
                if (err)
                        return err;
+
+               if (port != dsa_upstream_port(chip->ds, port))
+                       continue;
+               if (IS_ENABLED(CONFIG_BRIDGE_MRP)) {
+                       static const u8 mrp_dmac[][ETH_ALEN] = {
+                               { 0x1, 0x15, 0x4e, 0x0, 0x0, 0x1 },
+                               { 0x1, 0x15, 0x4e, 0x0, 0x0, 0x3 },
+                       };
+                       const u8 state =
MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC_DA_MGMT;
+                       int i;
+
+                       for (i = 0; i < ARRAY_SIZE(mrp_dmac); ++i) {
+                               err = mv88e6xxx_port_db_load_purge(chip,
port,
+
mrp_dmac[i], vid, state);
+                               if (err)
+                                       return err;
+                       }
+               }
        }

        return 0;

because yes, one needs to prevent those frames from being flooded out
all ports automatically.

I suppose the real solution is having userspace do some "bridge mdb add"
yoga, but since no code currently uses
MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC_DA_MGMT, I don't think there's any
way to actually achieve this. And I have no idea how to represent the
requirement that "frames with this multicast DA are only to be directed
at the CPU" in a hardware-agnostic way.

Rasmus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: commit 4c7ea3c0791e (net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports)
  2021-01-18 15:41   ` Rasmus Villemoes
@ 2021-01-18 17:50     ` Andrew Lunn
  2021-01-18 18:30       ` Tobias Waldekranz
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Lunn @ 2021-01-18 17:50 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Tobias Waldekranz, Network Development, Vivien Didelot,
	Horatiu Vultur, Vladimir Oltean

> I suppose the real solution is having userspace do some "bridge mdb add"
> yoga, but since no code currently uses
> MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC_DA_MGMT, I don't think there's any
> way to actually achieve this. And I have no idea how to represent the
> requirement that "frames with this multicast DA are only to be directed
> at the CPU" in a hardware-agnostic way.

The switchdev interface for this exists, because there can be
multicast listeners on the bridge. When they join a group, they ask
the switch to put in a HOST MDB, which should cause the traffic for
the group to be sent to the host. What you don't have is the
exclusivity. If there is an IGMP report for the DA received on another
port, IGMP snooping will add an MDB entry to forward traffic out that
port.

	Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: commit 4c7ea3c0791e (net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports)
  2021-01-18 17:50     ` Andrew Lunn
@ 2021-01-18 18:30       ` Tobias Waldekranz
  2021-01-18 19:01         ` Andrew Lunn
  0 siblings, 1 reply; 11+ messages in thread
From: Tobias Waldekranz @ 2021-01-18 18:30 UTC (permalink / raw)
  To: Andrew Lunn, Rasmus Villemoes
  Cc: Network Development, Vivien Didelot, Horatiu Vultur, Vladimir Oltean

On Mon, Jan 18, 2021 at 18:50, Andrew Lunn <andrew@lunn.ch> wrote:
>> I suppose the real solution is having userspace do some "bridge mdb add"
>> yoga, but since no code currently uses
>> MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC_DA_MGMT, I don't think there's any
>> way to actually achieve this. And I have no idea how to represent the
>> requirement that "frames with this multicast DA are only to be directed
>> at the CPU" in a hardware-agnostic way.
>
> The switchdev interface for this exists, because there can be
> multicast listeners on the bridge. When they join a group, they ask
> the switch to put in a HOST MDB, which should cause the traffic for

That is not quite the same thing as "management" though. Adding the
group to the host MDB will not allow it to pass through blocked (in the
STP sense) ports for example. With a management entry, the switch will
trap the packet with a TO_CPU tag, which means no ingress policy can get
in the way of it reaching the CPU.

> the group to be sent to the host. What you don't have is the
> exclusivity. If there is an IGMP report for the DA received on another
> port, IGMP snooping will add an MDB entry to forward traffic out that
> port.

Exclusivity should not be an issue as these protocols use groups outside
of the ranges allocated for IGMP/MLD.

I see a few ways forward:

1. Extending the MRP-support in the bridge to support other protocols
   (e.g. ERP). If DSA gets a callback saying that a ring has been setup
   with a certain role, we know which groups to add. Good for defined
   protocols - does not help anyone who needs to be able to configure
   custom groups.

2. Adding yet another flag to MDB entries to mark them as "trap"
   entries. I do not see thing going anywhere since the big fish have
   already solved this with...

3. ... TC, through the "trap" action. We could detect filters which only
   match on DA, with the action "trap"; and convert those to MGMT
   entries in the ATU. I think this could be the way forward for user
   defined entries.

4. Devlink trap configuration. Maybe? I have not managed to wrap my head
   around this yet.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: commit 4c7ea3c0791e (net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports)
  2021-01-18 18:30       ` Tobias Waldekranz
@ 2021-01-18 19:01         ` Andrew Lunn
  2021-01-18 19:10           ` Tobias Waldekranz
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Lunn @ 2021-01-18 19:01 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: Rasmus Villemoes, Network Development, Vivien Didelot,
	Horatiu Vultur, Vladimir Oltean

On Mon, Jan 18, 2021 at 07:30:49PM +0100, Tobias Waldekranz wrote:
> On Mon, Jan 18, 2021 at 18:50, Andrew Lunn <andrew@lunn.ch> wrote:
> >> I suppose the real solution is having userspace do some "bridge mdb add"
> >> yoga, but since no code currently uses
> >> MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC_DA_MGMT, I don't think there's any
> >> way to actually achieve this. And I have no idea how to represent the
> >> requirement that "frames with this multicast DA are only to be directed
> >> at the CPU" in a hardware-agnostic way.
> >
> > The switchdev interface for this exists, because there can be
> > multicast listeners on the bridge. When they join a group, they ask
> > the switch to put in a HOST MDB, which should cause the traffic for
> 
> That is not quite the same thing as "management" though. Adding the
> group to the host MDB will not allow it to pass through blocked (in the
> STP sense) ports for example. With a management entry, the switch will
> trap the packet with a TO_CPU tag, which means no ingress policy can get
> in the way of it reaching the CPU.

Ah, yes. I don't suppose the DA is part of the special group which the
switch will recognise as management and pass it on?

01:80:c2:00:00:00 - 01:80:c2:00:00:07
01:80:c2:00:00:08 - 01:80:c2:00:00:0f
01:80:c2:00:00:20 - 01:80:c2:00:00:27
01:80:c2:00:00:28 - 01:80:c2:00:00:2f

	Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: commit 4c7ea3c0791e (net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports)
  2021-01-18 19:01         ` Andrew Lunn
@ 2021-01-18 19:10           ` Tobias Waldekranz
  0 siblings, 0 replies; 11+ messages in thread
From: Tobias Waldekranz @ 2021-01-18 19:10 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Rasmus Villemoes, Network Development, Vivien Didelot,
	Horatiu Vultur, Vladimir Oltean

On Mon, Jan 18, 2021 at 20:01, Andrew Lunn <andrew@lunn.ch> wrote:
> On Mon, Jan 18, 2021 at 07:30:49PM +0100, Tobias Waldekranz wrote:
>> On Mon, Jan 18, 2021 at 18:50, Andrew Lunn <andrew@lunn.ch> wrote:
>> >> I suppose the real solution is having userspace do some "bridge mdb add"
>> >> yoga, but since no code currently uses
>> >> MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC_DA_MGMT, I don't think there's any
>> >> way to actually achieve this. And I have no idea how to represent the
>> >> requirement that "frames with this multicast DA are only to be directed
>> >> at the CPU" in a hardware-agnostic way.
>> >
>> > The switchdev interface for this exists, because there can be
>> > multicast listeners on the bridge. When they join a group, they ask
>> > the switch to put in a HOST MDB, which should cause the traffic for
>> 
>> That is not quite the same thing as "management" though. Adding the
>> group to the host MDB will not allow it to pass through blocked (in the
>> STP sense) ports for example. With a management entry, the switch will
>> trap the packet with a TO_CPU tag, which means no ingress policy can get
>> in the way of it reaching the CPU.
>
> Ah, yes. I don't suppose the DA is part of the special group which the
> switch will recognise as management and pass it on?
>
> 01:80:c2:00:00:00 - 01:80:c2:00:00:07
> 01:80:c2:00:00:08 - 01:80:c2:00:00:0f
> 01:80:c2:00:00:20 - 01:80:c2:00:00:27
> 01:80:c2:00:00:28 - 01:80:c2:00:00:2f

Unfortunately there are many protocols that live outside of the IEEE
range. ERP(S) which Rasmus was talking about uses a range assigned to
ITU IIRC. MRP a third one I believe.

The Reserved2CPU functionality has an additional deficiency vs. separate
management entries: they all use the same priority. This means that you
must assign LLDP frames to the same queue as STP for example, which is
typically not what you want. With a management entry (with priority
override) you can set them individually.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: commit 4c7ea3c0791e (net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports)
  2021-01-16  1:42 ` Tobias Waldekranz
  2021-01-18 15:41   ` Rasmus Villemoes
@ 2021-01-18 21:19   ` Vladimir Oltean
  2021-01-18 22:07     ` Rasmus Villemoes
  1 sibling, 1 reply; 11+ messages in thread
From: Vladimir Oltean @ 2021-01-18 21:19 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: Rasmus Villemoes, Andrew Lunn, Network Development,
	Vivien Didelot, Horatiu Vultur

On Sat, Jan 16, 2021 at 02:42:12AM +0100, Tobias Waldekranz wrote:
> > What I'm _really_ trying to do is to get my mv88e6250 to participate in
> > an MRP ring, which AFAICT will require that the master device's MAC gets
> > added as a static entry in the ATU: Otherwise, when the ring goes from
> > open to closed, I've seen the switch wrongly learn the node's own mac
> > address as being in the direction of one of the normal ports, which
> > obviously breaks all traffic. So if the topology is
> >
> >    M
> >  /   \
> > C1 *** C2
> >
> > with the link between C1 and C2 being broken, both M-C1 and M-C2 links
> > are in forwarding (hence learning) state, so when the C1-C2 link gets
> > reestablished, it will take at least one received test packet for M to
> > decide to put one of the ports in blocking state - by which time the
> > damage is done, and the ATU now has a broken entry for M's own mac address.

What hardware offload features do you need to use for MRP on mv88e6xxx?
If none, then considering that Tobias's bridge series may stall, I think
by far the easiest approach would be for DSA to detect that it can't
offload the bridge+MRP configuration, and keep all ports as standalone.
When in standalone mode, the ports don't offload any bridge flags, i.e.
they don't do address learning, and the only forwarding destination
allowed is the CPU. The only disadvantage is that this is software-based
forwarding.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: commit 4c7ea3c0791e (net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports)
  2021-01-18 21:19   ` Vladimir Oltean
@ 2021-01-18 22:07     ` Rasmus Villemoes
  2021-01-18 23:08       ` Tobias Waldekranz
  2021-01-19  9:15       ` Horatiu Vultur
  0 siblings, 2 replies; 11+ messages in thread
From: Rasmus Villemoes @ 2021-01-18 22:07 UTC (permalink / raw)
  To: Vladimir Oltean, Tobias Waldekranz
  Cc: Andrew Lunn, Network Development, Vivien Didelot, Horatiu Vultur

On 18/01/2021 22.19, Vladimir Oltean wrote:
> On Sat, Jan 16, 2021 at 02:42:12AM +0100, Tobias Waldekranz wrote:
>>> What I'm _really_ trying to do is to get my mv88e6250 to participate in
>>> an MRP ring, which AFAICT will require that the master device's MAC gets
>>> added as a static entry in the ATU: Otherwise, when the ring goes from
>>> open to closed, I've seen the switch wrongly learn the node's own mac
>>> address as being in the direction of one of the normal ports, which
>>> obviously breaks all traffic. So if the topology is
>>>
>>>    M
>>>  /   \
>>> C1 *** C2
>>>
>>> with the link between C1 and C2 being broken, both M-C1 and M-C2 links
>>> are in forwarding (hence learning) state, so when the C1-C2 link gets
>>> reestablished, it will take at least one received test packet for M to
>>> decide to put one of the ports in blocking state - by which time the
>>> damage is done, and the ATU now has a broken entry for M's own mac address.
> 
> What hardware offload features do you need to use for MRP on mv88e6xxx?
> If none, then considering that Tobias's bridge series may stall, I think
> by far the easiest approach would be for DSA to detect that it can't
> offload the bridge+MRP configuration, and keep all ports as standalone.
> When in standalone mode, the ports don't offload any bridge flags, i.e.
> they don't do address learning, and the only forwarding destination
> allowed is the CPU. The only disadvantage is that this is software-based
> forwarding.

Which would be an unacceptable regression for my customer's use case. We
really need some ring redundancy protocol, while also having the switch
act as, well, a switch and do most forwarding in hardware. We used to
use ERPS with some gross out-of-tree patches to set up the switch as
required (much of the same stuff we're discussing here).

Then when MRP got added to the kernel, and apparently some switches with
hardware support for that are in the pipeline somewhere, we decided to
try to switch to that - newer revisions of the hardware might include an
MRP-capable switch, but the existing hardware with the marvell switches
would (with a kernel and userspace upgrade) be able to coexist with that
newer hardware.

I took it for granted that MRP had been tested with existing
switches/switchdev/DSA, but AFAICT (Horatiu, correct me if I'm wrong),
currently MRP only works with a software bridge and with some
out-of-tree driver for some not-yet-released hardware? I think I've
identified what is needed to make it work with mv88e6xxx (and likely
also other switchdev switches):

(1) the port state as set on the software bridge must be
offloaded/synchronized to the switch.

(2) the bridge's hardware address must be made a static entry in the
switch's database to avoid the switch accidentally learning a wrong port
for that when the ring becomes closed.

(3) the cpu must be made the only recipient of frames with an MRP
multicast DA, 01:15:e4:...

For (1), I think the only thing we need is to agree on where in the
stack we translate from MRP to STP, because the like-named states in the
two protocols really do behave exactly the same, AFAICT. So it can be
done all the way up in MRP, perhaps even by getting completely rid of
the distinction, or anywhere down the notifier stack, towards the actual
switch driver.

For (2), I still have to see how far Tobias' patches will get me, but at
least there's some reason independent of MRP to do that.

Rasmus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: commit 4c7ea3c0791e (net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports)
  2021-01-18 22:07     ` Rasmus Villemoes
@ 2021-01-18 23:08       ` Tobias Waldekranz
  2021-01-19  9:15       ` Horatiu Vultur
  1 sibling, 0 replies; 11+ messages in thread
From: Tobias Waldekranz @ 2021-01-18 23:08 UTC (permalink / raw)
  To: Rasmus Villemoes, Vladimir Oltean
  Cc: Andrew Lunn, Network Development, Vivien Didelot, Horatiu Vultur

On Mon, Jan 18, 2021 at 23:07, Rasmus Villemoes <rasmus.villemoes@prevas.dk> wrote:
> On 18/01/2021 22.19, Vladimir Oltean wrote:
>> On Sat, Jan 16, 2021 at 02:42:12AM +0100, Tobias Waldekranz wrote:
>>>> What I'm _really_ trying to do is to get my mv88e6250 to participate in
>>>> an MRP ring, which AFAICT will require that the master device's MAC gets
>>>> added as a static entry in the ATU: Otherwise, when the ring goes from
>>>> open to closed, I've seen the switch wrongly learn the node's own mac
>>>> address as being in the direction of one of the normal ports, which
>>>> obviously breaks all traffic. So if the topology is
>>>>
>>>>    M
>>>>  /   \
>>>> C1 *** C2
>>>>
>>>> with the link between C1 and C2 being broken, both M-C1 and M-C2 links
>>>> are in forwarding (hence learning) state, so when the C1-C2 link gets
>>>> reestablished, it will take at least one received test packet for M to
>>>> decide to put one of the ports in blocking state - by which time the
>>>> damage is done, and the ATU now has a broken entry for M's own mac address.
>> 
>> What hardware offload features do you need to use for MRP on mv88e6xxx?
>> If none, then considering that Tobias's bridge series may stall, I think
>> by far the easiest approach would be for DSA to detect that it can't
>> offload the bridge+MRP configuration, and keep all ports as standalone.
>> When in standalone mode, the ports don't offload any bridge flags, i.e.
>> they don't do address learning, and the only forwarding destination
>> allowed is the CPU. The only disadvantage is that this is software-based
>> forwarding.

Just put some context around how these protocols are typically deployed:
The ring is the backbone of the whole network and can span hundreds of
switches. Applications range from low-bandwidth process automation to
high-definition IPTV. But even when you can meet the throughput demands
with a CPU, your latency will be off the charts, far beyond what is
acceptable in most cases.

> Which would be an unacceptable regression for my customer's use case. We
> really need some ring redundancy protocol, while also having the switch
> act as, well, a switch and do most forwarding in hardware. We used to
> use ERPS with some gross out-of-tree patches to set up the switch as
> required (much of the same stuff we're discussing here).
>
> Then when MRP got added to the kernel, and apparently some switches with
> hardware support for that are in the pipeline somewhere, we decided to
> try to switch to that - newer revisions of the hardware might include an
> MRP-capable switch, but the existing hardware with the marvell switches
> would (with a kernel and userspace upgrade) be able to coexist with that
> newer hardware.
>
> I took it for granted that MRP had been tested with existing
> switches/switchdev/DSA, but AFAICT (Horatiu, correct me if I'm wrong),
> currently MRP only works with a software bridge and with some
> out-of-tree driver for some not-yet-released hardware? I think I've
> identified what is needed to make it work with mv88e6xxx (and likely
> also other switchdev switches):
>
> (1) the port state as set on the software bridge must be
> offloaded/synchronized to the switch.
>
> (2) the bridge's hardware address must be made a static entry in the
> switch's database to avoid the switch accidentally learning a wrong port
> for that when the ring becomes closed.

I do not know MRP well enough, but that sounds reasonable if the same SA
is used for control packets sent through both ports.

> (3) the cpu must be made the only recipient of frames with an MRP
> multicast DA, 01:15:e4:...

I would possibly add (4): MRP comes in different "profiles". Some of
them require sending test packets at ridiculously high frequencies (more
than 1kHz IIRC). I would guess that Microchip has a programmable packet
generator that they can use for such things. We could potentially solve
that on newer 6xxx chips with the built-in IMP, but that is perhaps a
bit pie in the sky :)

> For (1), I think the only thing we need is to agree on where in the
> stack we translate from MRP to STP, because the like-named states in the
> two protocols really do behave exactly the same, AFAICT. So it can be
> done all the way up in MRP, perhaps even by getting completely rid of
> the distinction, or anywhere down the notifier stack, towards the actual
> switch driver.

You should search the archives. I distinctly remember somebody bringing
up this point before MRP was merged. So there ought to be some reason
for the existence of SWITCHDEV_ATTR_ID_MRP_PORT_STATE.

> For (2), I still have to see how far Tobias' patches will get me, but at
> least there's some reason independent of MRP to do that.

Like Vladimir said, do not count on that implementation making
it. Though something similar hopefully will.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: commit 4c7ea3c0791e (net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports)
  2021-01-18 22:07     ` Rasmus Villemoes
  2021-01-18 23:08       ` Tobias Waldekranz
@ 2021-01-19  9:15       ` Horatiu Vultur
  1 sibling, 0 replies; 11+ messages in thread
From: Horatiu Vultur @ 2021-01-19  9:15 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Vladimir Oltean, Tobias Waldekranz, Andrew Lunn,
	Network Development, Vivien Didelot

The 01/18/2021 23:07, Rasmus Villemoes wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On 18/01/2021 22.19, Vladimir Oltean wrote:
> > On Sat, Jan 16, 2021 at 02:42:12AM +0100, Tobias Waldekranz wrote:
> >>> What I'm _really_ trying to do is to get my mv88e6250 to participate in
> >>> an MRP ring, which AFAICT will require that the master device's MAC gets
> >>> added as a static entry in the ATU: Otherwise, when the ring goes from
> >>> open to closed, I've seen the switch wrongly learn the node's own mac
> >>> address as being in the direction of one of the normal ports, which
> >>> obviously breaks all traffic. So if the topology is
> >>>
> >>>    M
> >>>  /   \
> >>> C1 *** C2
> >>>
> >>> with the link between C1 and C2 being broken, both M-C1 and M-C2 links
> >>> are in forwarding (hence learning) state, so when the C1-C2 link gets
> >>> reestablished, it will take at least one received test packet for M to
> >>> decide to put one of the ports in blocking state - by which time the
> >>> damage is done, and the ATU now has a broken entry for M's own mac address.
> >
> > What hardware offload features do you need to use for MRP on mv88e6xxx?
> > If none, then considering that Tobias's bridge series may stall, I think
> > by far the easiest approach would be for DSA to detect that it can't
> > offload the bridge+MRP configuration, and keep all ports as standalone.
> > When in standalone mode, the ports don't offload any bridge flags, i.e.
> > they don't do address learning, and the only forwarding destination
> > allowed is the CPU. The only disadvantage is that this is software-based
> > forwarding.
> 
> Which would be an unacceptable regression for my customer's use case. We
> really need some ring redundancy protocol, while also having the switch
> act as, well, a switch and do most forwarding in hardware. We used to
> use ERPS with some gross out-of-tree patches to set up the switch as
> required (much of the same stuff we're discussing here).

I think we can do better than just do the forwarding in software.

> 
> Then when MRP got added to the kernel, and apparently some switches with
> hardware support for that are in the pipeline somewhere, we decided to
> try to switch to that - newer revisions of the hardware might include an
> MRP-capable switch, but the existing hardware with the marvell switches
> would (with a kernel and userspace upgrade) be able to coexist with that
> newer hardware.
> 
> I took it for granted that MRP had been tested with existing
> switches/switchdev/DSA, but AFAICT (Horatiu, correct me if I'm wrong),
> currently MRP only works with a software bridge and with some
> out-of-tree driver for some not-yet-released hardware?

Well, it works out of box with software bridge. But if you do the
forwarding in HW(which is expected) then you need to extend your driver
to implement MRP callbacks. I have tested with a pure software bridge
and I have tested it with a switchdev driver that has special HW to
process MRP frames and a switchdev driver, that doesn't have special HW
to process MRP frames but in this case I have tested MRC role. And I
have not tested it with a DSA driver.

The reason why the driver needs to be extended is that, if a MRP frame
with DMAC 01:15:e4:00:00:01 comes to the HW, it would just be flooded
which is the wrong behaviour.

> I think I've
> identified what is needed to make it work with mv88e6xxx (and likely
> also other switchdev switches):
> 
> (1) the port state as set on the software bridge must be
> offloaded/synchronized to the switch.
> 
> (2) the bridge's hardware address must be made a static entry in the
> switch's database to avoid the switch accidentally learning a wrong port
> for that when the ring becomes closed.
> 
> (3) the cpu must be made the only recipient of frames with an MRP
> multicast DA, 01:15:e4:...
> 
> For (1), I think the only thing we need is to agree on where in the
> stack we translate from MRP to STP, because the like-named states in the
> two protocols really do behave exactly the same, AFAICT. So it can be
> done all the way up in MRP, perhaps even by getting completely rid of
> the distinction, or anywhere down the notifier stack, towards the actual
> switch driver.

I agree with you, we might just remove and use the STP callback or
implement it in your driver. Lets see what the maintainers think. I am
OK with any of these.

> 
> For (2), I still have to see how far Tobias' patches will get me, but at
> least there's some reason independent of MRP to do that.
> 
> Rasmus

-- 
/Horatiu

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-01-19 13:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-14 13:49 commit 4c7ea3c0791e (net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports) Rasmus Villemoes
2021-01-16  1:42 ` Tobias Waldekranz
2021-01-18 15:41   ` Rasmus Villemoes
2021-01-18 17:50     ` Andrew Lunn
2021-01-18 18:30       ` Tobias Waldekranz
2021-01-18 19:01         ` Andrew Lunn
2021-01-18 19:10           ` Tobias Waldekranz
2021-01-18 21:19   ` Vladimir Oltean
2021-01-18 22:07     ` Rasmus Villemoes
2021-01-18 23:08       ` Tobias Waldekranz
2021-01-19  9:15       ` Horatiu Vultur

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).