Re: [PATCH net-next] net: mscc: ocelot: Workaround to allow traffic to CPU in standalone mode

From: Vladimir Oltean <olteanv@gmail.com>
To: "Allan W. Nielsen" <allan.nielsen@microchip.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Horatiu Vultur <horatiu.vultur@microchip.com>,
	Alexandre Belloni <alexandre.belloni@bootlin.com>,
	Andrew Lunn <andrew@lunn.ch>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Vivien Didelot <vivien.didelot@gmail.com>,
	Joergen Andreasen <joergen.andreasen@microchip.com>,
	Claudiu Manoil <claudiu.manoil@nxp.com>,
	netdev <netdev@vger.kernel.org>,
	Microchip Linux Driver Support <UNGLinuxDriver@microchip.com>
Subject: Re: [PATCH net-next] net: mscc: ocelot: Workaround to allow traffic to CPU in standalone mode
Date: Wed, 19 Feb 2020 15:55:28 +0200	[thread overview]
Message-ID: <CA+h21hqMPb3qV_Rusa8yADPxKhubXPjX7hHLhxuuS6g96btVHw@mail.gmail.com> (raw)
In-Reply-To: <20200219101149.dq7jwhs6aypv43kf@lx-anielsen.microsemi.net>

Hi Allan,

On Wed, 19 Feb 2020 at 12:11, Allan W. Nielsen
<allan.nielsen@microchip.com> wrote:
>
> On 18.02.2020 16:02, Vladimir Oltean wrote:
> >The problem is on RX.
> >
> >> Is it with the broadcast ARP, or is it the following unicast packet?
> >For the unicast packet.

My bad, it is not just the unicast packet that is dropped, the
broadcast ARP packet is dropped too. All get dropped with the
"drop_local" counter on the front-panel port, the "no valid
destinations" counter.

> When you have it working (in your setup, with your patch applied). Does
> the ping reply packet have an IFH (DSA-tag)?
>

FYI any packet sent through a swpN net device (an Ocelot/Felix net
device, that is) will have an Injection Frame Header.

> Or is it a frame on the NPI port without an IFH.
>

No, I'm not trying to ping over an IP set on the DSA master interface,
if that's what you're asking.

> This is important as this will tell us of the frame was copied to CPU
> and then redirected to the NPI port, or if it was plain forwarded.
>

IFH is "injection" (host -> switch).
I suppose you meant Extraction Frame Header?
Nothing is seen by tcpdumping on the DSA master interface either.
Moreover, the "drop_local" counter on the front-panel swp0 increments
proportionally with the number of packets that the CPU should be
seeing.

By the way, could you please clarify the mechanisms by which the
switch will add an EFH on some but not all packets sent towards the
NPI port? The fact that this is possible is news to me.

> I need to understand the problem better before trying to solve it.
>
> >> >But if I do this:
> >> >ip link add dev br0 type bridge
> >> >ip link set dev swp0 master br0
> >> >ip link set dev swp0 nomaster
> >> >ping 192.168.1.2
> >> >Then it works, because the code path from ocelot_bridge_stp_state_set
> >> >that puts the CPU port in the forwarding mask of the other ports gets
> >> >executed on the "bridge leave" action.
> >> >The whole point is to have the same behavior at probe time as after
> >> >removing the ports from the bridge.
> >> This does sound like a bug, but I still do not agree in the solution.
> >>
> >> >The code with ocelot_mact_learn towards PGID_CPU for the MAC addresses
> >> >of the switch port netdevices is all bypassed in Felix DSA. Even if it
> >> >weren't, it isn't the best solution.
> >> >On your switch, this test would probably work exactly because of that
> >> >ocelot_mact_learn.
> >> So I guess it is the reception of the unicast packet which is causing
> >> problems.
> >>
> >> >But try to receive packets sent at any other unicast DMAC immediately
> >> >after probe time, and you should see them in tcpdump but won't.
> >> That is true - this is because we have no way of implementing promisc
> >> mode, which still allow us to HW offload of the switching. We discussed
> >> this before.
> >>
> >> Long story short, it sounds like you have an issue because the
> >> Felix/DSA driver behave differently than the Ocelot. Could you try to do
> >> your fix such that it only impact Felix and does not change the Ocelot
> >> behavioral.
> >
> >It looks like you disagree with having BIT(ocelot->cpu) in PGID_SRC +
> >p (the forwarding matrix) and just want to rely on whitelisting
> >towards PGID_CPU*?
> Yes.
>
> When the port is not member of the bridge, it should act as a normal NIC
> interface.
>
> With this change frames are being forwarded even when the port is not
> member of the bridge. This may be what you want in a DSA (or may not -
> not sure), but it is not ideal in the Ocelot/switchdev solution as we
> want to use the MAC-table to do the RX filtering.
>
> >But you already have that logic present in your driver, it's just not
> >called from a useful place for Felix.
> >So it logically follows that we should remove these lines from
> >ocelot_bridge_stp_state_set, no?
> >
> >            } else {
> >                    /* Only the CPU port, this is compatible with link
> >                     * aggregation.
> >                     */
> >                    ocelot_write_rix(ocelot,
> >                                     BIT(ocelot->cpu),
> >                                     ANA_PGID_PGID, PGID_SRC + p);
> This should not be removed. When the port is member of the bridge this
> bit must be set. When it is removed it must be cleared again.

Yes, but why keep BIT(ocelot->cpu) in the "valid destinations for this
source port p" mask (PGID_SRC + p) at all, if you just explained above
that you don't need it, because the MAC table takes care of getting
frames copied to the CPU, and there are dubious loopholes in the
hardware implementation that make this possible even if the third PGID
lookup (for source masks) denies it?

>
> >*I admit that I have no idea why it works for you, and why the frames
> >learned towards PGID_CPU are forwarded to the CPU _despite_
> >BIT(ocelot->cpu) not being present in PGID_SRC + p.
> I believe this is because we have the MAC address in the MAC table.
>

Could you please confirm this using stronger language than "I believe"?
Just for everybody to follow along here, in Ocelot/Felix, Port Group
IDs (PGIDs) are masks of destination ports. The switch performs 3
lookups in the PGID table for each frame, and forwards the frame to
the ports that are present in the logical AND of all 3 PGIDs (for the
most part, see below).

The first PGID lookup is for the destination masks and the PGID table
is indexed by the DEST_IDX field from the MAC table (FDB).
The PGID can be an unicast set: PGIDs 0-11 are the per-port PGIDs, and
by convention PGID i has only BIT(i) set, aka only this port is set in
the destination mask.
Or the PGID can be a multicast set: PGIDs 12-63 can (again, still by
convention) hold a richer destination mask comprised of multiple
ports.

I'll be ignoring the second PGID lookup, for aggregation, here, since
it doesn't interfere (it doesn't restrict the valid destinations mask
in any way).

The third PGID lookup is for source masks: PGID entries 80-91 answer
the question: is port i allowed to forward traffic to port j? If yes,
then BIT(j) of PGID 80+i will be found set. _These_ are what we're
talking about here (PGID_SRC + i).

What is interesting about the CPU port in this whole story is that, in
the way the driver sets up the PGIDs, its bit isn't set in any source
mask PGID of any other port (therefore, the third lookup would always
decide to exclude the CPU port from this list). So frames are never
_forwarded_ to the CPU.

There is a loophole in this PGID mechanism which is described in the
VSC7514 manual:

        If an entry is found in the MAC table entry of ENTRY_TYPE 0 or 1
        and the CPU port is set in the PGID pointed to by the MAC table
        entry, CPU extraction queue PGID.DST_PGID is added to the CPUQ.

In other words, the CPU port is special, and frames are "copied" to
the CPU, disregarding the source masks (third PGID lookup), if
BIT(cpu) is found to be set in the destination masks (first PGID
lookup).

So my question was: is the "copy to CPU" action special-cased in the
Ocelot hardware to bypass the L2 forwarding matrix? Because if it is,
that isn't how any of the other DSA switches works.

> It seems that you want to use learning to forward frames to the CPU,
> also in the case when the port is not a member of the bridge. I'm not
> too keen on this, mainly because I'm not sure how well it will work. If
> you are certain this is what you want for Felix then lets try find a way
> to make it happend for Felix without chancing the behaivural for Ocelot.
>
> An alternative solution would be to use the MAC-table for white listing
> of unicast packets. But as I understand the thread this is not so easy
> to do with DSA. Sorry, I do not know DSA very well, and was not able to
> fully understand why. But this is as far as I know the only way to get
> the proper RX filtering.

With VLAN filtering enabled, keeping the MAC of the interface
installed in the MAC table in all VLANs that the port is a member of
will be a nightmare. Also think about reference counting (in DSA all
ports have the same MAC address by default. When should you remove the
MAC table entry corresponding to a port?) Also consider that I'd have
to change a lot of core DSA stuff for one switch.

>
> An other solution, is to skip the RX filtering, and when a port is not
> member of a beidge set the 'ANA:PORT[0-11]:CPU_FWD_CFG.CPU_SRC_COPY_ENA'
> bit. This will cause all fraems to be copied to the CPU. Again, we need
> to find a way to do this which does not affect Ocelot.

But I don't want to do this, do I? I want a forwarding path towards
the CPU, not to spam the CPU.

>
> /Allan
>

Regards,
-Vladimir