From mboxrd@z Thu Jan 1 00:00:00 1970 From: Florian Fainelli Subject: Re: [PATCH 4/4] cpsw: add switchdev support Date: Tue, 5 Jun 2018 14:37:41 -0700 Message-ID: References: <1527144984-31236-1-git-send-email-ilias.apalodimas@linaro.org> <1527144984-31236-5-git-send-email-ilias.apalodimas@linaro.org> <20180524131229.GC24557@lunn.ch> <20180524133234.GA15703@apalos> <20180524163904.GH5128@lunn.ch> <20180525045647.GA1758@apalos> <20180602103432.GA948@apalos> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Cc: Andrew Lunn , netdev@vger.kernel.org, ivan.khoronzhuk@linaro.org, nsekhar@ti.com, jiri@resnulli.us, ivecera@redhat.com, francois.ozog@linaro.org, yogeshs@ti.com, spatton@ti.com To: Grygorii Strashko , Ilias Apalodimas Return-path: Received: from mail-qt0-f179.google.com ([209.85.216.179]:34010 "EHLO mail-qt0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751600AbeFEVhu (ORCPT ); Tue, 5 Jun 2018 17:37:50 -0400 Received: by mail-qt0-f179.google.com with SMTP id d3-v6so4207575qto.1 for ; Tue, 05 Jun 2018 14:37:50 -0700 (PDT) In-Reply-To: Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 06/05/2018 02:03 PM, Grygorii Strashko wrote: > > > On 06/02/2018 05:34 AM, Ilias Apalodimas wrote: >> Hi Florian, >> >> Thanks for taking time to look into this >> >> On Fri, Jun 01, 2018 at 02:48:48PM -0700, Florian Fainelli wrote: >>> >>> >>> On 05/24/2018 09:56 PM, Ilias Apalodimas wrote: >>>> On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote: >>>>> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote: >>>>>> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote: >>>>>>> Device tree is supposed to describe the hardware. Using that hardware >>>>>>> in different ways is not something you should describe in DT. >>>>>>> >>>>>> The new switchdev mode is applied with a .config option in the kernel. What you >>>>>> see is pre-existing code, so i am not sure if i should change it in this >>>>>> patchset. >>>>> >>>>> If you break the code up into a library and two drivers, it becomes a >>>>> moot point. >>>> Agree >>>> >>>>> >>>>> But what i don't like here is that the device tree says to do dual >>>>> mac. But you ignore that and do sometime else. I would prefer that if >>>>> DT says dual mac, and switchdev is compiled in, the probe fails with >>>>> EINVAL. Rather than ignore something, make it clear it is invalid. >>>> The switch has 3 modes of operation as is. >>>> 1. switch mode, to enable that you don't need to add anything on >>>> the DTS and linux registers a single netdev interface. >>>> 2. dual mac mode, this is when you need to add dual_emac; on the DTS. >>>> 3. switchdev mode which is controlled by a .config option, since as you >>>> pointed out DTS was not made for controlling config options. >>>> >>>> I agree that this is far from beautiful. If the driver remains as in though, >>>> i'd prefer either keeping what's there or making "switchdev" a DTS option, >>>> following the pre-existing erroneous usage rather than making the device >>>> unusable. If we end up returning some error and refuse to initialize, users >>>> that remote upgrade their equipment, without taking a good look at changelog, >>>> will loose access to their devices with no means of remotely fixing that. >>> >>> It seems to me that the mistake here is seeing multiple modes of >>> operations for the cpsw. There are not actually many, there is one >>> usage, and then there is what you can and cannot offload. > >> CPSW has in fact 2 modes of operation, different FIFO usage/lookup entry(it's >> called ALE in the current driver) by-pass(which is used in dual emac for >> example) and other features. Again Grygorii is better suited to answer the >> exact differences. > > dual_mac is HW enabled mode of operation, so having DT option is pretty > reasonable as for me. > 1) when enabled it configures internal FIFOs in special way so both > external Ports became equal in the direction toward to Host port 0. > > TRM "The intention of this mode is to allow packets from both ethernet ports > to be written into the FIFO without one port starving the other port." > > 2) ALE, out of the box, does not support this mode and, as result, two > default vlan have to be created to direct traffic P1->P0 (vlan1) and > P2-P0 (vlan2), but any packets P0->P1 and P0->P2 have to be directed > (and will bypass ALE). This way traffic separated on cpsw egress towards to P0, > >>> The basic >>> premise with switchdev and DSA (which uses switchdev) is that each >>> user-facing port of your switch needs to work as if it were a normal >>> Ethernet NIC, that is what you call dual-MAC I believe. Then, when you >>> create a bridge and you enslave those ports into the bridge, you need to >>> have forwarding done in hardware between these two ports when the >>> SMAC/DMAC are not for the host/CPU/management interface and you must >>> simultaneously still have the host have the ability to send/receive >>> traffic through the bridge device. > > TRM > "When operating in dual mac mode the intention is to transfer packets between ports 0 and 1 and ports 0 > and 2, but not between ports 1 and 2. Each CPGMAC_SL appears as a single MAC with no bridging > between MAC’s. Each CPGMAC_SL has at least one unique (not the same) mac address." > > So, we cant't talk about forwarding in dual_mac mode. Interfaces expected to be > completely independent without any packet leaking between interfaces. > > !! BUT !! CPSW ALE (switch core) still expected to be used, but for packet filtering > - only registered vlans > - only registered mcast/bcast > - ingress mcast/bcast rate limiting (it's actually more like coalescing - > limits number of mcast/bcast packets per sec. > > And all offloading ALE (val/mdb) entries should always contain two ports in masks: > p1&p0 or p2&p0. Never ever all three ports. > >> Yes dual emac does that. But dual emac configures the port facing VLAN to the >> CPU port as well. So dual emac splits and uses 2 interfaces. VLAN 1 is >> configured on port1 + CPU port and VLAN 2 is confired on port 2 + CPU port >> That's exactly what the current RFC does as well, with the addition of >> registering a sw0p0 (i'll explain why later on this mail) >> A little more detail on the issue we are having. On my description >> sw0p0 -> CPU port, sw0p1 -> port 1 sw0p2 -> port 2. sw0p1/sw0p2 are the ports >> that have PHYs attached. >> >> When we start in the new switchdev mode all interfaces are added to VLAN 0 >> so CPU port + port1 + port2 are all in the same VLAN group. In that case sw0p1 >> and sw0p2 are working as you describe. So those 2 interfaces can send/receive >> traffic normally which matches the switchdev case. >> >> When we add them on a bridge(let's say br0), VLAN1(or any default bridge VLAN) >> is now configured on sw0p1 and sw0p2 but *not* on the CPU port. >> From this point on the whole fuunctionality just collapses. The switch will >> work and offload traffic between sw0p1/sw0p2 but the bridge won't be able to >> get an ip address (since VLAN1 is not a member of the CPU port and the packet >> gets dropped). >> IGMPv2/V3 messages will never reach the br_multicast.c code to trigger >> switchdev and configure the MDBs on the ports. i am prety sure there are other >> fail scenarios which i haven't discovered already, but those 2 are the most >> basic ones. If we add VLAN1 on the CPU port, everything works as intended of >> course. >> >> That's the reason we registered sw0p0 as the CPU port. It can't do any "real" >> traffic, but you can configure the CPU port independantly and not be forced to >> do an OR on every VLAN add/delete grouping the CPU port with your port command. >> The TL;DR version of this is that the switch is working exactly as switchdev is >> expecting offloading traffic to the hardware when possible as long as the CPU >> port is member of the proper VLANs >> >> Petr's patch solves this for us (9c86ce2c1ae337fc10568a12aea812ed03de8319). >> We can now do "bridge vlan add dev br0 vid 100 pvid untagged self" and decide >> when to add the CPU port or not. >> >> There are still a couple of cases that are not covered though, if we don't >> register the CPU port. We cant decide when to forward multicast >> traffic on the CPU port if a join hasn't been sent from br0. >> So let's say you got 2 hosts doing multicast and for whatever reason the host >> wants to see that traffic. >> With the CPU port present you can do a >> "bridge mdb add dev br0 port sw0p0 grp 239.1.1.1 permanent" which will offload >> the traffic to the CPU port and thus the host. If this goes away we are forced >> to send a join. >> >>> It seems to me like this is entirely doable given that the dual MAC use >>> case is supported already. >>> >>> switchdev is just a stateless framework to get notified from the >>> networking stack about what you can possibly offload in hardware, so >>> having a DTS option gate that is unfortunately wrong because it is >>> really implementing a SW policy in DTS which is not what it is meant for. >> The DTS option for configuration pre-existed, i don't like that either switchdev >> mode is activated by a .config option not DTS(it just overrides whatever config >> you have on the DTS). Far from pretty though fair point, i am open to >> suggestions. > > Again this is option describing HW mode which not expected to be changed on the fly. > I'm honestly, propose descope dual_mac and concentrate on switch mode (switchdev) - > right now I don't see how dual_mac can be supported with switchdev as per above. > The same way as I do not see how we can re-use switchdev with 50% of devices which > have "only one" user-facing external port (P1 or P2). This is still not appropriate for Device Tree, because this is completely orthogonal from one another. Also, I think you tend to conflate what the switch can do, with in which mode does it start by default, which are, again, two different things. -- Florian