From mboxrd@z Thu Jan 1 00:00:00 1970 From: Florian Fainelli Subject: Re: [PATCH 4/4] cpsw: add switchdev support Date: Sat, 02 Jun 2018 09:10:08 -0700 Message-ID: References: <1527144984-31236-1-git-send-email-ilias.apalodimas@linaro.org> <1527144984-31236-5-git-send-email-ilias.apalodimas@linaro.org> <20180524131229.GC24557@lunn.ch> <20180524133234.GA15703@apalos> <20180524163904.GH5128@lunn.ch> <20180525045647.GA1758@apalos> <20180602103432.GA948@apalos> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Andrew Lunn , netdev@vger.kernel.org, grygorii.strashko@ti.com, ivan.khoronzhuk@linaro.org, nsekhar@ti.com, jiri@resnulli.us, ivecera@redhat.com, francois.ozog@linaro.org, yogeshs@ti.com, spatton@ti.com To: Ilias Apalodimas Return-path: Received: from mail-pf0-f193.google.com ([209.85.192.193]:43092 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750964AbeFBQKO (ORCPT ); Sat, 2 Jun 2018 12:10:14 -0400 Received: by mail-pf0-f193.google.com with SMTP id j20-v6so13896089pff.10 for ; Sat, 02 Jun 2018 09:10:14 -0700 (PDT) In-Reply-To: <20180602103432.GA948@apalos> Sender: netdev-owner@vger.kernel.org List-ID: On June 2, 2018 3:34:32 AM MST, Ilias Apalodimas wrote: >Hi Florian,=20 > >Thanks for taking time to look into this > >On Fri, Jun 01, 2018 at 02:48:48PM -0700, Florian Fainelli wrote: >>=20 >>=20 >> On 05/24/2018 09:56 PM, Ilias Apalodimas wrote: >> > On Thu, May 24, 2018 at 06:39:04PM +0200, Andrew Lunn wrote: >> >> On Thu, May 24, 2018 at 04:32:34PM +0300, Ilias Apalodimas wrote: >> >>> On Thu, May 24, 2018 at 03:12:29PM +0200, Andrew Lunn wrote: >> >>>> Device tree is supposed to describe the hardware=2E Using that >hardware >> >>>> in different ways is not something you should describe in DT=2E >> >>>> >> >>> The new switchdev mode is applied with a =2Econfig option in the >kernel=2E What you >> >>> see is pre-existing code, so i am not sure if i should change it >in this >> >>> patchset=2E >> >> >> >> If you break the code up into a library and two drivers, it >becomes a >> >> moot point=2E >> > Agree >> >=20 >> >> >> >> But what i don't like here is that the device tree says to do dual >> >> mac=2E But you ignore that and do sometime else=2E I would prefer th= at >if >> >> DT says dual mac, and switchdev is compiled in, the probe fails >with >> >> EINVAL=2E Rather than ignore something, make it clear it is invalid= =2E >> > The switch has 3 modes of operation as is=2E >> > 1=2E switch mode, to enable that you don't need to add anything on >> > the DTS and linux registers a single netdev interface=2E >> > 2=2E dual mac mode, this is when you need to add dual_emac; on the >DTS=2E >> > 3=2E switchdev mode which is controlled by a =2Econfig option, since = as >you=20 >> > pointed out DTS was not made for controlling config options=2E=20 >> >=20 >> > I agree that this is far from beautiful=2E If the driver remains as >in though, >> > i'd prefer either keeping what's there or making "switchdev" a DTS >option,=20 >> > following the pre-existing erroneous usage rather than making the >device=20 >> > unusable=2E If we end up returning some error and refuse to >initialize, users=20 >> > that remote upgrade their equipment, without taking a good look at >changelog, >> > will loose access to their devices with no means of remotely fixing >that=2E >>=20 >> It seems to me that the mistake here is seeing multiple modes of >> operations for the cpsw=2E There are not actually many, there is one >> usage, and then there is what you can and cannot offload=2E=20 >CPSW has in fact 2 modes of operation, different FIFO usage/lookup >entry(it's >called ALE in the current driver) by-pass(which is used in dual emac >for=20 >example) and other features=2E Again Grygorii is better suited to answer >the=20 >exact differences=2E >> The basic >> premise with switchdev and DSA (which uses switchdev) is that each >> user-facing port of your switch needs to work as if it were a normal >> Ethernet NIC, that is what you call dual-MAC I believe=2E Then, when >you >> create a bridge and you enslave those ports into the bridge, you need >to >> have forwarding done in hardware between these two ports when the >> SMAC/DMAC are not for the host/CPU/management interface and you must >> simultaneously still have the host have the ability to send/receive >> traffic through the bridge device=2E >Yes dual emac does that=2E But dual emac configures the port facing VLAN >to the >CPU port as well=2E So dual emac splits and uses 2 interfaces=2E VLAN 1 i= s >configured on port1 + CPU port and VLAN 2 is confired on port 2 + CPU >port >That's exactly what the current RFC does as well, with the addition of >registering a sw0p0 (i'll explain why later on this mail) >A little more detail on the issue we are having=2E On my description=20 >sw0p0 -> CPU port, sw0p1 -> port 1 sw0p2 -> port 2=2E sw0p1/sw0p2 are the >ports >that have PHYs attached=2E=20 > >When we start in the new switchdev mode all interfaces are added to >VLAN 0 >so CPU port + port1 + port2 are all in the same VLAN group=2E In that >case sw0p1 >and sw0p2 are working as you describe=2E So those 2 interfaces can >send/receive >traffic normally which matches the switchdev case=2E > >When we add them on a bridge(let's say br0), VLAN1(or any default >bridge VLAN) >is now configured on sw0p1 and sw0p2 but *not* on the CPU port=2E=20 >>From this point on the whole fuunctionality just collapses=2E The switch >will=20 >work and offload traffic between sw0p1/sw0p2 but the bridge won't be >able to=20 >get an ip address (since VLAN1 is not a member of the CPU port and the >packet=20 >gets dropped)=2E=20 >IGMPv2/V3 messages will never reach the br_multicast=2Ec code to trigger= =20 >switchdev and configure the MDBs on the ports=2E i am prety sure there >are other >fail scenarios which i haven't discovered already, but those 2 are the >most=20 >basic ones=2E If we add VLAN1 on the CPU port, everything works as >intended of=20 >course=2E > >That's the reason we registered sw0p0 as the CPU port=2E It can't do any >"real" >traffic, but you can configure the CPU port independantly and not be >forced to >do an OR on every VLAN add/delete grouping the CPU port with your port >command=2E >The TL;DR version of this is that the switch is working exactly as >switchdev is >expecting offloading traffic to the hardware when possible as long as >the CPU >port is member of the proper VLANs > >Petr's patch solves this for us >(9c86ce2c1ae337fc10568a12aea812ed03de8319)=2E >We can now do "bridge vlan add dev br0 vid 100 pvid untagged self" and >decide >when to add the CPU port or not=2E=20 > >There are still a couple of cases that are not covered though, if we >don't=20 >register the CPU port=2E We cant decide when to forward multicast >traffic on the CPU port if a join hasn't been sent from br0=2E >So let's say you got 2 hosts doing multicast and for whatever reason >the host >wants to see that traffic=2E=20 >With the CPU port present you can do a=20 >"bridge mdb add dev br0 port sw0p0 grp 239=2E1=2E1=2E1 permanent" which w= ill >offload >the traffic to the CPU port and thus the host=2E If this goes away we are >forced >to send a join=2E Thanks for the detailed explanation=2E Somehow I was under the impression = that cpsw had the ability, through specific DMA descriptor bits to direct t= raffic towards one external port or another and conversely, have that infor= mation from the HW when receiving packets=2E What you describe is exactly t= he same problem we have in DSA when the switch advertises DSA_TAG_PROTO_NON= E where only VLAN tags could help differentiate traffic from external ports= =2E At some point there was a discuss of making DSA_TAG_PROTO_NONE automati= cally create one VLAN per port but this is a good source for other problems= =2E=2E=2E Looking forward to your follow-up patch series! --=20 Florian