All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marek Behun <marek.behun@nic.cz>
To: Tobias Waldekranz <tobias@waldekranz.com>
Cc: "Vladimir Oltean" <olteanv@gmail.com>,
	"Ansuel Smith" <ansuelsmth@gmail.com>,
	netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Andrew Lunn" <andrew@lunn.ch>,
	"Vivien Didelot" <vivien.didelot@gmail.com>,
	"Florian Fainelli" <f.fainelli@gmail.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andriin@fb.com>,
	"Eric Dumazet" <edumazet@google.com>,
	"Wei Wang" <weiwan@google.com>,
	"Cong Wang" <cong.wang@bytedance.com>,
	"Taehee Yoo" <ap420073@gmail.com>,
	"Björn Töpel" <bjorn@kernel.org>,
	"zhang kai" <zhangkaiheb@126.com>,
	"Weilong Chen" <chenweilong@huawei.com>,
	"Roopa Prabhu" <roopa@cumulusnetworks.com>,
	"Di Zhu" <zhudi21@huawei.com>,
	"Francis Laniel" <laniel_francis@privacyrequired.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
Date: Mon, 12 Apr 2021 23:56:57 +0200	[thread overview]
Message-ID: <20210412235657.4cc201b5@thinkpad> (raw)
In-Reply-To: <87zgy3jhr1.fsf@waldekranz.com>

On Mon, 12 Apr 2021 23:49:22 +0200
Tobias Waldekranz <tobias@waldekranz.com> wrote:

> On Tue, Apr 13, 2021 at 00:34, Vladimir Oltean <olteanv@gmail.com> wrote:
> > On Mon, Apr 12, 2021 at 11:22:45PM +0200, Tobias Waldekranz wrote:  
> >> On Mon, Apr 12, 2021 at 21:30, Marek Behun <marek.behun@nic.cz> wrote:  
> >> > On Mon, 12 Apr 2021 14:46:11 +0200
> >> > Tobias Waldekranz <tobias@waldekranz.com> wrote:
> >> >  
> >> >> I agree. Unless you only have a few really wideband flows, a LAG will
> >> >> typically do a great job with balancing. This will happen without the
> >> >> user having to do any configuration at all. It would also perform well
> >> >> in "router-on-a-stick"-setups where the incoming and outgoing port is
> >> >> the same.  
> >> >
> >> > TLDR: The problem with LAGs how they are currently implemented is that
> >> > for Turris Omnia, basically in 1/16 of configurations the traffic would
> >> > go via one CPU port anyway.
> >> >
> >> >
> >> >
> >> > One potencial problem that I see with using LAGs for aggregating CPU
> >> > ports on mv88e6xxx is how these switches determine the port for a
> >> > packet: only the src and dst MAC address is used for the hash that
> >> > chooses the port.
> >> >
> >> > The most common scenario for Turris Omnia, for example, where we have 2
> >> > CPU ports and 5 user ports, is that into these 5 user ports the user
> >> > plugs 5 simple devices (no switches, so only one peer MAC address for
> >> > port). So we have only 5 pairs of src + dst MAC addresses. If we simply
> >> > fill the LAG table as it is done now, then there is 2 * 0.5^5 = 1/16
> >> > chance that all packets would go through one CPU port.
> >> >
> >> > In order to have real load balancing in this scenario, we would either
> >> > have to recompute the LAG mask table depending on the MAC addresses, or
> >> > rewrite the LAG mask table somewhat randomly periodically. (This could
> >> > be in theory offloaded onto the Z80 internal CPU for some of the
> >> > switches of the mv88e6xxx family, but not for Omnia.)  
> >> 
> >> I thought that the option to associate each port netdev with a DSA
> >> master would only be used on transmit. Are you saying that there is a
> >> way to configure an mv88e6xxx chip to steer packets to different CPU
> >> ports depending on the incoming port?
> >> 
> >> The reason that the traffic is directed towards the CPU is that some
> >> kind of entry in the ATU says so, and the destination of that entry will
> >> either be a port vector or a LAG. Of those two, only the LAG will offer
> >> any kind of balancing. What am I missing?
> >> 
> >> Transmit is easy; you are already in the CPU, so you can use an
> >> arbitrarily fancy hashing algo/ebpf classifier/whatever to load balance
> >> in that case.  
> >
> > Say a user port receives a broadcast frame. Based on your understanding
> > where user-to-CPU port assignments are used only for TX, which CPU port
> > should be selected by the switch for this broadcast packet, and by which
> > mechanism?  
> 
> AFAIK, the only option available to you (again, if there is no LAG set
> up) is to statically choose one CPU port per entry. But hopefully Marek
> can teach me some new tricks!
> 
> So for any known (since the broadcast address is loaded in the ATU it is
> known) destination (b/m/u-cast), you can only "load balance" based on
> the DA. You would also have to make sure that unknown unicast and
> unknown multicast is only allowed to egress one of the CPU ports.
> 
> If you have a LAG OTOH, you could include all CPU ports in the port
> vectors of those same entries. The LAG mask would then do the final
> filtering so that you only send a single copy to the CPU.

The problem is that when the mv88e6xxx switch chooses the LAG entry, it
takes into account only hash(src MAC | dst MAC). There is no other
option, Marvell switches are unable to take more information into this
decision (for example hash of the IP + TCP/UDP header).

And in many usecases, there are only a couple of this (src,dst) MAC
pairs. On Turris Omnia in most cases there are only 5 such pairs,
because the user plugs into the router only 5 devices.

So for each of these 5 (src,dst) MAC pairs, there is probability 1/2
that the packet will be sent via CPU port 0. So 1/32 probability that
all packets will be sent via CPU port 0, and the same probability that
all packets will be sent via CPU port 1.

This means that in 1/16 of cases the LAG is useless in this scenario.

  reply	other threads:[~2021-04-12 21:57 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-10 13:34 [PATCH RFC net-next 0/3] Multi-CPU DSA support Ansuel Smith
2021-04-10 13:34 ` [PATCH RFC net-next 1/3] net: dsa: allow for multiple CPU ports Ansuel Smith
2021-04-12  3:35   ` DENG Qingfang
2021-04-12  4:41     ` Ansuel Smith
2021-04-12 15:30       ` DENG Qingfang
2021-04-12 16:17         ` Frank Wunderlich
2021-04-10 13:34 ` [PATCH RFC net-next 2/3] net: add ndo for setting the iflink property Ansuel Smith
2021-04-10 13:34 ` [PATCH RFC net-next 3/3] net: dsa: implement ndo_set_netlink for chaning port's CPU port Ansuel Smith
2021-04-10 13:34 ` [PATCH RFC iproute2-next] iplink: allow to change iplink value Ansuel Smith
2021-04-11 17:04   ` Stephen Hemminger
2021-04-11 17:09     ` Vladimir Oltean
2021-04-11 18:01 ` [PATCH RFC net-next 0/3] Multi-CPU DSA support Marek Behun
2021-04-11 18:08   ` Ansuel Smith
2021-04-11 18:39   ` Andrew Lunn
2021-04-12  2:07     ` Florian Fainelli
2021-04-12  4:53     ` Ansuel Smith
2021-04-11 18:50   ` Vladimir Oltean
2021-04-11 23:53     ` Vladimir Oltean
2021-04-12  2:10       ` Florian Fainelli
2021-04-12  5:04     ` Ansuel Smith
2021-04-12 12:46     ` Tobias Waldekranz
2021-04-12 14:35       ` Vladimir Oltean
2021-04-12 21:06         ` Tobias Waldekranz
2021-04-12 19:30       ` Marek Behun
2021-04-12 21:22         ` Tobias Waldekranz
2021-04-12 21:34           ` Vladimir Oltean
2021-04-12 21:49             ` Tobias Waldekranz
2021-04-12 21:56               ` Marek Behun [this message]
2021-04-12 22:06               ` Vladimir Oltean
2021-04-12 22:26                 ` Tobias Waldekranz
2021-04-12 22:48                   ` Vladimir Oltean
2021-04-12 23:04                     ` Marek Behun
2021-04-12 21:50           ` Marek Behun
2021-04-12 22:05             ` Tobias Waldekranz
2021-04-12 22:55               ` Marek Behun
2021-04-12 23:09                 ` Tobias Waldekranz
2021-04-12 23:13                   ` Tobias Waldekranz
2021-04-12 23:54                     ` Marek Behun
2021-04-13  0:27                       ` Marek Behun
2021-04-13  0:31                         ` Marek Behun
2021-04-13 14:46                         ` Tobias Waldekranz
2021-04-13 15:14                           ` Marek Behun
2021-04-13 18:16                             ` Tobias Waldekranz
2021-04-14 15:14                               ` Marek Behun
2021-04-14 18:39                                 ` Tobias Waldekranz
2021-04-14 23:39                                   ` Vladimir Oltean
2021-04-15  9:20                                     ` Tobias Waldekranz
2021-04-13 14:40                       ` Tobias Waldekranz
2021-04-12 15:00     ` DENG Qingfang
2021-04-12 16:32       ` Vladimir Oltean
2021-04-12 22:04         ` Marek Behun
2021-04-12 22:17           ` Vladimir Oltean
2021-04-12 22:47             ` Marek Behun
  -- strict thread matches above, loose matches on Subject: below --
2019-08-24  2:42 Marek Behún
2019-08-24 15:24 ` Andrew Lunn
2019-08-24 17:45   ` Marek Behun
2019-08-24 17:54     ` Andrew Lunn
2019-08-25  4:19   ` Marek Behun
2019-08-24 15:40 ` Vladimir Oltean
2019-08-24 15:44   ` Vladimir Oltean
2019-08-24 17:55     ` Marek Behun
2019-08-24 15:56   ` Andrew Lunn
2019-08-24 17:58     ` Marek Behun
2019-08-24 20:04 ` Florian Fainelli
2019-08-24 21:01   ` Marek Behun
2019-08-25  4:08     ` Marek Behun
2019-08-25  7:13   ` Marek Behun
2019-08-25 15:00     ` Florian Fainelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210412235657.4cc201b5@thinkpad \
    --to=marek.behun@nic.cz \
    --cc=andrew@lunn.ch \
    --cc=andriin@fb.com \
    --cc=ansuelsmth@gmail.com \
    --cc=ap420073@gmail.com \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=chenweilong@huawei.com \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=f.fainelli@gmail.com \
    --cc=kuba@kernel.org \
    --cc=laniel_francis@privacyrequired.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=roopa@cumulusnetworks.com \
    --cc=tobias@waldekranz.com \
    --cc=vivien.didelot@gmail.com \
    --cc=weiwan@google.com \
    --cc=zhangkaiheb@126.com \
    --cc=zhudi21@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.