netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tobias Waldekranz <tobias@waldekranz.com>
To: Vladimir Oltean <olteanv@gmail.com>
Cc: "Marek Behun" <marek.behun@nic.cz>,
	"Ansuel Smith" <ansuelsmth@gmail.com>,
	netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Andrew Lunn" <andrew@lunn.ch>,
	"Vivien Didelot" <vivien.didelot@gmail.com>,
	"Florian Fainelli" <f.fainelli@gmail.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andriin@fb.com>,
	"Eric Dumazet" <edumazet@google.com>,
	"Wei Wang" <weiwan@google.com>,
	"Cong Wang" <cong.wang@bytedance.com>,
	"Taehee Yoo" <ap420073@gmail.com>,
	"Björn Töpel" <bjorn@kernel.org>,
	"zhang kai" <zhangkaiheb@126.com>,
	"Weilong Chen" <chenweilong@huawei.com>,
	"Roopa Prabhu" <roopa@cumulusnetworks.com>,
	"Di Zhu" <zhudi21@huawei.com>,
	"Francis Laniel" <laniel_francis@privacyrequired.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
Date: Tue, 13 Apr 2021 00:26:52 +0200	[thread overview]
Message-ID: <87tuobjg0j.fsf@waldekranz.com> (raw)
In-Reply-To: <20210412220655.27rsiyxlf3f3tydm@skbuf>

On Tue, Apr 13, 2021 at 01:06, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Mon, Apr 12, 2021 at 11:49:22PM +0200, Tobias Waldekranz wrote:
>> On Tue, Apr 13, 2021 at 00:34, Vladimir Oltean <olteanv@gmail.com> wrote:
>> > On Mon, Apr 12, 2021 at 11:22:45PM +0200, Tobias Waldekranz wrote:
>> >> On Mon, Apr 12, 2021 at 21:30, Marek Behun <marek.behun@nic.cz> wrote:
>> >> > On Mon, 12 Apr 2021 14:46:11 +0200
>> >> > Tobias Waldekranz <tobias@waldekranz.com> wrote:
>> >> >
>> >> >> I agree. Unless you only have a few really wideband flows, a LAG will
>> >> >> typically do a great job with balancing. This will happen without the
>> >> >> user having to do any configuration at all. It would also perform well
>> >> >> in "router-on-a-stick"-setups where the incoming and outgoing port is
>> >> >> the same.
>> >> >
>> >> > TLDR: The problem with LAGs how they are currently implemented is that
>> >> > for Turris Omnia, basically in 1/16 of configurations the traffic would
>> >> > go via one CPU port anyway.
>> >> >
>> >> >
>> >> >
>> >> > One potencial problem that I see with using LAGs for aggregating CPU
>> >> > ports on mv88e6xxx is how these switches determine the port for a
>> >> > packet: only the src and dst MAC address is used for the hash that
>> >> > chooses the port.
>> >> >
>> >> > The most common scenario for Turris Omnia, for example, where we have 2
>> >> > CPU ports and 5 user ports, is that into these 5 user ports the user
>> >> > plugs 5 simple devices (no switches, so only one peer MAC address for
>> >> > port). So we have only 5 pairs of src + dst MAC addresses. If we simply
>> >> > fill the LAG table as it is done now, then there is 2 * 0.5^5 = 1/16
>> >> > chance that all packets would go through one CPU port.
>> >> >
>> >> > In order to have real load balancing in this scenario, we would either
>> >> > have to recompute the LAG mask table depending on the MAC addresses, or
>> >> > rewrite the LAG mask table somewhat randomly periodically. (This could
>> >> > be in theory offloaded onto the Z80 internal CPU for some of the
>> >> > switches of the mv88e6xxx family, but not for Omnia.)
>> >> 
>> >> I thought that the option to associate each port netdev with a DSA
>> >> master would only be used on transmit. Are you saying that there is a
>> >> way to configure an mv88e6xxx chip to steer packets to different CPU
>> >> ports depending on the incoming port?
>> >> 
>> >> The reason that the traffic is directed towards the CPU is that some
>> >> kind of entry in the ATU says so, and the destination of that entry will
>> >> either be a port vector or a LAG. Of those two, only the LAG will offer
>> >> any kind of balancing. What am I missing?
>> >> 
>> >> Transmit is easy; you are already in the CPU, so you can use an
>> >> arbitrarily fancy hashing algo/ebpf classifier/whatever to load balance
>> >> in that case.
>> >
>> > Say a user port receives a broadcast frame. Based on your understanding
>> > where user-to-CPU port assignments are used only for TX, which CPU port
>> > should be selected by the switch for this broadcast packet, and by which
>> > mechanism?
>> 
>> AFAIK, the only option available to you (again, if there is no LAG set
>> up) is to statically choose one CPU port per entry. But hopefully Marek
>> can teach me some new tricks!
>> 
>> So for any known (since the broadcast address is loaded in the ATU it is
>> known) destination (b/m/u-cast), you can only "load balance" based on
>> the DA. You would also have to make sure that unknown unicast and
>> unknown multicast is only allowed to egress one of the CPU ports.
>> 
>> If you have a LAG OTOH, you could include all CPU ports in the port
>> vectors of those same entries. The LAG mask would then do the final
>> filtering so that you only send a single copy to the CPU.
>
> I forgot that mv88e6xxx keeps the broadcast address in the ATU. I wanted
> to know what is done in the flooding case, therefore I should have asked
> about unknown destination traffic. It is sent to one CPU but not the
> other based on what information?
>
> And for destinations loaded into the ATU, how is user port isolation
> performed? Say lan0 and lan1 have the same MAC address of 00:01:02:03:04:05,
> but lan0 goes to the eth0 DSA master and lan1 goes to eth1. How many ATU
> entries would there be for host addresses, and towards which port would
> they point? Are they isolated by a port private VLAN or something along
> those lines?

This is what I do not understand. This is what I hope that Marek can
tell me. To my knowledge, there is no way to per-port load balancing
from the switch to the CPU. In any given FID, there can be only one
entry per address, and that entry can only point to either a vector or a
LAG.

So my theory is that the only way of getting any load balancing, however
flawed, on receive (from switch to CPU) is by setting up a
LAG. Hopefully there is some trick that I do not know about which means
we have another option available to us.

  reply	other threads:[~2021-04-12 22:26 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-10 13:34 [PATCH RFC net-next 0/3] Multi-CPU DSA support Ansuel Smith
2021-04-10 13:34 ` [PATCH RFC net-next 1/3] net: dsa: allow for multiple CPU ports Ansuel Smith
2021-04-12  3:35   ` DENG Qingfang
2021-04-12  4:41     ` Ansuel Smith
2021-04-12 15:30       ` DENG Qingfang
2021-04-12 16:17         ` Frank Wunderlich
2021-04-10 13:34 ` [PATCH RFC net-next 2/3] net: add ndo for setting the iflink property Ansuel Smith
2021-04-10 13:34 ` [PATCH RFC net-next 3/3] net: dsa: implement ndo_set_netlink for chaning port's CPU port Ansuel Smith
2021-04-10 13:34 ` [PATCH RFC iproute2-next] iplink: allow to change iplink value Ansuel Smith
2021-04-11 17:04   ` Stephen Hemminger
2021-04-11 17:09     ` Vladimir Oltean
2021-04-11 18:01 ` [PATCH RFC net-next 0/3] Multi-CPU DSA support Marek Behun
2021-04-11 18:08   ` Ansuel Smith
2021-04-11 18:39   ` Andrew Lunn
2021-04-12  2:07     ` Florian Fainelli
2021-04-12  4:53     ` Ansuel Smith
2021-04-11 18:50   ` Vladimir Oltean
2021-04-11 23:53     ` Vladimir Oltean
2021-04-12  2:10       ` Florian Fainelli
2021-04-12  5:04     ` Ansuel Smith
2021-04-12 12:46     ` Tobias Waldekranz
2021-04-12 14:35       ` Vladimir Oltean
2021-04-12 21:06         ` Tobias Waldekranz
2021-04-12 19:30       ` Marek Behun
2021-04-12 21:22         ` Tobias Waldekranz
2021-04-12 21:34           ` Vladimir Oltean
2021-04-12 21:49             ` Tobias Waldekranz
2021-04-12 21:56               ` Marek Behun
2021-04-12 22:06               ` Vladimir Oltean
2021-04-12 22:26                 ` Tobias Waldekranz [this message]
2021-04-12 22:48                   ` Vladimir Oltean
2021-04-12 23:04                     ` Marek Behun
2021-04-12 21:50           ` Marek Behun
2021-04-12 22:05             ` Tobias Waldekranz
2021-04-12 22:55               ` Marek Behun
2021-04-12 23:09                 ` Tobias Waldekranz
2021-04-12 23:13                   ` Tobias Waldekranz
2021-04-12 23:54                     ` Marek Behun
2021-04-13  0:27                       ` Marek Behun
2021-04-13  0:31                         ` Marek Behun
2021-04-13 14:46                         ` Tobias Waldekranz
2021-04-13 15:14                           ` Marek Behun
2021-04-13 18:16                             ` Tobias Waldekranz
2021-04-14 15:14                               ` Marek Behun
2021-04-14 18:39                                 ` Tobias Waldekranz
2021-04-14 23:39                                   ` Vladimir Oltean
2021-04-15  9:20                                     ` Tobias Waldekranz
2021-04-13 14:40                       ` Tobias Waldekranz
2021-04-12 15:00     ` DENG Qingfang
2021-04-12 16:32       ` Vladimir Oltean
2021-04-12 22:04         ` Marek Behun
2021-04-12 22:17           ` Vladimir Oltean
2021-04-12 22:47             ` Marek Behun
  -- strict thread matches above, loose matches on Subject: below --
2019-08-24  2:42 Marek Behún
2019-08-24 15:24 ` Andrew Lunn
2019-08-24 17:45   ` Marek Behun
2019-08-24 17:54     ` Andrew Lunn
2019-08-25  4:19   ` Marek Behun
2019-08-24 15:40 ` Vladimir Oltean
2019-08-24 15:44   ` Vladimir Oltean
2019-08-24 17:55     ` Marek Behun
2019-08-24 15:56   ` Andrew Lunn
2019-08-24 17:58     ` Marek Behun
2019-08-24 20:04 ` Florian Fainelli
2019-08-24 21:01   ` Marek Behun
2019-08-25  4:08     ` Marek Behun
2019-08-25  7:13   ` Marek Behun
2019-08-25 15:00     ` Florian Fainelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tuobjg0j.fsf@waldekranz.com \
    --to=tobias@waldekranz.com \
    --cc=andrew@lunn.ch \
    --cc=andriin@fb.com \
    --cc=ansuelsmth@gmail.com \
    --cc=ap420073@gmail.com \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=chenweilong@huawei.com \
    --cc=cong.wang@bytedance.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=f.fainelli@gmail.com \
    --cc=kuba@kernel.org \
    --cc=laniel_francis@privacyrequired.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marek.behun@nic.cz \
    --cc=netdev@vger.kernel.org \
    --cc=olteanv@gmail.com \
    --cc=roopa@cumulusnetworks.com \
    --cc=vivien.didelot@gmail.com \
    --cc=weiwan@google.com \
    --cc=zhangkaiheb@126.com \
    --cc=zhudi21@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).