From: Marek Behun <marek.behun@nic.cz>
To: Tobias Waldekranz <tobias@waldekranz.com>
Cc: "Vladimir Oltean" <olteanv@gmail.com>,
"Ansuel Smith" <ansuelsmth@gmail.com>,
netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
"Jakub Kicinski" <kuba@kernel.org>,
"Andrew Lunn" <andrew@lunn.ch>,
"Vivien Didelot" <vivien.didelot@gmail.com>,
"Florian Fainelli" <f.fainelli@gmail.com>,
"Alexei Starovoitov" <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Andrii Nakryiko" <andriin@fb.com>,
"Eric Dumazet" <edumazet@google.com>,
"Wei Wang" <weiwan@google.com>,
"Cong Wang" <cong.wang@bytedance.com>,
"Taehee Yoo" <ap420073@gmail.com>,
"Björn Töpel" <bjorn@kernel.org>,
"zhang kai" <zhangkaiheb@126.com>,
"Weilong Chen" <chenweilong@huawei.com>,
"Roopa Prabhu" <roopa@cumulusnetworks.com>,
"Di Zhu" <zhudi21@huawei.com>,
"Francis Laniel" <laniel_francis@privacyrequired.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support
Date: Tue, 13 Apr 2021 00:55:18 +0200 [thread overview]
Message-ID: <20210413005518.2f9b9cef@thinkpad> (raw)
In-Reply-To: <87wnt7jgzk.fsf@waldekranz.com>
On Tue, 13 Apr 2021 00:05:51 +0200
Tobias Waldekranz <tobias@waldekranz.com> wrote:
> On Mon, Apr 12, 2021 at 23:50, Marek Behun <marek.behun@nic.cz> wrote:
> > On Mon, 12 Apr 2021 23:22:45 +0200
> > Tobias Waldekranz <tobias@waldekranz.com> wrote:
> >
> >> On Mon, Apr 12, 2021 at 21:30, Marek Behun <marek.behun@nic.cz> wrote:
> >> > On Mon, 12 Apr 2021 14:46:11 +0200
> >> > Tobias Waldekranz <tobias@waldekranz.com> wrote:
> >> >
> >> >> I agree. Unless you only have a few really wideband flows, a LAG will
> >> >> typically do a great job with balancing. This will happen without the
> >> >> user having to do any configuration at all. It would also perform well
> >> >> in "router-on-a-stick"-setups where the incoming and outgoing port is
> >> >> the same.
> >> >
> >> > TLDR: The problem with LAGs how they are currently implemented is that
> >> > for Turris Omnia, basically in 1/16 of configurations the traffic would
> >> > go via one CPU port anyway.
> >> >
> >> >
> >> >
> >> > One potencial problem that I see with using LAGs for aggregating CPU
> >> > ports on mv88e6xxx is how these switches determine the port for a
> >> > packet: only the src and dst MAC address is used for the hash that
> >> > chooses the port.
> >> >
> >> > The most common scenario for Turris Omnia, for example, where we have 2
> >> > CPU ports and 5 user ports, is that into these 5 user ports the user
> >> > plugs 5 simple devices (no switches, so only one peer MAC address for
> >> > port). So we have only 5 pairs of src + dst MAC addresses. If we simply
> >> > fill the LAG table as it is done now, then there is 2 * 0.5^5 = 1/16
> >> > chance that all packets would go through one CPU port.
> >> >
> >> > In order to have real load balancing in this scenario, we would either
> >> > have to recompute the LAG mask table depending on the MAC addresses, or
> >> > rewrite the LAG mask table somewhat randomly periodically. (This could
> >> > be in theory offloaded onto the Z80 internal CPU for some of the
> >> > switches of the mv88e6xxx family, but not for Omnia.)
> >>
> >> I thought that the option to associate each port netdev with a DSA
> >> master would only be used on transmit. Are you saying that there is a
> >> way to configure an mv88e6xxx chip to steer packets to different CPU
> >> ports depending on the incoming port?
> >>
> >> The reason that the traffic is directed towards the CPU is that some
> >> kind of entry in the ATU says so, and the destination of that entry will
> >> either be a port vector or a LAG. Of those two, only the LAG will offer
> >> any kind of balancing. What am I missing?
> >
> > Via port vectors you can "load balance" by ports only, i.e. input port X
> > -> trasmit via CPU port Y.
>
> How is this done? In a case where there is no bridging between the
> ports, then I understand. Each port could have its own FID. But if you
> have this setup...
>
> br0 wan
> / \
> lan0 lan1
>
> lan0 and lan1 would use the same FID. So how could you say that frames
> from lan0 should go to cpu0 and frames from lan1 should go to cpu1 if
> the DA is the same? What would be the content of the ATU in a setup
> like that?
>
> > When using LAGs, you are load balancing via hash(src MAC | dst mac)
> > only. This is better in some ways. But what I am saying is that if the
> > LAG mask table is static, as it is now implemented in mv88e6xxx code,
> > then for many scenarios there is a big probability of no load balancing
> > at all. For Turris Omnia for example there is 6.25% probability that
> > the switch chip will send all traffic to the CPU via one CPU port.
> > This is because the switch chooses the LAG port only from the hash of
> > dst+src MAC address. (By the 1/16 = 6.25% probability I mean that for
> > cca 1 in 16 customers, the switch would only use one port when sending
> > data to the CPU).
> >
> > The round robin solution here is therefore better in this case.
>
> I agree that it would be better in that case. I just do not get how you
> get the switch to do it for you.
I thought that this is configured in the mv88e6xxx_port_vlan() function.
For each port, you specify via which ports data can egress.
So for ports 0, 2, 4 you can enable CPU port 0, and for ports 1 and 3
CPU port 1.
Am I wrong? I confess that I did not understand this into the most fine
details, so it is entirely possible that I am missing something
important and am completely wrong. Maybe this cannot be done.
Marek
next prev parent reply other threads:[~2021-04-12 22:55 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-10 13:34 [PATCH RFC net-next 0/3] Multi-CPU DSA support Ansuel Smith
2021-04-10 13:34 ` [PATCH RFC net-next 1/3] net: dsa: allow for multiple CPU ports Ansuel Smith
2021-04-12 3:35 ` DENG Qingfang
2021-04-12 4:41 ` Ansuel Smith
2021-04-12 15:30 ` DENG Qingfang
2021-04-12 16:17 ` Frank Wunderlich
2021-04-10 13:34 ` [PATCH RFC net-next 2/3] net: add ndo for setting the iflink property Ansuel Smith
2021-04-10 13:34 ` [PATCH RFC net-next 3/3] net: dsa: implement ndo_set_netlink for chaning port's CPU port Ansuel Smith
2021-04-10 13:34 ` [PATCH RFC iproute2-next] iplink: allow to change iplink value Ansuel Smith
2021-04-11 17:04 ` Stephen Hemminger
2021-04-11 17:09 ` Vladimir Oltean
2021-04-11 18:01 ` [PATCH RFC net-next 0/3] Multi-CPU DSA support Marek Behun
2021-04-11 18:08 ` Ansuel Smith
2021-04-11 18:39 ` Andrew Lunn
2021-04-12 2:07 ` Florian Fainelli
2021-04-12 4:53 ` Ansuel Smith
2021-04-11 18:50 ` Vladimir Oltean
2021-04-11 23:53 ` Vladimir Oltean
2021-04-12 2:10 ` Florian Fainelli
2021-04-12 5:04 ` Ansuel Smith
2021-04-12 12:46 ` Tobias Waldekranz
2021-04-12 14:35 ` Vladimir Oltean
2021-04-12 21:06 ` Tobias Waldekranz
2021-04-12 19:30 ` Marek Behun
2021-04-12 21:22 ` Tobias Waldekranz
2021-04-12 21:34 ` Vladimir Oltean
2021-04-12 21:49 ` Tobias Waldekranz
2021-04-12 21:56 ` Marek Behun
2021-04-12 22:06 ` Vladimir Oltean
2021-04-12 22:26 ` Tobias Waldekranz
2021-04-12 22:48 ` Vladimir Oltean
2021-04-12 23:04 ` Marek Behun
2021-04-12 21:50 ` Marek Behun
2021-04-12 22:05 ` Tobias Waldekranz
2021-04-12 22:55 ` Marek Behun [this message]
2021-04-12 23:09 ` Tobias Waldekranz
2021-04-12 23:13 ` Tobias Waldekranz
2021-04-12 23:54 ` Marek Behun
2021-04-13 0:27 ` Marek Behun
2021-04-13 0:31 ` Marek Behun
2021-04-13 14:46 ` Tobias Waldekranz
2021-04-13 15:14 ` Marek Behun
2021-04-13 18:16 ` Tobias Waldekranz
2021-04-14 15:14 ` Marek Behun
2021-04-14 18:39 ` Tobias Waldekranz
2021-04-14 23:39 ` Vladimir Oltean
2021-04-15 9:20 ` Tobias Waldekranz
2021-04-13 14:40 ` Tobias Waldekranz
2021-04-12 15:00 ` DENG Qingfang
2021-04-12 16:32 ` Vladimir Oltean
2021-04-12 22:04 ` Marek Behun
2021-04-12 22:17 ` Vladimir Oltean
2021-04-12 22:47 ` Marek Behun
-- strict thread matches above, loose matches on Subject: below --
2019-08-24 2:42 Marek Behún
2019-08-24 15:24 ` Andrew Lunn
2019-08-24 17:45 ` Marek Behun
2019-08-24 17:54 ` Andrew Lunn
2019-08-25 4:19 ` Marek Behun
2019-08-24 15:40 ` Vladimir Oltean
2019-08-24 15:44 ` Vladimir Oltean
2019-08-24 17:55 ` Marek Behun
2019-08-24 15:56 ` Andrew Lunn
2019-08-24 17:58 ` Marek Behun
2019-08-24 20:04 ` Florian Fainelli
2019-08-24 21:01 ` Marek Behun
2019-08-25 4:08 ` Marek Behun
2019-08-25 7:13 ` Marek Behun
2019-08-25 15:00 ` Florian Fainelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210413005518.2f9b9cef@thinkpad \
--to=marek.behun@nic.cz \
--cc=andrew@lunn.ch \
--cc=andriin@fb.com \
--cc=ansuelsmth@gmail.com \
--cc=ap420073@gmail.com \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=chenweilong@huawei.com \
--cc=cong.wang@bytedance.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=f.fainelli@gmail.com \
--cc=kuba@kernel.org \
--cc=laniel_francis@privacyrequired.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=olteanv@gmail.com \
--cc=roopa@cumulusnetworks.com \
--cc=tobias@waldekranz.com \
--cc=vivien.didelot@gmail.com \
--cc=weiwan@google.com \
--cc=zhangkaiheb@126.com \
--cc=zhudi21@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).