From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97E74C433ED for ; Mon, 12 Apr 2021 21:57:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 76C446135C for ; Mon, 12 Apr 2021 21:57:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343566AbhDLV5V (ORCPT ); Mon, 12 Apr 2021 17:57:21 -0400 Received: from lists.nic.cz ([217.31.204.67]:58218 "EHLO mail.nic.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238235AbhDLV5U (ORCPT ); Mon, 12 Apr 2021 17:57:20 -0400 Received: from thinkpad (unknown [IPv6:2a0e:b107:ae1:0:3e97:eff:fe61:c680]) by mail.nic.cz (Postfix) with ESMTPSA id 370F11407CA; Mon, 12 Apr 2021 23:56:58 +0200 (CEST) Date: Mon, 12 Apr 2021 23:56:57 +0200 From: Marek Behun To: Tobias Waldekranz Cc: Vladimir Oltean , Ansuel Smith , netdev@vger.kernel.org, "David S. Miller" , Jakub Kicinski , Andrew Lunn , Vivien Didelot , Florian Fainelli , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Eric Dumazet , Wei Wang , Cong Wang , Taehee Yoo , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , zhang kai , Weilong Chen , Roopa Prabhu , Di Zhu , Francis Laniel , linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support Message-ID: <20210412235657.4cc201b5@thinkpad> In-Reply-To: <87zgy3jhr1.fsf@waldekranz.com> References: <20210410133454.4768-1-ansuelsmth@gmail.com> <20210411200135.35fb5985@thinkpad> <20210411185017.3xf7kxzzq2vefpwu@skbuf> <878s5nllgs.fsf@waldekranz.com> <20210412213045.4277a598@thinkpad> <8735vvkxju.fsf@waldekranz.com> <20210412213402.vwvon2fdtzf4hnrt@skbuf> <87zgy3jhr1.fsf@waldekranz.com> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: clamav-milter 0.102.2 at mail X-Virus-Status: Clean Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 12 Apr 2021 23:49:22 +0200 Tobias Waldekranz wrote: > On Tue, Apr 13, 2021 at 00:34, Vladimir Oltean wrote: > > On Mon, Apr 12, 2021 at 11:22:45PM +0200, Tobias Waldekranz wrote: > >> On Mon, Apr 12, 2021 at 21:30, Marek Behun wrote: > >> > On Mon, 12 Apr 2021 14:46:11 +0200 > >> > Tobias Waldekranz wrote: > >> > > >> >> I agree. Unless you only have a few really wideband flows, a LAG will > >> >> typically do a great job with balancing. This will happen without the > >> >> user having to do any configuration at all. It would also perform well > >> >> in "router-on-a-stick"-setups where the incoming and outgoing port is > >> >> the same. > >> > > >> > TLDR: The problem with LAGs how they are currently implemented is that > >> > for Turris Omnia, basically in 1/16 of configurations the traffic would > >> > go via one CPU port anyway. > >> > > >> > > >> > > >> > One potencial problem that I see with using LAGs for aggregating CPU > >> > ports on mv88e6xxx is how these switches determine the port for a > >> > packet: only the src and dst MAC address is used for the hash that > >> > chooses the port. > >> > > >> > The most common scenario for Turris Omnia, for example, where we have 2 > >> > CPU ports and 5 user ports, is that into these 5 user ports the user > >> > plugs 5 simple devices (no switches, so only one peer MAC address for > >> > port). So we have only 5 pairs of src + dst MAC addresses. If we simply > >> > fill the LAG table as it is done now, then there is 2 * 0.5^5 = 1/16 > >> > chance that all packets would go through one CPU port. > >> > > >> > In order to have real load balancing in this scenario, we would either > >> > have to recompute the LAG mask table depending on the MAC addresses, or > >> > rewrite the LAG mask table somewhat randomly periodically. (This could > >> > be in theory offloaded onto the Z80 internal CPU for some of the > >> > switches of the mv88e6xxx family, but not for Omnia.) > >> > >> I thought that the option to associate each port netdev with a DSA > >> master would only be used on transmit. Are you saying that there is a > >> way to configure an mv88e6xxx chip to steer packets to different CPU > >> ports depending on the incoming port? > >> > >> The reason that the traffic is directed towards the CPU is that some > >> kind of entry in the ATU says so, and the destination of that entry will > >> either be a port vector or a LAG. Of those two, only the LAG will offer > >> any kind of balancing. What am I missing? > >> > >> Transmit is easy; you are already in the CPU, so you can use an > >> arbitrarily fancy hashing algo/ebpf classifier/whatever to load balance > >> in that case. > > > > Say a user port receives a broadcast frame. Based on your understanding > > where user-to-CPU port assignments are used only for TX, which CPU port > > should be selected by the switch for this broadcast packet, and by which > > mechanism? > > AFAIK, the only option available to you (again, if there is no LAG set > up) is to statically choose one CPU port per entry. But hopefully Marek > can teach me some new tricks! > > So for any known (since the broadcast address is loaded in the ATU it is > known) destination (b/m/u-cast), you can only "load balance" based on > the DA. You would also have to make sure that unknown unicast and > unknown multicast is only allowed to egress one of the CPU ports. > > If you have a LAG OTOH, you could include all CPU ports in the port > vectors of those same entries. The LAG mask would then do the final > filtering so that you only send a single copy to the CPU. The problem is that when the mv88e6xxx switch chooses the LAG entry, it takes into account only hash(src MAC | dst MAC). There is no other option, Marvell switches are unable to take more information into this decision (for example hash of the IP + TCP/UDP header). And in many usecases, there are only a couple of this (src,dst) MAC pairs. On Turris Omnia in most cases there are only 5 such pairs, because the user plugs into the router only 5 devices. So for each of these 5 (src,dst) MAC pairs, there is probability 1/2 that the packet will be sent via CPU port 0. So 1/32 probability that all packets will be sent via CPU port 0, and the same probability that all packets will be sent via CPU port 1. This means that in 1/16 of cases the LAG is useless in this scenario.