From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.7 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 094D2C43331 for ; Thu, 26 Mar 2020 11:54:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C34082070A for ; Thu, 26 Mar 2020 11:54:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="O84MUaQI" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728173AbgCZLyk (ORCPT ); Thu, 26 Mar 2020 07:54:40 -0400 Received: from new1-smtp.messagingengine.com ([66.111.4.221]:39725 "EHLO new1-smtp.messagingengine.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728165AbgCZLyk (ORCPT ); Thu, 26 Mar 2020 07:54:40 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailnew.nyi.internal (Postfix) with ESMTP id 826905800DC; Thu, 26 Mar 2020 07:54:39 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Thu, 26 Mar 2020 07:54:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; bh=vv/71j l+a2CLiYecoq14uU/0NIJhsJ1SlE/jdtwDdHw=; b=O84MUaQIHuJsHg81JyS7Ae 1M/UNVeUZz1Xb9E0jA6WDifbnUJ7fCpBrL0gF7c1ciPb/+KXl+hgnj35KxL+h8aO tTpMprMQyLdu5gpNDXmleFxaPT1tfdN0Uuj4tefMT8sKLmA3fOcm2rxzfxLMyoOX EbKca3spOiH/IWE1Zewshjcf4C6RmvLFIheO5NLioG9237ZaQ9amufvYKjYKeAWV 9vhrM9p3M8Ih8cQI5jBCEOuuwZfmExHCw79euiB6UmUwKphy2O94zbL/bUq3h1/a 1enJU8cevvdg/W/dqnZ6dlUzk7rIIi8n+6fECmjYBvFt4x/uHq3HsywFvjPtjmZg == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedugedrudehiedgfeefucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvffukfhfgggtuggjsehttdertddttddvnecuhfhrohhmpefkughoucfu tghhihhmmhgvlhcuoehiughoshgthhesihguohhstghhrdhorhhgqeenucffohhmrghinh epohiilhgrsghsrdhorhhgnecukfhppeejledrudekuddrudefvddrudeludenucevlhhu shhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehiughoshgthhesih guohhstghhrdhorhhg X-ME-Proxy: Received: from localhost (bzq-79-181-132-191.red.bezeqint.net [79.181.132.191]) by mail.messagingengine.com (Postfix) with ESMTPA id 2992E3069945; Thu, 26 Mar 2020 07:54:37 -0400 (EDT) Date: Thu, 26 Mar 2020 13:54:35 +0200 From: Ido Schimmel To: Vladimir Oltean Cc: Andrew Lunn , Florian Fainelli , Vivien Didelot , "David S. Miller" , Jakub Kicinski , murali.policharla@broadcom.com, Stephen Hemminger , Jiri Pirko , Jakub Kicinski , Nikolay Aleksandrov , netdev Subject: Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath Message-ID: <20200326115435.GA1385597@splinter> References: <20200325152209.3428-1-olteanv@gmail.com> <20200325152209.3428-11-olteanv@gmail.com> <20200326101752.GA1362955@splinter> <20200326113542.GA1383155@splinter> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Thu, Mar 26, 2020 at 01:44:51PM +0200, Vladimir Oltean wrote: > On Thu, 26 Mar 2020 at 13:35, Ido Schimmel wrote: > > > > On Thu, Mar 26, 2020 at 12:25:20PM +0200, Vladimir Oltean wrote: > > > Hi Ido, > > > > > > On Thu, 26 Mar 2020 at 12:17, Ido Schimmel wrote: > > > > > > > > Hi Vladimir, > > > > > > > > On Wed, Mar 25, 2020 at 05:22:09PM +0200, Vladimir Oltean wrote: > > > > > From: Vladimir Oltean > > > > > > > > > > In the initial attempt to add MTU configuration for DSA: > > > > > > > > > > https://patchwork.ozlabs.org/cover/1199868/ > > > > > > > > > > Florian raised a concern about the bridge MTU normalization logic (when > > > > > you bridge an interface with MTU 9000 and one with MTU 1500). His > > > > > expectation was that the bridge would automatically change the MTU of > > > > > all its slave ports to the minimum MTU, if those slaves are part of the > > > > > same hardware bridge. However, it doesn't do that, and for good reason, > > > > > I think. What br_mtu_auto_adjust() does is it adjusts the MTU of the > > > > > bridge net device itself, and not that of any slave port. If it were to > > > > > modify the MTU of the slave ports, the effect would be that the user > > > > > wouldn't be able to increase the MTU of any bridge slave port as long as > > > > > it was part of the bridge, which would be a bit annoying to say the > > > > > least. > > > > > > > > > > The idea behind this behavior is that normal termination from Linux over > > > > > the L2 forwarding domain described by DSA should happen over the bridge > > > > > net device, which _is_ properly limited by the minimum MTU. And > > > > > termination over individual slave device is possible even if those are > > > > > bridged. But that is not "forwarding", so there's no reason to do > > > > > normalization there, since only a single interface sees that packet. > > > > > > > > > > The real problem is with the offloaded data path, where of course, the > > > > > bridge net device MTU is ignored. So a packet received on an interface > > > > > with MTU 9000 would still be forwarded to an interface with MTU 1500. > > > > > And that is exactly what this patch is trying to prevent from happening. > > > > > > > > How is that different from the software data path where the CPU needs to > > > > forward the packet between port A with MTU X and port B with MTU X/2 ? > > > > > > > > I don't really understand what problem you are trying to solve here. It > > > > seems like the user did some misconfiguration and now you're introducing > > > > a policy to mitigate it? If so, it should be something the user can > > > > disable. It also seems like something that can be easily handled by a > > > > user space application. You get netlink notifications for all these > > > > operations. > > > > > > > > > > Actually I think the problem can be better understood if I explain > > > what the switches I'm dealing with look like. > > > None of them really has a 'MTU' register. They perform length-based > > > admission control on RX. > > > > IIUC, by that you mean that these switches only perform length-based > > filtering on RX, but not on TX? > > > > Yes. > > > > At this moment in time I don't think anybody wants to introduce an MRU > > > knob in iproute2, so we're adjusting that maximum ingress length > > > through the MTU. But it becomes an inverted problem, since the 'MTU' > > > needs to be controlled for all possible sources of traffic that are > > > going to egress on this port, in order for the real MTU on the port > > > itself to be observed. > > > > Looking at your example from the changelog: > > > > ip link set dev sw0p0 master br0 > > ip link set dev sw0p1 mtu 1400 > > ip link set dev sw0p1 master br0 > > > > Without your patch, after these commands sw0p0 has an MTU of 1500 and > > sw0p1 has an MTU of 1400. Are you saying that a frame with a length of > > 1450 bytes received on sw0p0 will be able to egress sw0p1 (assuming it > > should be forwarded there)? > > > > Yes. > > > If so, then I think I understand the problem. However, I don't think > > such code belongs in the bridge driver as this restriction does not > > apply to all switches. > > How do Mellanox switches deal with this? Frames will be discarded on the egress of sw0p1. > > > Also, I think that having the kernel change MTU > > of port A following MTU change of port B is a bit surprising and not > > intuitive. > > > > It already changes the MTU of br0, this just goes along the same path. Yea, but this is an established behavior already. And it applies regardless if the data path is offloaded or not, unlike this change. > > I think you should be more explicit about it. Did you consider listening > > to 'NETDEV_PRECHANGEMTU' notifications in relevant drivers and vetoing > > unsupported configurations with an appropriate extack message? If you > > can't veto (in order not to break user space), you can still emit an > > extack message. > > I suppose that is an alternative approach. This would be done from the > DSA core then? But instead of veto, just do the normalization thing. Not really my call, but I think the veto is better because you are being explicit about it and informing the user with an appropriate message. > > Thanks, > -Vladimir