netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Harry Coin <hcoin@quietfountain.com>
To: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: andrew@lunn.ch, netdev@vger.kernel.org
Subject: Re: llc needs namespace awareness asap, was Re: Patch fixing STP if bridge in non-default namespace.
Date: Tue, 11 Jul 2023 17:44:20 -0500	[thread overview]
Message-ID: <0f531295-e289-022d-5add-5ceffa0df9bc@quietfountain.com> (raw)
In-Reply-To: <20230711215132.77594-1-kuniyu@amazon.com>


On 7/11/23 16:51, Kuniyuki Iwashima wrote:
> From: Harry Coin<hcoin@quietfountain.com>
> Date: Tue, 11 Jul 2023 16:40:03 -0500
>> On 7/11/23 15:44, Andrew Lunn wrote:
>>>>>>>> The current llc_rcv.c around line 166 in net/llc/llc_input.c  has
>>>>>>>>
>>>>>>>>             if (!net_eq(dev_net(dev), &init_net))
>>>>>>>>                     goto drop;
>>>>>>>>
>>>> Thank you!  When you offer your patches, and you hear worries about being
>>>> 'invasive', it's worth asking 'compared to what' -- since the 'status quo'
>>>> is every bridge with STP in a non default namespace with a loop in it
>>>> somewhere will freeze every connected system more solid than ice in
>>>> Antarctica.
>>> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
>>>
>>> say:
>>>
>>> o It must be obviously correct and tested.
>>> o It cannot be bigger than 100 lines, with context.
>>> o It must fix only one thing.
>>> o It must fix a real bug that bothers people (not a, "This could be a problem..." type thing).
>>>
>>> git blame shows:
>>>
>>> commit 721499e8931c5732202481ae24f2dfbf9910f129
>>> Author: YOSHIFUJI Hideaki<yoshfuji@linux-ipv6.org>
>>> Date:   Sat Jul 19 22:34:43 2008 -0700
>>>
>>>       netns: Use net_eq() to compare net-namespaces for optimization.
>>>
>>> diff --git a/net/llc/llc_input.c b/net/llc/llc_input.c
>>> index 1c45f172991e..57ad974e4d94 100644
>>> --- a/net/llc/llc_input.c
>>> +++ b/net/llc/llc_input.c
>>> @@ -150,7 +150,7 @@ int llc_rcv(struct sk_buff *skb, struct net_device *dev,
>>>           int (*rcv)(struct sk_buff *, struct net_device *,
>>>                      struct packet_type *, struct net_device *);
>>>    
>>> -       if (dev_net(dev) != &init_net)
>>> +       if (!net_eq(dev_net(dev), &init_net))
>>>                   goto drop;
>>>    
>>>           /*
>>>
>>> So this is just an optimization.
>>>
>>> The test itself was added in
>>>
>>> ommit e730c15519d09ea528b4d2f1103681fa5937c0e6
>>> Author: Eric W. Biederman<ebiederm@xmission.com>
>>> Date:   Mon Sep 17 11:53:39 2007 -0700
>>>
>>>       [NET]: Make packet reception network namespace safe
>>>       
>>>       This patch modifies every packet receive function
>>>       registered with dev_add_pack() to drop packets if they
>>>       are not from the initial network namespace.
>>>       
>>>       This should ensure that the various network stacks do
>>>       not receive packets in a anything but the initial network
>>>       namespace until the code has been converted and is ready
>>>       for them.
>>>       
>>>       Signed-off-by: Eric W. Biederman<ebiederm@xmission.com>
>>>       Signed-off-by: David S. Miller<davem@davemloft.net>
>>>
>>> So that was over 15 years ago.
>>>
>>> It appears it has not bothered people for over 15 years.
>>>
>>> Lets wait until we get to see the actual fix. We can then decide how
>>> simple/hard it is to back port to stable, if it fulfils the stable
>>> rules or not.
>>>
>>>         Andrew
>> Andrew, fair enough.  In the time until it's fixed, the kernel folks
>> should publish an advisory and block any attempt to set bridge stp state
>> to other than 0 if in a non-default namespace. The alternative is a
>> packet flood at whatever the top line speed is should there be a loop
>> somewhere in even one connected link.
> Like this ?  Someone who didn't notice the issue might complain about
> it as regression.
>
> ---8<---
> diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
> index 75204d36d7f9..a807996ac56b 100644
> --- a/net/bridge/br_stp_if.c
> +++ b/net/bridge/br_stp_if.c
> @@ -201,6 +201,11 @@ int br_stp_set_enabled(struct net_bridge *br, unsigned long val,
>   {
>   	ASSERT_RTNL();
>   
> +	if (!net_eq(dev_net(br->dev), &init_net)) {
> +		NL_SET_ERR_MSG_MOD(extack, "STP can't be enabled in non-root netns");
> +		return -EINVAL;
> +	}
> +
>   	if (br_mrp_enabled(br)) {
>   		NL_SET_ERR_MSG_MOD(extack,
>   				   "STP can't be enabled if MRP is already enabled");
> ---8<---

Something like that, but to your point about regression -- It a 
statistically good bet there are many bridges with STP enabled in 
non-default name spaces that simply have no loops in L2 BUT these are 
'buried'  inside docker images or prepackaged setups that work 'just 
fine standalone' and also 'in lab namespaces (that just don't have L2 
loops...) and so that don't hit the bug.  These users are one "cable 
click to an open port already connected somewhere they can't see" away 
from bringing down every computer on their entire link (like me, been 
there, it's not a happy week for anyone...), they just don't know it.  
And their vendors 'trust STP, so that can't be it' --- but it is.

If the patch above gets installed-- then packagers downstream will have 
to respond with effort to add code to turn off STP if finding themselves 
in a namespace, and not if not.   They will be displeased at having to 
accommodate then de-accommodate when the fix lands.   As a 'usually 
downstream of the kernel' developer, I'd rather be warned than blocked.

Looking at those dates... wow!  I expect other os kernels and standalone 
switch vendors would see fixing this one as a removing a reliability 
advantage they've had for a long time.

Perhaps a broadcast advisory "Until this is fixed, your site will have a 
packet flood worse than an internal DDOS attack if there's a loop in a 
link layer and if even one docker image or prepackaged project uses a 
net bridge with STP enabled and is deployed in a non-default netns / net 
namespace.   Check with your package vendors if you're not sure.  You'll 
avoid this problem if your link layer layout chart is tree-and-branch 
without even one crosslink."   Yup, that'll be somewhat less than 
popular.  But better warned and awaiting a fix than blocked.

How hard can the fix be?  Instead of dropping the packet if in the 
non-default namespace, as each device is in a namespace it should be 
fine to pass the packet only to listeners in the same namespace as the 
device that received the packet.  Back in the day this code was written, 
it was probably 'hard to know' among the multicast subscribers what 
namespace they were in.

  I suspect the impact of this fix on existing code will be minor since 
the only effect will be packets appearing where they were expected 
before but not received.

Harry



-- 

  reply	other threads:[~2023-07-11 22:44 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-10 13:35 Patch fixing STP if bridge in non-default namespace Harry Coin
2023-07-11  3:22 ` Kuniyuki Iwashima
2023-07-11 17:08   ` llc needs namespace awareness asap, was " Harry Coin
2023-07-11 18:32     ` Kuniyuki Iwashima
2023-07-11 20:22       ` Harry Coin
2023-07-11 20:44         ` Andrew Lunn
2023-07-11 21:40           ` Harry Coin
2023-07-11 21:51             ` Kuniyuki Iwashima
2023-07-11 22:44               ` Harry Coin [this message]
2023-07-11 22:56                 ` Kuniyuki Iwashima
     [not found]               ` <b01e5af6-e397-486d-3428-6fa30a919042@quietfountain.com>
2023-07-12  0:55                 ` Andrew Lunn
2023-07-12  3:06                   ` Harry Coin
2023-07-13 22:37                     ` Stephen Hemminger
2023-07-12  9:44               ` Petr Machata
2023-07-12  0:49           ` Jakub Kicinski
2023-08-02  3:45 Hasenbosch, Samuel J
2023-08-02  3:50 ` Kuniyuki Iwashima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0f531295-e289-022d-5add-5ceffa0df9bc@quietfountain.com \
    --to=hcoin@quietfountain.com \
    --cc=andrew@lunn.ch \
    --cc=kuniyu@amazon.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).