[PATCH net] dev_forward_skb: do not scrub skb mark within the same name space

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH net] dev_forward_skb: do not scrub skb mark within the same name space
@ 2021-06-24  8:05 Nicolas Dichtel
  2021-06-24 12:16 ` Eyal Birger
  2021-06-25 18:20 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 6+ messages in thread
From: Nicolas Dichtel @ 2021-06-24  8:05 UTC (permalink / raw)
  To: davem, kuba; +Cc: netdev, Nicolas Dichtel

The goal is to keep the mark during a bpf_redirect(), like it is done for
legacy encapsulation / decapsulation, when there is no x-netns.
This was initially done in commit 213dd74aee76 ("skbuff: Do not scrub skb
mark within the same name space").

When the call to skb_scrub_packet() was added in dev_forward_skb() (commit
8b27f27797ca ("skb: allow skb_scrub_packet() to be used by tunnels")), the
second argument (xnet) was set to true to force a call to skb_orphan(). At
this time, the mark was always cleanned up by skb_scrub_packet(), whatever
xnet value was.
This call to skb_orphan() was removed later in commit
9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb.").
But this 'true' stayed here without any real reason.

Let's correctly set xnet in ____dev_forward_skb(), this function has access
to the previous interface and to the new interface.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/linux/netdevice.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5cbc950b34df..5ab2d1917ca1 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4114,7 +4114,7 @@ static __always_inline int ____dev_forward_skb(struct net_device *dev,
 		return NET_RX_DROP;
 	}

-	skb_scrub_packet(skb, true);
+	skb_scrub_packet(skb, !net_eq(dev_net(dev), dev_net(skb->dev)));
 	skb->priority = 0;
 	return 0;
 }
-- 
2.30.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] dev_forward_skb: do not scrub skb mark within the same name space
  2021-06-24  8:05 [PATCH net] dev_forward_skb: do not scrub skb mark within the same name space Nicolas Dichtel
@ 2021-06-24 12:16 ` Eyal Birger
  2021-06-24 15:26   ` Nicolas Dichtel
  2021-06-25 18:20 ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 6+ messages in thread
From: Eyal Birger @ 2021-06-24 12:16 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netdev, davem, kuba, liran.alon, shmulik.ladkani, daniel

Hi Nicholas,

On 24/06/2021 11:05, Nicolas Dichtel wrote:
> The goal is to keep the mark during a bpf_redirect(), like it is done for
> legacy encapsulation / decapsulation, when there is no x-netns.
> This was initially done in commit 213dd74aee76 ("skbuff: Do not scrub skb
> mark within the same name space").
> 
> When the call to skb_scrub_packet() was added in dev_forward_skb() (commit
> 8b27f27797ca ("skb: allow skb_scrub_packet() to be used by tunnels")), the
> second argument (xnet) was set to true to force a call to skb_orphan(). At
> this time, the mark was always cleanned up by skb_scrub_packet(), whatever
> xnet value was.
> This call to skb_orphan() was removed later in commit
> 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb.").
> But this 'true' stayed here without any real reason.
> 
> Let's correctly set xnet in ____dev_forward_skb(), this function has access
> to the previous interface and to the new interface.

This change was suggested in the past [1] and I think one of the main 
concerns was breaking existing callers which assume the mark would be 
cleared [2].

Personally, I think the suggestion made in [3] adding a flag to 
bpf_redirect() makes a lot of sense for this use case.

Eyal.

[1] 
https://lore.kernel.org/netdev/1520953642-8145-1-git-send-email-liran.alon@oracle.com/

[2] https://lore.kernel.org/netdev/20180315112150.58586758@halley/

[3] 
https://lore.kernel.org/netdev/cd0b73e3-2cde-1442-4312-566c69571e8a@iogearbox.net/


> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
>   include/linux/netdevice.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 5cbc950b34df..5ab2d1917ca1 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -4114,7 +4114,7 @@ static __always_inline int ____dev_forward_skb(struct net_device *dev,
>   		return NET_RX_DROP;
>   	}
>   
> -	skb_scrub_packet(skb, true);
> +	skb_scrub_packet(skb, !net_eq(dev_net(dev), dev_net(skb->dev)));
>   	skb->priority = 0;
>   	return 0;
>   }
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] dev_forward_skb: do not scrub skb mark within the same name space
  2021-06-24 12:16 ` Eyal Birger
@ 2021-06-24 15:26   ` Nicolas Dichtel
  2021-06-25  8:45     ` Eyal Birger
  0 siblings, 1 reply; 6+ messages in thread
From: Nicolas Dichtel @ 2021-06-24 15:26 UTC (permalink / raw)
  To: Eyal Birger; +Cc: netdev, davem, kuba, liran.alon, shmulik.ladkani, daniel

Hi Eyal,

Le 24/06/2021 à 14:16, Eyal Birger a écrit :
> Hi Nicholas,
> 
> On 24/06/2021 11:05, Nicolas Dichtel wrote:
>> The goal is to keep the mark during a bpf_redirect(), like it is done for
>> legacy encapsulation / decapsulation, when there is no x-netns.
>> This was initially done in commit 213dd74aee76 ("skbuff: Do not scrub skb
>> mark within the same name space").
>>
>> When the call to skb_scrub_packet() was added in dev_forward_skb() (commit
>> 8b27f27797ca ("skb: allow skb_scrub_packet() to be used by tunnels")), the
>> second argument (xnet) was set to true to force a call to skb_orphan(). At
>> this time, the mark was always cleanned up by skb_scrub_packet(), whatever
>> xnet value was.
>> This call to skb_orphan() was removed later in commit
>> 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb.").
>> But this 'true' stayed here without any real reason.
>>
>> Let's correctly set xnet in ____dev_forward_skb(), this function has access
>> to the previous interface and to the new interface.
> 
> This change was suggested in the past [1] and I think one of the main concerns
> was breaking existing callers which assume the mark would be cleared [2].
Thank you for the pointers!

> 
> Personally, I think the suggestion made in [3] adding a flag to bpf_redirect()
> makes a lot of sense for this use case.
I began with this approach, but actually, as I tried to explain in the commit
log, this looks more like a bug. This function is called almost everywhere in
the kernel (except for openvswitch and wireguard) with the xnet argument
reflecting a netns change. In other words, the behavior is different between
legacy encapsulation and the one made with ebpf :/


Regards,
Nicolas

> 
> Eyal.
> 
> [1]
> https://lore.kernel.org/netdev/1520953642-8145-1-git-send-email-liran.alon@oracle.com/
> 
> 
> [2] https://lore.kernel.org/netdev/20180315112150.58586758@halley/
> 
> [3]
> https://lore.kernel.org/netdev/cd0b73e3-2cde-1442-4312-566c69571e8a@iogearbox.net/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] dev_forward_skb: do not scrub skb mark within the same name space
  2021-06-24 15:26   ` Nicolas Dichtel
@ 2021-06-25  8:45     ` Eyal Birger
  2021-06-25 14:36       ` Nicolas Dichtel
  0 siblings, 1 reply; 6+ messages in thread
From: Eyal Birger @ 2021-06-25  8:45 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, davem, kuba, liran.alon, shmulik.ladkani, daniel



On 24/06/2021 18:26, Nicolas Dichtel wrote:
> Hi Eyal,
> 
> Le 24/06/2021 à 14:16, Eyal Birger a écrit :
>> Hi Nicholas,
>>
>> On 24/06/2021 11:05, Nicolas Dichtel wrote:
>>> The goal is to keep the mark during a bpf_redirect(), like it is done for
>>> legacy encapsulation / decapsulation, when there is no x-netns.
>>> This was initially done in commit 213dd74aee76 ("skbuff: Do not scrub skb
>>> mark within the same name space").
>>>
>>> When the call to skb_scrub_packet() was added in dev_forward_skb() (commit
>>> 8b27f27797ca ("skb: allow skb_scrub_packet() to be used by tunnels")), the
>>> second argument (xnet) was set to true to force a call to skb_orphan(). At
>>> this time, the mark was always cleanned up by skb_scrub_packet(), whatever
>>> xnet value was.
>>> This call to skb_orphan() was removed later in commit
>>> 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb.").
>>> But this 'true' stayed here without any real reason.
>>>
>>> Let's correctly set xnet in ____dev_forward_skb(), this function has access
>>> to the previous interface and to the new interface.
>>
>> This change was suggested in the past [1] and I think one of the main concerns
>> was breaking existing callers which assume the mark would be cleared [2].
> Thank you for the pointers!
> 
>>
>> Personally, I think the suggestion made in [3] adding a flag to bpf_redirect()
>> makes a lot of sense for this use case.
> I began with this approach, but actually, as I tried to explain in the commit
> log, this looks more like a bug. This function is called almost everywhere in
> the kernel (except for openvswitch and wireguard) with the xnet argument
> reflecting a netns change. In other words, the behavior is different between
> legacy encapsulation and the one made with ebpf :/
> 

I agree, and was also surprised that ebpf redirection scrubs the mark in 
the same ns - and only on ingress! - so I think keeping the mark should 
have been the default behavior. As noted in the thread though it is not 
clear whether this would break existing deployments both for ebpf and 
veth pairs on the same ns.

Eyal.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] dev_forward_skb: do not scrub skb mark within the same name space
  2021-06-25  8:45     ` Eyal Birger
@ 2021-06-25 14:36       ` Nicolas Dichtel
  0 siblings, 0 replies; 6+ messages in thread
From: Nicolas Dichtel @ 2021-06-25 14:36 UTC (permalink / raw)
  To: Eyal Birger; +Cc: netdev, davem, kuba, liran.alon, shmulik.ladkani, daniel

Le 25/06/2021 à 10:45, Eyal Birger a écrit :
> 
> 
> On 24/06/2021 18:26, Nicolas Dichtel wrote:
>> Hi Eyal,
>>
>> Le 24/06/2021 à 14:16, Eyal Birger a écrit :
>>> Hi Nicholas,
>>>
>>> On 24/06/2021 11:05, Nicolas Dichtel wrote:
>>>> The goal is to keep the mark during a bpf_redirect(), like it is done for
>>>> legacy encapsulation / decapsulation, when there is no x-netns.
>>>> This was initially done in commit 213dd74aee76 ("skbuff: Do not scrub skb
>>>> mark within the same name space").
>>>>
>>>> When the call to skb_scrub_packet() was added in dev_forward_skb() (commit
>>>> 8b27f27797ca ("skb: allow skb_scrub_packet() to be used by tunnels")), the
>>>> second argument (xnet) was set to true to force a call to skb_orphan(). At
>>>> this time, the mark was always cleanned up by skb_scrub_packet(), whatever
>>>> xnet value was.
>>>> This call to skb_orphan() was removed later in commit
>>>> 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb.").
>>>> But this 'true' stayed here without any real reason.
>>>>
>>>> Let's correctly set xnet in ____dev_forward_skb(), this function has access
>>>> to the previous interface and to the new interface.
>>>
>>> This change was suggested in the past [1] and I think one of the main concerns
>>> was breaking existing callers which assume the mark would be cleared [2].
>> Thank you for the pointers!
>>
>>>
>>> Personally, I think the suggestion made in [3] adding a flag to bpf_redirect()
>>> makes a lot of sense for this use case.
>> I began with this approach, but actually, as I tried to explain in the commit
>> log, this looks more like a bug. This function is called almost everywhere in
>> the kernel (except for openvswitch and wireguard) with the xnet argument
>> reflecting a netns change. In other words, the behavior is different between
>> legacy encapsulation and the one made with ebpf :/
>>
> 
> I agree, and was also surprised that ebpf redirection scrubs the mark in the
> same ns - and only on ingress! - so I think keeping the mark should have been
Yes, there is also this asymmetry :/

> the default behavior. As noted in the thread though it is not clear whether this
> would break existing deployments both for ebpf and veth pairs on the same ns.
To summarize:
 - it's not consistent between legacy tunnels and ebpf;
 - it's not consistent between ingress and egress;
 - it was already fixed in the past for legacy tunnel (commit 213dd74aee76);
 - this kind of patch has already been done in the past, see commit
   + 963a88b31ddb ("tunnels: harmonize cleanup done on skb on xmit path");
   + ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path");
 - nobody stated (on netdev at least) that he relied on this behavior.

For all those reasons, I'm inclined to think that this patch is acceptable.
Obviously, it's not my decision ;-)

Note also that clearing the mark is possible directly in the ebpf prog,
preserving it not. It means that, in the worst case of someone relying on this,
the bug can be fixed without patching the kernel. A trade-off could be to reset
the mark in the veth case only.

Side note: the mark was not cleared (for veth) between 2.6.34 and 3.4, see
commit 59b9997baba5 ("Revert "net: maintain namespace isolation between vlan and
real device"") and the revert was not done due to this ;-)

Comments are welcomed.

Regards,
Nicolas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] dev_forward_skb: do not scrub skb mark within the same name space
  2021-06-24  8:05 [PATCH net] dev_forward_skb: do not scrub skb mark within the same name space Nicolas Dichtel
  2021-06-24 12:16 ` Eyal Birger
@ 2021-06-25 18:20 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-06-25 18:20 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: davem, kuba, netdev

Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Thu, 24 Jun 2021 10:05:05 +0200 you wrote:
> The goal is to keep the mark during a bpf_redirect(), like it is done for
> legacy encapsulation / decapsulation, when there is no x-netns.
> This was initially done in commit 213dd74aee76 ("skbuff: Do not scrub skb
> mark within the same name space").
> 
> When the call to skb_scrub_packet() was added in dev_forward_skb() (commit
> 8b27f27797ca ("skb: allow skb_scrub_packet() to be used by tunnels")), the
> second argument (xnet) was set to true to force a call to skb_orphan(). At
> this time, the mark was always cleanned up by skb_scrub_packet(), whatever
> xnet value was.
> This call to skb_orphan() was removed later in commit
> 9c4c325252c5 ("skbuff: preserve sock reference when scrubbing the skb.").
> But this 'true' stayed here without any real reason.
> 
> [...]

Here is the summary with links:
  - [net] dev_forward_skb: do not scrub skb mark within the same name space
    https://git.kernel.org/netdev/net/c/ff70202b2d1a

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-06-25 18:20 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-24  8:05 [PATCH net] dev_forward_skb: do not scrub skb mark within the same name space Nicolas Dichtel
2021-06-24 12:16 ` Eyal Birger
2021-06-24 15:26   ` Nicolas Dichtel
2021-06-25  8:45     ` Eyal Birger
2021-06-25 14:36       ` Nicolas Dichtel
2021-06-25 18:20 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.