All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns
@ 2018-03-15 16:35 Liran Alon
  2018-03-15 16:50 ` Shmulik Ladkani
  0 siblings, 1 reply; 31+ messages in thread
From: Liran Alon @ 2018-03-15 16:35 UTC (permalink / raw)
  To: shmulik.ladkani
  Cc: netdev, daniel, mrv, davem, linux-kernel, yuval.shaia, idan.brown


----- shmulik.ladkani@gmail.com wrote:

> On Thu, 15 Mar 2018 08:01:03 -0700 (PDT) Liran Alon
> <liran.alon@oracle.com> wrote:
> > 
> > I still think that default behavior should be to zero skb->mark only
> when skb
> > cross netdevs in different netns.
> 
> But the previous default was scrub the mark in *both* xnet and
> non-xnet
> situations.
> 
> Therefore, there might be users which RELY on this (strange) default
> behavior in their same-netns-veth-pair setups.
> Meaning, changing the default behavior might break their apps relying
> on
> the former default behavior.
> 
> This is why the "disable mark scrubbing in non-xnet case" should be
> opt-in.

We think the same.
The only difference is that I think this for now should be controllable
by a global /proc/sys/net/core file instead of giving a flexible per-netdev control.
Because that is a larger change that could be done later.

> 
> Regards,
> Shmulik

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns
@ 2018-03-15 17:14 Liran Alon
  2018-03-20 16:24 ` Eric W. Biederman
  0 siblings, 1 reply; 31+ messages in thread
From: Liran Alon @ 2018-03-15 17:14 UTC (permalink / raw)
  To: shmulik.ladkani
  Cc: netdev, mrv, daniel, davem, linux-kernel, yuval.shaia, idan.brown


----- shmulik.ladkani@gmail.com wrote:

> On Thu, 15 Mar 2018 09:35:51 -0700 (PDT) Liran Alon
> <liran.alon@oracle.com> wrote:
> > ----- shmulik.ladkani@gmail.com wrote:
> > 
> > > On Thu, 15 Mar 2018 08:01:03 -0700 (PDT) Liran Alon
> > > <liran.alon@oracle.com> wrote:  
> > > > 
> > > > I still think that default behavior should be to zero skb->mark
> only  
> > > when skb  
> > > > cross netdevs in different netns.  
> > > 
> > > But the previous default was scrub the mark in *both* xnet and
> > > non-xnet
> > > situations.
> > > 
> > > Therefore, there might be users which RELY on this (strange)
> default
> > > behavior in their same-netns-veth-pair setups.
> > > Meaning, changing the default behavior might break their apps
> relying
> > > on
> > > the former default behavior.
> > > 
> > > This is why the "disable mark scrubbing in non-xnet case" should
> be
> > > opt-in.  
> > 
> > We think the same.
> > The only difference is that I think this for now should be
> controllable
> > by a global /proc/sys/net/core file instead of giving a flexible
> per-netdev
> > control.
> > Because that is a larger change that could be done later.
> 
> A flags attribute to veth newlink is a very scoped change.
> User controls this per veth creation.
> This is way more neat than /proc/sys/net and provides the desired
> granular
> control.
> 
> Also, scoping this to veth has the advantage of not affecting the many
> other
> dev_forward_skb callers.

Agreed. But isn't this an issue also for the
many others (& future) callers of dev_forward_skb()?
This seems problematic to me.

This will kinda leave a kernel interface with broken default behavior
for backwards comparability.

A flag to netdev or /proc/sys/net/core to "fix" default behavior
will avoid this.

> 
> Regards,
> Shmulik

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns
@ 2018-03-15 15:05 Liran Alon
  0 siblings, 0 replies; 31+ messages in thread
From: Liran Alon @ 2018-03-15 15:05 UTC (permalink / raw)
  To: daniel
  Cc: netdev, shmulik.ladkani, mrv, davem, linux-kernel, yuval.shaia,
	idan.brown


----- daniel@iogearbox.net wrote:

> On 03/15/2018 03:35 PM, Roman Mashak wrote:
> > Liran Alon <liran.alon@oracle.com> writes:
> > [...]
> >>> Overall I think it might be nice to not need scrubbing skb in
> such
> >>> cases,
> >>> although my concern would be that this has potential to break
> >>> existing
> >>> setups when they would expect mark being zero on other veth peer
> in
> >>> any
> >>> case since it's the behavior for a long time already. The safer
> >>> option
> >>> would be to have some sort of explicit opt-in e.g. on link
> creation to
> >>> let
> >>> the skb->mark pass through unscrubbed. This would definitely be a
> >>> useful
> >>> option e.g. when mark is set in the netns facing veth via
> >>> clsact/egress
> >>> on xmit and when the container is unprivileged anyway.
> >>>
> >>> Thanks,
> >>> Daniel
> >>
> >> I see your point in regards to backwards comparability.
> >> However, not scrubbing skb when it cross netns via some kernel
> functions compared to
> >> others is basically a bug which could easily break with a little
> bit of more refactoring.
> >> Therefore, it seems a bit weird to me to from now on, we will
> force
> >> every user on link creation to consider that once there was a bug
> leading
> >> to this weird behavior on specific netdevs.
> 
> Why bug specifically? It could well be that for some unpriv
> containers
> it would be fine to do e.g. in cases where orchestrator sets up
> clsact/
> egress on veth/ipvlan/etc in the container to set the mark and where
> app
> cannot mess with this while for others you need to act out of host
> facing
> veth; thus, explicit opt-in per dev could provide such more fine
> grained
> control.
> 
> > One valid use case could be preserving a source namespace nsid in
> > skb->mark when a packet crosses netns.
> 
> Right, was thinking about something similar.

I agree with all the above but this behavior was not supported both
before and after this commit. skb->mark is always zeroed when crossing netns.
This commit only changes behavior for skb crossing netdevs on *same* netns
via dev_forward_skb().

Therefore, I believe we should discuss here what we want default behavior to be
and how it should be controlled for backwards comparability.
Only after we should discuss about adding an extra feature of controlling skb scrub
per netdev or something similar.

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns
@ 2018-03-15 15:01 Liran Alon
  2018-03-15 16:11 ` Shmulik Ladkani
  0 siblings, 1 reply; 31+ messages in thread
From: Liran Alon @ 2018-03-15 15:01 UTC (permalink / raw)
  To: mrv
  Cc: netdev, shmulik.ladkani, daniel, davem, linux-kernel,
	yuval.shaia, idan.brown


----- mrv@mojatatu.com wrote:

> Liran Alon <liran.alon@oracle.com> writes:
> 
> 
> [...]
> 
> >> Overall I think it might be nice to not need scrubbing skb in such
> >> cases,
> >> although my concern would be that this has potential to break
> >> existing
> >> setups when they would expect mark being zero on other veth peer
> in
> >> any
> >> case since it's the behavior for a long time already. The safer
> >> option
> >> would be to have some sort of explicit opt-in e.g. on link creation
> to
> >> let
> >> the skb->mark pass through unscrubbed. This would definitely be a
> >> useful
> >> option e.g. when mark is set in the netns facing veth via
> >> clsact/egress
> >> on xmit and when the container is unprivileged anyway.
> >> 
> >> Thanks,
> >> Daniel
> >
> > I see your point in regards to backwards comparability.
> > However, not scrubbing skb when it cross netns via some kernel
> functions compared to
> > others is basically a bug which could easily break with a little bit
> of more refactoring.
> > Therefore, it seems a bit weird to me to from now on, we will force
> > every user on link creation to consider that once there was a bug
> leading
> > to this weird behavior on specific netdevs.
> > Thus, I suggest to maybe control this via a global /proc/sys/net
> file instead.
> 
> One valid use case could be preserving a source namespace nsid in
> skb->mark when a packet crosses netns.

Before and after this commit, veth peers that crosses netns zero skb->mark.
Therefore, what you suggest is a new feature unrelated to the
issue fixed by this commit.

I still think that default behavior should be to zero skb->mark only when skb
cross netdevs in different netns. And for backwards comparability, we can
consider adding a /proc/sys/net/core file to let dev_forward_skb() to always
scrub packet regardless if it crosses netdevs or not.

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns
@ 2018-03-15 12:23 Liran Alon
  2018-03-15 14:35 ` Roman Mashak
  0 siblings, 1 reply; 31+ messages in thread
From: Liran Alon @ 2018-03-15 12:23 UTC (permalink / raw)
  To: daniel
  Cc: netdev, shmulik.ladkani, davem, linux-kernel, yuval.shaia, idan.brown


----- daniel@iogearbox.net wrote:

> On 03/15/2018 10:21 AM, Shmulik Ladkani wrote:
> > Regarding the premise of this commit, this "reduces" the
> > ipvs/orphan/mark scrubbing in the following *non* xnet situations:
> > 
> >  1. mac2vlan port xmit to other macvlan ports in Bridge Mode
> >  2. similarly for ipvlan
> >  3. veth xmit
> >  4. l2tp_eth_dev_recv
> >  5. bpf redirect/clone_redirect ingress actions
> > 
> > Regarding l2tp recv, this commit seems to align the srubbing
> behavior
> > with ip tunnels (full scrub only if crossing netns, see
> ip_tunnel_rcv).
> > 
> > Regarding veth xmit, it does makes sense to preserve the fields if
> not
> > crossing netns. This is also the case when one uses tc mirred.
> > 
> > Regarding bpf redirect, well, it depends on the expectations of each
> bpf
> > program.
> > I'd argue that preserving the fields (at least the mark field) in
> the
> > *non* xnet makes sense and provides more information and therefore
> more
> > capabilities; Alas this might change behavior already being relied
> on.
> > 
> > Maybe Daniel can comment on the matter.
> 
> Overall I think it might be nice to not need scrubbing skb in such
> cases,
> although my concern would be that this has potential to break
> existing
> setups when they would expect mark being zero on other veth peer in
> any
> case since it's the behavior for a long time already. The safer
> option
> would be to have some sort of explicit opt-in e.g. on link creation to
> let
> the skb->mark pass through unscrubbed. This would definitely be a
> useful
> option e.g. when mark is set in the netns facing veth via
> clsact/egress
> on xmit and when the container is unprivileged anyway.
> 
> Thanks,
> Daniel

I see your point in regards to backwards comparability.
However, not scrubbing skb when it cross netns via some kernel functions compared to
others is basically a bug which could easily break with a little bit of more refactoring.
Therefore, it seems a bit weird to me to from now on, we will force
every user on link creation to consider that once there was a bug leading
to this weird behavior on specific netdevs.
Thus, I suggest to maybe control this via a global /proc/sys/net file instead.

-Liran

^ permalink raw reply	[flat|nested] 31+ messages in thread
* Re: [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns
@ 2018-03-15 12:14 Liran Alon
  0 siblings, 0 replies; 31+ messages in thread
From: Liran Alon @ 2018-03-15 12:14 UTC (permalink / raw)
  To: shmulik.ladkani
  Cc: netdev, daniel, davem, linux-kernel, yuval.shaia, idan.brown


----- shmulik.ladkani@gmail.com wrote:

> Hi,
> 
> On Tue, 13 Mar 2018 17:07:22 +0200 Liran Alon <liran.alon@oracle.com>
> wrote:
> > Before this commit, dev_forward_skb() always cleared packet's
> > per-network-namespace info. Even if the packet doesn't cross
> > network namespaces.
> > 
> > The comment above dev_forward_skb() describes that this is done
> > because the receiving device may be in another network namespace.
> > However, this case can easily be tested for and therefore we can
> > scrub packet's per-network-namespace info only when receiving
> device
> > is indeed in another network namespace.
> > 
> > Therefore, this commit changes ____dev_forward_skb() to tell
> > skb_scrub_packet() that skb has crossed network-namespace only in
> case
> > transmitting device (skb->dev) network namespace is different then
> > receiving device (dev) network namespace.
> 
> Assuming the premise of this commit is correct, note it may not act
> as
> intended for xnet situation in ipvlan_process_multicast, snip:
> 
> 		nskb->dev = ipvlan->dev;
> 		if (tx_pkt)
> 			ret = dev_forward_skb(ipvlan->dev, nskb);
> 		else
> 			ret = netif_rx(nskb);
> 
> as 'dev' gets already assigned to nskb prior dev_forward_skb (hence
> in
> ____dev_forward_skb both dev and skb->dev are the same).
> Fortunately every ipvlan_multicast_enqueue call is preceded by a
> forced
> scrub; It would be future-proof to not assign nskb->dev in the
> dev_forward_skb case (assign it only in the netif_rx case).

I agree. Nice catch.
skb->dev will later get assigned in eth_type_trans() called from __dev_forward_skb().
I will do this small change in a separate patch before this patch.
(In v2 of this series)

> 
> 
> Regarding the premise of this commit, this "reduces" the
> ipvs/orphan/mark scrubbing in the following *non* xnet situations:
> 
>  1. mac2vlan port xmit to other macvlan ports in Bridge Mode
>  2. similarly for ipvlan
>  3. veth xmit
>  4. l2tp_eth_dev_recv
>  5. bpf redirect/clone_redirect ingress actions
> 
> Regarding l2tp recv, this commit seems to align the srubbing behavior
> with ip tunnels (full scrub only if crossing netns, see
> ip_tunnel_rcv).
> 
> Regarding veth xmit, it does makes sense to preserve the fields if
> not
> crossing netns. This is also the case when one uses tc mirred.
> 
> Regarding bpf redirect, well, it depends on the expectations of each
> bpf
> program.
> I'd argue that preserving the fields (at least the mark field) in the
> *non* xnet makes sense and provides more information and therefore
> more
> capabilities; Alas this might change behavior already being relied
> on.

I now noticed that a similar change was done in the past in
commit 8a83a00b0735 ("net: maintain namespace isolation
between vlan and real device"). Commit changed dev_forward_skb() from always
setting skb->mark=0 to do this only in case we cross netns.

However, a later commit: 59b9997baba5 ("Revert "net: maintain namespace
isolation between vlan and real device") seems to later reverted that change.
But I think that the regression issue mentioned in the revert isn't
related to the change proposed by this commit.
Please correct me if I'm missing something.

> 
> Maybe Daniel can comment on the matter.
> 
> Regards,
> Shmulik

^ permalink raw reply	[flat|nested] 31+ messages in thread
* [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns
@ 2018-03-13 15:07 Liran Alon
  2018-03-13 16:13 ` Yuval Shaia
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Liran Alon @ 2018-03-13 15:07 UTC (permalink / raw)
  To: davem, netdev, linux-kernel; +Cc: idan.brown, Liran Alon, Yuval Shaia

Before this commit, dev_forward_skb() always cleared packet's
per-network-namespace info. Even if the packet doesn't cross
network namespaces.

The comment above dev_forward_skb() describes that this is done
because the receiving device may be in another network namespace.
However, this case can easily be tested for and therefore we can
scrub packet's per-network-namespace info only when receiving device
is indeed in another network namespace.

Therefore, this commit changes ____dev_forward_skb() to tell
skb_scrub_packet() that skb has crossed network-namespace only in case
transmitting device (skb->dev) network namespace is different then
receiving device (dev) network namespace.

An example of a netdev that use skb_forward_skb() is veth.
Thus, before this commit a packet transmitted from one veth peer to
another when both veth peers are on same network namespace will lose
it's skb->mark. The bug could easily be demonstrated by the following:

ip netns add test
ip netns exec test bash
ip link add veth-a type veth peer name veth-b
ip link set veth-a up
ip link set veth-b up
ip addr add dev veth-a 12.0.0.1/24
tc qdisc add dev veth-a root handle 1 prio
tc qdisc add dev veth-b ingress
tc filter add dev veth-a parent 1: u32 match u32 0 0 action skbedit mark 1337
tc filter add dev veth-b parent ffff: basic match 'meta(nf_mark eq 1337)' action simple "skb->mark 1337!"
dmesg -C
ping 12.0.0.2
dmesg

Before this change, the above will print nothing to dmesg.
After this change, "skb->mark 1337!" will be printed as necessary.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
---
 include/linux/netdevice.h | 2 +-
 net/core/dev.c            | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5eef6c8e2741..5908f1e31ee2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3371,7 +3371,7 @@ static __always_inline int ____dev_forward_skb(struct net_device *dev,
 		return NET_RX_DROP;
 	}
 
-	skb_scrub_packet(skb, true);
+	skb_scrub_packet(skb, !net_eq(dev_net(dev), dev_net(skb->dev)));
 	skb->priority = 0;
 	return 0;
 }
diff --git a/net/core/dev.c b/net/core/dev.c
index 2cedf520cb28..087787dd0a50 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1877,9 +1877,9 @@ int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
  * start_xmit function of one device into the receive queue
  * of another device.
  *
- * The receiving device may be in another namespace, so
- * we have to clear all information in the skb that could
- * impact namespace isolation.
+ * The receiving device may be in another namespace.
+ * In that case, we have to clear all information in the
+ * skb that could impact namespace isolation.
  */
 int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-03-20 21:13 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-15 16:35 [PATCH] net: dev_forward_skb(): Scrub packet's per-netns info only when crossing netns Liran Alon
2018-03-15 16:50 ` Shmulik Ladkani
  -- strict thread matches above, loose matches on Subject: below --
2018-03-15 17:14 Liran Alon
2018-03-20 16:24 ` Eric W. Biederman
2018-03-20 16:44   ` Liran Alon
2018-03-20 17:07     ` Ben Greear
2018-03-20 18:35       ` Eric W. Biederman
2018-03-15 15:05 Liran Alon
2018-03-15 15:01 Liran Alon
2018-03-15 16:11 ` Shmulik Ladkani
2018-03-15 12:23 Liran Alon
2018-03-15 14:35 ` Roman Mashak
2018-03-15 14:53   ` Daniel Borkmann
2018-03-15 12:14 Liran Alon
2018-03-13 15:07 Liran Alon
2018-03-13 16:13 ` Yuval Shaia
2018-03-14 12:03   ` Yuval Shaia
2018-03-15  9:21 ` Shmulik Ladkani
2018-03-15 11:56   ` Daniel Borkmann
2018-03-15 12:50     ` Shmulik Ladkani
2018-03-15 15:13       ` Daniel Borkmann
2018-03-15 15:54         ` Shmulik Ladkani
2018-03-15 17:48           ` Daniel Borkmann
2018-03-20 14:47 ` David Miller
2018-03-20 15:34   ` Liran Alon
2018-03-20 16:00     ` David Miller
2018-03-20 16:11       ` Liran Alon
2018-03-20 16:34         ` David Miller
2018-03-20 16:39           ` Liran Alon
2018-03-20 18:51             ` valdis.kletnieks
2018-03-20 21:12               ` Liran Alon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.