netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] ipvlan: disallow userns cap_net_admin to change global mode/flags
@ 2019-02-19 23:15 Daniel Borkmann
  2019-02-20 18:40 ` Mahesh Bandewar (महेश बंडेवार)
  2019-02-22 19:28 ` David Miller
  0 siblings, 2 replies; 3+ messages in thread
From: Daniel Borkmann @ 2019-02-19 23:15 UTC (permalink / raw)
  To: davem; +Cc: maheshb, m, netdev, Daniel Borkmann

When running Docker with userns isolation e.g. --userns-remap="default"
and spawning up some containers with CAP_NET_ADMIN under this realm, I
noticed that link changes on ipvlan slave device inside that container
can affect all devices from this ipvlan group which are in other net
namespaces where the container should have no permission to make changes
to, such as the init netns, for example.

This effectively allows to undo ipvlan private mode and switch globally to
bridge mode where slaves can communicate directly without going through
hostns, or it allows to switch between global operation mode (l2/l3/l3s)
for everyone bound to the given ipvlan master device. libnetwork plugin
here is creating an ipvlan master and ipvlan slave in hostns and a slave
each that is moved into the container's netns upon creation event.

* In hostns:

  # ip -d a
  [...]
  8: cilium_host@bond0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
     link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
     ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
     inet 10.41.0.1/32 scope link cilium_host
       valid_lft forever preferred_lft forever
  [...]

* Spawn container & change ipvlan mode setting inside of it:

  # docker run -dt --cap-add=NET_ADMIN --network cilium-net --name client -l app=test cilium/netperf
  9fff485d69dcb5ce37c9e33ca20a11ccafc236d690105aadbfb77e4f4170879c

  # docker exec -ti client ip -d a
  [...]
  10: cilium0@if4: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
      inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
         valid_lft forever preferred_lft forever

  # docker exec -ti client ip link change link cilium0 name cilium0 type ipvlan mode l2

  # docker exec -ti client ip -d a
  [...]
  10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
      inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
         valid_lft forever preferred_lft forever

* In hostns (mode switched to l2):

  # ip -d a
  [...]
  8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
      inet 10.41.0.1/32 scope link cilium_host
         valid_lft forever preferred_lft forever
  [...]

Same l3 -> l2 switch would also happen by creating another slave inside
the container's network namespace when specifying the existing cilium0
link to derive the actual (bond0) master:

  # docker exec -ti client ip link add link cilium0 name cilium1 type ipvlan mode l2

  # docker exec -ti client ip -d a
  [...]
  2: cilium1@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
  10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
      inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
         valid_lft forever preferred_lft forever

* In hostns:

  # ip -d a
  [...]
  8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
      ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
      inet 10.41.0.1/32 scope link cilium_host
         valid_lft forever preferred_lft forever
  [...]

One way to mitigate it is to check CAP_NET_ADMIN permissions of
the ipvlan master device's ns, and only then allow to change
mode or flags for all devices bound to it. Above two cases are
then disallowed after the patch.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 drivers/net/ipvlan/ipvlan_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 7cdac77..07e41c4 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -499,6 +499,8 @@ static int ipvlan_nl_changelink(struct net_device *dev,
 
 	if (!data)
 		return 0;
+	if (!ns_capable(dev_net(ipvlan->phy_dev)->user_ns, CAP_NET_ADMIN))
+		return -EPERM;
 
 	if (data[IFLA_IPVLAN_MODE]) {
 		u16 nmode = nla_get_u16(data[IFLA_IPVLAN_MODE]);
@@ -601,6 +603,8 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev,
 		struct ipvl_dev *tmp = netdev_priv(phy_dev);
 
 		phy_dev = tmp->phy_dev;
+		if (!ns_capable(dev_net(phy_dev)->user_ns, CAP_NET_ADMIN))
+			return -EPERM;
 	} else if (!netif_is_ipvlan_port(phy_dev)) {
 		/* Exit early if the underlying link is invalid or busy */
 		if (phy_dev->type != ARPHRD_ETHER ||
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net] ipvlan: disallow userns cap_net_admin to change global mode/flags
  2019-02-19 23:15 [PATCH net] ipvlan: disallow userns cap_net_admin to change global mode/flags Daniel Borkmann
@ 2019-02-20 18:40 ` Mahesh Bandewar (महेश बंडेवार)
  2019-02-22 19:28 ` David Miller
  1 sibling, 0 replies; 3+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2019-02-20 18:40 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: David Miller, m, linux-netdev

On Tue, Feb 19, 2019 at 3:38 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> When running Docker with userns isolation e.g. --userns-remap="default"
> and spawning up some containers with CAP_NET_ADMIN under this realm, I
> noticed that link changes on ipvlan slave device inside that container
> can affect all devices from this ipvlan group which are in other net
> namespaces where the container should have no permission to make changes
> to, such as the init netns, for example.
>
> This effectively allows to undo ipvlan private mode and switch globally to
> bridge mode where slaves can communicate directly without going through
> hostns, or it allows to switch between global operation mode (l2/l3/l3s)
> for everyone bound to the given ipvlan master device. libnetwork plugin
> here is creating an ipvlan master and ipvlan slave in hostns and a slave
> each that is moved into the container's netns upon creation event.
>
> * In hostns:
>
>   # ip -d a
>   [...]
>   8: cilium_host@bond0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
>      link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
>      ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
>      inet 10.41.0.1/32 scope link cilium_host
>        valid_lft forever preferred_lft forever
>   [...]
>
> * Spawn container & change ipvlan mode setting inside of it:
>
>   # docker run -dt --cap-add=NET_ADMIN --network cilium-net --name client -l app=test cilium/netperf
>   9fff485d69dcb5ce37c9e33ca20a11ccafc236d690105aadbfb77e4f4170879c
>
>   # docker exec -ti client ip -d a
>   [...]
>   10: cilium0@if4: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
>       link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
>       ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
>       inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
>          valid_lft forever preferred_lft forever
>
>   # docker exec -ti client ip link change link cilium0 name cilium0 type ipvlan mode l2
>
>   # docker exec -ti client ip -d a
>   [...]
>   10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
>       link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
>       ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
>       inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
>          valid_lft forever preferred_lft forever
>
> * In hostns (mode switched to l2):
>
>   # ip -d a
>   [...]
>   8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
>       link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
>       ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
>       inet 10.41.0.1/32 scope link cilium_host
>          valid_lft forever preferred_lft forever
>   [...]
>
> Same l3 -> l2 switch would also happen by creating another slave inside
> the container's network namespace when specifying the existing cilium0
> link to derive the actual (bond0) master:
>
>   # docker exec -ti client ip link add link cilium0 name cilium1 type ipvlan mode l2
>
>   # docker exec -ti client ip -d a
>   [...]
>   2: cilium1@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
>       link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
>       ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
>   10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
>       link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
>       ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
>       inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
>          valid_lft forever preferred_lft forever
>
> * In hostns:
>
>   # ip -d a
>   [...]
>   8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
>       link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
>       ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
>       inet 10.41.0.1/32 scope link cilium_host
>          valid_lft forever preferred_lft forever
>   [...]
>
> One way to mitigate it is to check CAP_NET_ADMIN permissions of
> the ipvlan master device's ns, and only then allow to change
> mode or flags for all devices bound to it. Above two cases are
> then disallowed after the patch.
thanks for the fix Daniel.
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Mahesh Bandewar <maheshb@google.com>
> ---
>  drivers/net/ipvlan/ipvlan_main.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> index 7cdac77..07e41c4 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -499,6 +499,8 @@ static int ipvlan_nl_changelink(struct net_device *dev,
>
>         if (!data)
>                 return 0;
> +       if (!ns_capable(dev_net(ipvlan->phy_dev)->user_ns, CAP_NET_ADMIN))
> +               return -EPERM;
>
>         if (data[IFLA_IPVLAN_MODE]) {
>                 u16 nmode = nla_get_u16(data[IFLA_IPVLAN_MODE]);
> @@ -601,6 +603,8 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev,
>                 struct ipvl_dev *tmp = netdev_priv(phy_dev);
>
>                 phy_dev = tmp->phy_dev;
> +               if (!ns_capable(dev_net(phy_dev)->user_ns, CAP_NET_ADMIN))
> +                       return -EPERM;
>         } else if (!netif_is_ipvlan_port(phy_dev)) {
>                 /* Exit early if the underlying link is invalid or busy */
>                 if (phy_dev->type != ARPHRD_ETHER ||
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net] ipvlan: disallow userns cap_net_admin to change global mode/flags
  2019-02-19 23:15 [PATCH net] ipvlan: disallow userns cap_net_admin to change global mode/flags Daniel Borkmann
  2019-02-20 18:40 ` Mahesh Bandewar (महेश बंडेवार)
@ 2019-02-22 19:28 ` David Miller
  1 sibling, 0 replies; 3+ messages in thread
From: David Miller @ 2019-02-22 19:28 UTC (permalink / raw)
  To: daniel; +Cc: maheshb, m, netdev

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 20 Feb 2019 00:15:30 +0100

> When running Docker with userns isolation e.g. --userns-remap="default"
> and spawning up some containers with CAP_NET_ADMIN under this realm, I
> noticed that link changes on ipvlan slave device inside that container
> can affect all devices from this ipvlan group which are in other net
> namespaces where the container should have no permission to make changes
> to, such as the init netns, for example.
> 
> This effectively allows to undo ipvlan private mode and switch globally to
> bridge mode where slaves can communicate directly without going through
> hostns, or it allows to switch between global operation mode (l2/l3/l3s)
> for everyone bound to the given ipvlan master device. libnetwork plugin
> here is creating an ipvlan master and ipvlan slave in hostns and a slave
> each that is moved into the container's netns upon creation event.
 ...
> One way to mitigate it is to check CAP_NET_ADMIN permissions of
> the ipvlan master device's ns, and only then allow to change
> mode or flags for all devices bound to it. Above two cases are
> then disallowed after the patch.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Applied and queued up for -stable.

Thanks.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-02-22 19:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-19 23:15 [PATCH net] ipvlan: disallow userns cap_net_admin to change global mode/flags Daniel Borkmann
2019-02-20 18:40 ` Mahesh Bandewar (महेश बंडेवार)
2019-02-22 19:28 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).