From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.5 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FROM_EXCESS_BASE64,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58F76C43381 for ; Wed, 20 Feb 2019 18:40:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 252322146E for ; Wed, 20 Feb 2019 18:40:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="toUktm00" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726430AbfBTSka (ORCPT ); Wed, 20 Feb 2019 13:40:30 -0500 Received: from mail-yb1-f196.google.com ([209.85.219.196]:40156 "EHLO mail-yb1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726316AbfBTSk2 (ORCPT ); Wed, 20 Feb 2019 13:40:28 -0500 Received: by mail-yb1-f196.google.com with SMTP id k2so5519034ybk.7 for ; Wed, 20 Feb 2019 10:40:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=7KrJPJUDcpekJYE1cTv2gMXN75mrsyZxxDVqDbKziHo=; b=toUktm00/bQ7qxHUgiaLgT8TYjVvo561VFoT8weTANN7Org4fkXK1d9tk8lnR6Yb7o SUuJrItV/rBm2GRkrf7RRV20Rg4Ito3dMfz0oGCxx3jbUuQvASNEkMQQXO+1FELw9y8U 0HmQegaTQwbI5gv82hZMzxdj2NySPjdpOI6MP1OiqYIqPTVmTIgYunadjmjLq+4XfGUH BIYTjzjTN9aSOxvuVtYxBPdmOrOBc7mYoj/3qV2egTIItHUlo7tovWL8DzsvD9ps65Vn Ojcn9k4lmU5Pk9oYctBjLI9XBp8DAnzoXrXxwM3QIVgwp3LKNHWdCAp9LsromSv8494e CD8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7KrJPJUDcpekJYE1cTv2gMXN75mrsyZxxDVqDbKziHo=; b=lUNYhWgOsJJZeYE6EYL+zHM5jDuzb88MZ2zYn+ek28hkWc8hoAuoUYjtBwewgeJBbc 1Jijx6atrpPXELs2vAiWjplPubfw9NCduVsTC/XNDesN77AP3wzmrH1NYle23ZbWSXZs CT9t0KCqnsCV5xNonWp58NCEfaa9LKk4EbG2HqK8At0r/JsGJFxJA1+XQa3Bdde30F9M ww6/Fo6qGVNBEnTINQC/ciOjfZ+RMIJdJ7atIacxeRzDb9mkMnzVE/kc9dXXSA2SfY0l 6iQk3qUCvJdeW/VOgbMkgIbbvyS8nLHpf0WHA2zUrfhYPNqslNhXUL+Tq/UjIJpt3mvA N0Zg== X-Gm-Message-State: AHQUAuYVCql0WXKyycre8lq39fp35HyQIRbff+DSJBTt78ohpr1dx0Np xvfPsLPTp49eg7jn+NcD6HQCkelFLcU/zFJ8PpAeIQ== X-Google-Smtp-Source: AHgI3IawaOlblpEMW6fgNN+DtIfP6ivxeAmLebLVvBsyP+O+1NkQPI6pWfnQ9C/829kXcOhOnotXJKaEwGMn2AmRGAw= X-Received: by 2002:a5b:70d:: with SMTP id g13mr5542912ybq.273.1550688027433; Wed, 20 Feb 2019 10:40:27 -0800 (PST) MIME-Version: 1.0 References: <20190219231530.11306-1-daniel@iogearbox.net> In-Reply-To: <20190219231530.11306-1-daniel@iogearbox.net> From: =?UTF-8?B?TWFoZXNoIEJhbmRld2FyICjgpK7gpLngpYfgpLYg4KSs4KSC4KSh4KWH4KS14KS+4KSwKQ==?= Date: Wed, 20 Feb 2019 10:40:15 -0800 Message-ID: Subject: Re: [PATCH net] ipvlan: disallow userns cap_net_admin to change global mode/flags To: Daniel Borkmann Cc: David Miller , m@lambda.lt, linux-netdev Content-Type: text/plain; charset="UTF-8" Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Tue, Feb 19, 2019 at 3:38 PM Daniel Borkmann wrote: > > When running Docker with userns isolation e.g. --userns-remap="default" > and spawning up some containers with CAP_NET_ADMIN under this realm, I > noticed that link changes on ipvlan slave device inside that container > can affect all devices from this ipvlan group which are in other net > namespaces where the container should have no permission to make changes > to, such as the init netns, for example. > > This effectively allows to undo ipvlan private mode and switch globally to > bridge mode where slaves can communicate directly without going through > hostns, or it allows to switch between global operation mode (l2/l3/l3s) > for everyone bound to the given ipvlan master device. libnetwork plugin > here is creating an ipvlan master and ipvlan slave in hostns and a slave > each that is moved into the container's netns upon creation event. > > * In hostns: > > # ip -d a > [...] > 8: cilium_host@bond0: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 > link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 > ipvlan mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 > inet 10.41.0.1/32 scope link cilium_host > valid_lft forever preferred_lft forever > [...] > > * Spawn container & change ipvlan mode setting inside of it: > > # docker run -dt --cap-add=NET_ADMIN --network cilium-net --name client -l app=test cilium/netperf > 9fff485d69dcb5ce37c9e33ca20a11ccafc236d690105aadbfb77e4f4170879c > > # docker exec -ti client ip -d a > [...] > 10: cilium0@if4: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 > link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 > ipvlan mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 > inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0 > valid_lft forever preferred_lft forever > > # docker exec -ti client ip link change link cilium0 name cilium0 type ipvlan mode l2 > > # docker exec -ti client ip -d a > [...] > 10: cilium0@if4: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 > link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 > ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 > inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0 > valid_lft forever preferred_lft forever > > * In hostns (mode switched to l2): > > # ip -d a > [...] > 8: cilium_host@bond0: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 > link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 > ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 > inet 10.41.0.1/32 scope link cilium_host > valid_lft forever preferred_lft forever > [...] > > Same l3 -> l2 switch would also happen by creating another slave inside > the container's network namespace when specifying the existing cilium0 > link to derive the actual (bond0) master: > > # docker exec -ti client ip link add link cilium0 name cilium1 type ipvlan mode l2 > > # docker exec -ti client ip -d a > [...] > 2: cilium1@if4: mtu 1500 qdisc noop state DOWN group default qlen 1000 > link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 > ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 > 10: cilium0@if4: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 > link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 > ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 > inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0 > valid_lft forever preferred_lft forever > > * In hostns: > > # ip -d a > [...] > 8: cilium_host@bond0: mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 > link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535 > ipvlan mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 > inet 10.41.0.1/32 scope link cilium_host > valid_lft forever preferred_lft forever > [...] > > One way to mitigate it is to check CAP_NET_ADMIN permissions of > the ipvlan master device's ns, and only then allow to change > mode or flags for all devices bound to it. Above two cases are > then disallowed after the patch. thanks for the fix Daniel. > > Signed-off-by: Daniel Borkmann Acked-by: Mahesh Bandewar > --- > drivers/net/ipvlan/ipvlan_main.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c > index 7cdac77..07e41c4 100644 > --- a/drivers/net/ipvlan/ipvlan_main.c > +++ b/drivers/net/ipvlan/ipvlan_main.c > @@ -499,6 +499,8 @@ static int ipvlan_nl_changelink(struct net_device *dev, > > if (!data) > return 0; > + if (!ns_capable(dev_net(ipvlan->phy_dev)->user_ns, CAP_NET_ADMIN)) > + return -EPERM; > > if (data[IFLA_IPVLAN_MODE]) { > u16 nmode = nla_get_u16(data[IFLA_IPVLAN_MODE]); > @@ -601,6 +603,8 @@ int ipvlan_link_new(struct net *src_net, struct net_device *dev, > struct ipvl_dev *tmp = netdev_priv(phy_dev); > > phy_dev = tmp->phy_dev; > + if (!ns_capable(dev_net(phy_dev)->user_ns, CAP_NET_ADMIN)) > + return -EPERM; > } else if (!netif_is_ipvlan_port(phy_dev)) { > /* Exit early if the underlying link is invalid or busy */ > if (phy_dev->type != ARPHRD_ETHER || > -- > 2.7.4 >