From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mike Manning Subject: [PATCH net-next 1/5] net: allow binding socket in a VRF when there's an unbound socket Date: Thu, 20 Sep 2018 09:58:44 +0100 Message-ID: <20180920085848.17721-2-mmanning@vyatta.att-mail.com> References: <20180920085848.17721-1-mmanning@vyatta.att-mail.com> Cc: Robert Shearman To: netdev@vger.kernel.org Return-path: Received: from mx0b-00191d01.pphosted.com ([67.231.157.136]:58414 "EHLO mx0a-00191d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1731480AbeITOli (ORCPT ); Thu, 20 Sep 2018 10:41:38 -0400 Received: from pps.filterd (m0049458.ppops.net [127.0.0.1]) by m0049458.ppops.net-00191d01. (8.16.0.22/8.16.0.22) with SMTP id w8K8tBEf017431 for ; Thu, 20 Sep 2018 04:59:11 -0400 Received: from tlpd255.enaf.dadc.sbc.com (sbcsmtp3.sbc.com [144.160.112.28]) by m0049458.ppops.net-00191d01. with ESMTP id 2mm1rf0ag3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 20 Sep 2018 04:59:11 -0400 Received: from enaf.dadc.sbc.com (localhost [127.0.0.1]) by tlpd255.enaf.dadc.sbc.com (8.14.5/8.14.5) with ESMTP id w8K8x9F6007045 for ; Thu, 20 Sep 2018 03:59:10 -0500 Received: from zlp30499.vci.att.com (zlp30499.vci.att.com [135.46.181.149]) by tlpd255.enaf.dadc.sbc.com (8.14.5/8.14.5) with ESMTP id w8K8x6RI006889 for ; Thu, 20 Sep 2018 03:59:06 -0500 Received: from zlp30499.vci.att.com (zlp30499.vci.att.com [127.0.0.1]) by zlp30499.vci.att.com (Service) with ESMTP id 970FD4013B2F for ; Thu, 20 Sep 2018 08:59:06 +0000 (GMT) Received: from tlpd252.dadc.sbc.com (unknown [135.31.184.157]) by zlp30499.vci.att.com (Service) with ESMTP id 6F3E34013B2E for ; Thu, 20 Sep 2018 08:59:06 +0000 (GMT) Received: from dadc.sbc.com (localhost [127.0.0.1]) by tlpd252.dadc.sbc.com (8.14.5/8.14.5) with ESMTP id w8K8x530106402 for ; Thu, 20 Sep 2018 03:59:06 -0500 Received: from mail.eng.vyatta.net (mail.eng.vyatta.net [10.156.50.82]) by tlpd252.dadc.sbc.com (8.14.5/8.14.5) with ESMTP id w8K8wvai105904 for ; Thu, 20 Sep 2018 03:58:57 -0500 In-Reply-To: <20180920085848.17721-1-mmanning@vyatta.att-mail.com> Sender: netdev-owner@vger.kernel.org List-ID: From: Robert Shearman There is no easy way currently for applications that want to receive packets in the default VRF to be isolated from packets arriving in VRFs, which makes using VRF-unaware applications in a VRF-aware system a potential security risk. So change the inet socket lookup to avoid packets arriving on a device enslaved to an l3mdev from matching unbound sockets by removing the wildcard for non sk_bound_dev_if and instead relying on check against the secondary device index, which will be 0 when the input device is not enslaved to an l3mdev and so match against an unbound socket and not match when the input device is enslaved. The existing net.ipv4.tcp_l3mdev_accept & net.ipv4.udp_l3mdev_accept sysctls, which are documented as allowing the working across all VRF domains, can be used to also work in the default VRF by causing unbound sockets to match against packets arriving on a device enslaved to an l3mdev. Change the socket binding to take the l3mdev into account to allow an unbound socket to not conflict sockets bound to an l3mdev given the datapath isolation now guaranteed. Signed-off-by: Robert Shearman Signed-off-by: Mike Manning --- Documentation/networking/vrf.txt | 9 +++++---- include/net/inet6_hashtables.h | 5 ++--- include/net/inet_hashtables.h | 21 ++++++++++++++------- include/net/inet_sock.h | 13 +++++++++++++ net/core/sock.c | 2 ++ net/ipv4/inet_connection_sock.c | 13 ++++++++++--- net/ipv4/inet_hashtables.c | 34 +++++++++++++++++++++------------- net/ipv4/ip_sockglue.c | 3 +++ net/ipv4/raw.c | 4 ++-- net/ipv4/udp.c | 15 ++++++--------- net/ipv6/datagram.c | 5 ++++- net/ipv6/inet6_hashtables.c | 14 ++++++-------- net/ipv6/ipv6_sockglue.c | 3 +++ net/ipv6/raw.c | 6 +++--- net/ipv6/udp.c | 14 +++++--------- 15 files changed, 99 insertions(+), 62 deletions(-) diff --git a/Documentation/networking/vrf.txt b/Documentation/networking/vrf.txt index 8ff7b4c8f91b..d4b129402d57 100644 --- a/Documentation/networking/vrf.txt +++ b/Documentation/networking/vrf.txt @@ -103,6 +103,11 @@ VRF device: or to specify the output device using cmsg and IP_PKTINFO. +By default the scope of the port bindings for unbound sockets is +limited to the default VRF. That is, it will not be matched by packets +arriving on interfaces enslaved to an l3mdev and processes may bind to +the same port if they bind to an l3mdev. + TCP & UDP services running in the default VRF context (ie., not bound to any VRF device) can work across all VRF domains by enabling the tcp_l3mdev_accept and udp_l3mdev_accept sysctl options: @@ -112,10 +117,6 @@ tcp_l3mdev_accept and udp_l3mdev_accept sysctl options: netfilter rules on the VRF device can be used to limit access to services running in the default VRF context as well. -The default VRF does not have limited scope with respect to port bindings. -That is, if a process does a wildcard bind to a port in the default VRF it -owns the port across all VRF domains within the network namespace. - ################################################################################ Using iproute2 for VRFs diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h index 6e91e38a31da..9db98af46985 100644 --- a/include/net/inet6_hashtables.h +++ b/include/net/inet6_hashtables.h @@ -115,9 +115,8 @@ int inet6_hash(struct sock *sk); ((__sk)->sk_family == AF_INET6) && \ ipv6_addr_equal(&(__sk)->sk_v6_daddr, (__saddr)) && \ ipv6_addr_equal(&(__sk)->sk_v6_rcv_saddr, (__daddr)) && \ - (!(__sk)->sk_bound_dev_if || \ - ((__sk)->sk_bound_dev_if == (__dif)) || \ - ((__sk)->sk_bound_dev_if == (__sdif))) && \ + (((__sk)->sk_bound_dev_if == (__dif)) || \ + ((__sk)->sk_bound_dev_if == (__sdif))) && \ net_eq(sock_net(__sk), (__net))) #endif /* _INET6_HASHTABLES_H */ diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 9141e95529e7..ec279bcd0958 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -79,6 +79,7 @@ struct inet_ehash_bucket { struct inet_bind_bucket { possible_net_t ib_net; + int l3mdev; unsigned short port; signed char fastreuse; signed char fastreuseport; @@ -188,10 +189,18 @@ static inline void inet_ehash_locks_free(struct inet_hashinfo *hashinfo) hashinfo->ehash_locks = NULL; } +static inline bool inet_sk_bound_dev_eq(struct net *net, int bound_dev_if, + int dif, int sdif) +{ + if (!bound_dev_if) + return !sdif || net->ipv4.sysctl_tcp_l3mdev_accept; + return bound_dev_if == dif || bound_dev_if == sdif; +} + struct inet_bind_bucket * inet_bind_bucket_create(struct kmem_cache *cachep, struct net *net, struct inet_bind_hashbucket *head, - const unsigned short snum); + const unsigned short snum, int l3mdev); void inet_bind_bucket_destroy(struct kmem_cache *cachep, struct inet_bind_bucket *tb); @@ -282,9 +291,8 @@ static inline struct sock *inet_lookup_listener(struct net *net, #define INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif, __sdif) \ (((__sk)->sk_portpair == (__ports)) && \ ((__sk)->sk_addrpair == (__cookie)) && \ - (!(__sk)->sk_bound_dev_if || \ - ((__sk)->sk_bound_dev_if == (__dif)) || \ - ((__sk)->sk_bound_dev_if == (__sdif))) && \ + (((__sk)->sk_bound_dev_if == (__dif)) || \ + ((__sk)->sk_bound_dev_if == (__sdif))) && \ net_eq(sock_net(__sk), (__net))) #else /* 32-bit arch */ #define INET_ADDR_COOKIE(__name, __saddr, __daddr) \ @@ -294,9 +302,8 @@ static inline struct sock *inet_lookup_listener(struct net *net, (((__sk)->sk_portpair == (__ports)) && \ ((__sk)->sk_daddr == (__saddr)) && \ ((__sk)->sk_rcv_saddr == (__daddr)) && \ - (!(__sk)->sk_bound_dev_if || \ - ((__sk)->sk_bound_dev_if == (__dif)) || \ - ((__sk)->sk_bound_dev_if == (__sdif))) && \ + (((__sk)->sk_bound_dev_if == (__dif)) || \ + ((__sk)->sk_bound_dev_if == (__sdif))) && \ net_eq(sock_net(__sk), (__net))) #endif /* 64-bit arch */ diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index e03b93360f33..92e0aa3958f6 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -130,6 +130,19 @@ static inline int inet_request_bound_dev_if(const struct sock *sk, return sk->sk_bound_dev_if; } +static inline int inet_sk_bound_l3mdev(const struct sock *sk) +{ +#ifdef CONFIG_NET_L3_MASTER_DEV + struct net *net = sock_net(sk); + + if (!net->ipv4.sysctl_tcp_l3mdev_accept) + return l3mdev_master_ifindex_by_index(net, + sk->sk_bound_dev_if); +#endif + + return 0; +} + static inline struct ip_options_rcu *ireq_opt_deref(const struct inet_request_sock *ireq) { return rcu_dereference_check(ireq->ireq_opt, diff --git a/net/core/sock.c b/net/core/sock.c index 3730eb855095..da1cbb88a6bf 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -567,6 +567,8 @@ static int sock_setbindtodevice(struct sock *sk, char __user *optval, lock_sock(sk); sk->sk_bound_dev_if = index; + if (sk->sk_prot->rehash) + sk->sk_prot->rehash(sk); sk_dst_reset(sk); release_sock(sk); diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index dfd5009f96ef..97bba5b3d69f 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -183,7 +183,9 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int * int i, low, high, attempt_half; struct inet_bind_bucket *tb; u32 remaining, offset; + int l3mdev; + l3mdev = inet_sk_bound_l3mdev(sk); attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0; other_half_scan: inet_get_local_port_range(net, &low, &high); @@ -219,7 +221,8 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int * hinfo->bhash_size)]; spin_lock_bh(&head->lock); inet_bind_bucket_for_each(tb, &head->chain) - if (net_eq(ib_net(tb), net) && tb->port == port) { + if (net_eq(ib_net(tb), net) && tb->l3mdev == l3mdev && + tb->port == port) { if (!inet_csk_bind_conflict(sk, tb, false, false)) goto success; goto next_port; @@ -293,6 +296,9 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) struct net *net = sock_net(sk); struct inet_bind_bucket *tb = NULL; kuid_t uid = sock_i_uid(sk); + int l3mdev; + + l3mdev = inet_sk_bound_l3mdev(sk); if (!port) { head = inet_csk_find_open_port(sk, &tb, &port); @@ -306,11 +312,12 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) hinfo->bhash_size)]; spin_lock_bh(&head->lock); inet_bind_bucket_for_each(tb, &head->chain) - if (net_eq(ib_net(tb), net) && tb->port == port) + if (net_eq(ib_net(tb), net) && tb->l3mdev == l3mdev && + tb->port == port) goto tb_found; tb_not_found: tb = inet_bind_bucket_create(hinfo->bind_bucket_cachep, - net, head, port); + net, head, port, l3mdev); if (!tb) goto fail_unlock; tb_found: diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index f5c9ef2586de..2ec684057ebd 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -65,12 +65,14 @@ static u32 sk_ehashfn(const struct sock *sk) struct inet_bind_bucket *inet_bind_bucket_create(struct kmem_cache *cachep, struct net *net, struct inet_bind_hashbucket *head, - const unsigned short snum) + const unsigned short snum, + int l3mdev) { struct inet_bind_bucket *tb = kmem_cache_alloc(cachep, GFP_ATOMIC); if (tb) { write_pnet(&tb->ib_net, net); + tb->l3mdev = l3mdev; tb->port = snum; tb->fastreuse = 0; tb->fastreuseport = 0; @@ -135,6 +137,7 @@ int __inet_inherit_port(const struct sock *sk, struct sock *child) table->bhash_size); struct inet_bind_hashbucket *head = &table->bhash[bhash]; struct inet_bind_bucket *tb; + int l3mdev; spin_lock(&head->lock); tb = inet_csk(sk)->icsk_bind_hash; @@ -143,6 +146,8 @@ int __inet_inherit_port(const struct sock *sk, struct sock *child) return -ENOENT; } if (tb->port != port) { + l3mdev = inet_sk_bound_l3mdev(sk); + /* NOTE: using tproxy and redirecting skbs to a proxy * on a different listener port breaks the assumption * that the listener socket's icsk_bind_hash is the same @@ -150,12 +155,13 @@ int __inet_inherit_port(const struct sock *sk, struct sock *child) * create a new bind bucket for the child here. */ inet_bind_bucket_for_each(tb, &head->chain) { if (net_eq(ib_net(tb), sock_net(sk)) && - tb->port == port) + tb->l3mdev == l3mdev && tb->port == port) break; } if (!tb) { tb = inet_bind_bucket_create(table->bind_bucket_cachep, - sock_net(sk), head, port); + sock_net(sk), head, port, + l3mdev); if (!tb) { spin_unlock(&head->lock); return -ENOMEM; @@ -229,6 +235,7 @@ static inline int compute_score(struct sock *sk, struct net *net, { int score = -1; struct inet_sock *inet = inet_sk(sk); + bool dev_match; if (net_eq(sock_net(sk), net) && inet->inet_num == hnum && !ipv6_only_sock(sk)) { @@ -239,15 +246,12 @@ static inline int compute_score(struct sock *sk, struct net *net, return -1; score += 4; } - if (sk->sk_bound_dev_if || exact_dif) { - bool dev_match = (sk->sk_bound_dev_if == dif || - sk->sk_bound_dev_if == sdif); + dev_match = inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if, + dif, sdif); + if (!dev_match) + return -1; + score += 4; - if (!dev_match) - return -1; - if (sk->sk_bound_dev_if) - score += 4; - } if (sk->sk_incoming_cpu == raw_smp_processor_id()) score++; } @@ -675,6 +679,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, u32 remaining, offset; int ret, i, low, high; static u32 hint; + int l3mdev; if (port) { head = &hinfo->bhash[inet_bhashfn(net, port, @@ -693,6 +698,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, return ret; } + l3mdev = inet_sk_bound_l3mdev(sk); + inet_get_local_port_range(net, &low, &high); high++; /* [32768, 60999] -> [32768, 61000[ */ remaining = high - low; @@ -719,7 +726,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, * the established check is already unique enough. */ inet_bind_bucket_for_each(tb, &head->chain) { - if (net_eq(ib_net(tb), net) && tb->port == port) { + if (net_eq(ib_net(tb), net) && tb->l3mdev == l3mdev && + tb->port == port) { if (tb->fastreuse >= 0 || tb->fastreuseport >= 0) goto next_port; @@ -732,7 +740,7 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, } tb = inet_bind_bucket_create(hinfo->bind_bucket_cachep, - net, head, port); + net, head, port, l3mdev); if (!tb) { spin_unlock_bh(&head->lock); return -ENOMEM; diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index c0fe5ad996f2..026971314c43 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -892,6 +892,9 @@ static int do_ip_setsockopt(struct sock *sk, int level, dev_put(dev); err = -EINVAL; + if (!sk->sk_bound_dev_if && midx) + break; + if (sk->sk_bound_dev_if && mreq.imr_ifindex != sk->sk_bound_dev_if && (!midx || midx != sk->sk_bound_dev_if)) diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 33df4d76db2d..8a0d568d7aec 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -70,6 +70,7 @@ #include #include #include +#include #include #include #include @@ -131,8 +132,7 @@ struct sock *__raw_v4_lookup(struct net *net, struct sock *sk, if (net_eq(sock_net(sk), net) && inet->inet_num == num && !(inet->inet_daddr && inet->inet_daddr != raddr) && !(inet->inet_rcv_saddr && inet->inet_rcv_saddr != laddr) && - !(sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif && - sk->sk_bound_dev_if != sdif)) + inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if, dif, sdif)) goto found; /* gotcha */ } sk = NULL; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index f4e35b2ff8b8..3d59ab47a85d 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -371,6 +371,7 @@ static int compute_score(struct sock *sk, struct net *net, { int score; struct inet_sock *inet; + bool dev_match; if (!net_eq(sock_net(sk), net) || udp_sk(sk)->udp_port_hash != hnum || @@ -398,15 +399,11 @@ static int compute_score(struct sock *sk, struct net *net, score += 4; } - if (sk->sk_bound_dev_if || exact_dif) { - bool dev_match = (sk->sk_bound_dev_if == dif || - sk->sk_bound_dev_if == sdif); - - if (!dev_match) - return -1; - if (sk->sk_bound_dev_if) - score += 4; - } + dev_match = inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if, + dif, sdif); + if (!dev_match) + return -1; + score += 4; if (sk->sk_incoming_cpu == raw_smp_processor_id()) score++; diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c index 1ede7a16a0be..4813293d4fad 100644 --- a/net/ipv6/datagram.c +++ b/net/ipv6/datagram.c @@ -782,7 +782,10 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk, if (src_info->ipi6_ifindex) { if (fl6->flowi6_oif && - src_info->ipi6_ifindex != fl6->flowi6_oif) + src_info->ipi6_ifindex != fl6->flowi6_oif && + (sk->sk_bound_dev_if != fl6->flowi6_oif || + !sk_dev_equal_l3scope( + sk, src_info->ipi6_ifindex))) return -EINVAL; fl6->flowi6_oif = src_info->ipi6_ifindex; } diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 3d7c7460a0c5..5eeeba7181a1 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -99,6 +99,7 @@ static inline int compute_score(struct sock *sk, struct net *net, const int dif, const int sdif, bool exact_dif) { int score = -1; + bool dev_match; if (net_eq(sock_net(sk), net) && inet_sk(sk)->inet_num == hnum && sk->sk_family == PF_INET6) { @@ -109,15 +110,12 @@ static inline int compute_score(struct sock *sk, struct net *net, return -1; score++; } - if (sk->sk_bound_dev_if || exact_dif) { - bool dev_match = (sk->sk_bound_dev_if == dif || - sk->sk_bound_dev_if == sdif); + dev_match = inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if, + dif, sdif); + if (!dev_match) + return -1; + score++; - if (!dev_match) - return -1; - if (sk->sk_bound_dev_if) - score++; - } if (sk->sk_incoming_cpu == raw_smp_processor_id()) score++; } diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index c0cac9cc3a28..7dfbc797b130 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -626,6 +626,9 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, rcu_read_unlock(); + if (!sk->sk_bound_dev_if && midx) + goto e_inval; + if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != val && (!midx || midx != sk->sk_bound_dev_if)) diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 413d98bf24f4..2f61a9f1a2b2 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -49,6 +49,7 @@ #include #include #include +#include #include #if IS_ENABLED(CONFIG_IPV6_MIP6) #include @@ -86,9 +87,8 @@ struct sock *__raw_v6_lookup(struct net *net, struct sock *sk, !ipv6_addr_equal(&sk->sk_v6_daddr, rmt_addr)) continue; - if (sk->sk_bound_dev_if && - sk->sk_bound_dev_if != dif && - sk->sk_bound_dev_if != sdif) + if (!inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if, + dif, sdif)) continue; if (!ipv6_addr_any(&sk->sk_v6_rcv_saddr)) { diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 83f4c77c79d8..e22b7dd78c9b 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -117,6 +117,7 @@ static int compute_score(struct sock *sk, struct net *net, { int score; struct inet_sock *inet; + bool dev_match; if (!net_eq(sock_net(sk), net) || udp_sk(sk)->udp_port_hash != hnum || @@ -144,15 +145,10 @@ static int compute_score(struct sock *sk, struct net *net, score++; } - if (sk->sk_bound_dev_if || exact_dif) { - bool dev_match = (sk->sk_bound_dev_if == dif || - sk->sk_bound_dev_if == sdif); - - if (!dev_match) - return -1; - if (sk->sk_bound_dev_if) - score++; - } + dev_match = inet_sk_bound_dev_eq(net, sk->sk_bound_dev_if, dif, sdif); + if (!dev_match) + return -1; + score++; if (sk->sk_incoming_cpu == raw_smp_processor_id()) score++; -- 2.11.0