* [PATCH] net-icmp: make icmp{,v6} (ping) sockets available to all by default @ 2020-05-08 23:42 Maciej Żenczykowski 2020-05-09 19:15 ` Ido Schimmel 0 siblings, 1 reply; 10+ messages in thread From: Maciej Żenczykowski @ 2020-05-08 23:42 UTC (permalink / raw) To: Maciej Żenczykowski, David S . Miller Cc: Linux Network Development Mailing List From: Maciej Żenczykowski <maze@google.com> This makes 'ping' 'ping6' and icmp based traceroute no longer require any suid or file capabilities. These sockets have baked long enough that the restriction to make them unusable by default is no longer necessary. The concerns were around exploits. However there are now major distros that default to enabling this. This is already the default on Fedora 31: [root@f31vm ~]# cat /proc/sys/net/ipv4/ping_group_range 0 2147483647 [root@f31vm ~]# cat /usr/lib/sysctl.d/50-default.conf | egrep -B6 ping_group_range # ping(8) without CAP_NET_ADMIN and CAP_NET_RAW # The upper limit is set to 2^31-1. Values greater than that get rejected by # the kernel because of this definition in linux/include/net/ping.h: # #define GID_T_MAX (((gid_t)~0U) >> 1) # That's not so bad because values between 2^31 and 2^32-1 are reserved on # systemd-based systems anyway: https://systemd.io/UIDS-GIDS.html#summary -net.ipv4.ping_group_range = 0 2147483647 And in general is super useful for any network namespace container based setup. See for example: https://docs.docker.com/engine/security/rootless/ This is one less thing you need to configure when you creare a new network namespace. Before: vm:~# unshare -n vm:~# cat /proc/sys/net/ipv4/ping_group_range 1 0 After: vm:~# unshare -n vm:~# cat /proc/sys/net/ipv4/ping_group_range 0 2147483647 Signed-off-by: Maciej Żenczykowski <maze@google.com> --- net/ipv4/af_inet.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index cf58e29cf746..1a8cb6f3ee38 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1819,12 +1819,8 @@ static __net_init int inet_init_net(struct net *net) net->ipv4.ip_local_ports.range[1] = 60999; seqlock_init(&net->ipv4.ping_group_range.lock); - /* - * Sane defaults - nobody may create ping sockets. - * Boot scripts should set this to distro-specific group. - */ - net->ipv4.ping_group_range.range[0] = make_kgid(&init_user_ns, 1); - net->ipv4.ping_group_range.range[1] = make_kgid(&init_user_ns, 0); + net->ipv4.ping_group_range.range[0] = GLOBAL_ROOT_GID; + net->ipv4.ping_group_range.range[1] = KGIDT_INIT(0x7FFFFFFF); /* Default values for sysctl-controlled parameters. * We set them here, in case sysctl is not compiled. -- 2.26.2.645.ge9eca65c58-goog ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] net-icmp: make icmp{,v6} (ping) sockets available to all by default 2020-05-08 23:42 [PATCH] net-icmp: make icmp{,v6} (ping) sockets available to all by default Maciej Żenczykowski @ 2020-05-09 19:15 ` Ido Schimmel 2020-05-09 19:17 ` Maciej Żenczykowski 2020-05-09 21:20 ` David Ahern 0 siblings, 2 replies; 10+ messages in thread From: Ido Schimmel @ 2020-05-09 19:15 UTC (permalink / raw) To: Maciej Żenczykowski, dsahern Cc: Maciej Żenczykowski, David S . Miller, Linux Network Development Mailing List On Fri, May 08, 2020 at 04:42:23PM -0700, Maciej Żenczykowski wrote: > From: Maciej Żenczykowski <maze@google.com> > > This makes 'ping' 'ping6' and icmp based traceroute no longer > require any suid or file capabilities. > > These sockets have baked long enough that the restriction > to make them unusable by default is no longer necessary. > > The concerns were around exploits. However there are now > major distros that default to enabling this. > > This is already the default on Fedora 31: > [root@f31vm ~]# cat /proc/sys/net/ipv4/ping_group_range > 0 2147483647 > [root@f31vm ~]# cat /usr/lib/sysctl.d/50-default.conf | egrep -B6 ping_group_range > # ping(8) without CAP_NET_ADMIN and CAP_NET_RAW > # The upper limit is set to 2^31-1. Values greater than that get rejected by > # the kernel because of this definition in linux/include/net/ping.h: > # #define GID_T_MAX (((gid_t)~0U) >> 1) > # That's not so bad because values between 2^31 and 2^32-1 are reserved on > # systemd-based systems anyway: https://systemd.io/UIDS-GIDS.html#summary > -net.ipv4.ping_group_range = 0 2147483647 > > And in general is super useful for any network namespace container > based setup. See for example: https://docs.docker.com/engine/security/rootless/ > > This is one less thing you need to configure when you creare a new network > namespace. > > Before: > vm:~# unshare -n > vm:~# cat /proc/sys/net/ipv4/ping_group_range > 1 0 > > After: > vm:~# unshare -n > vm:~# cat /proc/sys/net/ipv4/ping_group_range > 0 2147483647 > > Signed-off-by: Maciej Żenczykowski <maze@google.com> This will unfortunately cause regressions with VRFs because they don't work correctly with ping sockets. Simple example: ip link add name vrf-red up type vrf table 10 ip link add name vrf-blue up type vrf table 20 ip rule add pref 32765 table local ip rule del pref 0 ip link add name veth-red type veth peer name veth-blue ip link set dev veth-red up master vrf-red ip link set dev veth-blue up master vrf-red ip address add 192.0.2.1/24 dev veth-red ip address add 192.0.2.2/24 dev veth-blue ip vrf exec vrf-red ping -I 192.0.2.1 192.0.2.2 -c 1 -w 1 Before the patch: PING 192.0.2.2 (192.0.2.2) from 192.0.2.1 : 56(84) bytes of data. 64 bytes from 192.0.2.2: icmp_seq=1 ttl=64 time=0.053 ms After the patch: bind: Cannot assign requested address This specific test case is fixed by following patch, but at least a similar change is needed for IPv6. Other changes might also be required. Sorry for not providing a complete solution, but I'm busy with other tasks at the moment. :/ diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index 535427292194..8463b0e9e811 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -297,6 +297,7 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk, struct net *net = sock_net(sk); if (sk->sk_family == AF_INET) { struct sockaddr_in *addr = (struct sockaddr_in *) uaddr; + u32 tb_id = RT_TABLE_LOCAL; int chk_addr_ret; if (addr_len < sizeof(*addr)) @@ -310,7 +311,15 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk, pr_debug("ping_check_bind_addr(sk=%p,addr=%pI4,port=%d)\n", sk, &addr->sin_addr.s_addr, ntohs(addr->sin_port)); - chk_addr_ret = inet_addr_type(net, addr->sin_addr.s_addr); + if (sk->sk_bound_dev_if) { + tb_id = l3mdev_fib_table_by_index(net, + sk->sk_bound_dev_if); + if (!tb_id) + tb_id = RT_TABLE_LOCAL; + } + + chk_addr_ret = inet_addr_type_table(net, addr->sin_addr.s_addr, + tb_id); if (addr->sin_addr.s_addr == htonl(INADDR_ANY)) chk_addr_ret = RTN_LOCAL; > --- > net/ipv4/af_inet.c | 8 ++------ > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index cf58e29cf746..1a8cb6f3ee38 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -1819,12 +1819,8 @@ static __net_init int inet_init_net(struct net *net) > net->ipv4.ip_local_ports.range[1] = 60999; > > seqlock_init(&net->ipv4.ping_group_range.lock); > - /* > - * Sane defaults - nobody may create ping sockets. > - * Boot scripts should set this to distro-specific group. > - */ > - net->ipv4.ping_group_range.range[0] = make_kgid(&init_user_ns, 1); > - net->ipv4.ping_group_range.range[1] = make_kgid(&init_user_ns, 0); > + net->ipv4.ping_group_range.range[0] = GLOBAL_ROOT_GID; > + net->ipv4.ping_group_range.range[1] = KGIDT_INIT(0x7FFFFFFF); > > /* Default values for sysctl-controlled parameters. > * We set them here, in case sysctl is not compiled. > -- > 2.26.2.645.ge9eca65c58-goog > ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] net-icmp: make icmp{,v6} (ping) sockets available to all by default 2020-05-09 19:15 ` Ido Schimmel @ 2020-05-09 19:17 ` Maciej Żenczykowski 2020-05-09 19:32 ` Ido Schimmel 2020-05-09 21:20 ` David Ahern 1 sibling, 1 reply; 10+ messages in thread From: Maciej Żenczykowski @ 2020-05-09 19:17 UTC (permalink / raw) To: Ido Schimmel Cc: David Ahern, David S . Miller, Linux Network Development Mailing List Argh. I've never understood the faintest thing about VRF's. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net-icmp: make icmp{,v6} (ping) sockets available to all by default 2020-05-09 19:17 ` Maciej Żenczykowski @ 2020-05-09 19:32 ` Ido Schimmel 2020-05-09 21:09 ` David Ahern 0 siblings, 1 reply; 10+ messages in thread From: Ido Schimmel @ 2020-05-09 19:32 UTC (permalink / raw) To: Maciej Żenczykowski Cc: David Ahern, David S . Miller, Linux Network Development Mailing List On Sat, May 09, 2020 at 12:17:47PM -0700, Maciej Żenczykowski wrote: > Argh. I've never understood the faintest thing about VRF's. :) There are many resources that David created over the years. For example: https://www.kernel.org/doc/Documentation/networking/vrf.txt https://netdevconf.info/1.1/proceedings/slides/ahern-vrf-tutorial.pdf https://netdevconf.info/1.2/session.html?david-ahern-talk BTW, in the example it should be: ip link set dev veth-blue up master vrf-blue Instead of: ip link set dev veth-blue up master vrf-red (Obviously) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net-icmp: make icmp{,v6} (ping) sockets available to all by default 2020-05-09 19:32 ` Ido Schimmel @ 2020-05-09 21:09 ` David Ahern 0 siblings, 0 replies; 10+ messages in thread From: David Ahern @ 2020-05-09 21:09 UTC (permalink / raw) To: Ido Schimmel, Maciej Żenczykowski Cc: David S . Miller, Linux Network Development Mailing List On 5/9/20 1:32 PM, Ido Schimmel wrote: > On Sat, May 09, 2020 at 12:17:47PM -0700, Maciej Żenczykowski wrote: >> Argh. I've never understood the faintest thing about VRF's. > > :) > > There are many resources that David created over the years. For example: > > https://www.kernel.org/doc/Documentation/networking/vrf.txt > https://netdevconf.info/1.1/proceedings/slides/ahern-vrf-tutorial.pdf > https://netdevconf.info/1.2/session.html?david-ahern-talk best one is from OSS N.A. 2017: http://schd.ws/hosted_files/ossna2017/fe/vrf-tutorial-oss.pdf Maciej: Basically, VRF narrows the scope to an L3 domain - devices enslaved to the VRF device, their addresses and FIB entries in the VRF table. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net-icmp: make icmp{,v6} (ping) sockets available to all by default 2020-05-09 19:15 ` Ido Schimmel 2020-05-09 19:17 ` Maciej Żenczykowski @ 2020-05-09 21:20 ` David Ahern 2020-05-09 21:35 ` Maciej Żenczykowski 1 sibling, 1 reply; 10+ messages in thread From: David Ahern @ 2020-05-09 21:20 UTC (permalink / raw) To: Ido Schimmel, Maciej Żenczykowski Cc: Maciej Żenczykowski, David S . Miller, Linux Network Development Mailing List On 5/9/20 1:15 PM, Ido Schimmel wrote: > > This will unfortunately cause regressions with VRFs because they don't > work correctly with ping sockets. Simple example: Thanks for catching this, Ido. > > diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c > index 535427292194..8463b0e9e811 100644 > --- a/net/ipv4/ping.c > +++ b/net/ipv4/ping.c > @@ -297,6 +297,7 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk, > struct net *net = sock_net(sk); > if (sk->sk_family == AF_INET) { > struct sockaddr_in *addr = (struct sockaddr_in *) uaddr; > + u32 tb_id = RT_TABLE_LOCAL; > int chk_addr_ret; > > if (addr_len < sizeof(*addr)) > @@ -310,7 +311,15 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk, > pr_debug("ping_check_bind_addr(sk=%p,addr=%pI4,port=%d)\n", > sk, &addr->sin_addr.s_addr, ntohs(addr->sin_port)); > > - chk_addr_ret = inet_addr_type(net, addr->sin_addr.s_addr); > + if (sk->sk_bound_dev_if) { that's the key point - checking sk->sk_bound_dev_if. > + tb_id = l3mdev_fib_table_by_index(net, > + sk->sk_bound_dev_if); > + if (!tb_id) > + tb_id = RT_TABLE_LOCAL; > + } > + > + chk_addr_ret = inet_addr_type_table(net, addr->sin_addr.s_addr, > + tb_id); > > if (addr->sin_addr.s_addr == htonl(INADDR_ANY)) > chk_addr_ret = RTN_LOCAL; > From a scan of the ipv4/ping.c code, ping_v4_sendmsg also does not acknowledge sk->sk_bound_dev_if. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net-icmp: make icmp{,v6} (ping) sockets available to all by default 2020-05-09 21:20 ` David Ahern @ 2020-05-09 21:35 ` Maciej Żenczykowski 2020-05-10 1:24 ` David Ahern 0 siblings, 1 reply; 10+ messages in thread From: Maciej Żenczykowski @ 2020-05-09 21:35 UTC (permalink / raw) To: David Ahern Cc: Ido Schimmel, David S . Miller, Linux Network Development Mailing List Do we have some sort of beginner's introduction to Linux VRF somewhere? What they are? How to use them? Currently the concept simply doesn't fit into my mental model of networking... We've actually talked about maybe possibly using VRF's in Android (for our multi network support)... but no-one on our team has the faintest idea about how they work... (and there's rumours that they don't work with ipv6 link local) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net-icmp: make icmp{,v6} (ping) sockets available to all by default 2020-05-09 21:35 ` Maciej Żenczykowski @ 2020-05-10 1:24 ` David Ahern 2020-05-10 5:15 ` Maciej Żenczykowski 0 siblings, 1 reply; 10+ messages in thread From: David Ahern @ 2020-05-10 1:24 UTC (permalink / raw) To: Maciej Żenczykowski Cc: Ido Schimmel, David S . Miller, Linux Network Development Mailing List On 5/9/20 3:35 PM, Maciej Żenczykowski wrote: > Do we have some sort of beginner's introduction to Linux VRF somewhere? > What they are? How to use them? Ido's response gave introductory commands which can also be found here: https://www.kernel.org/doc/Documentation/networking/vrf.txt This should answer most questions about more advanced topics: http://schd.ws/hosted_files/ossna2017/fe/vrf-tutorial-oss.pdf Lately, I am putting blogs on https://people.kernel.org/dsahern for recurring questions. > > Currently the concept simply doesn't fit into my mental model of networking... network namespaces = device level separation and up VRF = Layer 3 and up separation > > We've actually talked about maybe possibly using VRF's in Android (for > our multi network support)... > but no-one on our team has the faintest idea about how they work... > (and there's rumours that they don't work with ipv6 link local) > Rumors are ugly. If in doubt, ask. LLA with VRF is a primary requirement from the beginning. With 5.3 and up, you can have IPv4 routes with IPv6 LLA gateways with and without VRFs. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net-icmp: make icmp{,v6} (ping) sockets available to all by default 2020-05-10 1:24 ` David Ahern @ 2020-05-10 5:15 ` Maciej Żenczykowski 2020-05-10 15:29 ` David Ahern 0 siblings, 1 reply; 10+ messages in thread From: Maciej Żenczykowski @ 2020-05-10 5:15 UTC (permalink / raw) To: David Ahern Cc: Ido Schimmel, David S . Miller, Linux Network Development Mailing List > Ido's response gave introductory commands which can also be found here: > https://www.kernel.org/doc/Documentation/networking/vrf.txt > > This should answer most questions about more advanced topics: > http://schd.ws/hosted_files/ossna2017/fe/vrf-tutorial-oss.pdf > > Lately, I am putting blogs on https://people.kernel.org/dsahern for > recurring questions. Thanks for that - I'll give it a look. > Rumors are ugly. If in doubt, ask. LLA with VRF is a primary requirement > from the beginning. LLA? Link Level Aggregation? Not sure what that has to do with VRF though... that's just bonding/teaming?? and seems to work fine without VRF at a (at least to me conceptually) even lower layer. > With 5.3 and up, you can have IPv4 routes with IPv6 LLA gateways with > and without VRFs. Ah, see... the latest phone hardware that I still don't have access to (mostly because of covid), is only 4.19 based (as such doing dev work on 4.14 or VMs, but getting wifi/cell emulation and ipv6 working right in VMs is very very hard, though we're making slow progress). 5.4 hardware is probably 10 months out (assuming things return to normal). We're always ~2 years behind, and playing catchup... :-( And I recently switched some of our servers (those were stuck on 4.3 until very recently) to use IPv4 egress routing via IPv6 ND, that's a great improvement (and came at just the right time), but again I guess I don't see how VRF fits in to the picture - this seems to be just a use NDv6 for foo instead of ARP for bar to figure out dst mac type of thing... Obviously I haven't read your links, so perhaps all my questions will be answered. (I'm rambling, mostly writing this email to thank you for the links) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] net-icmp: make icmp{,v6} (ping) sockets available to all by default 2020-05-10 5:15 ` Maciej Żenczykowski @ 2020-05-10 15:29 ` David Ahern 0 siblings, 0 replies; 10+ messages in thread From: David Ahern @ 2020-05-10 15:29 UTC (permalink / raw) To: Maciej Żenczykowski Cc: Ido Schimmel, David S . Miller, Linux Network Development Mailing List On 5/9/20 11:15 PM, Maciej Żenczykowski wrote: >> Rumors are ugly. If in doubt, ask. LLA with VRF is a primary requirement >> from the beginning. > > LLA? Link Level Aggregation? Link Local Address ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-05-10 15:29 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-08 23:42 [PATCH] net-icmp: make icmp{,v6} (ping) sockets available to all by default Maciej Żenczykowski 2020-05-09 19:15 ` Ido Schimmel 2020-05-09 19:17 ` Maciej Żenczykowski 2020-05-09 19:32 ` Ido Schimmel 2020-05-09 21:09 ` David Ahern 2020-05-09 21:20 ` David Ahern 2020-05-09 21:35 ` Maciej Żenczykowski 2020-05-10 1:24 ` David Ahern 2020-05-10 5:15 ` Maciej Żenczykowski 2020-05-10 15:29 ` David Ahern
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.