All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net 0/5] icmp: account for NAT when sending icmps from ndo layer
@ 2020-02-10 14:14 Jason A. Donenfeld
  2020-02-10 14:14 ` [PATCH v2 net 1/5] icmp: introduce helper for NAT'd source address in network device context Jason A. Donenfeld
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-10 14:14 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, netfilter-devel

The ICMP routines use the source address for two reasons:

1. Rate-limiting ICMP transmissions based on source address, so
   that one source address cannot provoke a flood of replies. If
   the source address is wrong, the rate limiting will be
   incorrectly applied.

2. Choosing the interface and hence new source address of the
   generated ICMP packet. If the original packet source address
   is wrong, ICMP replies will be sent from the wrong source
   address, resulting in either a misdelivery, infoleak, or just
   general network admin confusion.

Most of the time, the icmp_send and icmpv6_send routines can just reach
down into the skb's IP header to determine the saddr. However, if
icmp_send or icmpv6_send is being called from a network device driver --
there are a few in the tree -- then it's possible that by the time
icmp_send or icmpv6_send looks at the packet, the packet's source
address has already been transformed by SNAT or MASQUERADE or some other
transformation that CONNTRACK knows about. In this case, the packet's
source address is most certainly the *wrong* source address to be used
for the purpose of ICMP replies.

Rather, the source address we want to use for ICMP replies is the
original one, from before the transformation occurred.

Fortunately, it's very easy to just ask CONNTRACK if it knows about this
packet, and if so, how to fix it up. The saddr is the only field in the
header we need to fix up, for the purposes of the subsequent processing
in the icmp_send and icmpv6_send functions, so we do the lookup very
early on, so that the rest of the ICMP machinery can progress as usual.

Changes v1->v2:
- icmpv6 takes subtly different types than icmpv4, like u32 instead of be32,
  u8 instead of int.
- Since we're technically writing to the skb, we need to make sure it's not
  a shared one [Dave, 2017].
- Restore the original skb data after icmp_send returns. All current users
  are freeing the packet right after, so it doesn't matter, but future users
  might not.
- Remove superfluous route lookup in sunvnet [Dave].
- Use NF_NAT instead of NF_CONNTRACK for condition [Florian].
- Include this cover letter [Dave].

Cc: netdev@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org

Jason A. Donenfeld (5):
  icmp: introduce helper for NAT'd source address in network device
    context
  gtp: use icmp_ndo_send helper
  sunvnet: use icmp_ndo_send helper
  wireguard: use icmp_ndo_send helper
  xfrm: interface: use icmp_ndo_send helper

 drivers/net/ethernet/sun/sunvnet_common.c | 23 +++--------------
 drivers/net/gtp.c                         |  4 +--
 drivers/net/wireguard/device.c            |  4 +--
 include/linux/icmpv6.h                    |  6 +++++
 include/net/icmp.h                        |  6 +++++
 net/ipv4/icmp.c                           | 29 ++++++++++++++++++++++
 net/ipv6/ip6_icmp.c                       | 30 +++++++++++++++++++++++
 net/xfrm/xfrm_interface.c                 |  6 ++---
 8 files changed, 82 insertions(+), 26 deletions(-)

-- 
2.25.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 net 1/5] icmp: introduce helper for NAT'd source address in network device context
  2020-02-10 14:14 [PATCH v2 net 0/5] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
@ 2020-02-10 14:14 ` Jason A. Donenfeld
  2020-02-10 19:48   ` Jason A. Donenfeld
  2020-02-11 15:00   ` [PATCH v3 net 0/9] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
  2020-02-10 14:14 ` [PATCH v2 net 2/5] gtp: use icmp_ndo_send helper Jason A. Donenfeld
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-10 14:14 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, Florian Westphal

This introduces a helper function to be called only by network drivers
that wraps calls to icmp[v6]_send in a conntrack transformation, in case
NAT has been used. The transformation happens only on a non-shared skb,
and the skb is fixed back up to its original state after, in case the
calling code continues to use it.

We don't want to pollute the non-driver path, though, so we introduce
this as a helper to be called by places that actually make use of
this, as suggested by Florian.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Florian Westphal <fw@strlen.de>
---
 include/linux/icmpv6.h |  6 ++++++
 include/net/icmp.h     |  6 ++++++
 net/ipv4/icmp.c        | 29 +++++++++++++++++++++++++++++
 net/ipv6/ip6_icmp.c    | 30 ++++++++++++++++++++++++++++++
 4 files changed, 71 insertions(+)

diff --git a/include/linux/icmpv6.h b/include/linux/icmpv6.h
index ef1cbb5f454f..93338fd54af8 100644
--- a/include/linux/icmpv6.h
+++ b/include/linux/icmpv6.h
@@ -31,6 +31,12 @@ static inline void icmpv6_send(struct sk_buff *skb,
 }
 #endif
 
+#if IS_ENABLED(CONFIG_NF_NAT)
+void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info);
+#else
+#define icmpv6_ndo_send icmpv6_send
+#endif
+
 extern int				icmpv6_init(void);
 extern int				icmpv6_err_convert(u8 type, u8 code,
 							   int *err);
diff --git a/include/net/icmp.h b/include/net/icmp.h
index 5d4bfdba9adf..9ac2d2672a93 100644
--- a/include/net/icmp.h
+++ b/include/net/icmp.h
@@ -43,6 +43,12 @@ static inline void icmp_send(struct sk_buff *skb_in, int type, int code, __be32
 	__icmp_send(skb_in, type, code, info, &IPCB(skb_in)->opt);
 }
 
+#if IS_ENABLED(CONFIG_NF_NAT)
+void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info);
+#else
+#define icmp_ndo_send icmp_send
+#endif
+
 int icmp_rcv(struct sk_buff *skb);
 int icmp_err(struct sk_buff *skb, u32 info);
 int icmp_init(void);
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 18068ed42f25..5ca36181d4f4 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -748,6 +748,35 @@ out:;
 }
 EXPORT_SYMBOL(__icmp_send);
 
+#if IS_ENABLED(CONFIG_NF_NAT)
+#include <net/netfilter/nf_conntrack.h>
+void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
+{
+	struct sk_buff *cloned_skb = NULL;
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+	__be32 orig_ip;
+
+	ct = nf_ct_get(skb_in, &ctinfo);
+	if (ct) {
+		if (skb_shared(skb_in)) {
+			skb_in = cloned_skb = skb_clone(skb_in, GFP_ATOMIC);
+			if (unlikely(!skb_in))
+				return;
+		}
+		orig_ip = ip_hdr(skb_in)->saddr;
+		ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip;
+	}
+	icmp_send(skb_in, type, code, info);
+	if (ct) {
+		if (cloned_skb)
+			consume_skb(cloned_skb);
+		else
+			ip_hdr(skb_in)->saddr = orig_ip;
+	}
+}
+EXPORT_SYMBOL(icmp_ndo_send);
+#endif
 
 static void icmp_socket_deliver(struct sk_buff *skb, u32 info)
 {
diff --git a/net/ipv6/ip6_icmp.c b/net/ipv6/ip6_icmp.c
index 02045494c24c..ee364d61b789 100644
--- a/net/ipv6/ip6_icmp.c
+++ b/net/ipv6/ip6_icmp.c
@@ -45,4 +45,34 @@ void icmpv6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info)
 	rcu_read_unlock();
 }
 EXPORT_SYMBOL(icmpv6_send);
+
+#if IS_ENABLED(CONFIG_NF_NAT)
+#include <net/netfilter/nf_conntrack.h>
+void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info)
+{
+	struct sk_buff *cloned_skb = NULL;
+	enum ip_conntrack_info ctinfo;
+	struct in6_addr orig_ip;
+	struct nf_conn *ct;
+
+	ct = nf_ct_get(skb_in, &ctinfo);
+	if (ct) {
+		if (skb_shared(skb_in)) {
+			skb_in = cloned_skb = skb_clone(skb_in, GFP_ATOMIC);
+			if (unlikely(!skb_in))
+				return;
+		}
+		orig_ip = ipv6_hdr(skb_in)->saddr;
+		ipv6_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.in6;
+	}
+	icmpv6_send(skb_in, type, code, info);
+	if (ct) {
+		if (cloned_skb)
+			consume_skb(cloned_skb);
+		else
+			ipv6_hdr(skb_in)->saddr = orig_ip;
+	}
+}
+EXPORT_SYMBOL(icmpv6_ndo_send);
+#endif
 #endif
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 net 2/5] gtp: use icmp_ndo_send helper
  2020-02-10 14:14 [PATCH v2 net 0/5] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
  2020-02-10 14:14 ` [PATCH v2 net 1/5] icmp: introduce helper for NAT'd source address in network device context Jason A. Donenfeld
@ 2020-02-10 14:14 ` Jason A. Donenfeld
  2020-02-10 14:14 ` [PATCH v2 net 3/5] sunvnet: " Jason A. Donenfeld
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-10 14:14 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, Harald Welte

Because gtp is calling icmp from network device context, it should use
the ndo helper so that the rate limiting applies correctly.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Harald Welte <laforge@gnumonks.org>
---
 drivers/net/gtp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index af07ea760b35..672cd2caf2fb 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -546,8 +546,8 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
 	    mtu < ntohs(iph->tot_len)) {
 		netdev_dbg(dev, "packet too big, fragmentation needed\n");
 		memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
-		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
-			  htonl(mtu));
+		icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
+			      htonl(mtu));
 		goto err_rt;
 	}
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 net 3/5] sunvnet: use icmp_ndo_send helper
  2020-02-10 14:14 [PATCH v2 net 0/5] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
  2020-02-10 14:14 ` [PATCH v2 net 1/5] icmp: introduce helper for NAT'd source address in network device context Jason A. Donenfeld
  2020-02-10 14:14 ` [PATCH v2 net 2/5] gtp: use icmp_ndo_send helper Jason A. Donenfeld
@ 2020-02-10 14:14 ` Jason A. Donenfeld
  2020-02-10 14:14 ` [PATCH v2 net 4/5] wireguard: " Jason A. Donenfeld
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-10 14:14 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, Shannon Nelson

Because sunvnet is calling icmp from network device context, it should use
the ndo helper so that the rate limiting applies correctly. While we're
at it, doing the additional route lookup before calling icmp_ndo_send is
superfluous, since this is the job of the icmp code in the first place.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Shannon Nelson <shannon.nelson@oracle.com>
---
 drivers/net/ethernet/sun/sunvnet_common.c | 23 ++++-------------------
 1 file changed, 4 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunvnet_common.c b/drivers/net/ethernet/sun/sunvnet_common.c
index c23ce838ff63..8dc6c9ff22e1 100644
--- a/drivers/net/ethernet/sun/sunvnet_common.c
+++ b/drivers/net/ethernet/sun/sunvnet_common.c
@@ -1350,27 +1350,12 @@ sunvnet_start_xmit_common(struct sk_buff *skb, struct net_device *dev,
 		if (vio_version_after_eq(&port->vio, 1, 3))
 			localmtu -= VLAN_HLEN;
 
-		if (skb->protocol == htons(ETH_P_IP)) {
-			struct flowi4 fl4;
-			struct rtable *rt = NULL;
-
-			memset(&fl4, 0, sizeof(fl4));
-			fl4.flowi4_oif = dev->ifindex;
-			fl4.flowi4_tos = RT_TOS(ip_hdr(skb)->tos);
-			fl4.daddr = ip_hdr(skb)->daddr;
-			fl4.saddr = ip_hdr(skb)->saddr;
-
-			rt = ip_route_output_key(dev_net(dev), &fl4);
-			if (!IS_ERR(rt)) {
-				skb_dst_set(skb, &rt->dst);
-				icmp_send(skb, ICMP_DEST_UNREACH,
-					  ICMP_FRAG_NEEDED,
-					  htonl(localmtu));
-			}
-		}
+		if (skb->protocol == htons(ETH_P_IP))
+			icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
+				      htonl(localmtu));
 #if IS_ENABLED(CONFIG_IPV6)
 		else if (skb->protocol == htons(ETH_P_IPV6))
-			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, localmtu);
+			icmpv6_ndo_send(skb, ICMPV6_PKT_TOOBIG, 0, localmtu);
 #endif
 		goto out_dropped;
 	}
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 net 4/5] wireguard: use icmp_ndo_send helper
  2020-02-10 14:14 [PATCH v2 net 0/5] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
                   ` (2 preceding siblings ...)
  2020-02-10 14:14 ` [PATCH v2 net 3/5] sunvnet: " Jason A. Donenfeld
@ 2020-02-10 14:14 ` Jason A. Donenfeld
  2020-02-10 14:14 ` [PATCH v2 net 5/5] xfrm: interface: " Jason A. Donenfeld
  2020-02-10 19:30 ` [PATCH v2 net 6/5] wireguard: selftests: ensure that icmp src address is correct with NAT Jason A. Donenfeld
  5 siblings, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-10 14:14 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld

Because wireguard is calling icmp from network device context, it should use
the ndo helper so that the rate limiting applies correctly.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 drivers/net/wireguard/device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c
index 16b19824b9ad..43db442b1373 100644
--- a/drivers/net/wireguard/device.c
+++ b/drivers/net/wireguard/device.c
@@ -203,9 +203,9 @@ static netdev_tx_t wg_xmit(struct sk_buff *skb, struct net_device *dev)
 err:
 	++dev->stats.tx_errors;
 	if (skb->protocol == htons(ETH_P_IP))
-		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
+		icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
 	else if (skb->protocol == htons(ETH_P_IPV6))
-		icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0);
+		icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0);
 	kfree_skb(skb);
 	return ret;
 }
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 net 5/5] xfrm: interface: use icmp_ndo_send helper
  2020-02-10 14:14 [PATCH v2 net 0/5] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
                   ` (3 preceding siblings ...)
  2020-02-10 14:14 ` [PATCH v2 net 4/5] wireguard: " Jason A. Donenfeld
@ 2020-02-10 14:14 ` Jason A. Donenfeld
  2020-02-10 19:30 ` [PATCH v2 net 6/5] wireguard: selftests: ensure that icmp src address is correct with NAT Jason A. Donenfeld
  5 siblings, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-10 14:14 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, Nicolas Dichtel, Steffen Klassert

Because xfrmi is calling icmp from network device context, it should use
the ndo helper so that the rate limiting applies correctly.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_interface.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
index dc651a628dcf..3361e3ac5714 100644
--- a/net/xfrm/xfrm_interface.c
+++ b/net/xfrm/xfrm_interface.c
@@ -300,10 +300,10 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
 			if (mtu < IPV6_MIN_MTU)
 				mtu = IPV6_MIN_MTU;
 
-			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+			icmpv6_ndo_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
 		} else {
-			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
-				  htonl(mtu));
+			icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
+				      htonl(mtu));
 		}
 
 		dst_release(dst);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 net 6/5] wireguard: selftests: ensure that icmp src address is correct with NAT
  2020-02-10 14:14 [PATCH v2 net 0/5] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
                   ` (4 preceding siblings ...)
  2020-02-10 14:14 ` [PATCH v2 net 5/5] xfrm: interface: " Jason A. Donenfeld
@ 2020-02-10 19:30 ` Jason A. Donenfeld
  5 siblings, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-10 19:30 UTC (permalink / raw)
  To: davem, netdev; +Cc: Jason A. Donenfeld

This is a small test to ensure that icmp_ndo_send is actually doing the
right with with regards to the source address. It measure this by
ensuring that there are a sufficient number of non-errors returned in a
row, which should be impossible with proper rate limiting.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Here's a test for the WireGuard path of the series I submitted earlier
today. This test correctly fails when using the old code, and succeeds
when using the new code. If the "6/5" stupidity disrupts patchwork, no
need to respond, and I'll just resubmit this later.

 tools/testing/selftests/wireguard/netns.sh | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tools/testing/selftests/wireguard/netns.sh b/tools/testing/selftests/wireguard/netns.sh
index f5ab1cda8bb5..4e31d5b1bf7f 100755
--- a/tools/testing/selftests/wireguard/netns.sh
+++ b/tools/testing/selftests/wireguard/netns.sh
@@ -297,7 +297,17 @@ ip1 -4 rule add table main suppress_prefixlength 0
 n1 ping -W 1 -c 100 -f 192.168.99.7
 n1 ping -W 1 -c 100 -f abab::1111
 
+# Have ns2 NAT into wg0 packets from ns0, but return an icmp error along the right route.
+n2 iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -d 192.168.241.0/24 -j SNAT --to 192.168.241.2
+n0 iptables -t filter -A INPUT \! -s 10.0.0.0/24 -i vethrs -j DROP # Manual rpfilter just to be explicit.
+n2 bash -c 'printf 1 > /proc/sys/net/ipv4/ip_forward'
+ip0 -4 route add 192.168.241.1 via 10.0.0.100
+n2 wg set wg0 peer "$pub1" remove
+[[ $(! n0 ping -W 1 -c 1 192.168.241.1 || false) == *"From 10.0.0.100 icmp_seq=1 Destination Host Unreachable"* ]]
+
 n0 iptables -t nat -F
+n0 iptables -t filter -F
+n2 iptables -t nat -F
 ip0 link del vethrc
 ip0 link del vethrs
 ip1 link del wg0
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 net 1/5] icmp: introduce helper for NAT'd source address in network device context
  2020-02-10 14:14 ` [PATCH v2 net 1/5] icmp: introduce helper for NAT'd source address in network device context Jason A. Donenfeld
@ 2020-02-10 19:48   ` Jason A. Donenfeld
  2020-02-10 21:32     ` Florian Westphal
  2020-02-11 15:00   ` [PATCH v3 net 0/9] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
  1 sibling, 1 reply; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-10 19:48 UTC (permalink / raw)
  To: Netdev, David Miller; +Cc: Florian Westphal

On Mon, Feb 10, 2020 at 3:15 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> +               ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip;
> +       }
> +       icmp_send(skb_in, type, code, info);

According to the comments in icmp_send, access to
ip_hdr(skb_in)->saddr requires first checking for `if
(skb_network_header(skb_in) < skb_in->head ||
(skb_network_header(skb_in) + sizeof(struct iphdr)) >
skb_tail_pointer(skb_in))` first to be safe. So, I'll fix this up for
v3, but will wait some time in case there are additional comments.

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 net 1/5] icmp: introduce helper for NAT'd source address in network device context
  2020-02-10 19:48   ` Jason A. Donenfeld
@ 2020-02-10 21:32     ` Florian Westphal
  2020-02-10 21:59       ` Jason A. Donenfeld
  2020-02-11 12:25       ` Jason A. Donenfeld
  0 siblings, 2 replies; 27+ messages in thread
From: Florian Westphal @ 2020-02-10 21:32 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Netdev, David Miller, Florian Westphal

Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> On Mon, Feb 10, 2020 at 3:15 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > +               ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip;
> > +       }
> > +       icmp_send(skb_in, type, code, info);
> 
> According to the comments in icmp_send, access to
> ip_hdr(skb_in)->saddr requires first checking for `if
> (skb_network_header(skb_in) < skb_in->head ||
> (skb_network_header(skb_in) + sizeof(struct iphdr)) >
> skb_tail_pointer(skb_in))` first to be safe.

You will probably also need skb_ensure_writable() to handle cloned skbs.

I also suggest to check "ct->status & IPS_NAT_MASK", nat is only done if
those bits are set.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 net 1/5] icmp: introduce helper for NAT'd source address in network device context
  2020-02-10 21:32     ` Florian Westphal
@ 2020-02-10 21:59       ` Jason A. Donenfeld
  2020-02-10 22:26         ` Florian Westphal
  2020-02-11 12:25       ` Jason A. Donenfeld
  1 sibling, 1 reply; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-10 21:59 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Netdev, David Miller

Hi Florian,

On Mon, Feb 10, 2020 at 10:33 PM Florian Westphal <fw@strlen.de> wrote:
>
> Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > On Mon, Feb 10, 2020 at 3:15 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > > +               ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip;
> > > +       }
> > > +       icmp_send(skb_in, type, code, info);
> >
> > According to the comments in icmp_send, access to
> > ip_hdr(skb_in)->saddr requires first checking for `if
> > (skb_network_header(skb_in) < skb_in->head ||
> > (skb_network_header(skb_in) + sizeof(struct iphdr)) >
> > skb_tail_pointer(skb_in))` first to be safe.
>
> You will probably also need skb_ensure_writable() to handle cloned skbs.
>
> I also suggest to check "ct->status & IPS_NAT_MASK", nat is only done if
> those bits are set.

Thanks for the suggestions. I've made these changes and they're queued
up for a v3, currently staged in wireguard-linux.git's stable branch:
https://git.zx2c4.com/wireguard-linux/log/?h=stable

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 net 1/5] icmp: introduce helper for NAT'd source address in network device context
  2020-02-10 21:59       ` Jason A. Donenfeld
@ 2020-02-10 22:26         ` Florian Westphal
  2020-02-11  9:59           ` Jason A. Donenfeld
  0 siblings, 1 reply; 27+ messages in thread
From: Florian Westphal @ 2020-02-10 22:26 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Florian Westphal, Netdev, David Miller

Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hi Florian,
> 
> On Mon, Feb 10, 2020 at 10:33 PM Florian Westphal <fw@strlen.de> wrote:
> >
> > Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > > On Mon, Feb 10, 2020 at 3:15 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > > > +               ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip;
> > > > +       }
> > > > +       icmp_send(skb_in, type, code, info);
> > >
> > > According to the comments in icmp_send, access to
> > > ip_hdr(skb_in)->saddr requires first checking for `if
> > > (skb_network_header(skb_in) < skb_in->head ||
> > > (skb_network_header(skb_in) + sizeof(struct iphdr)) >
> > > skb_tail_pointer(skb_in))` first to be safe.
> >
> > You will probably also need skb_ensure_writable() to handle cloned skbs.
> >
> > I also suggest to check "ct->status & IPS_NAT_MASK", nat is only done if
> > those bits are set.
> 
> Thanks for the suggestions. I've made these changes and they're queued
> up for a v3, currently staged in wireguard-linux.git's stable branch:
> https://git.zx2c4.com/wireguard-linux/log/?h=stable

I think this is a bit too conservative, f.e. i don't see how
ndo-called skbs could be shared (tx path needs to be able to change skb
list pointers)?

If its needed it looks ok.

Otherwise, I would suggest something like this:

void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
{
   enum ip_conntrack_info ctinfo;
   struct nf_conn *ct;
   __be32 orig_ip;

   ct = nf_ct_get(skb_in, &ctinfo);
   if (!ct || ((ct->status & IPS_NAT_MASK) == 0) {
      icmp_send(skb_in, type, code, info);
      return;
    }

   /* avoid reallocations */
   if (skb_network_header(skb_in) < skb_in->head ||
       (skb_network_header(skb_in) + sizeof(struct iphdr)) >
        skb_tail_pointer(skb_in))
       return;

    /* handle clones. NB: if reallocations are to be avoided, then
     * if (skb_cloned(skb_in) &&
     * !skb_clone_writable(skb_in, skb_network_offset(skb_in) + iphlen))
     *
     * ... should be placed here instead:
     */
    if (unlikely(skb_ensure_writable(skb_in,
                 skb_network_offset(skb_in) + sizeof(struct iphdr))))
        return;

    orig_ip = ip_hdr(skb_in)->saddr;
    ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip;
    icmp_send(skb_in, type, code, info);
    ip_hdr(skb_in)->saddr = orig_ip;
}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 net 1/5] icmp: introduce helper for NAT'd source address in network device context
  2020-02-10 22:26         ` Florian Westphal
@ 2020-02-11  9:59           ` Jason A. Donenfeld
  0 siblings, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-11  9:59 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Netdev, David Miller

On Mon, Feb 10, 2020 at 11:33 PM Florian Westphal <fw@strlen.de> wrote:
> I think this is a bit too conservative, f.e. i don't see how
> ndo-called skbs could be shared (tx path needs to be able to change skb
> list pointers)?

Are you *sure* about that? Dave - do you know?

I'm asking with *asterisks* because I see a few drivers in tree that
do call skb_share_check from their ndo_start_xmit function. If this
makes no sense and isn't needed, then I'll send a cleanup series for
these. And, I'll remove it from icmp_ndo_send for v3, replacing it
with a `if (WARN_ON(skb_shared(skb_in))) return;`.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 net 1/5] icmp: introduce helper for NAT'd source address in network device context
  2020-02-10 21:32     ` Florian Westphal
  2020-02-10 21:59       ` Jason A. Donenfeld
@ 2020-02-11 12:25       ` Jason A. Donenfeld
  1 sibling, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-11 12:25 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Netdev, David Miller

On Mon, Feb 10, 2020 at 10:33 PM Florian Westphal <fw@strlen.de> wrote:
> I also suggest to check "ct->status & IPS_NAT_MASK", nat is only done if
> those bits are set.

We can optimize even further, because we only care about the IPS_SRC_NAT case.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 net 0/9] icmp: account for NAT when sending icmps from ndo layer
  2020-02-10 14:14 ` [PATCH v2 net 1/5] icmp: introduce helper for NAT'd source address in network device context Jason A. Donenfeld
  2020-02-10 19:48   ` Jason A. Donenfeld
@ 2020-02-11 15:00   ` Jason A. Donenfeld
  2020-02-11 15:00     ` [PATCH v3 net 1/9] icmp: introduce helper for nat'd source address in network device context Jason A. Donenfeld
                       ` (8 more replies)
  1 sibling, 9 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-11 15:00 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, netfilter-devel

The ICMP routines use the source address for two reasons:

1. Rate-limiting ICMP transmissions based on source address, so
   that one source address cannot provoke a flood of replies. If
   the source address is wrong, the rate limiting will be
   incorrectly applied.

2. Choosing the interface and hence new source address of the
   generated ICMP packet. If the original packet source address
   is wrong, ICMP replies will be sent from the wrong source
   address, resulting in either a misdelivery, infoleak, or just
   general network admin confusion.

Most of the time, the icmp_send and icmpv6_send routines can just reach
down into the skb's IP header to determine the saddr. However, if
icmp_send or icmpv6_send is being called from a network device driver --
there are a few in the tree -- then it's possible that by the time
icmp_send or icmpv6_send looks at the packet, the packet's source
address has already been transformed by SNAT or MASQUERADE or some other
transformation that CONNTRACK knows about. In this case, the packet's
source address is most certainly the *wrong* source address to be used
for the purpose of ICMP replies.

Rather, the source address we want to use for ICMP replies is the
original one, from before the transformation occurred.

Fortunately, it's very easy to just ask CONNTRACK if it knows about this
packet, and if so, how to fix it up. The saddr is the only field in the
header we need to fix up, for the purposes of the subsequent processing
in the icmp_send and icmpv6_send functions, so we do the lookup very
early on, so that the rest of the ICMP machinery can progress as usual.

Changes v2->v3:
- Add selftest to ensure this actually does what we want and never regresses.
- Check the size of the skb header before operating on it.
- Use skb_ensure_writable to ensure we can modify the cloned skb [Florian].
- Conditionalize this on IPS_SRC_NAT so we don't do anything unnecessarily
  [Florian].
- It turns out that since we're calling these from the xmit path,
  skb_share_check isn't required, so remove that [Florian]. This simplifes the
  code a bit too. **The supposition here is that skbs passed to ndo_start_xmit
  are _never_ shared. If this is not correct NOW IS THE TIME TO PIPE UP, for
  doom awaits us later.**
- While investigating the shared skb business, several drivers appeared to be
  calling it incorrectly in the xmit path, so this series also removes those
  unnecessary calls, based on the supposition mentioned in the previous point.

Changes v1->v2:
- icmpv6 takes subtly different types than icmpv4, like u32 instead of be32,
  u8 instead of int.
- Since we're technically writing to the skb, we need to make sure it's not
  a shared one [Dave, 2017].
- Restore the original skb data after icmp_send returns. All current users
  are freeing the packet right after, so it doesn't matter, but future users
  might not.
- Remove superfluous route lookup in sunvnet [Dave].
- Use NF_NAT instead of NF_CONNTRACK for condition [Florian].
- Include this cover letter [Dave].

Cc: netdev@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org

Jason A. Donenfeld (9):
  icmp: introduce helper for nat'd source address in network device
    context
  gtp: use icmp_ndo_send helper
  sunvnet: use icmp_ndo_send helper
  xfrm: interface: use icmp_ndo_send helper
  wireguard: device: use icmp_ndo_send helper and remove skb_share_check
  firewire: remove skb_share_check from xmit path
  ipvlan: remove skb_share_check from xmit path
  fm10k: remove skb_share_check from xmit path
  benet: remove skb_share_check from xmit path

 drivers/firewire/net.c                        |  4 ---
 drivers/net/ethernet/emulex/benet/be_main.c   |  4 ---
 .../net/ethernet/intel/fm10k/fm10k_netdev.c   |  5 ----
 drivers/net/ethernet/sun/sunvnet_common.c     | 23 +++------------
 drivers/net/gtp.c                             |  4 +--
 drivers/net/ipvlan/ipvlan_core.c              |  3 --
 drivers/net/wireguard/device.c                |  8 ++---
 include/linux/icmpv6.h                        |  6 ++++
 include/net/icmp.h                            |  6 ++++
 net/ipv4/icmp.c                               | 28 ++++++++++++++++++
 net/ipv6/ip6_icmp.c                           | 29 +++++++++++++++++++
 net/xfrm/xfrm_interface.c                     |  6 ++--
 tools/testing/selftests/wireguard/netns.sh    | 10 +++++++
 13 files changed, 90 insertions(+), 46 deletions(-)

-- 
2.25.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 net 1/9] icmp: introduce helper for nat'd source address in network device context
  2020-02-11 15:00   ` [PATCH v3 net 0/9] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
@ 2020-02-11 15:00     ` Jason A. Donenfeld
  2020-02-11 15:00     ` [PATCH v3 net 2/9] gtp: use icmp_ndo_send helper Jason A. Donenfeld
                       ` (7 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-11 15:00 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, Florian Westphal

This introduces a helper function to be called only by network drivers
that wraps calls to icmp[v6]_send in a conntrack transformation, in case
NAT has been used. The transformation happens only from a non-shared skb
context, and the skb is fixed back up to its original state after, in
case the calling code continues to use it.

We don't want to pollute the non-driver path, though, so we introduce
this as a helper to be called by places that actually make use of
this, as suggested by Florian.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Florian Westphal <fw@strlen.de>
---
 include/linux/icmpv6.h |  6 ++++++
 include/net/icmp.h     |  6 ++++++
 net/ipv4/icmp.c        | 28 ++++++++++++++++++++++++++++
 net/ipv6/ip6_icmp.c    | 29 +++++++++++++++++++++++++++++
 4 files changed, 69 insertions(+)

diff --git a/include/linux/icmpv6.h b/include/linux/icmpv6.h
index ef1cbb5f454f..93338fd54af8 100644
--- a/include/linux/icmpv6.h
+++ b/include/linux/icmpv6.h
@@ -31,6 +31,12 @@ static inline void icmpv6_send(struct sk_buff *skb,
 }
 #endif
 
+#if IS_ENABLED(CONFIG_NF_NAT)
+void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info);
+#else
+#define icmpv6_ndo_send icmpv6_send
+#endif
+
 extern int				icmpv6_init(void);
 extern int				icmpv6_err_convert(u8 type, u8 code,
 							   int *err);
diff --git a/include/net/icmp.h b/include/net/icmp.h
index 5d4bfdba9adf..9ac2d2672a93 100644
--- a/include/net/icmp.h
+++ b/include/net/icmp.h
@@ -43,6 +43,12 @@ static inline void icmp_send(struct sk_buff *skb_in, int type, int code, __be32
 	__icmp_send(skb_in, type, code, info, &IPCB(skb_in)->opt);
 }
 
+#if IS_ENABLED(CONFIG_NF_NAT)
+void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info);
+#else
+#define icmp_ndo_send icmp_send
+#endif
+
 int icmp_rcv(struct sk_buff *skb);
 int icmp_err(struct sk_buff *skb, u32 info);
 int icmp_init(void);
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 18068ed42f25..0cd23df4ad5c 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -748,6 +748,34 @@ out:;
 }
 EXPORT_SYMBOL(__icmp_send);
 
+#if IS_ENABLED(CONFIG_NF_NAT)
+#include <net/netfilter/nf_conntrack.h>
+void icmp_ndo_send(struct sk_buff *skb_in, int type, int code, __be32 info)
+{
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+	__be32 orig_ip;
+
+	ct = nf_ct_get(skb_in, &ctinfo);
+	if (!ct || !(ct->status & IPS_SRC_NAT)) {
+		icmp_send(skb_in, type, code, info);
+		return;
+	}
+
+	if (unlikely(WARN_ON(skb_shared(skb_in)) ||
+	    skb_network_header(skb_in) < skb_in->head ||
+	    (skb_network_header(skb_in) + sizeof(struct iphdr)) >
+	    skb_tail_pointer(skb_in) || skb_ensure_writable(skb_in,
+	    skb_network_offset(skb_in) + sizeof(struct iphdr))))
+		return;
+
+	orig_ip = ip_hdr(skb_in)->saddr;
+	ip_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.ip;
+	icmp_send(skb_in, type, code, info);
+	ip_hdr(skb_in)->saddr = orig_ip;
+}
+EXPORT_SYMBOL(icmp_ndo_send);
+#endif
 
 static void icmp_socket_deliver(struct sk_buff *skb, u32 info)
 {
diff --git a/net/ipv6/ip6_icmp.c b/net/ipv6/ip6_icmp.c
index 02045494c24c..00e94e1dffae 100644
--- a/net/ipv6/ip6_icmp.c
+++ b/net/ipv6/ip6_icmp.c
@@ -45,4 +45,33 @@ void icmpv6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info)
 	rcu_read_unlock();
 }
 EXPORT_SYMBOL(icmpv6_send);
+
+#if IS_ENABLED(CONFIG_NF_NAT)
+#include <net/netfilter/nf_conntrack.h>
+void icmpv6_ndo_send(struct sk_buff *skb_in, u8 type, u8 code, __u32 info)
+{
+	enum ip_conntrack_info ctinfo;
+	struct in6_addr orig_ip;
+	struct nf_conn *ct;
+
+	ct = nf_ct_get(skb_in, &ctinfo);
+	if (!ct || !(ct->status & IPS_SRC_NAT)) {
+		icmpv6_send(skb_in, type, code, info);
+		return;
+	}
+
+	if (unlikely(WARN_ON(skb_shared(skb_in)) ||
+	    skb_network_header(skb_in) < skb_in->head ||
+	    (skb_network_header(skb_in) + sizeof(struct ipv6hdr)) >
+	    skb_tail_pointer(skb_in) || skb_ensure_writable(skb_in,
+	    skb_network_offset(skb_in) + sizeof(struct ipv6hdr))))
+		return;
+
+	orig_ip = ipv6_hdr(skb_in)->saddr;
+	ipv6_hdr(skb_in)->saddr = ct->tuplehash[0].tuple.src.u3.in6;
+	icmpv6_send(skb_in, type, code, info);
+	ipv6_hdr(skb_in)->saddr = orig_ip;
+}
+EXPORT_SYMBOL(icmpv6_ndo_send);
+#endif
 #endif
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net 2/9] gtp: use icmp_ndo_send helper
  2020-02-11 15:00   ` [PATCH v3 net 0/9] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
  2020-02-11 15:00     ` [PATCH v3 net 1/9] icmp: introduce helper for nat'd source address in network device context Jason A. Donenfeld
@ 2020-02-11 15:00     ` Jason A. Donenfeld
  2020-02-11 15:00     ` [PATCH v3 net 3/9] sunvnet: " Jason A. Donenfeld
                       ` (6 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-11 15:00 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, Harald Welte

Because gtp is calling icmp from network device context, it should use
the ndo helper so that the rate limiting applies correctly.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Harald Welte <laforge@gnumonks.org>
---
 drivers/net/gtp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index af07ea760b35..672cd2caf2fb 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -546,8 +546,8 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
 	    mtu < ntohs(iph->tot_len)) {
 		netdev_dbg(dev, "packet too big, fragmentation needed\n");
 		memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
-		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
-			  htonl(mtu));
+		icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
+			      htonl(mtu));
 		goto err_rt;
 	}
 
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net 3/9] sunvnet: use icmp_ndo_send helper
  2020-02-11 15:00   ` [PATCH v3 net 0/9] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
  2020-02-11 15:00     ` [PATCH v3 net 1/9] icmp: introduce helper for nat'd source address in network device context Jason A. Donenfeld
  2020-02-11 15:00     ` [PATCH v3 net 2/9] gtp: use icmp_ndo_send helper Jason A. Donenfeld
@ 2020-02-11 15:00     ` Jason A. Donenfeld
  2020-02-11 15:00     ` [PATCH v3 net 4/9] xfrm: interface: " Jason A. Donenfeld
                       ` (5 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-11 15:00 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, Shannon Nelson

Because sunvnet is calling icmp from network device context, it should use
the ndo helper so that the rate limiting applies correctly. While we're
at it, doing the additional route lookup before calling icmp_ndo_send is
superfluous, since this is the job of the icmp code in the first place.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Shannon Nelson <shannon.nelson@oracle.com>
---
 drivers/net/ethernet/sun/sunvnet_common.c | 23 ++++-------------------
 1 file changed, 4 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunvnet_common.c b/drivers/net/ethernet/sun/sunvnet_common.c
index c23ce838ff63..8dc6c9ff22e1 100644
--- a/drivers/net/ethernet/sun/sunvnet_common.c
+++ b/drivers/net/ethernet/sun/sunvnet_common.c
@@ -1350,27 +1350,12 @@ sunvnet_start_xmit_common(struct sk_buff *skb, struct net_device *dev,
 		if (vio_version_after_eq(&port->vio, 1, 3))
 			localmtu -= VLAN_HLEN;
 
-		if (skb->protocol == htons(ETH_P_IP)) {
-			struct flowi4 fl4;
-			struct rtable *rt = NULL;
-
-			memset(&fl4, 0, sizeof(fl4));
-			fl4.flowi4_oif = dev->ifindex;
-			fl4.flowi4_tos = RT_TOS(ip_hdr(skb)->tos);
-			fl4.daddr = ip_hdr(skb)->daddr;
-			fl4.saddr = ip_hdr(skb)->saddr;
-
-			rt = ip_route_output_key(dev_net(dev), &fl4);
-			if (!IS_ERR(rt)) {
-				skb_dst_set(skb, &rt->dst);
-				icmp_send(skb, ICMP_DEST_UNREACH,
-					  ICMP_FRAG_NEEDED,
-					  htonl(localmtu));
-			}
-		}
+		if (skb->protocol == htons(ETH_P_IP))
+			icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
+				      htonl(localmtu));
 #if IS_ENABLED(CONFIG_IPV6)
 		else if (skb->protocol == htons(ETH_P_IPV6))
-			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, localmtu);
+			icmpv6_ndo_send(skb, ICMPV6_PKT_TOOBIG, 0, localmtu);
 #endif
 		goto out_dropped;
 	}
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net 4/9] xfrm: interface: use icmp_ndo_send helper
  2020-02-11 15:00   ` [PATCH v3 net 0/9] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
                       ` (2 preceding siblings ...)
  2020-02-11 15:00     ` [PATCH v3 net 3/9] sunvnet: " Jason A. Donenfeld
@ 2020-02-11 15:00     ` Jason A. Donenfeld
  2020-02-11 15:00     ` [PATCH v3 net 5/9] wireguard: device: use icmp_ndo_send helper and remove skb_share_check Jason A. Donenfeld
                       ` (4 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-11 15:00 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, Nicolas Dichtel, Steffen Klassert

Because xfrmi is calling icmp from network device context, it should use
the ndo helper so that the rate limiting applies correctly.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_interface.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
index dc651a628dcf..3361e3ac5714 100644
--- a/net/xfrm/xfrm_interface.c
+++ b/net/xfrm/xfrm_interface.c
@@ -300,10 +300,10 @@ xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
 			if (mtu < IPV6_MIN_MTU)
 				mtu = IPV6_MIN_MTU;
 
-			icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+			icmpv6_ndo_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
 		} else {
-			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
-				  htonl(mtu));
+			icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
+				      htonl(mtu));
 		}
 
 		dst_release(dst);
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net 5/9] wireguard: device: use icmp_ndo_send helper and remove skb_share_check
  2020-02-11 15:00   ` [PATCH v3 net 0/9] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
                       ` (3 preceding siblings ...)
  2020-02-11 15:00     ` [PATCH v3 net 4/9] xfrm: interface: " Jason A. Donenfeld
@ 2020-02-11 15:00     ` Jason A. Donenfeld
  2020-02-11 15:00     ` [PATCH v3 net 6/9] firewire: remove skb_share_check from xmit path Jason A. Donenfeld
                       ` (3 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-11 15:00 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld

Because wireguard is calling icmp from network device context, it should
use the ndo helper so that the rate limiting applies correctly. Also,
skb_share_check in the xmit path is an impossible condition to reach; an
skb in ndo_start_xmit won't be shared by definition.

This commit adds a small test to the wireguard test suite to ensure that
the new functions continue doing the right thing in the context of
wireguard. It does this by setting up a condition that will definately
evoke an icmp error message from the driver, but along a nat'd path.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Link: https://lore.kernel.org/netdev/CAHmME9pk8HEFRq_mBeatNbwXTx7UEfiQ_HG_+Lyz7E+80GmbSA@mail.gmail.com/
---
 drivers/net/wireguard/device.c             |  8 ++------
 tools/testing/selftests/wireguard/netns.sh | 10 ++++++++++
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c
index 16b19824b9ad..62cb7b106c52 100644
--- a/drivers/net/wireguard/device.c
+++ b/drivers/net/wireguard/device.c
@@ -167,10 +167,6 @@ static netdev_tx_t wg_xmit(struct sk_buff *skb, struct net_device *dev)
 	skb_list_walk_safe(skb, skb, next) {
 		skb_mark_not_on_list(skb);
 
-		skb = skb_share_check(skb, GFP_ATOMIC);
-		if (unlikely(!skb))
-			continue;
-
 		/* We only need to keep the original dst around for icmp,
 		 * so at this point we're in a position to drop it.
 		 */
@@ -203,9 +199,9 @@ static netdev_tx_t wg_xmit(struct sk_buff *skb, struct net_device *dev)
 err:
 	++dev->stats.tx_errors;
 	if (skb->protocol == htons(ETH_P_IP))
-		icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
+		icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
 	else if (skb->protocol == htons(ETH_P_IPV6))
-		icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0);
+		icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0);
 	kfree_skb(skb);
 	return ret;
 }
diff --git a/tools/testing/selftests/wireguard/netns.sh b/tools/testing/selftests/wireguard/netns.sh
index f5ab1cda8bb5..4e31d5b1bf7f 100755
--- a/tools/testing/selftests/wireguard/netns.sh
+++ b/tools/testing/selftests/wireguard/netns.sh
@@ -297,7 +297,17 @@ ip1 -4 rule add table main suppress_prefixlength 0
 n1 ping -W 1 -c 100 -f 192.168.99.7
 n1 ping -W 1 -c 100 -f abab::1111
 
+# Have ns2 NAT into wg0 packets from ns0, but return an icmp error along the right route.
+n2 iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -d 192.168.241.0/24 -j SNAT --to 192.168.241.2
+n0 iptables -t filter -A INPUT \! -s 10.0.0.0/24 -i vethrs -j DROP # Manual rpfilter just to be explicit.
+n2 bash -c 'printf 1 > /proc/sys/net/ipv4/ip_forward'
+ip0 -4 route add 192.168.241.1 via 10.0.0.100
+n2 wg set wg0 peer "$pub1" remove
+[[ $(! n0 ping -W 1 -c 1 192.168.241.1 || false) == *"From 10.0.0.100 icmp_seq=1 Destination Host Unreachable"* ]]
+
 n0 iptables -t nat -F
+n0 iptables -t filter -F
+n2 iptables -t nat -F
 ip0 link del vethrc
 ip0 link del vethrs
 ip1 link del wg0
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net 6/9] firewire: remove skb_share_check from xmit path
  2020-02-11 15:00   ` [PATCH v3 net 0/9] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
                       ` (4 preceding siblings ...)
  2020-02-11 15:00     ` [PATCH v3 net 5/9] wireguard: device: use icmp_ndo_send helper and remove skb_share_check Jason A. Donenfeld
@ 2020-02-11 15:00     ` Jason A. Donenfeld
  2020-02-11 15:00     ` [PATCH v3 net 7/9] ipvlan: " Jason A. Donenfeld
                       ` (2 subsequent siblings)
  8 siblings, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-11 15:00 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, stefanr

This is an impossible condition to reach; an skb in ndo_start_xmit won't
be shared by definition.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: stefanr@s5r6.in-berlin.de
Link: https://lore.kernel.org/netdev/CAHmME9pk8HEFRq_mBeatNbwXTx7UEfiQ_HG_+Lyz7E+80GmbSA@mail.gmail.com/
---
 drivers/firewire/net.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/firewire/net.c b/drivers/firewire/net.c
index 715e491dfbc3..c48eae922566 100644
--- a/drivers/firewire/net.c
+++ b/drivers/firewire/net.c
@@ -1259,10 +1259,6 @@ static netdev_tx_t fwnet_tx(struct sk_buff *skb, struct net_device *net)
 	if (ptask == NULL)
 		goto fail;
 
-	skb = skb_share_check(skb, GFP_ATOMIC);
-	if (!skb)
-		goto fail;
-
 	/*
 	 * Make a copy of the driver-specific header.
 	 * We might need to rebuild the header on tx failure.
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net 7/9] ipvlan: remove skb_share_check from xmit path
  2020-02-11 15:00   ` [PATCH v3 net 0/9] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
                       ` (5 preceding siblings ...)
  2020-02-11 15:00     ` [PATCH v3 net 6/9] firewire: remove skb_share_check from xmit path Jason A. Donenfeld
@ 2020-02-11 15:00     ` Jason A. Donenfeld
  2020-02-11 17:39       ` Eric Dumazet
  2020-02-11 15:00     ` [PATCH v3 net 8/9] fm10k: " Jason A. Donenfeld
  2020-02-11 15:00     ` [PATCH v3 net 9/9] benet: " Jason A. Donenfeld
  8 siblings, 1 reply; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-11 15:00 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, Mahesh Bandewar

This is an impossible condition to reach; an skb in ndo_start_xmit won't
be shared by definition.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Link: https://lore.kernel.org/netdev/CAHmME9pk8HEFRq_mBeatNbwXTx7UEfiQ_HG_+Lyz7E+80GmbSA@mail.gmail.com/
---
 drivers/net/ipvlan/ipvlan_core.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 30cd0c4f0be0..da40723065f2 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -605,9 +605,6 @@ static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
 				return ipvlan_rcv_frame(addr, &skb, true);
 			}
 		}
-		skb = skb_share_check(skb, GFP_ATOMIC);
-		if (!skb)
-			return NET_XMIT_DROP;
 
 		/* Packet definitely does not belong to any of the
 		 * virtual devices, but the dest is local. So forward
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net 8/9] fm10k: remove skb_share_check from xmit path
  2020-02-11 15:00   ` [PATCH v3 net 0/9] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
                       ` (6 preceding siblings ...)
  2020-02-11 15:00     ` [PATCH v3 net 7/9] ipvlan: " Jason A. Donenfeld
@ 2020-02-11 15:00     ` Jason A. Donenfeld
  2020-02-11 15:00     ` [PATCH v3 net 9/9] benet: " Jason A. Donenfeld
  8 siblings, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-11 15:00 UTC (permalink / raw)
  To: netdev, davem; +Cc: Jason A. Donenfeld, Jeff Kirsher

This is an impossible condition to reach; an skb in ndo_start_xmit won't
be shared by definition.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Link: https://lore.kernel.org/netdev/CAHmME9pk8HEFRq_mBeatNbwXTx7UEfiQ_HG_+Lyz7E+80GmbSA@mail.gmail.com/
---
 drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 0637ccadee79..b6cd7ab6efbe 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -641,11 +641,6 @@ static netdev_tx_t fm10k_xmit_frame(struct sk_buff *skb, struct net_device *dev)
 		struct vlan_hdr *vhdr;
 		__be16 proto;
 
-		/* make sure skb is not shared */
-		skb = skb_share_check(skb, GFP_ATOMIC);
-		if (!skb)
-			return NETDEV_TX_OK;
-
 		/* make sure there is enough room to move the ethernet header */
 		if (unlikely(!pskb_may_pull(skb, VLAN_ETH_HLEN)))
 			return NETDEV_TX_OK;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 net 9/9] benet: remove skb_share_check from xmit path
  2020-02-11 15:00   ` [PATCH v3 net 0/9] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
                       ` (7 preceding siblings ...)
  2020-02-11 15:00     ` [PATCH v3 net 8/9] fm10k: " Jason A. Donenfeld
@ 2020-02-11 15:00     ` Jason A. Donenfeld
  8 siblings, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-11 15:00 UTC (permalink / raw)
  To: netdev, davem
  Cc: Jason A. Donenfeld, Sathya Perla, Ajit Khaparde,
	Sriharsha Basavapatna, Somnath Kotur

This is an impossible condition to reach; an skb in ndo_start_xmit won't
be shared by definition.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://lore.kernel.org/netdev/CAHmME9pk8HEFRq_mBeatNbwXTx7UEfiQ_HG_+Lyz7E+80GmbSA@mail.gmail.com/
---
 drivers/net/ethernet/emulex/benet/be_main.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 56f59db6ebf2..8b797adb3065 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -1048,10 +1048,6 @@ static struct sk_buff *be_insert_vlan_in_pkt(struct be_adapter *adapter,
 	bool insert_vlan = false;
 	u16 vlan_tag = 0;
 
-	skb = skb_share_check(skb, GFP_ATOMIC);
-	if (unlikely(!skb))
-		return skb;
-
 	if (skb_vlan_tag_present(skb)) {
 		vlan_tag = be_get_tx_vlan_tag(adapter, skb);
 		insert_vlan = true;
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 net 7/9] ipvlan: remove skb_share_check from xmit path
  2020-02-11 15:00     ` [PATCH v3 net 7/9] ipvlan: " Jason A. Donenfeld
@ 2020-02-11 17:39       ` Eric Dumazet
  2020-02-11 17:44         ` Jason A. Donenfeld
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2020-02-11 17:39 UTC (permalink / raw)
  To: Jason A. Donenfeld, netdev, davem; +Cc: Mahesh Bandewar



On 2/11/20 7:00 AM, Jason A. Donenfeld wrote:
> This is an impossible condition to reach; an skb in ndo_start_xmit won't
> be shared by definition.
> 

Yes, maybe, but can you elaborate in this changelog ?

AFAIK net/core/pktgen.c can definitely provide shared skbs.

     refcount_inc(&pkt_dev->skb->users);
     ret = dev_queue_xmit(pkt_dev->skb);

We might have to change pktgen to make sure we do not make skb shared
just because it was convenient.

Please do not give a link to some web page that might disappear in the future.

Having to follow an old thread to understand the reasoning is not appealing
for us having to fix bugs in the following years.

Thanks.

> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Mahesh Bandewar <maheshb@google.com>
> Link: https://lore.kernel.org/netdev/CAHmME9pk8HEFRq_mBeatNbwXTx7UEfiQ_HG_+Lyz7E+80GmbSA@mail.gmail.com/
> ---
>  drivers/net/ipvlan/ipvlan_core.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
> index 30cd0c4f0be0..da40723065f2 100644
> --- a/drivers/net/ipvlan/ipvlan_core.c
> +++ b/drivers/net/ipvlan/ipvlan_core.c
> @@ -605,9 +605,6 @@ static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
>  				return ipvlan_rcv_frame(addr, &skb, true);
>  			}
>  		}
> -		skb = skb_share_check(skb, GFP_ATOMIC);
> -		if (!skb)
> -			return NET_XMIT_DROP;
>  
>  		/* Packet definitely does not belong to any of the
>  		 * virtual devices, but the dest is local. So forward
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 net 7/9] ipvlan: remove skb_share_check from xmit path
  2020-02-11 17:39       ` Eric Dumazet
@ 2020-02-11 17:44         ` Jason A. Donenfeld
  2020-02-11 18:50           ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-11 17:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Netdev, David Miller, Mahesh Bandewar, Florian Westphal

On Tue, Feb 11, 2020 at 6:39 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Yes, maybe, but can you elaborate in this changelog ?
>
> AFAIK net/core/pktgen.c can definitely provide shared skbs.
>
>      refcount_inc(&pkt_dev->skb->users);
>      ret = dev_queue_xmit(pkt_dev->skb);
>
> We might have to change pktgen to make sure we do not make skb shared
> just because it was convenient.
>
> Please do not give a link to some web page that might disappear in the future.
>
> Having to follow an old thread to understand the reasoning is not appealing
> for us having to fix bugs in the following years.

Well, I don't know really.

Florian said I should remove skb_share_check() from a function I was
adding, because according to him, the ndo_start_xmit path cannot
contain shared skbs. (See the 0/9 cover letter.) If this claim is
true, then this series is correct. If this claim is not true, then the
series needs to be adjusted.

I tried to trace these and couldn't totally make up my mind, hence the
ALL CAPS mention in the 0/9.

Do you know if this is a safe presumption to make? It sounds like your
opinion is "no" in light of pktgen.c? Should that simply be adjusted
instead?

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 net 7/9] ipvlan: remove skb_share_check from xmit path
  2020-02-11 17:44         ` Jason A. Donenfeld
@ 2020-02-11 18:50           ` Eric Dumazet
  2020-02-11 18:54             ` Jason A. Donenfeld
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2020-02-11 18:50 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Netdev, David Miller, Mahesh Bandewar, Florian Westphal



On 2/11/20 9:44 AM, Jason A. Donenfeld wrote:
> On Tue, Feb 11, 2020 at 6:39 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Yes, maybe, but can you elaborate in this changelog ?
>>
>> AFAIK net/core/pktgen.c can definitely provide shared skbs.
>>
>>      refcount_inc(&pkt_dev->skb->users);
>>      ret = dev_queue_xmit(pkt_dev->skb);
>>
>> We might have to change pktgen to make sure we do not make skb shared
>> just because it was convenient.
>>
>> Please do not give a link to some web page that might disappear in the future.
>>
>> Having to follow an old thread to understand the reasoning is not appealing
>> for us having to fix bugs in the following years.
> 
> Well, I don't know really.
> 
> Florian said I should remove skb_share_check() from a function I was
> adding, because according to him, the ndo_start_xmit path cannot
> contain shared skbs. (See the 0/9 cover letter.) If this claim is
> true, then this series is correct. If this claim is not true, then the
> series needs to be adjusted.

The claim might be true for a particular driver, but not others.

ipvlan has a way to forward packets from TX to RX, and RX to TX,
I would rather not touch it unless you really can make good arguments,
and possibly some tests :)

I am worried about a missing skb_share_check() if for some
reason pskb_expand_head() has to be called later

      BUG_ON(skb_shared(skb));

> 
> I tried to trace these and couldn't totally make up my mind, hence the
> ALL CAPS mention in the 0/9.
> 
> Do you know if this is a safe presumption to make? It sounds like your
> opinion is "no" in light of pktgen.c? Should that simply be adjusted
> instead?

The key here is IFF_TX_SKB_SHARING, but really this is the intent.
I am not sure if all paths have been correctly audited/tested.

I am not saying your patch is wrong, I am unsure about it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 net 7/9] ipvlan: remove skb_share_check from xmit path
  2020-02-11 18:50           ` Eric Dumazet
@ 2020-02-11 18:54             ` Jason A. Donenfeld
  0 siblings, 0 replies; 27+ messages in thread
From: Jason A. Donenfeld @ 2020-02-11 18:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Netdev, David Miller, Mahesh Bandewar, Florian Westphal

On Tue, Feb 11, 2020 at 7:50 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 2/11/20 9:44 AM, Jason A. Donenfeld wrote:
> > On Tue, Feb 11, 2020 at 6:39 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> Yes, maybe, but can you elaborate in this changelog ?
> >>
> >> AFAIK net/core/pktgen.c can definitely provide shared skbs.
> >>
> >>      refcount_inc(&pkt_dev->skb->users);
> >>      ret = dev_queue_xmit(pkt_dev->skb);
> >>
> >> We might have to change pktgen to make sure we do not make skb shared
> >> just because it was convenient.
> >>
> >> Please do not give a link to some web page that might disappear in the future.
> >>
> >> Having to follow an old thread to understand the reasoning is not appealing
> >> for us having to fix bugs in the following years.
> >
> > Well, I don't know really.
> >
> > Florian said I should remove skb_share_check() from a function I was
> > adding, because according to him, the ndo_start_xmit path cannot
> > contain shared skbs. (See the 0/9 cover letter.) If this claim is
> > true, then this series is correct. If this claim is not true, then the
> > series needs to be adjusted.
>
> The claim might be true for a particular driver, but not others.
>
> ipvlan has a way to forward packets from TX to RX, and RX to TX,
> I would rather not touch it unless you really can make good arguments,
> and possibly some tests :)
>
> I am worried about a missing skb_share_check() if for some
> reason pskb_expand_head() has to be called later
>
>       BUG_ON(skb_shared(skb));
>
> >
> > I tried to trace these and couldn't totally make up my mind, hence the
> > ALL CAPS mention in the 0/9.
> >
> > Do you know if this is a safe presumption to make? It sounds like your
> > opinion is "no" in light of pktgen.c? Should that simply be adjusted
> > instead?
>
> The key here is IFF_TX_SKB_SHARING, but really this is the intent.
> I am not sure if all paths have been correctly audited/tested.
>
> I am not saying your patch is wrong, I am unsure about it.

Thanks for the analysis here. It seems like removal of skb_share_check
in a blanket manner is a project in its own. I'll rework v4 of this
patch set to take the conservative set of choices I had originally, of
assuming this can happen and so the helper function needs to be robust
for it. Later we can revisit skb sharing on the tx path as a whole
situation probably best suited for a net-next series.

Jason

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-02-11 18:54 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10 14:14 [PATCH v2 net 0/5] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
2020-02-10 14:14 ` [PATCH v2 net 1/5] icmp: introduce helper for NAT'd source address in network device context Jason A. Donenfeld
2020-02-10 19:48   ` Jason A. Donenfeld
2020-02-10 21:32     ` Florian Westphal
2020-02-10 21:59       ` Jason A. Donenfeld
2020-02-10 22:26         ` Florian Westphal
2020-02-11  9:59           ` Jason A. Donenfeld
2020-02-11 12:25       ` Jason A. Donenfeld
2020-02-11 15:00   ` [PATCH v3 net 0/9] icmp: account for NAT when sending icmps from ndo layer Jason A. Donenfeld
2020-02-11 15:00     ` [PATCH v3 net 1/9] icmp: introduce helper for nat'd source address in network device context Jason A. Donenfeld
2020-02-11 15:00     ` [PATCH v3 net 2/9] gtp: use icmp_ndo_send helper Jason A. Donenfeld
2020-02-11 15:00     ` [PATCH v3 net 3/9] sunvnet: " Jason A. Donenfeld
2020-02-11 15:00     ` [PATCH v3 net 4/9] xfrm: interface: " Jason A. Donenfeld
2020-02-11 15:00     ` [PATCH v3 net 5/9] wireguard: device: use icmp_ndo_send helper and remove skb_share_check Jason A. Donenfeld
2020-02-11 15:00     ` [PATCH v3 net 6/9] firewire: remove skb_share_check from xmit path Jason A. Donenfeld
2020-02-11 15:00     ` [PATCH v3 net 7/9] ipvlan: " Jason A. Donenfeld
2020-02-11 17:39       ` Eric Dumazet
2020-02-11 17:44         ` Jason A. Donenfeld
2020-02-11 18:50           ` Eric Dumazet
2020-02-11 18:54             ` Jason A. Donenfeld
2020-02-11 15:00     ` [PATCH v3 net 8/9] fm10k: " Jason A. Donenfeld
2020-02-11 15:00     ` [PATCH v3 net 9/9] benet: " Jason A. Donenfeld
2020-02-10 14:14 ` [PATCH v2 net 2/5] gtp: use icmp_ndo_send helper Jason A. Donenfeld
2020-02-10 14:14 ` [PATCH v2 net 3/5] sunvnet: " Jason A. Donenfeld
2020-02-10 14:14 ` [PATCH v2 net 4/5] wireguard: " Jason A. Donenfeld
2020-02-10 14:14 ` [PATCH v2 net 5/5] xfrm: interface: " Jason A. Donenfeld
2020-02-10 19:30 ` [PATCH v2 net 6/5] wireguard: selftests: ensure that icmp src address is correct with NAT Jason A. Donenfeld

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.