netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] IP: cleanup LSRR option processing
@ 2017-08-03 16:07 Paolo Abeni
  2017-08-03 16:07 ` [PATCH net-next 1/4] IP: do not modify ingress packet IP option in ip_options_echo() Paolo Abeni
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Paolo Abeni @ 2017-08-03 16:07 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet, Hannes Frederic Sowa

The __ip_options_echo() function expect a valid dst entry in skb->dst;
as result we sometimes need to preserve the dst entry for the whole IP
RX path.

The current usage of skb->dst looks more a relic from ancient past that
a real functional constraint. This patchset tries to remove such usage,
and than drops some hacks currently in place in the IP code to keep
skb->dst around.

__ip_options_echo() uses of skb->dst for two different purposes: retrieving
the netns assicated with the skb, and modify the ingress packet LSRR address
list. 

The first patch removes the code modifying the ingress packet, and the second
one provides an explicit netns argument to __ip_options_echo(). The following
patches cleanup the current code keeping arund skb->dst for __ip_options_echo's
sake.

Updating the __ip_options_echo() function has been previously discussed here:

http://marc.info/?l=linux-netdev&m=150064533516348&w=2

Paolo Abeni (4):
  IP: do not modify ingress packet IP option in ip_options_echo()
  ip/options: explicitly provide net ns to __ip_options_echo()
  Revert "ipv4: keep skb->dst around in presence of IP options"
  udp: no need to preserve skb->dst

 include/net/ip.h       |  9 +++++----
 include/net/tcp.h      |  5 +++--
 net/ipv4/icmp.c        |  4 ++--
 net/ipv4/ip_options.c  |  9 +++------
 net/ipv4/ip_output.c   |  2 +-
 net/ipv4/ip_sockglue.c | 16 +++++-----------
 net/ipv4/syncookies.c  |  2 +-
 net/ipv4/tcp_ipv4.c    |  2 +-
 net/ipv4/udp.c         | 13 +++++--------
 9 files changed, 26 insertions(+), 36 deletions(-)

-- 
2.13.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-next 1/4] IP: do not modify ingress packet IP option in ip_options_echo()
  2017-08-03 16:07 [PATCH net-next 0/4] IP: cleanup LSRR option processing Paolo Abeni
@ 2017-08-03 16:07 ` Paolo Abeni
  2017-08-03 16:07 ` [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo() Paolo Abeni
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2017-08-03 16:07 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet, Hannes Frederic Sowa

While computing the response option set for LSRR, ip_options_echo()
also changes the ingress packet LSRR addresses list, setting
the last one to the dst specific address for the ingress packet
- via memset(start[ ...
The only visible effect of such change - beyond possibly damaging
shared/cloned skbs - is modifying the data carried by ICMP replies
changing the header information for reported the ingress packet,
which violates RFC1122 3.2.2.6.
All the others call sites just ignore the ingress packet IP options
after calling ip_options_echo()
Note that the last element in the LSRR option address list for the
reply packet will be properly set later in the ip output path
via ip_options_build().
This buggy memset() predates git history and apparently was present
into the initial ip_options_echo() implementation in linux 1.3.30 but
still looks wrong.

The removal of the fib_compute_spec_dst() call will help
completely dropping the skb->dst usage by __ip_options_echo() with a
later patch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/ip_options.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index 93157f2f4758..fdda97308c0b 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -174,9 +174,6 @@ int __ip_options_echo(struct ip_options *dopt, struct sk_buff *skb,
 				doffset -= 4;
 		}
 		if (doffset > 3) {
-			__be32 daddr = fib_compute_spec_dst(skb);
-
-			memcpy(&start[doffset-1], &daddr, 4);
 			dopt->faddr = faddr;
 			dptr[0] = start[0];
 			dptr[1] = doffset+3;
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo()
  2017-08-03 16:07 [PATCH net-next 0/4] IP: cleanup LSRR option processing Paolo Abeni
  2017-08-03 16:07 ` [PATCH net-next 1/4] IP: do not modify ingress packet IP option in ip_options_echo() Paolo Abeni
@ 2017-08-03 16:07 ` Paolo Abeni
  2017-09-05 17:18   ` Eric Dumazet
  2017-08-03 16:07 ` [PATCH net-next 3/4] Revert "ipv4: keep skb->dst around in presence of IP options" Paolo Abeni
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2017-08-03 16:07 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet, Hannes Frederic Sowa

__ip_options_echo() uses the current network namespace, and
currently retrives it via skb->dst->dev.

This commit adds an explicit 'net' argument to __ip_options_echo()
and update all the call sites to provide it, usually via a simpler
sock_net().

After this change, __ip_options_echo() no more needs to access
skb->dst and we can drop a couple of hack to preserve such
info in the rx path.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/ip.h       | 9 +++++----
 include/net/tcp.h      | 5 +++--
 net/ipv4/icmp.c        | 4 ++--
 net/ipv4/ip_options.c  | 6 +++---
 net/ipv4/ip_output.c   | 2 +-
 net/ipv4/ip_sockglue.c | 7 ++++---
 net/ipv4/syncookies.c  | 2 +-
 net/ipv4/tcp_ipv4.c    | 2 +-
 8 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 821cedcc8e73..9e59dcf1787a 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -567,11 +567,12 @@ int ip_forward(struct sk_buff *skb);
 void ip_options_build(struct sk_buff *skb, struct ip_options *opt,
 		      __be32 daddr, struct rtable *rt, int is_frag);
 
-int __ip_options_echo(struct ip_options *dopt, struct sk_buff *skb,
-		      const struct ip_options *sopt);
-static inline int ip_options_echo(struct ip_options *dopt, struct sk_buff *skb)
+int __ip_options_echo(struct net *net, struct ip_options *dopt,
+		      struct sk_buff *skb, const struct ip_options *sopt);
+static inline int ip_options_echo(struct net *net, struct ip_options *dopt,
+				  struct sk_buff *skb)
 {
-	return __ip_options_echo(dopt, skb, &IPCB(skb)->opt);
+	return __ip_options_echo(net, dopt, skb, &IPCB(skb)->opt);
 }
 
 void ip_options_fragment(struct sk_buff *skb);
diff --git a/include/net/tcp.h b/include/net/tcp.h
index bb1881b4ce48..5173fecde495 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1885,7 +1885,8 @@ extern void tcp_rack_reo_timeout(struct sock *sk);
 /*
  * Save and compile IPv4 options, return a pointer to it
  */
-static inline struct ip_options_rcu *tcp_v4_save_options(struct sk_buff *skb)
+static inline struct ip_options_rcu *tcp_v4_save_options(struct net *net,
+							 struct sk_buff *skb)
 {
 	const struct ip_options *opt = &TCP_SKB_CB(skb)->header.h4.opt;
 	struct ip_options_rcu *dopt = NULL;
@@ -1894,7 +1895,7 @@ static inline struct ip_options_rcu *tcp_v4_save_options(struct sk_buff *skb)
 		int opt_size = sizeof(*dopt) + opt->optlen;
 
 		dopt = kmalloc(opt_size, GFP_ATOMIC);
-		if (dopt && __ip_options_echo(&dopt->opt, skb, opt)) {
+		if (dopt && __ip_options_echo(net, &dopt->opt, skb, opt)) {
 			kfree(dopt);
 			dopt = NULL;
 		}
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index c2be26b98b5f..681e33998e03 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -412,7 +412,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 	int type = icmp_param->data.icmph.type;
 	int code = icmp_param->data.icmph.code;
 
-	if (ip_options_echo(&icmp_param->replyopts.opt.opt, skb))
+	if (ip_options_echo(net, &icmp_param->replyopts.opt.opt, skb))
 		return;
 
 	/* Needed by both icmp_global_allow and icmp_xmit_lock */
@@ -694,7 +694,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 					  iph->tos;
 	mark = IP4_REPLY_MARK(net, skb_in->mark);
 
-	if (ip_options_echo(&icmp_param.replyopts.opt.opt, skb_in))
+	if (ip_options_echo(net, &icmp_param.replyopts.opt.opt, skb_in))
 		goto out_unlock;
 
 
diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index fdda97308c0b..525ae88d1e58 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -86,8 +86,8 @@ void ip_options_build(struct sk_buff *skb, struct ip_options *opt,
  * NOTE: dopt cannot point to skb.
  */
 
-int __ip_options_echo(struct ip_options *dopt, struct sk_buff *skb,
-		      const struct ip_options *sopt)
+int __ip_options_echo(struct net *net, struct ip_options *dopt,
+		      struct sk_buff *skb, const struct ip_options *sopt)
 {
 	unsigned char *sptr, *dptr;
 	int soffset, doffset;
@@ -140,7 +140,7 @@ int __ip_options_echo(struct ip_options *dopt, struct sk_buff *skb,
 						__be32 addr;
 
 						memcpy(&addr, dptr+soffset-1, 4);
-						if (inet_addr_type(dev_net(skb_dst(skb)->dev), addr) != RTN_UNICAST) {
+						if (inet_addr_type(net, addr) != RTN_UNICAST) {
 							dopt->ts_needtime = 1;
 							soffset += 8;
 						}
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index b631ec685d77..73b0b15245b6 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1525,7 +1525,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
 	int err;
 	int oif;
 
-	if (__ip_options_echo(&replyopts.opt.opt, skb, sopt))
+	if (__ip_options_echo(net, &replyopts.opt.opt, skb, sopt))
 		return;
 
 	ipc.addr = daddr;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index ecc4b4a2413e..1c3354d028a4 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -80,7 +80,8 @@ static void ip_cmsg_recv_opts(struct msghdr *msg, struct sk_buff *skb)
 }
 
 
-static void ip_cmsg_recv_retopts(struct msghdr *msg, struct sk_buff *skb)
+static void ip_cmsg_recv_retopts(struct net *net, struct msghdr *msg,
+				 struct sk_buff *skb)
 {
 	unsigned char optbuf[sizeof(struct ip_options) + 40];
 	struct ip_options *opt = (struct ip_options *)optbuf;
@@ -88,7 +89,7 @@ static void ip_cmsg_recv_retopts(struct msghdr *msg, struct sk_buff *skb)
 	if (IPCB(skb)->opt.optlen == 0)
 		return;
 
-	if (ip_options_echo(opt, skb)) {
+	if (ip_options_echo(net, opt, skb)) {
 		msg->msg_flags |= MSG_CTRUNC;
 		return;
 	}
@@ -204,7 +205,7 @@ void ip_cmsg_recv_offset(struct msghdr *msg, struct sock *sk,
 	}
 
 	if (flags & IP_CMSG_RETOPTS) {
-		ip_cmsg_recv_retopts(msg, skb);
+		ip_cmsg_recv_retopts(sock_net(sk), msg, skb);
 
 		flags &= ~IP_CMSG_RETOPTS;
 		if (!flags)
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 03ad8778c395..b1bb1b3a1082 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -355,7 +355,7 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb)
 	/* We throwed the options of the initial SYN away, so we hope
 	 * the ACK carries the same options again (see RFC1122 4.2.3.8)
 	 */
-	ireq->opt = tcp_v4_save_options(skb);
+	ireq->opt = tcp_v4_save_options(sock_net(sk), skb);
 
 	if (security_inet_conn_request(sk, skb, req)) {
 		reqsk_free(req);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 9b51663cd5a4..5f708c85110e 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1267,7 +1267,7 @@ static void tcp_v4_init_req(struct request_sock *req,
 
 	sk_rcv_saddr_set(req_to_sk(req), ip_hdr(skb)->daddr);
 	sk_daddr_set(req_to_sk(req), ip_hdr(skb)->saddr);
-	ireq->opt = tcp_v4_save_options(skb);
+	ireq->opt = tcp_v4_save_options(sock_net(sk_listener), skb);
 }
 
 static struct dst_entry *tcp_v4_route_req(const struct sock *sk,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 3/4] Revert "ipv4: keep skb->dst around in presence of IP options"
  2017-08-03 16:07 [PATCH net-next 0/4] IP: cleanup LSRR option processing Paolo Abeni
  2017-08-03 16:07 ` [PATCH net-next 1/4] IP: do not modify ingress packet IP option in ip_options_echo() Paolo Abeni
  2017-08-03 16:07 ` [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo() Paolo Abeni
@ 2017-08-03 16:07 ` Paolo Abeni
  2017-08-03 16:07 ` [PATCH net-next 4/4] udp: no need to preserve skb->dst Paolo Abeni
  2017-08-07  3:51 ` [PATCH net-next 0/4] IP: cleanup LSRR option processing David Miller
  4 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2017-08-03 16:07 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet, Hannes Frederic Sowa

ip_options_echo() does not use anymore the skb->dst and don't
need to keep the dst around for options's sake only.
This reverts commit 34b2cef20f19c87999fff3da4071e66937db9644.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/ip_sockglue.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 1c3354d028a4..dd68a9ed5e40 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1228,14 +1228,7 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct sk_buff *skb)
 		pktinfo->ipi_ifindex = 0;
 		pktinfo->ipi_spec_dst.s_addr = 0;
 	}
-	/* We need to keep the dst for __ip_options_echo()
-	 * We could restrict the test to opt.ts_needtime || opt.srr,
-	 * but the following is good enough as IP options are not often used.
-	 */
-	if (unlikely(IPCB(skb)->opt.optlen))
-		skb_dst_force(skb);
-	else
-		skb_dst_drop(skb);
+	skb_dst_drop(skb);
 }
 
 int ip_setsockopt(struct sock *sk, int level,
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-next 4/4] udp: no need to preserve skb->dst
  2017-08-03 16:07 [PATCH net-next 0/4] IP: cleanup LSRR option processing Paolo Abeni
                   ` (2 preceding siblings ...)
  2017-08-03 16:07 ` [PATCH net-next 3/4] Revert "ipv4: keep skb->dst around in presence of IP options" Paolo Abeni
@ 2017-08-03 16:07 ` Paolo Abeni
  2017-08-07  3:51 ` [PATCH net-next 0/4] IP: cleanup LSRR option processing David Miller
  4 siblings, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2017-08-03 16:07 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet, Hannes Frederic Sowa

__ip_options_echo() does not need anymore skb->dst, so we can
avoid explicitly preserving it for its own sake.

This is almost a revert of commit 0ddf3fb2c43d ("udp: preserve
skb->dst if required for IP options processing") plus some
lifting to fit later changes.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/udp.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e6276fa3750b..38bca2c4897d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1176,7 +1176,11 @@ static void udp_set_dev_scratch(struct sk_buff *skb)
 	scratch->csum_unnecessary = !!skb_csum_unnecessary(skb);
 	scratch->is_linear = !skb_is_nonlinear(skb);
 #endif
-	if (likely(!skb->_skb_refdst))
+	/* all head states execept sp (dst, sk, nf) are always cleared by
+	 * udp_rcv() and we need to preserve secpath, if present, to eventually
+	 * process IP_CMSG_PASSSEC at recvmsg() time
+	 */
+	if (likely(!skb_sec_path(skb)))
 		scratch->_tsize_state |= UDP_SKB_IS_STATELESS;
 }
 
@@ -1782,13 +1786,6 @@ static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 		sk_mark_napi_id_once(sk, skb);
 	}
 
-	/* At recvmsg() time we may access skb->dst or skb->sp depending on
-	 * the IP options and the cmsg flags, elsewhere can we clear all
-	 * pending head states while they are hot in the cache
-	 */
-	if (likely(IPCB(skb)->opt.optlen == 0 && !skb_sec_path(skb)))
-		skb_release_head_state(skb);
-
 	rc = __udp_enqueue_schedule_skb(sk, skb);
 	if (rc < 0) {
 		int is_udplite = IS_UDPLITE(sk);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 0/4] IP: cleanup LSRR option processing
  2017-08-03 16:07 [PATCH net-next 0/4] IP: cleanup LSRR option processing Paolo Abeni
                   ` (3 preceding siblings ...)
  2017-08-03 16:07 ` [PATCH net-next 4/4] udp: no need to preserve skb->dst Paolo Abeni
@ 2017-08-07  3:51 ` David Miller
  4 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2017-08-07  3:51 UTC (permalink / raw)
  To: pabeni; +Cc: netdev, edumazet, hannes

From: Paolo Abeni <pabeni@redhat.com>
Date: Thu,  3 Aug 2017 18:07:04 +0200

> The __ip_options_echo() function expect a valid dst entry in skb->dst;
> as result we sometimes need to preserve the dst entry for the whole IP
> RX path.
> 
> The current usage of skb->dst looks more a relic from ancient past that
> a real functional constraint. This patchset tries to remove such usage,
> and than drops some hacks currently in place in the IP code to keep
> skb->dst around.
> 
> __ip_options_echo() uses of skb->dst for two different purposes: retrieving
> the netns assicated with the skb, and modify the ingress packet LSRR address
> list. 
> 
> The first patch removes the code modifying the ingress packet, and the second
> one provides an explicit netns argument to __ip_options_echo(). The following
> patches cleanup the current code keeping arund skb->dst for __ip_options_echo's
> sake.
> 
> Updating the __ip_options_echo() function has been previously discussed here:
> 
> http://marc.info/?l=linux-netdev&m=150064533516348&w=2

Series applied, thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo()
  2017-08-03 16:07 ` [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo() Paolo Abeni
@ 2017-09-05 17:18   ` Eric Dumazet
  2017-09-05 21:03     ` Paolo Abeni
  2017-09-06 12:44     ` [PATCH net] udp: drop head states only when all skb references are gone Paolo Abeni
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2017-09-05 17:18 UTC (permalink / raw)
  To: Paolo Abeni, David S. Miller; +Cc: netdev, Eric Dumazet, Hannes Frederic Sowa

On Thu, 2017-08-03 at 18:07 +0200, Paolo Abeni wrote:
> __ip_options_echo() uses the current network namespace, and
> currently retrives it via skb->dst->dev.
> 
> This commit adds an explicit 'net' argument to __ip_options_echo()
> and update all the call sites to provide it, usually via a simpler
> sock_net().
> 
> After this change, __ip_options_echo() no more needs to access
> skb->dst and we can drop a couple of hack to preserve such
> info in the rx path.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---

David, Paolo

This commit (91ed1e666a4ea2e260452a7d7d311ac5ae852cba "ip/options:
explicitly provide net ns to __ip_options_echo()")

needs to be backported to linux-4.13 stable version to avoid these kind
of crashes [1]

This is because of MSG_PEEK operation, hitting skb_consume_udp() while
skb is still in receive queue.

Next read() finding again the skb then can see a NULL skb->dst

Thanks !

[1]
general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 3017 Comm: syzkaller446772 Not tainted 4.13.0+ #68
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
task: ffff8801cd0a4380 task.stack: ffff8801cc498000
RIP: 0010:__ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143
RSP: 0018:ffff8801cc49f628 EFLAGS: 00010246
RAX: dffffc0000000000 RBX: ffff8801cc49f928 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000004
RBP: ffff8801cc49f6b8 R08: ffff8801cc49f936 R09: ffffed0039893f28
R10: 0000000000000003 R11: ffffed0039893f27 R12: ffff8801cc49f918
R13: ffff8801ccbcf36c R14: 000000000000000f R15: 0000000000000018
FS:  0000000000979880(0000) GS:ffff8801db200000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000200c0ff0 CR3: 00000001cc4ed000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ip_options_echo include/net/ip.h:574 [inline]
 ip_cmsg_recv_retopts net/ipv4/ip_sockglue.c:91 [inline]
 ip_cmsg_recv_offset+0xa17/0x1280 net/ipv4/ip_sockglue.c:207
 udp_recvmsg+0xe0b/0x1260 net/ipv4/udp.c:1641
 inet_recvmsg+0x14c/0x5f0 net/ipv4/af_inet.c:793
 sock_recvmsg_nosec net/socket.c:792 [inline]
 sock_recvmsg+0xc9/0x110 net/socket.c:799
 SYSC_recvfrom+0x2dc/0x570 net/socket.c:1788
 SyS_recvfrom+0x40/0x50 net/socket.c:1760
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x444c89
RSP: 002b:00007ffd80c788e8 EFLAGS: 00000286 ORIG_RAX: 000000000000002d
RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000444c89
RDX: 0000000000000000 RSI: 0000000020bc0000 RDI: 0000000000000004
RBP: 0000000000000082 R08: 00000000200c0ff0 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000402390
R13: 0000000000402420 R14: 0000000000000000 R15: 0000000000000000
Code: f6 c1 01 0f 85 a5 01 00 00 48 89 4d b8 e8 31 e9 6b fd 48 8b 4d b8
48 b8 00 00 00 00 00 fc ff df 48 83 e1 fe 48 89 ca 48 c1 ea 03 <80> 3c
02 00 0f 85 41 02 00 00 48 8b 09 48 b8 00 00 00 00 00 fc 
RIP: __ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143 RSP:
ffff8801cc49f628
---[ end trace b30d95b284222843 ]---
Kernel panic - not syncing: Fatal exception

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo()
  2017-09-05 17:18   ` Eric Dumazet
@ 2017-09-05 21:03     ` Paolo Abeni
  2017-09-06 12:44     ` [PATCH net] udp: drop head states only when all skb references are gone Paolo Abeni
  1 sibling, 0 replies; 10+ messages in thread
From: Paolo Abeni @ 2017-09-05 21:03 UTC (permalink / raw)
  To: Eric Dumazet, David S. Miller; +Cc: netdev, Eric Dumazet, Hannes Frederic Sowa

On Tue, 2017-09-05 at 10:18 -0700, Eric Dumazet wrote:
> On Thu, 2017-08-03 at 18:07 +0200, Paolo Abeni wrote:
> > __ip_options_echo() uses the current network namespace, and
> > currently retrives it via skb->dst->dev.
> > 
> > This commit adds an explicit 'net' argument to __ip_options_echo()
> > and update all the call sites to provide it, usually via a simpler
> > sock_net().
> > 
> > After this change, __ip_options_echo() no more needs to access
> > skb->dst and we can drop a couple of hack to preserve such
> > info in the rx path.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> 
> David, Paolo
> 
> This commit (91ed1e666a4ea2e260452a7d7d311ac5ae852cba "ip/options:
> explicitly provide net ns to __ip_options_echo()")
> 
> needs to be backported to linux-4.13 stable version to avoid these kind
> of crashes [1]
> 
> This is because of MSG_PEEK operation, hitting skb_consume_udp() while
> skb is still in receive queue.
> 
> Next read() finding again the skb then can see a NULL skb->dst
> 
> Thanks !
> 
> [1]
> general protection fault: 0000 [#1] SMP KASAN
> Dumping ftrace buffer:
>    (ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 3017 Comm: syzkaller446772 Not tainted 4.13.0+ #68
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> task: ffff8801cd0a4380 task.stack: ffff8801cc498000
> RIP: 0010:__ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143
> RSP: 0018:ffff8801cc49f628 EFLAGS: 00010246
> RAX: dffffc0000000000 RBX: ffff8801cc49f928 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000004
> RBP: ffff8801cc49f6b8 R08: ffff8801cc49f936 R09: ffffed0039893f28
> R10: 0000000000000003 R11: ffffed0039893f27 R12: ffff8801cc49f918
> R13: ffff8801ccbcf36c R14: 000000000000000f R15: 0000000000000018
> FS:  0000000000979880(0000) GS:ffff8801db200000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000200c0ff0 CR3: 00000001cc4ed000 CR4: 00000000001406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  ip_options_echo include/net/ip.h:574 [inline]
>  ip_cmsg_recv_retopts net/ipv4/ip_sockglue.c:91 [inline]
>  ip_cmsg_recv_offset+0xa17/0x1280 net/ipv4/ip_sockglue.c:207
>  udp_recvmsg+0xe0b/0x1260 net/ipv4/udp.c:1641
>  inet_recvmsg+0x14c/0x5f0 net/ipv4/af_inet.c:793
>  sock_recvmsg_nosec net/socket.c:792 [inline]
>  sock_recvmsg+0xc9/0x110 net/socket.c:799
>  SYSC_recvfrom+0x2dc/0x570 net/socket.c:1788
>  SyS_recvfrom+0x40/0x50 net/socket.c:1760
>  entry_SYSCALL_64_fastpath+0x1f/0xbe
> RIP: 0033:0x444c89
> RSP: 002b:00007ffd80c788e8 EFLAGS: 00000286 ORIG_RAX: 000000000000002d
> RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000444c89
> RDX: 0000000000000000 RSI: 0000000020bc0000 RDI: 0000000000000004
> RBP: 0000000000000082 R08: 00000000200c0ff0 R09: 0000000000000010
> R10: 0000000000000000 R11: 0000000000000286 R12: 0000000000402390
> R13: 0000000000402420 R14: 0000000000000000 R15: 0000000000000000
> Code: f6 c1 01 0f 85 a5 01 00 00 48 89 4d b8 e8 31 e9 6b fd 48 8b 4d b8
> 48 b8 00 00 00 00 00 fc ff df 48 83 e1 fe 48 89 ca 48 c1 ea 03 <80> 3c
> 02 00 0f 85 41 02 00 00 48 8b 09 48 b8 00 00 00 00 00 fc 
> RIP: __ip_options_echo+0xea8/0x1430 net/ipv4/ip_options.c:143 RSP:
> ffff8801cc49f628
> ---[ end trace b30d95b284222843 ]---
> Kernel panic - not syncing: Fatal exception

Thank you Eric for the report! 

Darn me, I seriously messed-up with the stateless consume.

I think we can have similar issues pith ipsec/secpath and MSG_PEEK,
even if with less catastropthic outcome.

What about the following, which should cover both cases? (only compile
tested, I'll test it tomorrow morning my time)
---
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d67a8182e5eb..63df75ae70ee 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -885,7 +885,7 @@ void kfree_skb(struct sk_buff *skb);
 void kfree_skb_list(struct sk_buff *segs);
 void skb_tx_error(struct sk_buff *skb);
 void consume_skb(struct sk_buff *skb);
-void consume_stateless_skb(struct sk_buff *skb);
+void __consume_stateless_skb(struct sk_buff *skb);
 void  __kfree_skb(struct sk_buff *skb);
 extern struct kmem_cache *skbuff_head_cache;
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e07556606284..f2411a8744d7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -753,14 +753,11 @@ EXPORT_SYMBOL(consume_skb);
  *	consume_stateless_skb - free an skbuff, assuming it is stateless
  *	@skb: buffer to free
  *
- *	Works like consume_skb(), but this variant assumes that all the head
- *	states have been already dropped.
+ *	Alike consume_skb(), but this variant assumes that all the head
+ *	states have been already dropped and usage count is one
  */
-void consume_stateless_skb(struct sk_buff *skb)
+void __consume_stateless_skb(struct sk_buff *skb)
 {
-	if (!skb_unref(skb))
-		return;
-
 	trace_consume_skb(skb);
 	if (likely(skb->head))
 		skb_release_data(skb);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 62344804baae..979e4d8526ba 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1386,12 +1386,15 @@ void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len)
 		unlock_sock_fast(sk, slow);
 	}
 
+	if (!skb_unref(skb))
+		return;
+
 	/* In the more common cases we cleared the head states previously,
 	 * see __udp_queue_rcv_skb().
 	 */
 	if (unlikely(udp_skb_has_head_state(skb)))
 		skb_release_head_state(skb);
-	consume_stateless_skb(skb);
+	__consume_stateless_skb(skb);
 }
 EXPORT_SYMBOL_GPL(skb_consume_udp);
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net] udp: drop head states only when all skb references are gone
  2017-09-05 17:18   ` Eric Dumazet
  2017-09-05 21:03     ` Paolo Abeni
@ 2017-09-06 12:44     ` Paolo Abeni
  2017-09-08  3:03       ` David Miller
  1 sibling, 1 reply; 10+ messages in thread
From: Paolo Abeni @ 2017-09-06 12:44 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet, Hannes Frederic Sowa

After commit 0ddf3fb2c43d ("udp: preserve skb->dst if required
for IP options processing") we clear the skb head state as soon
as the skb carrying them is first processed.

Since the same skb can be processed several times when MSG_PEEK
is used, we can end up lacking the required head states, and
eventually oopsing.

Fix this clearing the skb head state only when processing the
last skb reference.

Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: 0ddf3fb2c43d ("udp: preserve skb->dst if required for IP options processing")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/linux/skbuff.h | 2 +-
 net/core/skbuff.c      | 9 +++------
 net/ipv4/udp.c         | 5 ++++-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d67a8182e5eb..63df75ae70ee 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -885,7 +885,7 @@ void kfree_skb(struct sk_buff *skb);
 void kfree_skb_list(struct sk_buff *segs);
 void skb_tx_error(struct sk_buff *skb);
 void consume_skb(struct sk_buff *skb);
-void consume_stateless_skb(struct sk_buff *skb);
+void __consume_stateless_skb(struct sk_buff *skb);
 void  __kfree_skb(struct sk_buff *skb);
 extern struct kmem_cache *skbuff_head_cache;
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e07556606284..72eb23d2426f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -753,14 +753,11 @@ EXPORT_SYMBOL(consume_skb);
  *	consume_stateless_skb - free an skbuff, assuming it is stateless
  *	@skb: buffer to free
  *
- *	Works like consume_skb(), but this variant assumes that all the head
- *	states have been already dropped.
+ *	Alike consume_skb(), but this variant assumes that this is the last
+ *	skb reference and all the head states have been already dropped
  */
-void consume_stateless_skb(struct sk_buff *skb)
+void __consume_stateless_skb(struct sk_buff *skb)
 {
-	if (!skb_unref(skb))
-		return;
-
 	trace_consume_skb(skb);
 	if (likely(skb->head))
 		skb_release_data(skb);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 62344804baae..979e4d8526ba 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1386,12 +1386,15 @@ void skb_consume_udp(struct sock *sk, struct sk_buff *skb, int len)
 		unlock_sock_fast(sk, slow);
 	}
 
+	if (!skb_unref(skb))
+		return;
+
 	/* In the more common cases we cleared the head states previously,
 	 * see __udp_queue_rcv_skb().
 	 */
 	if (unlikely(udp_skb_has_head_state(skb)))
 		skb_release_head_state(skb);
-	consume_stateless_skb(skb);
+	__consume_stateless_skb(skb);
 }
 EXPORT_SYMBOL_GPL(skb_consume_udp);
 
-- 
2.13.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net] udp: drop head states only when all skb references are gone
  2017-09-06 12:44     ` [PATCH net] udp: drop head states only when all skb references are gone Paolo Abeni
@ 2017-09-08  3:03       ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2017-09-08  3:03 UTC (permalink / raw)
  To: pabeni; +Cc: netdev, edumazet, hannes

From: Paolo Abeni <pabeni@redhat.com>
Date: Wed,  6 Sep 2017 14:44:36 +0200

> After commit 0ddf3fb2c43d ("udp: preserve skb->dst if required
> for IP options processing") we clear the skb head state as soon
> as the skb carrying them is first processed.
> 
> Since the same skb can be processed several times when MSG_PEEK
> is used, we can end up lacking the required head states, and
> eventually oopsing.
> 
> Fix this clearing the skb head state only when processing the
> last skb reference.
> 
> Reported-by: Eric Dumazet <edumazet@google.com>
> Fixes: 0ddf3fb2c43d ("udp: preserve skb->dst if required for IP options processing")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-09-08  3:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-03 16:07 [PATCH net-next 0/4] IP: cleanup LSRR option processing Paolo Abeni
2017-08-03 16:07 ` [PATCH net-next 1/4] IP: do not modify ingress packet IP option in ip_options_echo() Paolo Abeni
2017-08-03 16:07 ` [PATCH net-next 2/4] ip/options: explicitly provide net ns to __ip_options_echo() Paolo Abeni
2017-09-05 17:18   ` Eric Dumazet
2017-09-05 21:03     ` Paolo Abeni
2017-09-06 12:44     ` [PATCH net] udp: drop head states only when all skb references are gone Paolo Abeni
2017-09-08  3:03       ` David Miller
2017-08-03 16:07 ` [PATCH net-next 3/4] Revert "ipv4: keep skb->dst around in presence of IP options" Paolo Abeni
2017-08-03 16:07 ` [PATCH net-next 4/4] udp: no need to preserve skb->dst Paolo Abeni
2017-08-07  3:51 ` [PATCH net-next 0/4] IP: cleanup LSRR option processing David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).