All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] inet: fully convert sk->sk_rx_dst to RCU rules
@ 2021-12-20 14:33 Eric Dumazet
  2021-12-21  3:10 ` patchwork-bot+netdevbpf
  2022-10-20 21:05 ` Carlos Llamas
  0 siblings, 2 replies; 5+ messages in thread
From: Eric Dumazet @ 2021-12-20 14:33 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski; +Cc: netdev, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

syzbot reported various issues around early demux,
one being included in this changelog [1]

sk->sk_rx_dst is using RCU protection without clearly
documenting it.

And following sequences in tcp_v4_do_rcv()/tcp_v6_do_rcv()
are not following standard RCU rules.

[a]    dst_release(dst);
[b]    sk->sk_rx_dst = NULL;

They look wrong because a delete operation of RCU protected
pointer is supposed to clear the pointer before
the call_rcu()/synchronize_rcu() guarding actual memory freeing.

In some cases indeed, dst could be freed before [b] is done.

We could cheat by clearing sk_rx_dst before calling
dst_release(), but this seems the right time to stick
to standard RCU annotations and debugging facilities.

[1]
BUG: KASAN: use-after-free in dst_check include/net/dst.h:470 [inline]
BUG: KASAN: use-after-free in tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
Read of size 2 at addr ffff88807f1cb73a by task syz-executor.5/9204

CPU: 0 PID: 9204 Comm: syz-executor.5 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
 __kasan_report mm/kasan/report.c:433 [inline]
 kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
 dst_check include/net/dst.h:470 [inline]
 tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
 ip_rcv_finish_core.constprop.0+0x15de/0x1e80 net/ipv4/ip_input.c:340
 ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
 ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
 ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
 __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
 __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
 __netif_receive_skb_list net/core/dev.c:5608 [inline]
 netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
 gro_normal_list net/core/dev.c:5853 [inline]
 gro_normal_list net/core/dev.c:5849 [inline]
 napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
 virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
 __napi_poll+0xaf/0x440 net/core/dev.c:7023
 napi_poll net/core/dev.c:7090 [inline]
 net_rx_action+0x801/0xb40 net/core/dev.c:7177
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
 invoke_softirq kernel/softirq.c:432 [inline]
 __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
 irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
 common_interrupt+0x52/0xc0 arch/x86/kernel/irq.c:240
 asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:629
RIP: 0033:0x7f5e972bfd57
Code: 39 d1 73 14 0f 1f 80 00 00 00 00 48 8b 50 f8 48 83 e8 08 48 39 ca 77 f3 48 39 c3 73 3e 48 89 13 48 8b 50 f8 48 89 38 49 8b 0e <48> 8b 3e 48 83 c3 08 48 83 c6 08 eb bc 48 39 d1 72 9e 48 39 d0 73
RSP: 002b:00007fff8a413210 EFLAGS: 00000283
RAX: 00007f5e97108990 RBX: 00007f5e97108338 RCX: ffffffff81d3aa45
RDX: ffffffff81d3aa45 RSI: 00007f5e97108340 RDI: ffffffff81d3aa45
RBP: 00007f5e97107eb8 R08: 00007f5e97108d88 R09: 0000000093c2e8d9
R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5e97107eb0
R13: 00007f5e97108338 R14: 00007f5e97107ea8 R15: 0000000000000019
 </TASK>

Allocated by task 13:
 kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:434 [inline]
 __kasan_slab_alloc+0x90/0xc0 mm/kasan/common.c:467
 kasan_slab_alloc include/linux/kasan.h:259 [inline]
 slab_post_alloc_hook mm/slab.h:519 [inline]
 slab_alloc_node mm/slub.c:3234 [inline]
 slab_alloc mm/slub.c:3242 [inline]
 kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3247
 dst_alloc+0x146/0x1f0 net/core/dst.c:92
 rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
 ip_route_input_slow+0x1817/0x3a20 net/ipv4/route.c:2340
 ip_route_input_rcu net/ipv4/route.c:2470 [inline]
 ip_route_input_noref+0x116/0x2a0 net/ipv4/route.c:2415
 ip_rcv_finish_core.constprop.0+0x288/0x1e80 net/ipv4/ip_input.c:354
 ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
 ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
 ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
 __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
 __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
 __netif_receive_skb_list net/core/dev.c:5608 [inline]
 netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
 gro_normal_list net/core/dev.c:5853 [inline]
 gro_normal_list net/core/dev.c:5849 [inline]
 napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
 virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
 virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
 __napi_poll+0xaf/0x440 net/core/dev.c:7023
 napi_poll net/core/dev.c:7090 [inline]
 net_rx_action+0x801/0xb40 net/core/dev.c:7177
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558

Freed by task 13:
 kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
 kasan_set_track+0x21/0x30 mm/kasan/common.c:46
 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
 ____kasan_slab_free mm/kasan/common.c:366 [inline]
 ____kasan_slab_free mm/kasan/common.c:328 [inline]
 __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:1723 [inline]
 slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749
 slab_free mm/slub.c:3513 [inline]
 kmem_cache_free+0xbd/0x5d0 mm/slub.c:3530
 dst_destroy+0x2d6/0x3f0 net/core/dst.c:127
 rcu_do_batch kernel/rcu/tree.c:2506 [inline]
 rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2741
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558

Last potentially related work creation:
 kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
 __kasan_record_aux_stack+0xf5/0x120 mm/kasan/generic.c:348
 __call_rcu kernel/rcu/tree.c:2985 [inline]
 call_rcu+0xb1/0x740 kernel/rcu/tree.c:3065
 dst_release net/core/dst.c:177 [inline]
 dst_release+0x79/0xe0 net/core/dst.c:167
 tcp_v4_do_rcv+0x612/0x8d0 net/ipv4/tcp_ipv4.c:1712
 sk_backlog_rcv include/net/sock.h:1030 [inline]
 __release_sock+0x134/0x3b0 net/core/sock.c:2768
 release_sock+0x54/0x1b0 net/core/sock.c:3300
 tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1441
 inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:724
 sock_write_iter+0x289/0x3c0 net/socket.c:1057
 call_write_iter include/linux/fs.h:2162 [inline]
 new_sync_write+0x429/0x660 fs/read_write.c:503
 vfs_write+0x7cd/0xae0 fs/read_write.c:590
 ksys_write+0x1ee/0x250 fs/read_write.c:643
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff88807f1cb700
 which belongs to the cache ip_dst_cache of size 176
The buggy address is located 58 bytes inside of
 176-byte region [ffff88807f1cb700, ffff88807f1cb7b0)
The buggy address belongs to the page:
page:ffffea0001fc72c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7f1cb
flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000200 dead000000000100 dead000000000122 ffff8881413bb780
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 5, ts 108466983062, free_ts 108048976062
 prep_new_page mm/page_alloc.c:2418 [inline]
 get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
 alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191
 alloc_slab_page mm/slub.c:1793 [inline]
 allocate_slab mm/slub.c:1930 [inline]
 new_slab+0x32d/0x4a0 mm/slub.c:1993
 ___slab_alloc+0x918/0xfe0 mm/slub.c:3022
 __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109
 slab_alloc_node mm/slub.c:3200 [inline]
 slab_alloc mm/slub.c:3242 [inline]
 kmem_cache_alloc+0x35c/0x3a0 mm/slub.c:3247
 dst_alloc+0x146/0x1f0 net/core/dst.c:92
 rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
 __mkroute_output net/ipv4/route.c:2564 [inline]
 ip_route_output_key_hash_rcu+0x921/0x2d00 net/ipv4/route.c:2791
 ip_route_output_key_hash+0x18b/0x300 net/ipv4/route.c:2619
 __ip_route_output_key include/net/route.h:126 [inline]
 ip_route_output_flow+0x23/0x150 net/ipv4/route.c:2850
 ip_route_output_key include/net/route.h:142 [inline]
 geneve_get_v4_rt+0x3a6/0x830 drivers/net/geneve.c:809
 geneve_xmit_skb drivers/net/geneve.c:899 [inline]
 geneve_xmit+0xc4a/0x3540 drivers/net/geneve.c:1082
 __netdev_start_xmit include/linux/netdevice.h:4994 [inline]
 netdev_start_xmit include/linux/netdevice.h:5008 [inline]
 xmit_one net/core/dev.c:3590 [inline]
 dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3606
 __dev_queue_xmit+0x299a/0x3650 net/core/dev.c:4229
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1338 [inline]
 free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389
 free_unref_page_prepare mm/page_alloc.c:3309 [inline]
 free_unref_page+0x19/0x690 mm/page_alloc.c:3388
 qlink_free mm/kasan/quarantine.c:146 [inline]
 qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165
 kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
 __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:444
 kasan_slab_alloc include/linux/kasan.h:259 [inline]
 slab_post_alloc_hook mm/slab.h:519 [inline]
 slab_alloc_node mm/slub.c:3234 [inline]
 kmem_cache_alloc_node+0x255/0x3f0 mm/slub.c:3270
 __alloc_skb+0x215/0x340 net/core/skbuff.c:414
 alloc_skb include/linux/skbuff.h:1126 [inline]
 alloc_skb_with_frags+0x93/0x620 net/core/skbuff.c:6078
 sock_alloc_send_pskb+0x783/0x910 net/core/sock.c:2575
 mld_newpack+0x1df/0x770 net/ipv6/mcast.c:1754
 add_grhead+0x265/0x330 net/ipv6/mcast.c:1857
 add_grec+0x1053/0x14e0 net/ipv6/mcast.c:1995
 mld_send_initial_cr.part.0+0xf6/0x230 net/ipv6/mcast.c:2242
 mld_send_initial_cr net/ipv6/mcast.c:1232 [inline]
 mld_dad_work+0x1d3/0x690 net/ipv6/mcast.c:2268
 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445

Memory state around the buggy address:
 ffff88807f1cb600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88807f1cb680: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
>ffff88807f1cb700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                        ^
 ffff88807f1cb780: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
 ffff88807f1cb800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sock.h   |  2 +-
 net/ipv4/af_inet.c   |  2 +-
 net/ipv4/tcp.c       |  3 +--
 net/ipv4/tcp_input.c |  2 +-
 net/ipv4/tcp_ipv4.c  | 11 +++++++----
 net/ipv4/udp.c       |  6 +++---
 net/ipv6/tcp_ipv6.c  | 11 +++++++----
 net/ipv6/udp.c       |  4 ++--
 8 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index bea21ff70e74d906216f4eaa2d5a712d12551216..d47e9658da28545c1f6afd9db0cf136b3e13d7b6 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -431,7 +431,7 @@ struct sock {
 #ifdef CONFIG_XFRM
 	struct xfrm_policy __rcu *sk_policy[2];
 #endif
-	struct dst_entry	*sk_rx_dst;
+	struct dst_entry __rcu	*sk_rx_dst;
 	int			sk_rx_dst_ifindex;
 	u32			sk_rx_dst_cookie;
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 0189e3cd4a7df2dc2ea7121182ee290e0164df90..6b5956500436187d5e9b801ebb6f8f52ba0db090 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -154,7 +154,7 @@ void inet_sock_destruct(struct sock *sk)
 
 	kfree(rcu_dereference_protected(inet->inet_opt, 1));
 	dst_release(rcu_dereference_protected(sk->sk_dst_cache, 1));
-	dst_release(sk->sk_rx_dst);
+	dst_release(rcu_dereference_protected(sk->sk_rx_dst, 1));
 	sk_refcnt_debug_dec(sk);
 }
 EXPORT_SYMBOL(inet_sock_destruct);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index bbb3d39c69afc2d5a42c6ace8d473657861da61f..2bb28bfd83bf621b5d9f3fb7ce5695195697b43a 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3012,8 +3012,7 @@ int tcp_disconnect(struct sock *sk, int flags)
 	icsk->icsk_ack.rcv_mss = TCP_MIN_MSS;
 	memset(&tp->rx_opt, 0, sizeof(tp->rx_opt));
 	__sk_dst_reset(sk);
-	dst_release(sk->sk_rx_dst);
-	sk->sk_rx_dst = NULL;
+	dst_release(xchg((__force struct dst_entry **)&sk->sk_rx_dst, NULL));
 	tcp_saved_syn_free(tp);
 	tp->compressed_ack = 0;
 	tp->segs_in = 0;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 246ab7b5e857eb9e802c4805075e89c98cf00636..0ce46849ec3d4595699dd54229919b2d66b70257 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5787,7 +5787,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb)
 	trace_tcp_probe(sk, skb);
 
 	tcp_mstamp_refresh(tp);
-	if (unlikely(!sk->sk_rx_dst))
+	if (unlikely(!rcu_access_pointer(sk->sk_rx_dst)))
 		inet_csk(sk)->icsk_af_ops->sk_rx_dst_set(sk, skb);
 	/*
 	 *	Header prediction.
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 13d868c43284584ee0c58ddfd411bb52c8b0c830..084df223b5dff8089a615a4a8d392b620fc0a28a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1701,7 +1701,10 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
 	struct sock *rsk;
 
 	if (sk->sk_state == TCP_ESTABLISHED) { /* Fast path */
-		struct dst_entry *dst = sk->sk_rx_dst;
+		struct dst_entry *dst;
+
+		dst = rcu_dereference_protected(sk->sk_rx_dst,
+						lockdep_sock_is_held(sk));
 
 		sock_rps_save_rxhash(sk, skb);
 		sk_mark_napi_id(sk, skb);
@@ -1709,8 +1712,8 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
 			if (sk->sk_rx_dst_ifindex != skb->skb_iif ||
 			    !INDIRECT_CALL_1(dst->ops->check, ipv4_dst_check,
 					     dst, 0)) {
+				RCU_INIT_POINTER(sk->sk_rx_dst, NULL);
 				dst_release(dst);
-				sk->sk_rx_dst = NULL;
 			}
 		}
 		tcp_rcv_established(sk, skb);
@@ -1786,7 +1789,7 @@ int tcp_v4_early_demux(struct sk_buff *skb)
 		skb->sk = sk;
 		skb->destructor = sock_edemux;
 		if (sk_fullsock(sk)) {
-			struct dst_entry *dst = READ_ONCE(sk->sk_rx_dst);
+			struct dst_entry *dst = rcu_dereference(sk->sk_rx_dst);
 
 			if (dst)
 				dst = dst_check(dst, 0);
@@ -2201,7 +2204,7 @@ void inet_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
 	struct dst_entry *dst = skb_dst(skb);
 
 	if (dst && dst_hold_safe(dst)) {
-		sk->sk_rx_dst = dst;
+		rcu_assign_pointer(sk->sk_rx_dst, dst);
 		sk->sk_rx_dst_ifindex = skb->skb_iif;
 	}
 }
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 23b05e28490b0a0a690a837027f26167e353f8ce..15c6b450b8dba44d5f344a554eea6b991c1ea5f1 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2250,7 +2250,7 @@ bool udp_sk_rx_dst_set(struct sock *sk, struct dst_entry *dst)
 	struct dst_entry *old;
 
 	if (dst_hold_safe(dst)) {
-		old = xchg(&sk->sk_rx_dst, dst);
+		old = xchg((__force struct dst_entry **)&sk->sk_rx_dst, dst);
 		dst_release(old);
 		return old != dst;
 	}
@@ -2440,7 +2440,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 		struct dst_entry *dst = skb_dst(skb);
 		int ret;
 
-		if (unlikely(sk->sk_rx_dst != dst))
+		if (unlikely(rcu_dereference(sk->sk_rx_dst) != dst))
 			udp_sk_rx_dst_set(sk, dst);
 
 		ret = udp_unicast_rcv_skb(sk, skb, uh);
@@ -2599,7 +2599,7 @@ int udp_v4_early_demux(struct sk_buff *skb)
 
 	skb->sk = sk;
 	skb->destructor = sock_efree;
-	dst = READ_ONCE(sk->sk_rx_dst);
+	dst = rcu_dereference(sk->sk_rx_dst);
 
 	if (dst)
 		dst = dst_check(dst, 0);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 551fce49841d7f53a111b0435855634cece2b40a..680e6481b9672040ccb41fd08ab80166575bef50 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -107,7 +107,7 @@ static void inet6_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
 	if (dst && dst_hold_safe(dst)) {
 		const struct rt6_info *rt = (const struct rt6_info *)dst;
 
-		sk->sk_rx_dst = dst;
+		rcu_assign_pointer(sk->sk_rx_dst, dst);
 		sk->sk_rx_dst_ifindex = skb->skb_iif;
 		sk->sk_rx_dst_cookie = rt6_get_cookie(rt);
 	}
@@ -1505,7 +1505,10 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
 		opt_skb = skb_clone(skb, sk_gfp_mask(sk, GFP_ATOMIC));
 
 	if (sk->sk_state == TCP_ESTABLISHED) { /* Fast path */
-		struct dst_entry *dst = sk->sk_rx_dst;
+		struct dst_entry *dst;
+
+		dst = rcu_dereference_protected(sk->sk_rx_dst,
+						lockdep_sock_is_held(sk));
 
 		sock_rps_save_rxhash(sk, skb);
 		sk_mark_napi_id(sk, skb);
@@ -1513,8 +1516,8 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
 			if (sk->sk_rx_dst_ifindex != skb->skb_iif ||
 			    INDIRECT_CALL_1(dst->ops->check, ip6_dst_check,
 					    dst, sk->sk_rx_dst_cookie) == NULL) {
+				RCU_INIT_POINTER(sk->sk_rx_dst, NULL);
 				dst_release(dst);
-				sk->sk_rx_dst = NULL;
 			}
 		}
 
@@ -1874,7 +1877,7 @@ INDIRECT_CALLABLE_SCOPE void tcp_v6_early_demux(struct sk_buff *skb)
 		skb->sk = sk;
 		skb->destructor = sock_edemux;
 		if (sk_fullsock(sk)) {
-			struct dst_entry *dst = READ_ONCE(sk->sk_rx_dst);
+			struct dst_entry *dst = rcu_dereference(sk->sk_rx_dst);
 
 			if (dst)
 				dst = dst_check(dst, sk->sk_rx_dst_cookie);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index e43b31d25fb61c7875f3bb8a93eb74da244d912a..a2caca6ccf114546f9e4ea854ad67208b2f3873e 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -956,7 +956,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 		struct dst_entry *dst = skb_dst(skb);
 		int ret;
 
-		if (unlikely(sk->sk_rx_dst != dst))
+		if (unlikely(rcu_dereference(sk->sk_rx_dst) != dst))
 			udp6_sk_rx_dst_set(sk, dst);
 
 		if (!uh->check && !udp_sk(sk)->no_check6_rx) {
@@ -1070,7 +1070,7 @@ INDIRECT_CALLABLE_SCOPE void udp_v6_early_demux(struct sk_buff *skb)
 
 	skb->sk = sk;
 	skb->destructor = sock_efree;
-	dst = READ_ONCE(sk->sk_rx_dst);
+	dst = rcu_dereference(sk->sk_rx_dst);
 
 	if (dst)
 		dst = dst_check(dst, sk->sk_rx_dst_cookie);
-- 
2.34.1.173.g76aa8bc2d0-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net] inet: fully convert sk->sk_rx_dst to RCU rules
  2021-12-20 14:33 [PATCH net] inet: fully convert sk->sk_rx_dst to RCU rules Eric Dumazet
@ 2021-12-21  3:10 ` patchwork-bot+netdevbpf
  2022-10-20 21:05 ` Carlos Llamas
  1 sibling, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-12-21  3:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, netdev, edumazet

Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 20 Dec 2021 06:33:30 -0800 you wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> syzbot reported various issues around early demux,
> one being included in this changelog [1]
> 
> sk->sk_rx_dst is using RCU protection without clearly
> documenting it.
> 
> [...]

Here is the summary with links:
  - [net] inet: fully convert sk->sk_rx_dst to RCU rules
    https://git.kernel.org/netdev/net/c/8f905c0e7354

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] inet: fully convert sk->sk_rx_dst to RCU rules
  2021-12-20 14:33 [PATCH net] inet: fully convert sk->sk_rx_dst to RCU rules Eric Dumazet
  2021-12-21  3:10 ` patchwork-bot+netdevbpf
@ 2022-10-20 21:05 ` Carlos Llamas
  2022-10-20 21:53   ` Eric Dumazet
  1 sibling, 1 reply; 5+ messages in thread
From: Carlos Llamas @ 2022-10-20 21:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, netdev, Eric Dumazet, liuwei.a

On Mon, Dec 20, 2021 at 06:33:30AM -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> syzbot reported various issues around early demux,
> one being included in this changelog [1]
> 
> sk->sk_rx_dst is using RCU protection without clearly
> documenting it.
> 
> And following sequences in tcp_v4_do_rcv()/tcp_v6_do_rcv()
> are not following standard RCU rules.
> 
> [a]    dst_release(dst);
> [b]    sk->sk_rx_dst = NULL;
> 
> They look wrong because a delete operation of RCU protected
> pointer is supposed to clear the pointer before
> the call_rcu()/synchronize_rcu() guarding actual memory freeing.
> 
> In some cases indeed, dst could be freed before [b] is done.
> 
> We could cheat by clearing sk_rx_dst before calling
> dst_release(), but this seems the right time to stick
> to standard RCU annotations and debugging facilities.
> 
> [1]
> BUG: KASAN: use-after-free in dst_check include/net/dst.h:470 [inline]
> BUG: KASAN: use-after-free in tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
> Read of size 2 at addr ffff88807f1cb73a by task syz-executor.5/9204
> 
> CPU: 0 PID: 9204 Comm: syz-executor.5 Not tainted 5.16.0-rc5-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  <TASK>
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
>  print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
>  __kasan_report mm/kasan/report.c:433 [inline]
>  kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
>  dst_check include/net/dst.h:470 [inline]
>  tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
>  ip_rcv_finish_core.constprop.0+0x15de/0x1e80 net/ipv4/ip_input.c:340
>  ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
>  ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
>  ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
>  __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
>  __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
>  __netif_receive_skb_list net/core/dev.c:5608 [inline]
>  netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
>  gro_normal_list net/core/dev.c:5853 [inline]
>  gro_normal_list net/core/dev.c:5849 [inline]
>  napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
>  virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
>  virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
>  __napi_poll+0xaf/0x440 net/core/dev.c:7023
>  napi_poll net/core/dev.c:7090 [inline]
>  net_rx_action+0x801/0xb40 net/core/dev.c:7177
>  __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
>  invoke_softirq kernel/softirq.c:432 [inline]
>  __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
>  irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
>  common_interrupt+0x52/0xc0 arch/x86/kernel/irq.c:240
>  asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:629
> RIP: 0033:0x7f5e972bfd57
> Code: 39 d1 73 14 0f 1f 80 00 00 00 00 48 8b 50 f8 48 83 e8 08 48 39 ca 77 f3 48 39 c3 73 3e 48 89 13 48 8b 50 f8 48 89 38 49 8b 0e <48> 8b 3e 48 83 c3 08 48 83 c6 08 eb bc 48 39 d1 72 9e 48 39 d0 73
> RSP: 002b:00007fff8a413210 EFLAGS: 00000283
> RAX: 00007f5e97108990 RBX: 00007f5e97108338 RCX: ffffffff81d3aa45
> RDX: ffffffff81d3aa45 RSI: 00007f5e97108340 RDI: ffffffff81d3aa45
> RBP: 00007f5e97107eb8 R08: 00007f5e97108d88 R09: 0000000093c2e8d9
> R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5e97107eb0
> R13: 00007f5e97108338 R14: 00007f5e97107ea8 R15: 0000000000000019
>  </TASK>
> 
> Allocated by task 13:
>  kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
>  kasan_set_track mm/kasan/common.c:46 [inline]
>  set_alloc_info mm/kasan/common.c:434 [inline]
>  __kasan_slab_alloc+0x90/0xc0 mm/kasan/common.c:467
>  kasan_slab_alloc include/linux/kasan.h:259 [inline]
>  slab_post_alloc_hook mm/slab.h:519 [inline]
>  slab_alloc_node mm/slub.c:3234 [inline]
>  slab_alloc mm/slub.c:3242 [inline]
>  kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3247
>  dst_alloc+0x146/0x1f0 net/core/dst.c:92
>  rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
>  ip_route_input_slow+0x1817/0x3a20 net/ipv4/route.c:2340
>  ip_route_input_rcu net/ipv4/route.c:2470 [inline]
>  ip_route_input_noref+0x116/0x2a0 net/ipv4/route.c:2415
>  ip_rcv_finish_core.constprop.0+0x288/0x1e80 net/ipv4/ip_input.c:354
>  ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
>  ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
>  ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
>  __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
>  __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
>  __netif_receive_skb_list net/core/dev.c:5608 [inline]
>  netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
>  gro_normal_list net/core/dev.c:5853 [inline]
>  gro_normal_list net/core/dev.c:5849 [inline]
>  napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
>  virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
>  virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
>  __napi_poll+0xaf/0x440 net/core/dev.c:7023
>  napi_poll net/core/dev.c:7090 [inline]
>  net_rx_action+0x801/0xb40 net/core/dev.c:7177
>  __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
> 
> Freed by task 13:
>  kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
>  kasan_set_track+0x21/0x30 mm/kasan/common.c:46
>  kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
>  ____kasan_slab_free mm/kasan/common.c:366 [inline]
>  ____kasan_slab_free mm/kasan/common.c:328 [inline]
>  __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374
>  kasan_slab_free include/linux/kasan.h:235 [inline]
>  slab_free_hook mm/slub.c:1723 [inline]
>  slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749
>  slab_free mm/slub.c:3513 [inline]
>  kmem_cache_free+0xbd/0x5d0 mm/slub.c:3530
>  dst_destroy+0x2d6/0x3f0 net/core/dst.c:127
>  rcu_do_batch kernel/rcu/tree.c:2506 [inline]
>  rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2741
>  __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
> 
> Last potentially related work creation:
>  kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
>  __kasan_record_aux_stack+0xf5/0x120 mm/kasan/generic.c:348
>  __call_rcu kernel/rcu/tree.c:2985 [inline]
>  call_rcu+0xb1/0x740 kernel/rcu/tree.c:3065
>  dst_release net/core/dst.c:177 [inline]
>  dst_release+0x79/0xe0 net/core/dst.c:167
>  tcp_v4_do_rcv+0x612/0x8d0 net/ipv4/tcp_ipv4.c:1712
>  sk_backlog_rcv include/net/sock.h:1030 [inline]
>  __release_sock+0x134/0x3b0 net/core/sock.c:2768
>  release_sock+0x54/0x1b0 net/core/sock.c:3300
>  tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1441
>  inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
>  sock_sendmsg_nosec net/socket.c:704 [inline]
>  sock_sendmsg+0xcf/0x120 net/socket.c:724
>  sock_write_iter+0x289/0x3c0 net/socket.c:1057
>  call_write_iter include/linux/fs.h:2162 [inline]
>  new_sync_write+0x429/0x660 fs/read_write.c:503
>  vfs_write+0x7cd/0xae0 fs/read_write.c:590
>  ksys_write+0x1ee/0x250 fs/read_write.c:643
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> The buggy address belongs to the object at ffff88807f1cb700
>  which belongs to the cache ip_dst_cache of size 176
> The buggy address is located 58 bytes inside of
>  176-byte region [ffff88807f1cb700, ffff88807f1cb7b0)
> The buggy address belongs to the page:
> page:ffffea0001fc72c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7f1cb
> flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
> raw: 00fff00000000200 dead000000000100 dead000000000122 ffff8881413bb780
> raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
> page_owner tracks the page as allocated
> page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 5, ts 108466983062, free_ts 108048976062
>  prep_new_page mm/page_alloc.c:2418 [inline]
>  get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
>  __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
>  alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191
>  alloc_slab_page mm/slub.c:1793 [inline]
>  allocate_slab mm/slub.c:1930 [inline]
>  new_slab+0x32d/0x4a0 mm/slub.c:1993
>  ___slab_alloc+0x918/0xfe0 mm/slub.c:3022
>  __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109
>  slab_alloc_node mm/slub.c:3200 [inline]
>  slab_alloc mm/slub.c:3242 [inline]
>  kmem_cache_alloc+0x35c/0x3a0 mm/slub.c:3247
>  dst_alloc+0x146/0x1f0 net/core/dst.c:92
>  rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
>  __mkroute_output net/ipv4/route.c:2564 [inline]
>  ip_route_output_key_hash_rcu+0x921/0x2d00 net/ipv4/route.c:2791
>  ip_route_output_key_hash+0x18b/0x300 net/ipv4/route.c:2619
>  __ip_route_output_key include/net/route.h:126 [inline]
>  ip_route_output_flow+0x23/0x150 net/ipv4/route.c:2850
>  ip_route_output_key include/net/route.h:142 [inline]
>  geneve_get_v4_rt+0x3a6/0x830 drivers/net/geneve.c:809
>  geneve_xmit_skb drivers/net/geneve.c:899 [inline]
>  geneve_xmit+0xc4a/0x3540 drivers/net/geneve.c:1082
>  __netdev_start_xmit include/linux/netdevice.h:4994 [inline]
>  netdev_start_xmit include/linux/netdevice.h:5008 [inline]
>  xmit_one net/core/dev.c:3590 [inline]
>  dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3606
>  __dev_queue_xmit+0x299a/0x3650 net/core/dev.c:4229
> page last free stack trace:
>  reset_page_owner include/linux/page_owner.h:24 [inline]
>  free_pages_prepare mm/page_alloc.c:1338 [inline]
>  free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389
>  free_unref_page_prepare mm/page_alloc.c:3309 [inline]
>  free_unref_page+0x19/0x690 mm/page_alloc.c:3388
>  qlink_free mm/kasan/quarantine.c:146 [inline]
>  qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165
>  kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
>  __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:444
>  kasan_slab_alloc include/linux/kasan.h:259 [inline]
>  slab_post_alloc_hook mm/slab.h:519 [inline]
>  slab_alloc_node mm/slub.c:3234 [inline]
>  kmem_cache_alloc_node+0x255/0x3f0 mm/slub.c:3270
>  __alloc_skb+0x215/0x340 net/core/skbuff.c:414
>  alloc_skb include/linux/skbuff.h:1126 [inline]
>  alloc_skb_with_frags+0x93/0x620 net/core/skbuff.c:6078
>  sock_alloc_send_pskb+0x783/0x910 net/core/sock.c:2575
>  mld_newpack+0x1df/0x770 net/ipv6/mcast.c:1754
>  add_grhead+0x265/0x330 net/ipv6/mcast.c:1857
>  add_grec+0x1053/0x14e0 net/ipv6/mcast.c:1995
>  mld_send_initial_cr.part.0+0xf6/0x230 net/ipv6/mcast.c:2242
>  mld_send_initial_cr net/ipv6/mcast.c:1232 [inline]
>  mld_dad_work+0x1d3/0x690 net/ipv6/mcast.c:2268
>  process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
>  worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
> 
> Memory state around the buggy address:
>  ffff88807f1cb600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff88807f1cb680: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
> >ffff88807f1cb700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                         ^
>  ffff88807f1cb780: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
>  ffff88807f1cb800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> 
> Fixes: 41063e9dd119 ("ipv4: Early TCP socket demux.")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/net/sock.h   |  2 +-
>  net/ipv4/af_inet.c   |  2 +-
>  net/ipv4/tcp.c       |  3 +--
>  net/ipv4/tcp_input.c |  2 +-
>  net/ipv4/tcp_ipv4.c  | 11 +++++++----
>  net/ipv4/udp.c       |  6 +++---
>  net/ipv6/tcp_ipv6.c  | 11 +++++++----
>  net/ipv6/udp.c       |  4 ++--
>  8 files changed, 23 insertions(+), 18 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index bea21ff70e74d906216f4eaa2d5a712d12551216..d47e9658da28545c1f6afd9db0cf136b3e13d7b6 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -431,7 +431,7 @@ struct sock {
>  #ifdef CONFIG_XFRM
>  	struct xfrm_policy __rcu *sk_policy[2];
>  #endif
> -	struct dst_entry	*sk_rx_dst;
> +	struct dst_entry __rcu	*sk_rx_dst;
>  	int			sk_rx_dst_ifindex;
>  	u32			sk_rx_dst_cookie;
>  
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 0189e3cd4a7df2dc2ea7121182ee290e0164df90..6b5956500436187d5e9b801ebb6f8f52ba0db090 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -154,7 +154,7 @@ void inet_sock_destruct(struct sock *sk)
>  
>  	kfree(rcu_dereference_protected(inet->inet_opt, 1));
>  	dst_release(rcu_dereference_protected(sk->sk_dst_cache, 1));
> -	dst_release(sk->sk_rx_dst);
> +	dst_release(rcu_dereference_protected(sk->sk_rx_dst, 1));
>  	sk_refcnt_debug_dec(sk);
>  }
>  EXPORT_SYMBOL(inet_sock_destruct);
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index bbb3d39c69afc2d5a42c6ace8d473657861da61f..2bb28bfd83bf621b5d9f3fb7ce5695195697b43a 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3012,8 +3012,7 @@ int tcp_disconnect(struct sock *sk, int flags)
>  	icsk->icsk_ack.rcv_mss = TCP_MIN_MSS;
>  	memset(&tp->rx_opt, 0, sizeof(tp->rx_opt));
>  	__sk_dst_reset(sk);
> -	dst_release(sk->sk_rx_dst);
> -	sk->sk_rx_dst = NULL;
> +	dst_release(xchg((__force struct dst_entry **)&sk->sk_rx_dst, NULL));
>  	tcp_saved_syn_free(tp);
>  	tp->compressed_ack = 0;
>  	tp->segs_in = 0;
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 246ab7b5e857eb9e802c4805075e89c98cf00636..0ce46849ec3d4595699dd54229919b2d66b70257 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5787,7 +5787,7 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb)
>  	trace_tcp_probe(sk, skb);
>  
>  	tcp_mstamp_refresh(tp);
> -	if (unlikely(!sk->sk_rx_dst))
> +	if (unlikely(!rcu_access_pointer(sk->sk_rx_dst)))
>  		inet_csk(sk)->icsk_af_ops->sk_rx_dst_set(sk, skb);
>  	/*
>  	 *	Header prediction.
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 13d868c43284584ee0c58ddfd411bb52c8b0c830..084df223b5dff8089a615a4a8d392b620fc0a28a 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1701,7 +1701,10 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
>  	struct sock *rsk;
>  
>  	if (sk->sk_state == TCP_ESTABLISHED) { /* Fast path */
> -		struct dst_entry *dst = sk->sk_rx_dst;
> +		struct dst_entry *dst;
> +
> +		dst = rcu_dereference_protected(sk->sk_rx_dst,
> +						lockdep_sock_is_held(sk));
>  
>  		sock_rps_save_rxhash(sk, skb);
>  		sk_mark_napi_id(sk, skb);
> @@ -1709,8 +1712,8 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
>  			if (sk->sk_rx_dst_ifindex != skb->skb_iif ||
>  			    !INDIRECT_CALL_1(dst->ops->check, ipv4_dst_check,
>  					     dst, 0)) {
> +				RCU_INIT_POINTER(sk->sk_rx_dst, NULL);
>  				dst_release(dst);
> -				sk->sk_rx_dst = NULL;
>  			}
>  		}
>  		tcp_rcv_established(sk, skb);
> @@ -1786,7 +1789,7 @@ int tcp_v4_early_demux(struct sk_buff *skb)
>  		skb->sk = sk;
>  		skb->destructor = sock_edemux;
>  		if (sk_fullsock(sk)) {
> -			struct dst_entry *dst = READ_ONCE(sk->sk_rx_dst);
> +			struct dst_entry *dst = rcu_dereference(sk->sk_rx_dst);
>  
>  			if (dst)
>  				dst = dst_check(dst, 0);
> @@ -2201,7 +2204,7 @@ void inet_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
>  	struct dst_entry *dst = skb_dst(skb);
>  
>  	if (dst && dst_hold_safe(dst)) {
> -		sk->sk_rx_dst = dst;
> +		rcu_assign_pointer(sk->sk_rx_dst, dst);
>  		sk->sk_rx_dst_ifindex = skb->skb_iif;
>  	}
>  }
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 23b05e28490b0a0a690a837027f26167e353f8ce..15c6b450b8dba44d5f344a554eea6b991c1ea5f1 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -2250,7 +2250,7 @@ bool udp_sk_rx_dst_set(struct sock *sk, struct dst_entry *dst)
>  	struct dst_entry *old;
>  
>  	if (dst_hold_safe(dst)) {
> -		old = xchg(&sk->sk_rx_dst, dst);
> +		old = xchg((__force struct dst_entry **)&sk->sk_rx_dst, dst);
>  		dst_release(old);
>  		return old != dst;
>  	}
> @@ -2440,7 +2440,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
>  		struct dst_entry *dst = skb_dst(skb);
>  		int ret;
>  
> -		if (unlikely(sk->sk_rx_dst != dst))
> +		if (unlikely(rcu_dereference(sk->sk_rx_dst) != dst))
>  			udp_sk_rx_dst_set(sk, dst);
>  
>  		ret = udp_unicast_rcv_skb(sk, skb, uh);
> @@ -2599,7 +2599,7 @@ int udp_v4_early_demux(struct sk_buff *skb)
>  
>  	skb->sk = sk;
>  	skb->destructor = sock_efree;
> -	dst = READ_ONCE(sk->sk_rx_dst);
> +	dst = rcu_dereference(sk->sk_rx_dst);
>  
>  	if (dst)
>  		dst = dst_check(dst, 0);
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index 551fce49841d7f53a111b0435855634cece2b40a..680e6481b9672040ccb41fd08ab80166575bef50 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -107,7 +107,7 @@ static void inet6_sk_rx_dst_set(struct sock *sk, const struct sk_buff *skb)
>  	if (dst && dst_hold_safe(dst)) {
>  		const struct rt6_info *rt = (const struct rt6_info *)dst;
>  
> -		sk->sk_rx_dst = dst;
> +		rcu_assign_pointer(sk->sk_rx_dst, dst);
>  		sk->sk_rx_dst_ifindex = skb->skb_iif;
>  		sk->sk_rx_dst_cookie = rt6_get_cookie(rt);
>  	}
> @@ -1505,7 +1505,10 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
>  		opt_skb = skb_clone(skb, sk_gfp_mask(sk, GFP_ATOMIC));
>  
>  	if (sk->sk_state == TCP_ESTABLISHED) { /* Fast path */
> -		struct dst_entry *dst = sk->sk_rx_dst;
> +		struct dst_entry *dst;
> +
> +		dst = rcu_dereference_protected(sk->sk_rx_dst,
> +						lockdep_sock_is_held(sk));
>  
>  		sock_rps_save_rxhash(sk, skb);
>  		sk_mark_napi_id(sk, skb);
> @@ -1513,8 +1516,8 @@ static int tcp_v6_do_rcv(struct sock *sk, struct sk_buff *skb)
>  			if (sk->sk_rx_dst_ifindex != skb->skb_iif ||
>  			    INDIRECT_CALL_1(dst->ops->check, ip6_dst_check,
>  					    dst, sk->sk_rx_dst_cookie) == NULL) {
> +				RCU_INIT_POINTER(sk->sk_rx_dst, NULL);
>  				dst_release(dst);
> -				sk->sk_rx_dst = NULL;
>  			}
>  		}
>  
> @@ -1874,7 +1877,7 @@ INDIRECT_CALLABLE_SCOPE void tcp_v6_early_demux(struct sk_buff *skb)
>  		skb->sk = sk;
>  		skb->destructor = sock_edemux;
>  		if (sk_fullsock(sk)) {
> -			struct dst_entry *dst = READ_ONCE(sk->sk_rx_dst);
> +			struct dst_entry *dst = rcu_dereference(sk->sk_rx_dst);
>  
>  			if (dst)
>  				dst = dst_check(dst, sk->sk_rx_dst_cookie);
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index e43b31d25fb61c7875f3bb8a93eb74da244d912a..a2caca6ccf114546f9e4ea854ad67208b2f3873e 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -956,7 +956,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
>  		struct dst_entry *dst = skb_dst(skb);
>  		int ret;
>  
> -		if (unlikely(sk->sk_rx_dst != dst))
> +		if (unlikely(rcu_dereference(sk->sk_rx_dst) != dst))
>  			udp6_sk_rx_dst_set(sk, dst);
>  
>  		if (!uh->check && !udp_sk(sk)->no_check6_rx) {
> @@ -1070,7 +1070,7 @@ INDIRECT_CALLABLE_SCOPE void udp_v6_early_demux(struct sk_buff *skb)
>  
>  	skb->sk = sk;
>  	skb->destructor = sock_efree;
> -	dst = READ_ONCE(sk->sk_rx_dst);
> +	dst = rcu_dereference(sk->sk_rx_dst);
>  
>  	if (dst)
>  		dst = dst_check(dst, sk->sk_rx_dst_cookie);
> -- 
> 2.34.1.173.g76aa8bc2d0-goog
> 
> 

Eric, this patch was picked for v5.15 stable and I wonder whether this
needs to be backported to older branches too. The Fixes commit quoted
here seems to go back all the way to v3.5. Would you know?

--
Carlos Llamas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] inet: fully convert sk->sk_rx_dst to RCU rules
  2022-10-20 21:05 ` Carlos Llamas
@ 2022-10-20 21:53   ` Eric Dumazet
  2022-10-20 22:55     ` Carlos Llamas
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2022-10-20 21:53 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev, liuwei.a

On Thu, Oct 20, 2022 at 2:05 PM Carlos Llamas <cmllamas@google.com> wrote:
>
> On Mon, Dec 20, 2021 at 06:33:30AM -0800, Eric Dumazet wrote:

>
> Eric, this patch was picked for v5.15 stable and I wonder whether this
> needs to be backported to older branches too. The Fixes commit quoted
> here seems to go back all the way to v3.5. Would you know?

I guess nobody cared to address some merge conflicts on older kernel versions.

If you want, you could handle the backport, this patch can help some
rare race windows
on preemptable kernels.

>
> --
> Carlos Llamas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] inet: fully convert sk->sk_rx_dst to RCU rules
  2022-10-20 21:53   ` Eric Dumazet
@ 2022-10-20 22:55     ` Carlos Llamas
  0 siblings, 0 replies; 5+ messages in thread
From: Carlos Llamas @ 2022-10-20 22:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev, liuwei.a

On Thu, Oct 20, 2022 at 02:53:06PM -0700, Eric Dumazet wrote:
> On Thu, Oct 20, 2022 at 2:05 PM Carlos Llamas <cmllamas@google.com> wrote:
> >
> > On Mon, Dec 20, 2021 at 06:33:30AM -0800, Eric Dumazet wrote:
> 
> >
> > Eric, this patch was picked for v5.15 stable and I wonder whether this
> > needs to be backported to older branches too. The Fixes commit quoted
> > here seems to go back all the way to v3.5. Would you know?
> 
> I guess nobody cared to address some merge conflicts on older kernel versions.
> 
> If you want, you could handle the backport, this patch can help some
> rare race windows
> on preemptable kernels.

Sounds good, I'll have a look at backporting this patch.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-10-20 22:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-20 14:33 [PATCH net] inet: fully convert sk->sk_rx_dst to RCU rules Eric Dumazet
2021-12-21  3:10 ` patchwork-bot+netdevbpf
2022-10-20 21:05 ` Carlos Llamas
2022-10-20 21:53   ` Eric Dumazet
2022-10-20 22:55     ` Carlos Llamas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.