All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-18 10:00 ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-18 10:00 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, bfields, jlayton, davem,
	linux-nfs, netdev, linux-kernel, pabeni
  Cc: Vadim Lomovtsev

While running nfs/connectathon tests kernel NULL-pointer exception
has been observed due to races in svcsock.c.

Race is appear when kernel accepts connection by kernel_accept
(which creates new socket) and start queuing ingress packets
to new socket. This happanes in ksoftirq context which concurrently
on a differnt core while new socket setup is not done yet.

The fix is to re-order socket user data init sequence, add NULL-ptr
check before callback call along with barriers to prevent kernel crash.

Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

Crash log:
---<-snip->---
[ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 6708.647093] pgd = ffff0000094e0000
[ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
[ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
[ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
[ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
[ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
[ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
[ 6708.773822] PC is at 0x0
[ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
[ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
[ 6708.789248] sp : ffff810ffbad3900
[ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
[ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
[ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
[ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
[ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
[ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
[ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
[ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
[ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
[ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
[ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
[ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
[ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
[ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
[ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
[ 6708.872084]
[ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
[ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
[ 6708.886075] Call trace:
[ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
[ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
[ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
[ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
[ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
[ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
[ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
[ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
[ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
[ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
[ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
[ 6708.973117] [<          (null)>]           (null)
[ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
[ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
[ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
[ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
[ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
[ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
[ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
[ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
[ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
[ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
[ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
[ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
[ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
[ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
[ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
[ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
[ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
[ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
[ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
[ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
[ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
[ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
[ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
[ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
[ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
[ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
[ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
[ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
[ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
[ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
[ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
[ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
[ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
[ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
[ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
[ 6709.207967] Code: bad PC value
[ 6709.211061] SMP: stopping secondary CPUs
[ 6709.218830] Starting crashdump kernel...
[ 6709.222749] Bye!
---<-snip>---

Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
---
 net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2b720fa..b6496f3 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
 		dprintk("svc: socket %p(inet %p), busy=%d\n",
 			svsk, sk,
 			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
-		svsk->sk_odata(sk);
+		rmb();
+		if (svsk->sk_odata)
+			svsk->sk_odata(sk);
 		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
 			svc_xprt_enqueue(&svsk->sk_xprt);
 	}
@@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
 	if (svsk) {
 		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
 			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
-		svsk->sk_owspace(sk);
+		rmb();
+		if (svsk->sk_owspace)
+			svsk->sk_owspace(sk);
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
 }
@@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
 	dprintk("svc: socket %p TCP (listen) state change %d\n",
 		sk, sk->sk_state);
 
-	if (svsk)
-		svsk->sk_odata(sk);
+	if (svsk) { 
+		rmb();
+		if (svsk->sk_odata)
+			svsk->sk_odata(sk);
+	}
+
 	/*
 	 * This callback may called twice when a new connection
 	 * is established as a child socket inherits everything
@@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
 	if (!svsk)
 		printk("svc: socket %p: no user data\n", sk);
 	else {
-		svsk->sk_ostate(sk);
+		rmb();
+		if (svsk->sk_ostate)
+			svsk->sk_ostate(sk);
+
 		if (sk->sk_state != TCP_ESTABLISHED) {
 			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
 			svc_xprt_enqueue(&svsk->sk_xprt);
@@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
 		return ERR_PTR(err);
 	}
 
-	inet->sk_user_data = svsk;
 	svsk->sk_sock = sock;
 	svsk->sk_sk = inet;
 	svsk->sk_ostate = inet->sk_state_change;
 	svsk->sk_odata = inet->sk_data_ready;
 	svsk->sk_owspace = inet->sk_write_space;
+	wmb();
+	inet->sk_user_data = svsk;
 
 	/* Initialize the socket */
 	if (sock->type == SOCK_DGRAM)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-18 10:00 ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-18 10:00 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, bfields, jlayton, davem,
	linux-nfs, netdev, linux-kernel, pabeni
  Cc: Vadim Lomovtsev

While running nfs/connectathon tests kernel NULL-pointer exception
has been observed due to races in svcsock.c.

Race is appear when kernel accepts connection by kernel_accept
(which creates new socket) and start queuing ingress packets
to new socket. This happanes in ksoftirq context which concurrently
on a differnt core while new socket setup is not done yet.

The fix is to re-order socket user data init sequence, add NULL-ptr
check before callback call along with barriers to prevent kernel crash.

Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

Crash log:
---<-snip->---
[ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 6708.647093] pgd = ffff0000094e0000
[ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
[ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
[ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
  fb_sys_fops
[ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
[ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
[ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
[ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
[ 6708.773822] PC is at 0x0
[ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
[ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
[ 6708.789248] sp : ffff810ffbad3900
[ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
[ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
[ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
[ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
[ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
[ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
[ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
[ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
[ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
[ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
[ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
[ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
[ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
[ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
[ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
[ 6708.872084]
[ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
[ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
[ 6708.886075] Call trace:
[ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
[ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
[ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
[ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
[ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
[ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
[ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
[ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
[ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
[ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
[ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
[ 6708.973117] [<          (null)>]           (null)
[ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
[ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
[ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
[ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
[ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
[ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
[ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
[ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
[ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
[ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
[ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
[ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
[ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
[ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
[ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
[ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
[ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
[ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
[ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
[ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
[ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
[ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
[ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
[ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
[ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
[ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
[ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
[ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
[ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
[ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
[ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
[ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
[ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
[ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
[ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
[ 6709.207967] Code: bad PC value
[ 6709.211061] SMP: stopping secondary CPUs
[ 6709.218830] Starting crashdump kernel...
[ 6709.222749] Bye!
---<-snip>---

Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
---
 net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2b720fa..b6496f3 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
 		dprintk("svc: socket %p(inet %p), busy=%d\n",
 			svsk, sk,
 			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
-		svsk->sk_odata(sk);
+		rmb();
+		if (svsk->sk_odata)
+			svsk->sk_odata(sk);
 		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
 			svc_xprt_enqueue(&svsk->sk_xprt);
 	}
@@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
 	if (svsk) {
 		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
 			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
-		svsk->sk_owspace(sk);
+		rmb();
+		if (svsk->sk_owspace)
+			svsk->sk_owspace(sk);
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
 }
@@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
 	dprintk("svc: socket %p TCP (listen) state change %d\n",
 		sk, sk->sk_state);
 
-	if (svsk)
-		svsk->sk_odata(sk);
+	if (svsk) { 
+		rmb();
+		if (svsk->sk_odata)
+			svsk->sk_odata(sk);
+	}
+
 	/*
 	 * This callback may called twice when a new connection
 	 * is established as a child socket inherits everything
@@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
 	if (!svsk)
 		printk("svc: socket %p: no user data\n", sk);
 	else {
-		svsk->sk_ostate(sk);
+		rmb();
+		if (svsk->sk_ostate)
+			svsk->sk_ostate(sk);
+
 		if (sk->sk_state != TCP_ESTABLISHED) {
 			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
 			svc_xprt_enqueue(&svsk->sk_xprt);
@@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
 		return ERR_PTR(err);
 	}
 
-	inet->sk_user_data = svsk;
 	svsk->sk_sock = sock;
 	svsk->sk_sk = inet;
 	svsk->sk_ostate = inet->sk_state_change;
 	svsk->sk_odata = inet->sk_data_ready;
 	svsk->sk_owspace = inet->sk_write_space;
+	wmb();
+	inet->sk_user_data = svsk;
 
 	/* Initialize the socket */
 	if (sock->type == SOCK_DGRAM)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-18 10:27   ` Jeff Layton
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff Layton @ 2017-08-18 10:27 UTC (permalink / raw)
  To: Vadim Lomovtsev, trond.myklebust, anna.schumaker, bfields, davem,
	linux-nfs, netdev, linux-kernel, pabeni

On Fri, 2017-08-18 at 06:00 -0400, Vadim Lomovtsev wrote:
> While running nfs/connectathon tests kernel NULL-pointer exception
> has been observed due to races in svcsock.c.
> 
> Race is appear when kernel accepts connection by kernel_accept
> (which creates new socket) and start queuing ingress packets
> to new socket. This happanes in ksoftirq context which concurrently
> on a differnt core while new socket setup is not done yet.
> 
> The fix is to re-order socket user data init sequence, add NULL-ptr
> check before callback call along with barriers to prevent kernel crash.
> 
> Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> 
> Crash log:
> ---<-snip->---
> [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [ 6708.647093] pgd = ffff0000094e0000
> [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> [ 6708.773822] PC is at 0x0
> [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> [ 6708.789248] sp : ffff810ffbad3900
> [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> [ 6708.872084]
> [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> [ 6708.886075] Call trace:
> [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> [ 6708.973117] [<          (null)>]           (null)
> [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> [ 6709.207967] Code: bad PC value
> [ 6709.211061] SMP: stopping secondary CPUs
> [ 6709.218830] Starting crashdump kernel...
> [ 6709.222749] Bye!
> ---<-snip>---
> 
> Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> ---
>  net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 2b720fa..b6496f3 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
>  		dprintk("svc: socket %p(inet %p), busy=%d\n",
>  			svsk, sk,
>  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> -		svsk->sk_odata(sk);
> +		rmb();
> +		if (svsk->sk_odata)
> +			svsk->sk_odata(sk);

I'm not convinced these checks will really do much, though you might
need the rmb() calls there...see below...

>  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
>  			svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
> @@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
>  	if (svsk) {
>  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
>  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> -		svsk->sk_owspace(sk);
> +		rmb();
> +		if (svsk->sk_owspace)
> +			svsk->sk_owspace(sk);
>  		svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
>  }
> @@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>  	dprintk("svc: socket %p TCP (listen) state change %d\n",
>  		sk, sk->sk_state);
>  
> -	if (svsk)
> -		svsk->sk_odata(sk);
> +	if (svsk) { 
> +		rmb();
> +		if (svsk->sk_odata)
> +			svsk->sk_odata(sk);
> +	}
> +
>  	/*
>  	 * This callback may called twice when a new connection
>  	 * is established as a child socket inherits everything
> @@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
>  	if (!svsk)
>  		printk("svc: socket %p: no user data\n", sk);
>  	else {
> -		svsk->sk_ostate(sk);
> +		rmb();
> +		if (svsk->sk_ostate)
> +			svsk->sk_ostate(sk);
> +
>  		if (sk->sk_state != TCP_ESTABLISHED) {
>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  		return ERR_PTR(err);
>  	}
>  
> -	inet->sk_user_data = svsk;
>  	svsk->sk_sock = sock;
>  	svsk->sk_sk = inet;
>  	svsk->sk_ostate = inet->sk_state_change;
>  	svsk->sk_odata = inet->sk_data_ready;
>  	svsk->sk_owspace = inet->sk_write_space;
> +	wmb();
> +	inet->sk_user_data = svsk;
>  
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)

Since we always check whether svsk is NULL before calling the functions,
then the above hunk and maybe the rmb() calls might be all that's needed
here. I don't see how we'd ever get a sk_odata value (for instance)
that's NULL with the above change.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-18 10:27   ` Jeff Layton
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff Layton @ 2017-08-18 10:27 UTC (permalink / raw)
  To: Vadim Lomovtsev, trond.myklebust-7I+n7zu2hftEKMMhf/gKZA,
	anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	pabeni-H+wXaHxf7aLQT0dZR+AlfA

On Fri, 2017-08-18 at 06:00 -0400, Vadim Lomovtsev wrote:
> While running nfs/connectathon tests kernel NULL-pointer exception
> has been observed due to races in svcsock.c.
> 
> Race is appear when kernel accepts connection by kernel_accept
> (which creates new socket) and start queuing ingress packets
> to new socket. This happanes in ksoftirq context which concurrently
> on a differnt core while new socket setup is not done yet.
> 
> The fix is to re-order socket user data init sequence, add NULL-ptr
> check before callback call along with barriers to prevent kernel crash.
> 
> Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> 
> Crash log:
> ---<-snip->---
> [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [ 6708.647093] pgd = ffff0000094e0000
> [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgb
 lt fb_sys_fops
> [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> [ 6708.773822] PC is at 0x0
> [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> [ 6708.789248] sp : ffff810ffbad3900
> [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> [ 6708.872084]
> [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> [ 6708.886075] Call trace:
> [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> [ 6708.973117] [<          (null)>]           (null)
> [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> [ 6709.207967] Code: bad PC value
> [ 6709.211061] SMP: stopping secondary CPUs
> [ 6709.218830] Starting crashdump kernel...
> [ 6709.222749] Bye!
> ---<-snip>---
> 
> Signed-off-by: Vadim Lomovtsev <vlomovts-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 2b720fa..b6496f3 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
>  		dprintk("svc: socket %p(inet %p), busy=%d\n",
>  			svsk, sk,
>  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> -		svsk->sk_odata(sk);
> +		rmb();
> +		if (svsk->sk_odata)
> +			svsk->sk_odata(sk);

I'm not convinced these checks will really do much, though you might
need the rmb() calls there...see below...

>  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
>  			svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
> @@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
>  	if (svsk) {
>  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
>  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> -		svsk->sk_owspace(sk);
> +		rmb();
> +		if (svsk->sk_owspace)
> +			svsk->sk_owspace(sk);
>  		svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
>  }
> @@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>  	dprintk("svc: socket %p TCP (listen) state change %d\n",
>  		sk, sk->sk_state);
>  
> -	if (svsk)
> -		svsk->sk_odata(sk);
> +	if (svsk) { 
> +		rmb();
> +		if (svsk->sk_odata)
> +			svsk->sk_odata(sk);
> +	}
> +
>  	/*
>  	 * This callback may called twice when a new connection
>  	 * is established as a child socket inherits everything
> @@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
>  	if (!svsk)
>  		printk("svc: socket %p: no user data\n", sk);
>  	else {
> -		svsk->sk_ostate(sk);
> +		rmb();
> +		if (svsk->sk_ostate)
> +			svsk->sk_ostate(sk);
> +
>  		if (sk->sk_state != TCP_ESTABLISHED) {
>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  		return ERR_PTR(err);
>  	}
>  
> -	inet->sk_user_data = svsk;
>  	svsk->sk_sock = sock;
>  	svsk->sk_sk = inet;
>  	svsk->sk_ostate = inet->sk_state_change;
>  	svsk->sk_odata = inet->sk_data_ready;
>  	svsk->sk_owspace = inet->sk_write_space;
> +	wmb();
> +	inet->sk_user_data = svsk;
>  
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)

Since we always check whether svsk is NULL before calling the functions,
then the above hunk and maybe the rmb() calls might be all that's needed
here. I don't see how we'd ever get a sk_odata value (for instance)
that's NULL with the above change.

-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-18 11:08     ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-18 11:08 UTC (permalink / raw)
  To: Jeff Layton
  Cc: trond.myklebust, anna.schumaker, bfields, davem, linux-nfs,
	netdev, linux-kernel, pabeni, vlomovts

Hi Jeff,

On Fri, Aug 18, 2017 at 06:27:32AM -0400, Jeff Layton wrote:
> On Fri, 2017-08-18 at 06:00 -0400, Vadim Lomovtsev wrote:
> > While running nfs/connectathon tests kernel NULL-pointer exception
> > has been observed due to races in svcsock.c.
> > 
> > Race is appear when kernel accepts connection by kernel_accept
> > (which creates new socket) and start queuing ingress packets
> > to new socket. This happanes in ksoftirq context which concurrently
> > on a differnt core while new socket setup is not done yet.
> > 
> > The fix is to re-order socket user data init sequence, add NULL-ptr
> > check before callback call along with barriers to prevent kernel crash.
> > 
> > Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> > 
> > Crash log:
> > ---<-snip->---
> > [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > [ 6708.647093] pgd = ffff0000094e0000
> > [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> > [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> > [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> > [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> > [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> > [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> > [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> > [ 6708.773822] PC is at 0x0
> > [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> > [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> > [ 6708.789248] sp : ffff810ffbad3900
> > [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> > [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> > [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> > [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> > [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> > [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> > [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> > [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> > [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> > [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> > [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> > [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> > [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> > [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> > [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> > [ 6708.872084]
> > [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> > [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> > [ 6708.886075] Call trace:
> > [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> > [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> > [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> > [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> > [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> > [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> > [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> > [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> > [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> > [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> > [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> > [ 6708.973117] [<          (null)>]           (null)
> > [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> > [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> > [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> > [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> > [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> > [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> > [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> > [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> > [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> > [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> > [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> > [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> > [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> > [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> > [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> > [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> > [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> > [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> > [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> > [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> > [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> > [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> > [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> > [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> > [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> > [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> > [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> > [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> > [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> > [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> > [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> > [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> > [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> > [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> > [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> > [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> > [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> > [ 6709.207967] Code: bad PC value
> > [ 6709.211061] SMP: stopping secondary CPUs
> > [ 6709.218830] Starting crashdump kernel...
> > [ 6709.222749] Bye!
> > ---<-snip>---
> > 
> > Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> > ---
> >  net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
> >  1 file changed, 18 insertions(+), 6 deletions(-)
> > 
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 2b720fa..b6496f3 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
> >  		dprintk("svc: socket %p(inet %p), busy=%d\n",
> >  			svsk, sk,
> >  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > -		svsk->sk_odata(sk);
> > +		rmb();
> > +		if (svsk->sk_odata)
> > +			svsk->sk_odata(sk);
> 
> I'm not convinced these checks will really do much, though you might
> need the rmb() calls there...see below...
> 
> >  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
> >  			svc_xprt_enqueue(&svsk->sk_xprt);
> >  	}
> > @@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
> >  	if (svsk) {
> >  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
> >  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > -		svsk->sk_owspace(sk);
> > +		rmb();
> > +		if (svsk->sk_owspace)
> > +			svsk->sk_owspace(sk);
> >  		svc_xprt_enqueue(&svsk->sk_xprt);
> >  	}
> >  }
> > @@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
> >  	dprintk("svc: socket %p TCP (listen) state change %d\n",
> >  		sk, sk->sk_state);
> >  
> > -	if (svsk)
> > -		svsk->sk_odata(sk);
> > +	if (svsk) { 
> > +		rmb();
> > +		if (svsk->sk_odata)
> > +			svsk->sk_odata(sk);
> > +	}
> > +
> >  	/*
> >  	 * This callback may called twice when a new connection
> >  	 * is established as a child socket inherits everything
> > @@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
> >  	if (!svsk)
> >  		printk("svc: socket %p: no user data\n", sk);
> >  	else {
> > -		svsk->sk_ostate(sk);
> > +		rmb();
> > +		if (svsk->sk_ostate)
> > +			svsk->sk_ostate(sk);
> > +
> >  		if (sk->sk_state != TCP_ESTABLISHED) {
> >  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > @@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> >  		return ERR_PTR(err);
> >  	}
> >  
> > -	inet->sk_user_data = svsk;
> >  	svsk->sk_sock = sock;
> >  	svsk->sk_sk = inet;
> >  	svsk->sk_ostate = inet->sk_state_change;
> >  	svsk->sk_odata = inet->sk_data_ready;
> >  	svsk->sk_owspace = inet->sk_write_space;
> > +	wmb();
> > +	inet->sk_user_data = svsk;
> >  
> >  	/* Initialize the socket */
> >  	if (sock->type == SOCK_DGRAM)
> 
> Since we always check whether svsk is NULL before calling the functions,
> then the above hunk and maybe the rmb() calls might be all that's needed
> here. I don't see how we'd ever get a sk_odata value (for instance)
> that's NULL with the above change.

Thanks for feed-back. I'll update patch then, run test cycle once again
(hope couple days would be enough) and re-send v2 to review. 

> 
> -- 
> Jeff Layton <jlayton@redhat.com>

WBR,
Vadim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-18 11:08     ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-18 11:08 UTC (permalink / raw)
  To: Jeff Layton
  Cc: trond.myklebust-7I+n7zu2hftEKMMhf/gKZA,
	anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	pabeni-H+wXaHxf7aLQT0dZR+AlfA, vlomovts-H+wXaHxf7aLQT0dZR+AlfA

Hi Jeff,

On Fri, Aug 18, 2017 at 06:27:32AM -0400, Jeff Layton wrote:
> On Fri, 2017-08-18 at 06:00 -0400, Vadim Lomovtsev wrote:
> > While running nfs/connectathon tests kernel NULL-pointer exception
> > has been observed due to races in svcsock.c.
> > 
> > Race is appear when kernel accepts connection by kernel_accept
> > (which creates new socket) and start queuing ingress packets
> > to new socket. This happanes in ksoftirq context which concurrently
> > on a differnt core while new socket setup is not done yet.
> > 
> > The fix is to re-order socket user data init sequence, add NULL-ptr
> > check before callback call along with barriers to prevent kernel crash.
> > 
> > Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> > 
> > Crash log:
> > ---<-snip->---
> > [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > [ 6708.647093] pgd = ffff0000094e0000
> > [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> > [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> > [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysim
 gblt fb_sys_fops
> > [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> > [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> > [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> > [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> > [ 6708.773822] PC is at 0x0
> > [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> > [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> > [ 6708.789248] sp : ffff810ffbad3900
> > [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> > [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> > [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> > [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> > [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> > [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> > [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> > [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> > [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> > [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> > [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> > [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> > [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> > [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> > [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> > [ 6708.872084]
> > [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> > [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> > [ 6708.886075] Call trace:
> > [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> > [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> > [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> > [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> > [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> > [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> > [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> > [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> > [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> > [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> > [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> > [ 6708.973117] [<          (null)>]           (null)
> > [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> > [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> > [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> > [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> > [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> > [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> > [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> > [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> > [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> > [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> > [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> > [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> > [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> > [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> > [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> > [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> > [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> > [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> > [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> > [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> > [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> > [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> > [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> > [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> > [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> > [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> > [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> > [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> > [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> > [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> > [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> > [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> > [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> > [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> > [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> > [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> > [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> > [ 6709.207967] Code: bad PC value
> > [ 6709.211061] SMP: stopping secondary CPUs
> > [ 6709.218830] Starting crashdump kernel...
> > [ 6709.222749] Bye!
> > ---<-snip>---
> > 
> > Signed-off-by: Vadim Lomovtsev <vlomovts-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> >  net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
> >  1 file changed, 18 insertions(+), 6 deletions(-)
> > 
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 2b720fa..b6496f3 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
> >  		dprintk("svc: socket %p(inet %p), busy=%d\n",
> >  			svsk, sk,
> >  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > -		svsk->sk_odata(sk);
> > +		rmb();
> > +		if (svsk->sk_odata)
> > +			svsk->sk_odata(sk);
> 
> I'm not convinced these checks will really do much, though you might
> need the rmb() calls there...see below...
> 
> >  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
> >  			svc_xprt_enqueue(&svsk->sk_xprt);
> >  	}
> > @@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
> >  	if (svsk) {
> >  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
> >  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > -		svsk->sk_owspace(sk);
> > +		rmb();
> > +		if (svsk->sk_owspace)
> > +			svsk->sk_owspace(sk);
> >  		svc_xprt_enqueue(&svsk->sk_xprt);
> >  	}
> >  }
> > @@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
> >  	dprintk("svc: socket %p TCP (listen) state change %d\n",
> >  		sk, sk->sk_state);
> >  
> > -	if (svsk)
> > -		svsk->sk_odata(sk);
> > +	if (svsk) { 
> > +		rmb();
> > +		if (svsk->sk_odata)
> > +			svsk->sk_odata(sk);
> > +	}
> > +
> >  	/*
> >  	 * This callback may called twice when a new connection
> >  	 * is established as a child socket inherits everything
> > @@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
> >  	if (!svsk)
> >  		printk("svc: socket %p: no user data\n", sk);
> >  	else {
> > -		svsk->sk_ostate(sk);
> > +		rmb();
> > +		if (svsk->sk_ostate)
> > +			svsk->sk_ostate(sk);
> > +
> >  		if (sk->sk_state != TCP_ESTABLISHED) {
> >  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > @@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> >  		return ERR_PTR(err);
> >  	}
> >  
> > -	inet->sk_user_data = svsk;
> >  	svsk->sk_sock = sock;
> >  	svsk->sk_sk = inet;
> >  	svsk->sk_ostate = inet->sk_state_change;
> >  	svsk->sk_odata = inet->sk_data_ready;
> >  	svsk->sk_owspace = inet->sk_write_space;
> > +	wmb();
> > +	inet->sk_user_data = svsk;
> >  
> >  	/* Initialize the socket */
> >  	if (sock->type == SOCK_DGRAM)
> 
> Since we always check whether svsk is NULL before calling the functions,
> then the above hunk and maybe the rmb() calls might be all that's needed
> here. I don't see how we'd ever get a sk_odata value (for instance)
> that's NULL with the above change.

Thanks for feed-back. I'll update patch then, run test cycle once again
(hope couple days would be enough) and re-send v2 to review. 

> 
> -- 
> Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

WBR,
Vadim
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
  2017-08-18 11:08     ` Vadim Lomovtsev
@ 2017-08-18 11:16       ` Jeff Layton
  -1 siblings, 0 replies; 33+ messages in thread
From: Jeff Layton @ 2017-08-18 11:16 UTC (permalink / raw)
  To: Vadim Lomovtsev
  Cc: trond.myklebust, anna.schumaker, bfields, davem, linux-nfs,
	netdev, linux-kernel, pabeni

On Fri, 2017-08-18 at 07:08 -0400, Vadim Lomovtsev wrote:
> Hi Jeff,
> 
> On Fri, Aug 18, 2017 at 06:27:32AM -0400, Jeff Layton wrote:
> > On Fri, 2017-08-18 at 06:00 -0400, Vadim Lomovtsev wrote:
> > > While running nfs/connectathon tests kernel NULL-pointer exception
> > > has been observed due to races in svcsock.c.
> > > 
> > > Race is appear when kernel accepts connection by kernel_accept
> > > (which creates new socket) and start queuing ingress packets
> > > to new socket. This happanes in ksoftirq context which concurrently
> > > on a differnt core while new socket setup is not done yet.
> > > 
> > > The fix is to re-order socket user data init sequence, add NULL-ptr
> > > check before callback call along with barriers to prevent kernel crash.
> > > 
> > > Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> > > 
> > > Crash log:
> > > ---<-snip->---
> > > [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > > [ 6708.647093] pgd = ffff0000094e0000
> > > [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> > > [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> > > [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> > > [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> > > [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> > > [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> > > [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> > > [ 6708.773822] PC is at 0x0
> > > [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> > > [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> > > [ 6708.789248] sp : ffff810ffbad3900
> > > [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> > > [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> > > [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> > > [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> > > [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> > > [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> > > [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> > > [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> > > [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> > > [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> > > [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> > > [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> > > [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> > > [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> > > [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> > > [ 6708.872084]
> > > [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> > > [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> > > [ 6708.886075] Call trace:
> > > [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> > > [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> > > [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> > > [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> > > [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> > > [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> > > [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> > > [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> > > [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> > > [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> > > [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> > > [ 6708.973117] [<          (null)>]           (null)
> > > [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> > > [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> > > [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> > > [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> > > [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> > > [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> > > [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> > > [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> > > [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> > > [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> > > [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> > > [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> > > [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> > > [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> > > [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> > > [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> > > [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> > > [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> > > [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> > > [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> > > [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> > > [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> > > [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> > > [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> > > [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> > > [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> > > [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> > > [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> > > [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> > > [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> > > [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> > > [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> > > [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> > > [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> > > [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> > > [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> > > [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> > > [ 6709.207967] Code: bad PC value
> > > [ 6709.211061] SMP: stopping secondary CPUs
> > > [ 6709.218830] Starting crashdump kernel...
> > > [ 6709.222749] Bye!
> > > ---<-snip>---
> > > 
> > > Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> > > ---
> > >  net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
> > >  1 file changed, 18 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > > index 2b720fa..b6496f3 100644
> > > --- a/net/sunrpc/svcsock.c
> > > +++ b/net/sunrpc/svcsock.c
> > > @@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
> > >  		dprintk("svc: socket %p(inet %p), busy=%d\n",
> > >  			svsk, sk,
> > >  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > > -		svsk->sk_odata(sk);
> > > +		rmb();
> > > +		if (svsk->sk_odata)
> > > +			svsk->sk_odata(sk);
> > 
> > I'm not convinced these checks will really do much, though you might
> > need the rmb() calls there...see below...
> > 
> > >  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
> > >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > >  	}
> > > @@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
> > >  	if (svsk) {
> > >  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
> > >  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > > -		svsk->sk_owspace(sk);
> > > +		rmb();
> > > +		if (svsk->sk_owspace)
> > > +			svsk->sk_owspace(sk);
> > >  		svc_xprt_enqueue(&svsk->sk_xprt);
> > >  	}
> > >  }
> > > @@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
> > >  	dprintk("svc: socket %p TCP (listen) state change %d\n",
> > >  		sk, sk->sk_state);
> > >  
> > > -	if (svsk)
> > > -		svsk->sk_odata(sk);
> > > +	if (svsk) { 
> > > +		rmb();
> > > +		if (svsk->sk_odata)
> > > +			svsk->sk_odata(sk);
> > > +	}
> > > +
> > >  	/*
> > >  	 * This callback may called twice when a new connection
> > >  	 * is established as a child socket inherits everything
> > > @@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
> > >  	if (!svsk)
> > >  		printk("svc: socket %p: no user data\n", sk);
> > >  	else {
> > > -		svsk->sk_ostate(sk);
> > > +		rmb();
> > > +		if (svsk->sk_ostate)
> > > +			svsk->sk_ostate(sk);
> > > +
> > >  		if (sk->sk_state != TCP_ESTABLISHED) {
> > >  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> > >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > > @@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> > >  		return ERR_PTR(err);
> > >  	}
> > >  
> > > -	inet->sk_user_data = svsk;
> > >  	svsk->sk_sock = sock;
> > >  	svsk->sk_sk = inet;
> > >  	svsk->sk_ostate = inet->sk_state_change;
> > >  	svsk->sk_odata = inet->sk_data_ready;
> > >  	svsk->sk_owspace = inet->sk_write_space;
> > > +	wmb();
> > > +	inet->sk_user_data = svsk;
> > >  
> > >  	/* Initialize the socket */
> > >  	if (sock->type == SOCK_DGRAM)
> > 
> > Since we always check whether svsk is NULL before calling the functions,
> > then the above hunk and maybe the rmb() calls might be all that's needed
> > here. I don't see how we'd ever get a sk_odata value (for instance)
> > that's NULL with the above change.
> 
> Thanks for feed-back. I'll update patch then, run test cycle once again
> (hope couple days would be enough) and re-send v2 to review. 
> 

Great. It might also be nice to sprinkle in some comments that document
how the rmb/wmb calls are paired together too.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-18 11:16       ` Jeff Layton
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff Layton @ 2017-08-18 11:16 UTC (permalink / raw)
  To: Vadim Lomovtsev
  Cc: trond.myklebust, anna.schumaker, bfields, davem, linux-nfs,
	netdev, linux-kernel, pabeni

On Fri, 2017-08-18 at 07:08 -0400, Vadim Lomovtsev wrote:
> Hi Jeff,
> 
> On Fri, Aug 18, 2017 at 06:27:32AM -0400, Jeff Layton wrote:
> > On Fri, 2017-08-18 at 06:00 -0400, Vadim Lomovtsev wrote:
> > > While running nfs/connectathon tests kernel NULL-pointer exception
> > > has been observed due to races in svcsock.c.
> > > 
> > > Race is appear when kernel accepts connection by kernel_accept
> > > (which creates new socket) and start queuing ingress packets
> > > to new socket. This happanes in ksoftirq context which concurrently
> > > on a differnt core while new socket setup is not done yet.
> > > 
> > > The fix is to re-order socket user data init sequence, add NULL-ptr
> > > check before callback call along with barriers to prevent kernel crash.
> > > 
> > > Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> > > 
> > > Crash log:
> > > ---<-snip->---
> > > [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > > [ 6708.647093] pgd = ffff0000094e0000
> > > [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> > > [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> > > [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sys
 imgblt fb_sys_fops
> > > [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> > > [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> > > [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> > > [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> > > [ 6708.773822] PC is at 0x0
> > > [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> > > [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> > > [ 6708.789248] sp : ffff810ffbad3900
> > > [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> > > [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> > > [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> > > [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> > > [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> > > [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> > > [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> > > [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> > > [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> > > [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> > > [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> > > [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> > > [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> > > [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> > > [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> > > [ 6708.872084]
> > > [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> > > [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> > > [ 6708.886075] Call trace:
> > > [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> > > [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> > > [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> > > [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> > > [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> > > [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> > > [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> > > [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> > > [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> > > [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> > > [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> > > [ 6708.973117] [<          (null)>]           (null)
> > > [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> > > [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> > > [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> > > [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> > > [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> > > [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> > > [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> > > [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> > > [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> > > [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> > > [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> > > [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> > > [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> > > [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> > > [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> > > [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> > > [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> > > [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> > > [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> > > [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> > > [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> > > [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> > > [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> > > [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> > > [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> > > [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> > > [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> > > [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> > > [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> > > [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> > > [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> > > [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> > > [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> > > [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> > > [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> > > [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> > > [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> > > [ 6709.207967] Code: bad PC value
> > > [ 6709.211061] SMP: stopping secondary CPUs
> > > [ 6709.218830] Starting crashdump kernel...
> > > [ 6709.222749] Bye!
> > > ---<-snip>---
> > > 
> > > Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> > > ---
> > >  net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
> > >  1 file changed, 18 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > > index 2b720fa..b6496f3 100644
> > > --- a/net/sunrpc/svcsock.c
> > > +++ b/net/sunrpc/svcsock.c
> > > @@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
> > >  		dprintk("svc: socket %p(inet %p), busy=%d\n",
> > >  			svsk, sk,
> > >  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > > -		svsk->sk_odata(sk);
> > > +		rmb();
> > > +		if (svsk->sk_odata)
> > > +			svsk->sk_odata(sk);
> > 
> > I'm not convinced these checks will really do much, though you might
> > need the rmb() calls there...see below...
> > 
> > >  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
> > >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > >  	}
> > > @@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
> > >  	if (svsk) {
> > >  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
> > >  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > > -		svsk->sk_owspace(sk);
> > > +		rmb();
> > > +		if (svsk->sk_owspace)
> > > +			svsk->sk_owspace(sk);
> > >  		svc_xprt_enqueue(&svsk->sk_xprt);
> > >  	}
> > >  }
> > > @@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
> > >  	dprintk("svc: socket %p TCP (listen) state change %d\n",
> > >  		sk, sk->sk_state);
> > >  
> > > -	if (svsk)
> > > -		svsk->sk_odata(sk);
> > > +	if (svsk) { 
> > > +		rmb();
> > > +		if (svsk->sk_odata)
> > > +			svsk->sk_odata(sk);
> > > +	}
> > > +
> > >  	/*
> > >  	 * This callback may called twice when a new connection
> > >  	 * is established as a child socket inherits everything
> > > @@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
> > >  	if (!svsk)
> > >  		printk("svc: socket %p: no user data\n", sk);
> > >  	else {
> > > -		svsk->sk_ostate(sk);
> > > +		rmb();
> > > +		if (svsk->sk_ostate)
> > > +			svsk->sk_ostate(sk);
> > > +
> > >  		if (sk->sk_state != TCP_ESTABLISHED) {
> > >  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> > >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > > @@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> > >  		return ERR_PTR(err);
> > >  	}
> > >  
> > > -	inet->sk_user_data = svsk;
> > >  	svsk->sk_sock = sock;
> > >  	svsk->sk_sk = inet;
> > >  	svsk->sk_ostate = inet->sk_state_change;
> > >  	svsk->sk_odata = inet->sk_data_ready;
> > >  	svsk->sk_owspace = inet->sk_write_space;
> > > +	wmb();
> > > +	inet->sk_user_data = svsk;
> > >  
> > >  	/* Initialize the socket */
> > >  	if (sock->type == SOCK_DGRAM)
> > 
> > Since we always check whether svsk is NULL before calling the functions,
> > then the above hunk and maybe the rmb() calls might be all that's needed
> > here. I don't see how we'd ever get a sk_odata value (for instance)
> > that's NULL with the above change.
> 
> Thanks for feed-back. I'll update patch then, run test cycle once again
> (hope couple days would be enough) and re-send v2 to review. 
> 

Great. It might also be nice to sprinkle in some comments that document
how the rmb/wmb calls are paired together too.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-18 11:34         ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-18 11:34 UTC (permalink / raw)
  To: Jeff Layton
  Cc: trond.myklebust, anna.schumaker, bfields, davem, linux-nfs,
	netdev, linux-kernel, pabeni, vlomovts

On Fri, Aug 18, 2017 at 07:16:45AM -0400, Jeff Layton wrote:
> On Fri, 2017-08-18 at 07:08 -0400, Vadim Lomovtsev wrote:
> > Hi Jeff,
> > 
> > On Fri, Aug 18, 2017 at 06:27:32AM -0400, Jeff Layton wrote:
> > > On Fri, 2017-08-18 at 06:00 -0400, Vadim Lomovtsev wrote:
> > > > While running nfs/connectathon tests kernel NULL-pointer exception
> > > > has been observed due to races in svcsock.c.
> > > > 
> > > > Race is appear when kernel accepts connection by kernel_accept
> > > > (which creates new socket) and start queuing ingress packets
> > > > to new socket. This happanes in ksoftirq context which concurrently
> > > > on a differnt core while new socket setup is not done yet.
> > > > 
> > > > The fix is to re-order socket user data init sequence, add NULL-ptr
> > > > check before callback call along with barriers to prevent kernel crash.
> > > > 
> > > > Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> > > > 
> > > > Crash log:
> > > > ---<-snip->---
> > > > [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > > > [ 6708.647093] pgd = ffff0000094e0000
> > > > [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> > > > [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> > > > [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> > > > [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> > > > [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> > > > [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> > > > [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> > > > [ 6708.773822] PC is at 0x0
> > > > [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> > > > [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> > > > [ 6708.789248] sp : ffff810ffbad3900
> > > > [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> > > > [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> > > > [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> > > > [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> > > > [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> > > > [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> > > > [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> > > > [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> > > > [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> > > > [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> > > > [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> > > > [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> > > > [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> > > > [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> > > > [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> > > > [ 6708.872084]
> > > > [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> > > > [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> > > > [ 6708.886075] Call trace:
> > > > [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> > > > [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> > > > [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> > > > [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> > > > [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> > > > [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> > > > [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> > > > [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> > > > [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> > > > [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> > > > [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> > > > [ 6708.973117] [<          (null)>]           (null)
> > > > [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> > > > [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> > > > [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> > > > [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> > > > [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> > > > [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> > > > [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> > > > [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> > > > [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> > > > [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> > > > [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> > > > [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> > > > [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> > > > [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> > > > [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> > > > [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> > > > [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> > > > [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> > > > [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> > > > [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> > > > [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> > > > [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> > > > [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> > > > [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> > > > [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> > > > [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> > > > [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> > > > [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> > > > [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> > > > [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> > > > [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> > > > [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> > > > [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> > > > [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> > > > [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> > > > [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> > > > [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> > > > [ 6709.207967] Code: bad PC value
> > > > [ 6709.211061] SMP: stopping secondary CPUs
> > > > [ 6709.218830] Starting crashdump kernel...
> > > > [ 6709.222749] Bye!
> > > > ---<-snip>---
> > > > 
> > > > Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> > > > ---
> > > >  net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
> > > >  1 file changed, 18 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > > > index 2b720fa..b6496f3 100644
> > > > --- a/net/sunrpc/svcsock.c
> > > > +++ b/net/sunrpc/svcsock.c
> > > > @@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
> > > >  		dprintk("svc: socket %p(inet %p), busy=%d\n",
> > > >  			svsk, sk,
> > > >  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > > > -		svsk->sk_odata(sk);
> > > > +		rmb();
> > > > +		if (svsk->sk_odata)
> > > > +			svsk->sk_odata(sk);
> > > 
> > > I'm not convinced these checks will really do much, though you might
> > > need the rmb() calls there...see below...
> > > 
> > > >  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
> > > >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > > >  	}
> > > > @@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
> > > >  	if (svsk) {
> > > >  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
> > > >  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > > > -		svsk->sk_owspace(sk);
> > > > +		rmb();
> > > > +		if (svsk->sk_owspace)
> > > > +			svsk->sk_owspace(sk);
> > > >  		svc_xprt_enqueue(&svsk->sk_xprt);
> > > >  	}
> > > >  }
> > > > @@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
> > > >  	dprintk("svc: socket %p TCP (listen) state change %d\n",
> > > >  		sk, sk->sk_state);
> > > >  
> > > > -	if (svsk)
> > > > -		svsk->sk_odata(sk);
> > > > +	if (svsk) { 
> > > > +		rmb();
> > > > +		if (svsk->sk_odata)
> > > > +			svsk->sk_odata(sk);
> > > > +	}
> > > > +
> > > >  	/*
> > > >  	 * This callback may called twice when a new connection
> > > >  	 * is established as a child socket inherits everything
> > > > @@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
> > > >  	if (!svsk)
> > > >  		printk("svc: socket %p: no user data\n", sk);
> > > >  	else {
> > > > -		svsk->sk_ostate(sk);
> > > > +		rmb();
> > > > +		if (svsk->sk_ostate)
> > > > +			svsk->sk_ostate(sk);
> > > > +
> > > >  		if (sk->sk_state != TCP_ESTABLISHED) {
> > > >  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> > > >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > > > @@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> > > >  		return ERR_PTR(err);
> > > >  	}
> > > >  
> > > > -	inet->sk_user_data = svsk;
> > > >  	svsk->sk_sock = sock;
> > > >  	svsk->sk_sk = inet;
> > > >  	svsk->sk_ostate = inet->sk_state_change;
> > > >  	svsk->sk_odata = inet->sk_data_ready;
> > > >  	svsk->sk_owspace = inet->sk_write_space;
> > > > +	wmb();
> > > > +	inet->sk_user_data = svsk;
> > > >  
> > > >  	/* Initialize the socket */
> > > >  	if (sock->type == SOCK_DGRAM)
> > > 
> > > Since we always check whether svsk is NULL before calling the functions,
> > > then the above hunk and maybe the rmb() calls might be all that's needed
> > > here. I don't see how we'd ever get a sk_odata value (for instance)
> > > that's NULL with the above change.
> > 
> > Thanks for feed-back. I'll update patch then, run test cycle once again
> > (hope couple days would be enough) and re-send v2 to review. 
> > 
> 
> Great. It might also be nice to sprinkle in some comments that document
> how the rmb/wmb calls are paired together too.
> 

Got it. Will do.

> -- 
> Jeff Layton <jlayton@redhat.com>

WBR,
Vadim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-18 11:34         ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-18 11:34 UTC (permalink / raw)
  To: Jeff Layton
  Cc: trond.myklebust-7I+n7zu2hftEKMMhf/gKZA,
	anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	pabeni-H+wXaHxf7aLQT0dZR+AlfA, vlomovts-H+wXaHxf7aLQT0dZR+AlfA

On Fri, Aug 18, 2017 at 07:16:45AM -0400, Jeff Layton wrote:
> On Fri, 2017-08-18 at 07:08 -0400, Vadim Lomovtsev wrote:
> > Hi Jeff,
> > 
> > On Fri, Aug 18, 2017 at 06:27:32AM -0400, Jeff Layton wrote:
> > > On Fri, 2017-08-18 at 06:00 -0400, Vadim Lomovtsev wrote:
> > > > While running nfs/connectathon tests kernel NULL-pointer exception
> > > > has been observed due to races in svcsock.c.
> > > > 
> > > > Race is appear when kernel accepts connection by kernel_accept
> > > > (which creates new socket) and start queuing ingress packets
> > > > to new socket. This happanes in ksoftirq context which concurrently
> > > > on a differnt core while new socket setup is not done yet.
> > > > 
> > > > The fix is to re-order socket user data init sequence, add NULL-ptr
> > > > check before callback call along with barriers to prevent kernel crash.
> > > > 
> > > > Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> > > > 
> > > > Crash log:
> > > > ---<-snip->---
> > > > [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > > > [ 6708.647093] pgd = ffff0000094e0000
> > > > [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> > > > [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> > > > [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect s
 ysimgblt fb_sys_fops
> > > > [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> > > > [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> > > > [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> > > > [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> > > > [ 6708.773822] PC is at 0x0
> > > > [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> > > > [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> > > > [ 6708.789248] sp : ffff810ffbad3900
> > > > [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> > > > [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> > > > [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> > > > [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> > > > [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> > > > [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> > > > [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> > > > [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> > > > [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> > > > [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> > > > [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> > > > [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> > > > [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> > > > [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> > > > [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> > > > [ 6708.872084]
> > > > [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> > > > [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> > > > [ 6708.886075] Call trace:
> > > > [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> > > > [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> > > > [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> > > > [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> > > > [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> > > > [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> > > > [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> > > > [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> > > > [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> > > > [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> > > > [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> > > > [ 6708.973117] [<          (null)>]           (null)
> > > > [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> > > > [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> > > > [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> > > > [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> > > > [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> > > > [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> > > > [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> > > > [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> > > > [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> > > > [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> > > > [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> > > > [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> > > > [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> > > > [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> > > > [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> > > > [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> > > > [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> > > > [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> > > > [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> > > > [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> > > > [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> > > > [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> > > > [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> > > > [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> > > > [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> > > > [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> > > > [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> > > > [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> > > > [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> > > > [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> > > > [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> > > > [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> > > > [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> > > > [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> > > > [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> > > > [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> > > > [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> > > > [ 6709.207967] Code: bad PC value
> > > > [ 6709.211061] SMP: stopping secondary CPUs
> > > > [ 6709.218830] Starting crashdump kernel...
> > > > [ 6709.222749] Bye!
> > > > ---<-snip>---
> > > > 
> > > > Signed-off-by: Vadim Lomovtsev <vlomovts-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > > ---
> > > >  net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
> > > >  1 file changed, 18 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > > > index 2b720fa..b6496f3 100644
> > > > --- a/net/sunrpc/svcsock.c
> > > > +++ b/net/sunrpc/svcsock.c
> > > > @@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
> > > >  		dprintk("svc: socket %p(inet %p), busy=%d\n",
> > > >  			svsk, sk,
> > > >  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > > > -		svsk->sk_odata(sk);
> > > > +		rmb();
> > > > +		if (svsk->sk_odata)
> > > > +			svsk->sk_odata(sk);
> > > 
> > > I'm not convinced these checks will really do much, though you might
> > > need the rmb() calls there...see below...
> > > 
> > > >  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
> > > >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > > >  	}
> > > > @@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
> > > >  	if (svsk) {
> > > >  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
> > > >  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > > > -		svsk->sk_owspace(sk);
> > > > +		rmb();
> > > > +		if (svsk->sk_owspace)
> > > > +			svsk->sk_owspace(sk);
> > > >  		svc_xprt_enqueue(&svsk->sk_xprt);
> > > >  	}
> > > >  }
> > > > @@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
> > > >  	dprintk("svc: socket %p TCP (listen) state change %d\n",
> > > >  		sk, sk->sk_state);
> > > >  
> > > > -	if (svsk)
> > > > -		svsk->sk_odata(sk);
> > > > +	if (svsk) { 
> > > > +		rmb();
> > > > +		if (svsk->sk_odata)
> > > > +			svsk->sk_odata(sk);
> > > > +	}
> > > > +
> > > >  	/*
> > > >  	 * This callback may called twice when a new connection
> > > >  	 * is established as a child socket inherits everything
> > > > @@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
> > > >  	if (!svsk)
> > > >  		printk("svc: socket %p: no user data\n", sk);
> > > >  	else {
> > > > -		svsk->sk_ostate(sk);
> > > > +		rmb();
> > > > +		if (svsk->sk_ostate)
> > > > +			svsk->sk_ostate(sk);
> > > > +
> > > >  		if (sk->sk_state != TCP_ESTABLISHED) {
> > > >  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> > > >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > > > @@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> > > >  		return ERR_PTR(err);
> > > >  	}
> > > >  
> > > > -	inet->sk_user_data = svsk;
> > > >  	svsk->sk_sock = sock;
> > > >  	svsk->sk_sk = inet;
> > > >  	svsk->sk_ostate = inet->sk_state_change;
> > > >  	svsk->sk_odata = inet->sk_data_ready;
> > > >  	svsk->sk_owspace = inet->sk_write_space;
> > > > +	wmb();
> > > > +	inet->sk_user_data = svsk;
> > > >  
> > > >  	/* Initialize the socket */
> > > >  	if (sock->type == SOCK_DGRAM)
> > > 
> > > Since we always check whether svsk is NULL before calling the functions,
> > > then the above hunk and maybe the rmb() calls might be all that's needed
> > > here. I don't see how we'd ever get a sk_odata value (for instance)
> > > that's NULL with the above change.
> > 
> > Thanks for feed-back. I'll update patch then, run test cycle once again
> > (hope couple days would be enough) and re-send v2 to review. 
> > 
> 
> Great. It might also be nice to sprinkle in some comments that document
> how the rmb/wmb calls are paired together too.
> 

Got it. Will do.

> -- 
> Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

WBR,
Vadim
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-21  8:44   ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-21  8:44 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, bfields, jlayton, davem,
	linux-nfs, netdev, linux-kernel, pabeni
  Cc: Vadim Lomovtsev

While running nfs/connectathon tests kernel NULL-pointer exception
has been observed due to races in svcsock.c.

Race is appear when kernel accepts connection by kernel_accept
(which creates new socket) and start queuing ingress packets
to new socket. This happens in ksoftirq context which could run
concurrently on a different core while new socket setup is not done yet.

The fix is to re-order socket user data init sequence and add
write/read barrier calls to be sure that we got proper values
for callback pointers before actually calling them.

Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

Crash log:
---<-snip->---
[ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 6708.647093] pgd = ffff0000094e0000
[ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
[ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
[ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
[ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
[ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
[ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
[ 6708.773822] PC is at 0x0
[ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
[ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
[ 6708.789248] sp : ffff810ffbad3900
[ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
[ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
[ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
[ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
[ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
[ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
[ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
[ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
[ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
[ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
[ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
[ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
[ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
[ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
[ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
[ 6708.872084]
[ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
[ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
[ 6708.886075] Call trace:
[ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
[ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
[ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
[ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
[ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
[ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
[ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
[ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
[ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
[ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
[ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
[ 6708.973117] [<          (null)>]           (null)
[ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
[ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
[ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
[ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
[ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
[ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
[ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
[ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
[ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
[ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
[ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
[ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
[ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
[ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
[ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
[ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
[ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
[ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
[ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
[ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
[ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
[ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
[ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
[ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
[ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
[ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
[ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
[ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
[ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
[ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
[ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
[ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
[ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
[ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
[ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
[ 6709.207967] Code: bad PC value
[ 6709.211061] SMP: stopping secondary CPUs
[ 6709.218830] Starting crashdump kernel...
[ 6709.222749] Bye!
---<-snip>---

Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
---
 net/sunrpc/svcsock.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2b720fa..2be967f 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -421,6 +421,10 @@ static void svc_data_ready(struct sock *sk)
 		dprintk("svc: socket %p(inet %p), busy=%d\n",
 			svsk, sk,
 			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races aginst
+		   svc_setup_socket() while calling sk_odata() callback. */
+		rmb();
 		svsk->sk_odata(sk);
 		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
 			svc_xprt_enqueue(&svsk->sk_xprt);
@@ -437,6 +441,10 @@ static void svc_write_space(struct sock *sk)
 	if (svsk) {
 		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
 			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races aginst
+		   svc_setup_socket() while calling sk_owspace() callback. */
+		rmb();
 		svsk->sk_owspace(sk);
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
@@ -760,8 +768,14 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
 	dprintk("svc: socket %p TCP (listen) state change %d\n",
 		sk, sk->sk_state);
 
-	if (svsk)
+	if (svsk) {
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races aginst
+		   svc_setup_socket() while calling sk_odata() callback.*/
+		rmb();
 		svsk->sk_odata(sk);
+	}
+
 	/*
 	 * This callback may called twice when a new connection
 	 * is established as a child socket inherits everything
@@ -794,7 +808,12 @@ static void svc_tcp_state_change(struct sock *sk)
 	if (!svsk)
 		printk("svc: socket %p: no user data\n", sk);
 	else {
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races aginst
+		   svc_setup_socket() while calling sk_odata() callback. */
+		rmb();
 		svsk->sk_ostate(sk);
+
 		if (sk->sk_state != TCP_ESTABLISHED) {
 			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
 			svc_xprt_enqueue(&svsk->sk_xprt);
@@ -1381,12 +1400,16 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
 		return ERR_PTR(err);
 	}
 
-	inet->sk_user_data = svsk;
 	svsk->sk_sock = sock;
 	svsk->sk_sk = inet;
 	svsk->sk_ostate = inet->sk_state_change;
 	svsk->sk_odata = inet->sk_data_ready;
 	svsk->sk_owspace = inet->sk_write_space;
+	/* this barrier is necessary in order to prevent race condition
+	   with svc_data_ready(), svc_listen_data_ready()
+	   and others when calling callbacks above */
+	wmb();
+	inet->sk_user_data = svsk;
 
 	/* Initialize the socket */
 	if (sock->type == SOCK_DGRAM)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-21  8:44   ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-21  8:44 UTC (permalink / raw)
  To: trond.myklebust-7I+n7zu2hftEKMMhf/gKZA,
	anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, jlayton-vpEMnDpepFuMZCB2o+C8xQ,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	pabeni-H+wXaHxf7aLQT0dZR+AlfA
  Cc: Vadim Lomovtsev

While running nfs/connectathon tests kernel NULL-pointer exception
has been observed due to races in svcsock.c.

Race is appear when kernel accepts connection by kernel_accept
(which creates new socket) and start queuing ingress packets
to new socket. This happens in ksoftirq context which could run
concurrently on a different core while new socket setup is not done yet.

The fix is to re-order socket user data init sequence and add
write/read barrier calls to be sure that we got proper values
for callback pointers before actually calling them.

Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

Crash log:
---<-snip->---
[ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 6708.647093] pgd = ffff0000094e0000
[ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
[ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
[ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
  fb_sys_fops
[ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
[ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
[ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
[ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
[ 6708.773822] PC is at 0x0
[ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
[ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
[ 6708.789248] sp : ffff810ffbad3900
[ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
[ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
[ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
[ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
[ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
[ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
[ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
[ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
[ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
[ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
[ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
[ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
[ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
[ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
[ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
[ 6708.872084]
[ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
[ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
[ 6708.886075] Call trace:
[ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
[ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
[ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
[ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
[ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
[ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
[ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
[ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
[ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
[ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
[ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
[ 6708.973117] [<          (null)>]           (null)
[ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
[ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
[ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
[ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
[ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
[ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
[ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
[ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
[ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
[ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
[ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
[ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
[ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
[ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
[ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
[ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
[ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
[ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
[ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
[ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
[ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
[ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
[ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
[ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
[ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
[ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
[ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
[ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
[ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
[ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
[ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
[ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
[ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
[ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
[ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
[ 6709.207967] Code: bad PC value
[ 6709.211061] SMP: stopping secondary CPUs
[ 6709.218830] Starting crashdump kernel...
[ 6709.222749] Bye!
---<-snip>---

Signed-off-by: Vadim Lomovtsev <vlomovts-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/svcsock.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2b720fa..2be967f 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -421,6 +421,10 @@ static void svc_data_ready(struct sock *sk)
 		dprintk("svc: socket %p(inet %p), busy=%d\n",
 			svsk, sk,
 			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races aginst
+		   svc_setup_socket() while calling sk_odata() callback. */
+		rmb();
 		svsk->sk_odata(sk);
 		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
 			svc_xprt_enqueue(&svsk->sk_xprt);
@@ -437,6 +441,10 @@ static void svc_write_space(struct sock *sk)
 	if (svsk) {
 		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
 			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races aginst
+		   svc_setup_socket() while calling sk_owspace() callback. */
+		rmb();
 		svsk->sk_owspace(sk);
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
@@ -760,8 +768,14 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
 	dprintk("svc: socket %p TCP (listen) state change %d\n",
 		sk, sk->sk_state);
 
-	if (svsk)
+	if (svsk) {
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races aginst
+		   svc_setup_socket() while calling sk_odata() callback.*/
+		rmb();
 		svsk->sk_odata(sk);
+	}
+
 	/*
 	 * This callback may called twice when a new connection
 	 * is established as a child socket inherits everything
@@ -794,7 +808,12 @@ static void svc_tcp_state_change(struct sock *sk)
 	if (!svsk)
 		printk("svc: socket %p: no user data\n", sk);
 	else {
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races aginst
+		   svc_setup_socket() while calling sk_odata() callback. */
+		rmb();
 		svsk->sk_ostate(sk);
+
 		if (sk->sk_state != TCP_ESTABLISHED) {
 			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
 			svc_xprt_enqueue(&svsk->sk_xprt);
@@ -1381,12 +1400,16 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
 		return ERR_PTR(err);
 	}
 
-	inet->sk_user_data = svsk;
 	svsk->sk_sock = sock;
 	svsk->sk_sk = inet;
 	svsk->sk_ostate = inet->sk_state_change;
 	svsk->sk_odata = inet->sk_data_ready;
 	svsk->sk_owspace = inet->sk_write_space;
+	/* this barrier is necessary in order to prevent race condition
+	   with svc_data_ready(), svc_listen_data_ready()
+	   and others when calling callbacks above */
+	wmb();
+	inet->sk_user_data = svsk;
 
 	/* Initialize the socket */
 	if (sock->type == SOCK_DGRAM)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] net: sunrpc: svcsock: fix NULL-pointer exception
  2017-08-21  8:44   ` Vadim Lomovtsev
@ 2017-08-21  8:50     ` Vadim Lomovtsev
  -1 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-21  8:50 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, bfields, jlayton, davem,
	linux-nfs, netdev, linux-kernel, pabeni
  Cc: vlomovts


Sorry guys, please ignore this - being sent by mistake. 
Have some typos at comments by copy-paste, will correct and re-send

Vadim

On Mon, Aug 21, 2017 at 04:44:45AM -0400, Vadim Lomovtsev wrote:
> While running nfs/connectathon tests kernel NULL-pointer exception
> has been observed due to races in svcsock.c.
> 
> Race is appear when kernel accepts connection by kernel_accept
> (which creates new socket) and start queuing ingress packets
> to new socket. This happens in ksoftirq context which could run
> concurrently on a different core while new socket setup is not done yet.
> 
> The fix is to re-order socket user data init sequence and add
> write/read barrier calls to be sure that we got proper values
> for callback pointers before actually calling them.
> 
> Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> 
> Crash log:
> ---<-snip->---
> [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [ 6708.647093] pgd = ffff0000094e0000
> [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> [ 6708.773822] PC is at 0x0
> [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> [ 6708.789248] sp : ffff810ffbad3900
> [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> [ 6708.872084]
> [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> [ 6708.886075] Call trace:
> [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> [ 6708.973117] [<          (null)>]           (null)
> [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> [ 6709.207967] Code: bad PC value
> [ 6709.211061] SMP: stopping secondary CPUs
> [ 6709.218830] Starting crashdump kernel...
> [ 6709.222749] Bye!
> ---<-snip>---
> 
> Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> ---
>  net/sunrpc/svcsock.c | 27 +++++++++++++++++++++++++--
>  1 file changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 2b720fa..2be967f 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -421,6 +421,10 @@ static void svc_data_ready(struct sock *sk)
>  		dprintk("svc: socket %p(inet %p), busy=%d\n",
>  			svsk, sk,
>  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races aginst
> +		   svc_setup_socket() while calling sk_odata() callback. */
> +		rmb();
>  		svsk->sk_odata(sk);
>  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -437,6 +441,10 @@ static void svc_write_space(struct sock *sk)
>  	if (svsk) {
>  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
>  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races aginst
> +		   svc_setup_socket() while calling sk_owspace() callback. */
> +		rmb();
>  		svsk->sk_owspace(sk);
>  		svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
> @@ -760,8 +768,14 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>  	dprintk("svc: socket %p TCP (listen) state change %d\n",
>  		sk, sk->sk_state);
>  
> -	if (svsk)
> +	if (svsk) {
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races aginst
> +		   svc_setup_socket() while calling sk_odata() callback.*/
> +		rmb();
>  		svsk->sk_odata(sk);
> +	}
> +
>  	/*
>  	 * This callback may called twice when a new connection
>  	 * is established as a child socket inherits everything
> @@ -794,7 +808,12 @@ static void svc_tcp_state_change(struct sock *sk)
>  	if (!svsk)
>  		printk("svc: socket %p: no user data\n", sk);
>  	else {
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races aginst
> +		   svc_setup_socket() while calling sk_odata() callback. */
> +		rmb();
>  		svsk->sk_ostate(sk);
> +
>  		if (sk->sk_state != TCP_ESTABLISHED) {
>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -1381,12 +1400,16 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  		return ERR_PTR(err);
>  	}
>  
> -	inet->sk_user_data = svsk;
>  	svsk->sk_sock = sock;
>  	svsk->sk_sk = inet;
>  	svsk->sk_ostate = inet->sk_state_change;
>  	svsk->sk_odata = inet->sk_data_ready;
>  	svsk->sk_owspace = inet->sk_write_space;
> +	/* this barrier is necessary in order to prevent race condition
> +	   with svc_data_ready(), svc_listen_data_ready()
> +	   and others when calling callbacks above */
> +	wmb();
> +	inet->sk_user_data = svsk;
>  
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-21  8:50     ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-21  8:50 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, bfields, jlayton, davem,
	linux-nfs, netdev, linux-kernel, pabeni
  Cc: vlomovts


Sorry guys, please ignore this - being sent by mistake. 
Have some typos at comments by copy-paste, will correct and re-send

Vadim

On Mon, Aug 21, 2017 at 04:44:45AM -0400, Vadim Lomovtsev wrote:
> While running nfs/connectathon tests kernel NULL-pointer exception
> has been observed due to races in svcsock.c.
> 
> Race is appear when kernel accepts connection by kernel_accept
> (which creates new socket) and start queuing ingress packets
> to new socket. This happens in ksoftirq context which could run
> concurrently on a different core while new socket setup is not done yet.
> 
> The fix is to re-order socket user data init sequence and add
> write/read barrier calls to be sure that we got proper values
> for callback pointers before actually calling them.
> 
> Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> 
> Crash log:
> ---<-snip->---
> [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [ 6708.647093] pgd = ffff0000094e0000
> [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgb
 lt fb_sys_fops
> [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> [ 6708.773822] PC is at 0x0
> [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> [ 6708.789248] sp : ffff810ffbad3900
> [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> [ 6708.872084]
> [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> [ 6708.886075] Call trace:
> [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> [ 6708.973117] [<          (null)>]           (null)
> [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> [ 6709.207967] Code: bad PC value
> [ 6709.211061] SMP: stopping secondary CPUs
> [ 6709.218830] Starting crashdump kernel...
> [ 6709.222749] Bye!
> ---<-snip>---
> 
> Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> ---
>  net/sunrpc/svcsock.c | 27 +++++++++++++++++++++++++--
>  1 file changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 2b720fa..2be967f 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -421,6 +421,10 @@ static void svc_data_ready(struct sock *sk)
>  		dprintk("svc: socket %p(inet %p), busy=%d\n",
>  			svsk, sk,
>  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races aginst
> +		   svc_setup_socket() while calling sk_odata() callback. */
> +		rmb();
>  		svsk->sk_odata(sk);
>  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -437,6 +441,10 @@ static void svc_write_space(struct sock *sk)
>  	if (svsk) {
>  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
>  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races aginst
> +		   svc_setup_socket() while calling sk_owspace() callback. */
> +		rmb();
>  		svsk->sk_owspace(sk);
>  		svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
> @@ -760,8 +768,14 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>  	dprintk("svc: socket %p TCP (listen) state change %d\n",
>  		sk, sk->sk_state);
>  
> -	if (svsk)
> +	if (svsk) {
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races aginst
> +		   svc_setup_socket() while calling sk_odata() callback.*/
> +		rmb();
>  		svsk->sk_odata(sk);
> +	}
> +
>  	/*
>  	 * This callback may called twice when a new connection
>  	 * is established as a child socket inherits everything
> @@ -794,7 +808,12 @@ static void svc_tcp_state_change(struct sock *sk)
>  	if (!svsk)
>  		printk("svc: socket %p: no user data\n", sk);
>  	else {
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races aginst
> +		   svc_setup_socket() while calling sk_odata() callback. */
> +		rmb();
>  		svsk->sk_ostate(sk);
> +
>  		if (sk->sk_state != TCP_ESTABLISHED) {
>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -1381,12 +1400,16 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  		return ERR_PTR(err);
>  	}
>  
> -	inet->sk_user_data = svsk;
>  	svsk->sk_sock = sock;
>  	svsk->sk_sk = inet;
>  	svsk->sk_ostate = inet->sk_state_change;
>  	svsk->sk_odata = inet->sk_data_ready;
>  	svsk->sk_owspace = inet->sk_write_space;
> +	/* this barrier is necessary in order to prevent race condition
> +	   with svc_data_ready(), svc_listen_data_ready()
> +	   and others when calling callbacks above */
> +	wmb();
> +	inet->sk_user_data = svsk;
>  
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3] net: sunrpc: svcsock: fix NULL-pointer exception
  2017-08-21  8:44   ` Vadim Lomovtsev
@ 2017-08-21  8:56     ` Vadim Lomovtsev
  -1 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-21  8:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, bfields, jlayton, davem,
	linux-nfs, netdev, linux-kernel, pabeni
  Cc: Vadim Lomovtsev

While running nfs/connectathon tests kernel NULL-pointer exception
has been observed due to races in svcsock.c.

Race is appear when kernel accepts connection by kernel_accept
(which creates new socket) and start queuing ingress packets
to new socket. This happens in ksoftirq context which could run
concurrently on a different core while new socket setup is not done yet.

The fix is to re-order socket user data init sequence and add
write/read barrier calls to be sure that we got proper values
for callback pointers before actually calling them.

Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

Crash log:
---<-snip->---
[ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 6708.647093] pgd = ffff0000094e0000
[ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
[ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
[ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
[ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
[ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
[ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
[ 6708.773822] PC is at 0x0
[ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
[ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
[ 6708.789248] sp : ffff810ffbad3900
[ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
[ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
[ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
[ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
[ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
[ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
[ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
[ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
[ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
[ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
[ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
[ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
[ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
[ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
[ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
[ 6708.872084]
[ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
[ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
[ 6708.886075] Call trace:
[ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
[ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
[ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
[ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
[ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
[ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
[ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
[ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
[ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
[ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
[ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
[ 6708.973117] [<          (null)>]           (null)
[ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
[ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
[ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
[ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
[ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
[ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
[ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
[ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
[ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
[ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
[ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
[ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
[ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
[ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
[ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
[ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
[ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
[ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
[ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
[ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
[ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
[ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
[ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
[ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
[ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
[ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
[ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
[ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
[ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
[ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
[ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
[ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
[ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
[ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
[ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
[ 6709.207967] Code: bad PC value
[ 6709.211061] SMP: stopping secondary CPUs
[ 6709.218830] Starting crashdump kernel...
[ 6709.222749] Bye!
---<-snip>---

Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
---
 net/sunrpc/svcsock.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2b720fa..9ec16c5 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -421,6 +421,10 @@ static void svc_data_ready(struct sock *sk)
 		dprintk("svc: socket %p(inet %p), busy=%d\n",
 			svsk, sk,
 			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races against
+		   svc_setup_socket() while calling sk_odata() callback. */
+		rmb();
 		svsk->sk_odata(sk);
 		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
 			svc_xprt_enqueue(&svsk->sk_xprt);
@@ -437,6 +441,10 @@ static void svc_write_space(struct sock *sk)
 	if (svsk) {
 		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
 			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races against
+		   svc_setup_socket() while calling sk_owspace() callback. */
+		rmb();
 		svsk->sk_owspace(sk);
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
@@ -760,8 +768,14 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
 	dprintk("svc: socket %p TCP (listen) state change %d\n",
 		sk, sk->sk_state);
 
-	if (svsk)
+	if (svsk) {
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races against
+		   svc_setup_socket() while calling sk_odata() callback.*/
+		rmb();
 		svsk->sk_odata(sk);
+	}
+
 	/*
 	 * This callback may called twice when a new connection
 	 * is established as a child socket inherits everything
@@ -794,7 +808,12 @@ static void svc_tcp_state_change(struct sock *sk)
 	if (!svsk)
 		printk("svc: socket %p: no user data\n", sk);
 	else {
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races against
+		   svc_setup_socket() while calling sk_ostate() callback. */
+		rmb();
 		svsk->sk_ostate(sk);
+
 		if (sk->sk_state != TCP_ESTABLISHED) {
 			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
 			svc_xprt_enqueue(&svsk->sk_xprt);
@@ -1381,12 +1400,16 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
 		return ERR_PTR(err);
 	}
 
-	inet->sk_user_data = svsk;
 	svsk->sk_sock = sock;
 	svsk->sk_sk = inet;
 	svsk->sk_ostate = inet->sk_state_change;
 	svsk->sk_odata = inet->sk_data_ready;
 	svsk->sk_owspace = inet->sk_write_space;
+	/* this barrier is necessary in order to prevent race condition
+	   with svc_data_ready(), svc_listen_data_ready()
+	   and others when calling callbacks above */
+	wmb();
+	inet->sk_user_data = svsk;
 
 	/* Initialize the socket */
 	if (sock->type == SOCK_DGRAM)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v3] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-21  8:56     ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-21  8:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, bfields, jlayton, davem,
	linux-nfs, netdev, linux-kernel, pabeni
  Cc: Vadim Lomovtsev

While running nfs/connectathon tests kernel NULL-pointer exception
has been observed due to races in svcsock.c.

Race is appear when kernel accepts connection by kernel_accept
(which creates new socket) and start queuing ingress packets
to new socket. This happens in ksoftirq context which could run
concurrently on a different core while new socket setup is not done yet.

The fix is to re-order socket user data init sequence and add
write/read barrier calls to be sure that we got proper values
for callback pointers before actually calling them.

Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

Crash log:
---<-snip->---
[ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 6708.647093] pgd = ffff0000094e0000
[ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
[ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
[ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
  fb_sys_fops
[ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
[ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
[ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
[ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
[ 6708.773822] PC is at 0x0
[ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
[ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
[ 6708.789248] sp : ffff810ffbad3900
[ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
[ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
[ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
[ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
[ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
[ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
[ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
[ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
[ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
[ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
[ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
[ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
[ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
[ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
[ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
[ 6708.872084]
[ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
[ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
[ 6708.886075] Call trace:
[ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
[ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
[ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
[ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
[ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
[ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
[ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
[ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
[ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
[ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
[ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
[ 6708.973117] [<          (null)>]           (null)
[ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
[ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
[ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
[ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
[ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
[ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
[ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
[ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
[ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
[ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
[ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
[ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
[ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
[ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
[ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
[ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
[ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
[ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
[ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
[ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
[ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
[ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
[ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
[ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
[ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
[ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
[ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
[ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
[ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
[ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
[ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
[ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
[ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
[ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
[ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
[ 6709.207967] Code: bad PC value
[ 6709.211061] SMP: stopping secondary CPUs
[ 6709.218830] Starting crashdump kernel...
[ 6709.222749] Bye!
---<-snip>---

Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
---
 net/sunrpc/svcsock.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2b720fa..9ec16c5 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -421,6 +421,10 @@ static void svc_data_ready(struct sock *sk)
 		dprintk("svc: socket %p(inet %p), busy=%d\n",
 			svsk, sk,
 			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races against
+		   svc_setup_socket() while calling sk_odata() callback. */
+		rmb();
 		svsk->sk_odata(sk);
 		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
 			svc_xprt_enqueue(&svsk->sk_xprt);
@@ -437,6 +441,10 @@ static void svc_write_space(struct sock *sk)
 	if (svsk) {
 		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
 			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races against
+		   svc_setup_socket() while calling sk_owspace() callback. */
+		rmb();
 		svsk->sk_owspace(sk);
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
@@ -760,8 +768,14 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
 	dprintk("svc: socket %p TCP (listen) state change %d\n",
 		sk, sk->sk_state);
 
-	if (svsk)
+	if (svsk) {
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races against
+		   svc_setup_socket() while calling sk_odata() callback.*/
+		rmb();
 		svsk->sk_odata(sk);
+	}
+
 	/*
 	 * This callback may called twice when a new connection
 	 * is established as a child socket inherits everything
@@ -794,7 +808,12 @@ static void svc_tcp_state_change(struct sock *sk)
 	if (!svsk)
 		printk("svc: socket %p: no user data\n", sk);
 	else {
+		/* this barrier is necessary to prevent kernel crash
+		   (due to bad CPU-value) caused by races against
+		   svc_setup_socket() while calling sk_ostate() callback. */
+		rmb();
 		svsk->sk_ostate(sk);
+
 		if (sk->sk_state != TCP_ESTABLISHED) {
 			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
 			svc_xprt_enqueue(&svsk->sk_xprt);
@@ -1381,12 +1400,16 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
 		return ERR_PTR(err);
 	}
 
-	inet->sk_user_data = svsk;
 	svsk->sk_sock = sock;
 	svsk->sk_sk = inet;
 	svsk->sk_ostate = inet->sk_state_change;
 	svsk->sk_odata = inet->sk_data_ready;
 	svsk->sk_owspace = inet->sk_write_space;
+	/* this barrier is necessary in order to prevent race condition
+	   with svc_data_ready(), svc_listen_data_ready()
+	   and others when calling callbacks above */
+	wmb();
+	inet->sk_user_data = svsk;
 
 	/* Initialize the socket */
 	if (sock->type == SOCK_DGRAM)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v3] net: sunrpc: svcsock: fix NULL-pointer exception
  2017-08-21  8:56     ` Vadim Lomovtsev
@ 2017-08-21 10:42       ` Jeff Layton
  -1 siblings, 0 replies; 33+ messages in thread
From: Jeff Layton @ 2017-08-21 10:42 UTC (permalink / raw)
  To: Vadim Lomovtsev, trond.myklebust, anna.schumaker, bfields, davem,
	linux-nfs, netdev, linux-kernel, pabeni

On Mon, 2017-08-21 at 04:56 -0400, Vadim Lomovtsev wrote:
> While running nfs/connectathon tests kernel NULL-pointer exception
> has been observed due to races in svcsock.c.
> 
> Race is appear when kernel accepts connection by kernel_accept
> (which creates new socket) and start queuing ingress packets
> to new socket. This happens in ksoftirq context which could run
> concurrently on a different core while new socket setup is not done yet.
> 
> The fix is to re-order socket user data init sequence and add
> write/read barrier calls to be sure that we got proper values
> for callback pointers before actually calling them.
> 
> Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> 
> Crash log:
> ---<-snip->---
> [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [ 6708.647093] pgd = ffff0000094e0000
> [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> [ 6708.773822] PC is at 0x0
> [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> [ 6708.789248] sp : ffff810ffbad3900
> [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> [ 6708.872084]
> [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> [ 6708.886075] Call trace:
> [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> [ 6708.973117] [<          (null)>]           (null)
> [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> [ 6709.207967] Code: bad PC value
> [ 6709.211061] SMP: stopping secondary CPUs
> [ 6709.218830] Starting crashdump kernel...
> [ 6709.222749] Bye!
> ---<-snip>---
> 
> Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> ---
>  net/sunrpc/svcsock.c | 27 +++++++++++++++++++++++++--
>  1 file changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 2b720fa..9ec16c5 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -421,6 +421,10 @@ static void svc_data_ready(struct sock *sk)
>  		dprintk("svc: socket %p(inet %p), busy=%d\n",
>  			svsk, sk,
>  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races against
> +		   svc_setup_socket() while calling sk_odata() callback. */
> +		rmb();
>  		svsk->sk_odata(sk);
>  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -437,6 +441,10 @@ static void svc_write_space(struct sock *sk)
>  	if (svsk) {
>  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
>  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races against
> +		   svc_setup_socket() while calling sk_owspace() callback. */
> +		rmb();
>  		svsk->sk_owspace(sk);
>  		svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
> @@ -760,8 +768,14 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>  	dprintk("svc: socket %p TCP (listen) state change %d\n",
>  		sk, sk->sk_state);
>  
> -	if (svsk)
> +	if (svsk) {
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races against
> +		   svc_setup_socket() while calling sk_odata() callback.*/
> +		rmb();
>  		svsk->sk_odata(sk);
> +	}
> +
>  	/*
>  	 * This callback may called twice when a new connection
>  	 * is established as a child socket inherits everything
> @@ -794,7 +808,12 @@ static void svc_tcp_state_change(struct sock *sk)
>  	if (!svsk)
>  		printk("svc: socket %p: no user data\n", sk);
>  	else {
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races against
> +		   svc_setup_socket() while calling sk_ostate() callback. */
> +		rmb();
>  		svsk->sk_ostate(sk);
> +
>  		if (sk->sk_state != TCP_ESTABLISHED) {
>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -1381,12 +1400,16 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  		return ERR_PTR(err);
>  	}
>  
> -	inet->sk_user_data = svsk;
>  	svsk->sk_sock = sock;
>  	svsk->sk_sk = inet;
>  	svsk->sk_ostate = inet->sk_state_change;
>  	svsk->sk_odata = inet->sk_data_ready;
>  	svsk->sk_owspace = inet->sk_write_space;
> +	/* this barrier is necessary in order to prevent race condition
> +	   with svc_data_ready(), svc_listen_data_ready()
> +	   and others when calling callbacks above */
> +	wmb();
> +	inet->sk_user_data = svsk;
>  
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)

Patch itself looks fine -- nice catch!

The comment format probably ought to be in kernel-standard style though.
What I'd probably do is have this longer comment next to the wmb() call
and then have smaller comments by the rmb() calls referring them to the
longer comment in svc_setup_socket().

Once you fix the comments, you can add:

Reviewed-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-21 10:42       ` Jeff Layton
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff Layton @ 2017-08-21 10:42 UTC (permalink / raw)
  To: Vadim Lomovtsev, trond.myklebust, anna.schumaker, bfields, davem,
	linux-nfs, netdev, linux-kernel, pabeni

On Mon, 2017-08-21 at 04:56 -0400, Vadim Lomovtsev wrote:
> While running nfs/connectathon tests kernel NULL-pointer exception
> has been observed due to races in svcsock.c.
> 
> Race is appear when kernel accepts connection by kernel_accept
> (which creates new socket) and start queuing ingress packets
> to new socket. This happens in ksoftirq context which could run
> concurrently on a different core while new socket setup is not done yet.
> 
> The fix is to re-order socket user data init sequence and add
> write/read barrier calls to be sure that we got proper values
> for callback pointers before actually calling them.
> 
> Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> 
> Crash log:
> ---<-snip->---
> [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [ 6708.647093] pgd = ffff0000094e0000
> [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgb
 lt fb_sys_fops
> [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> [ 6708.773822] PC is at 0x0
> [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> [ 6708.789248] sp : ffff810ffbad3900
> [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> [ 6708.872084]
> [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> [ 6708.886075] Call trace:
> [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> [ 6708.973117] [<          (null)>]           (null)
> [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> [ 6709.207967] Code: bad PC value
> [ 6709.211061] SMP: stopping secondary CPUs
> [ 6709.218830] Starting crashdump kernel...
> [ 6709.222749] Bye!
> ---<-snip>---
> 
> Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> ---
>  net/sunrpc/svcsock.c | 27 +++++++++++++++++++++++++--
>  1 file changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 2b720fa..9ec16c5 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -421,6 +421,10 @@ static void svc_data_ready(struct sock *sk)
>  		dprintk("svc: socket %p(inet %p), busy=%d\n",
>  			svsk, sk,
>  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races against
> +		   svc_setup_socket() while calling sk_odata() callback. */
> +		rmb();
>  		svsk->sk_odata(sk);
>  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -437,6 +441,10 @@ static void svc_write_space(struct sock *sk)
>  	if (svsk) {
>  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
>  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races against
> +		   svc_setup_socket() while calling sk_owspace() callback. */
> +		rmb();
>  		svsk->sk_owspace(sk);
>  		svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
> @@ -760,8 +768,14 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>  	dprintk("svc: socket %p TCP (listen) state change %d\n",
>  		sk, sk->sk_state);
>  
> -	if (svsk)
> +	if (svsk) {
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races against
> +		   svc_setup_socket() while calling sk_odata() callback.*/
> +		rmb();
>  		svsk->sk_odata(sk);
> +	}
> +
>  	/*
>  	 * This callback may called twice when a new connection
>  	 * is established as a child socket inherits everything
> @@ -794,7 +808,12 @@ static void svc_tcp_state_change(struct sock *sk)
>  	if (!svsk)
>  		printk("svc: socket %p: no user data\n", sk);
>  	else {
> +		/* this barrier is necessary to prevent kernel crash
> +		   (due to bad CPU-value) caused by races against
> +		   svc_setup_socket() while calling sk_ostate() callback. */
> +		rmb();
>  		svsk->sk_ostate(sk);
> +
>  		if (sk->sk_state != TCP_ESTABLISHED) {
>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -1381,12 +1400,16 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  		return ERR_PTR(err);
>  	}
>  
> -	inet->sk_user_data = svsk;
>  	svsk->sk_sock = sock;
>  	svsk->sk_sk = inet;
>  	svsk->sk_ostate = inet->sk_state_change;
>  	svsk->sk_odata = inet->sk_data_ready;
>  	svsk->sk_owspace = inet->sk_write_space;
> +	/* this barrier is necessary in order to prevent race condition
> +	   with svc_data_ready(), svc_listen_data_ready()
> +	   and others when calling callbacks above */
> +	wmb();
> +	inet->sk_user_data = svsk;
>  
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)

Patch itself looks fine -- nice catch!

The comment format probably ought to be in kernel-standard style though.
What I'd probably do is have this longer comment next to the wmb() call
and then have smaller comments by the rmb() calls referring them to the
longer comment in svc_setup_socket().

Once you fix the comments, you can add:

Reviewed-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-21 10:51         ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-21 10:51 UTC (permalink / raw)
  To: Jeff Layton
  Cc: trond.myklebust, anna.schumaker, bfields, davem, linux-nfs,
	netdev, linux-kernel, pabeni, vlomovts

On Mon, Aug 21, 2017 at 06:42:07AM -0400, Jeff Layton wrote:
> On Mon, 2017-08-21 at 04:56 -0400, Vadim Lomovtsev wrote:
> > While running nfs/connectathon tests kernel NULL-pointer exception
> > has been observed due to races in svcsock.c.
> > 
> > Race is appear when kernel accepts connection by kernel_accept
> > (which creates new socket) and start queuing ingress packets
> > to new socket. This happens in ksoftirq context which could run
> > concurrently on a different core while new socket setup is not done yet.
> > 
> > The fix is to re-order socket user data init sequence and add
> > write/read barrier calls to be sure that we got proper values
> > for callback pointers before actually calling them.
> > 
> > Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> > 
> > Crash log:
> > ---<-snip->---
> > [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > [ 6708.647093] pgd = ffff0000094e0000
> > [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> > [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> > [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> > [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> > [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> > [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> > [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> > [ 6708.773822] PC is at 0x0
> > [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> > [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> > [ 6708.789248] sp : ffff810ffbad3900
> > [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> > [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> > [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> > [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> > [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> > [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> > [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> > [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> > [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> > [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> > [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> > [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> > [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> > [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> > [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> > [ 6708.872084]
> > [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> > [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> > [ 6708.886075] Call trace:
> > [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> > [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> > [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> > [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> > [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> > [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> > [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> > [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> > [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> > [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> > [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> > [ 6708.973117] [<          (null)>]           (null)
> > [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> > [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> > [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> > [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> > [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> > [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> > [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> > [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> > [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> > [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> > [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> > [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> > [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> > [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> > [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> > [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> > [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> > [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> > [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> > [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> > [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> > [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> > [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> > [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> > [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> > [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> > [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> > [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> > [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> > [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> > [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> > [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> > [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> > [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> > [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> > [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> > [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> > [ 6709.207967] Code: bad PC value
> > [ 6709.211061] SMP: stopping secondary CPUs
> > [ 6709.218830] Starting crashdump kernel...
> > [ 6709.222749] Bye!
> > ---<-snip>---
> > 
> > Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> > ---
> >  net/sunrpc/svcsock.c | 27 +++++++++++++++++++++++++--
> >  1 file changed, 25 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 2b720fa..9ec16c5 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -421,6 +421,10 @@ static void svc_data_ready(struct sock *sk)
> >  		dprintk("svc: socket %p(inet %p), busy=%d\n",
> >  			svsk, sk,
> >  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > +		/* this barrier is necessary to prevent kernel crash
> > +		   (due to bad CPU-value) caused by races against
> > +		   svc_setup_socket() while calling sk_odata() callback. */
> > +		rmb();
> >  		svsk->sk_odata(sk);
> >  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
> >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > @@ -437,6 +441,10 @@ static void svc_write_space(struct sock *sk)
> >  	if (svsk) {
> >  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
> >  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > +		/* this barrier is necessary to prevent kernel crash
> > +		   (due to bad CPU-value) caused by races against
> > +		   svc_setup_socket() while calling sk_owspace() callback. */
> > +		rmb();
> >  		svsk->sk_owspace(sk);
> >  		svc_xprt_enqueue(&svsk->sk_xprt);
> >  	}
> > @@ -760,8 +768,14 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
> >  	dprintk("svc: socket %p TCP (listen) state change %d\n",
> >  		sk, sk->sk_state);
> >  
> > -	if (svsk)
> > +	if (svsk) {
> > +		/* this barrier is necessary to prevent kernel crash
> > +		   (due to bad CPU-value) caused by races against
> > +		   svc_setup_socket() while calling sk_odata() callback.*/
> > +		rmb();
> >  		svsk->sk_odata(sk);
> > +	}
> > +
> >  	/*
> >  	 * This callback may called twice when a new connection
> >  	 * is established as a child socket inherits everything
> > @@ -794,7 +808,12 @@ static void svc_tcp_state_change(struct sock *sk)
> >  	if (!svsk)
> >  		printk("svc: socket %p: no user data\n", sk);
> >  	else {
> > +		/* this barrier is necessary to prevent kernel crash
> > +		   (due to bad CPU-value) caused by races against
> > +		   svc_setup_socket() while calling sk_ostate() callback. */
> > +		rmb();
> >  		svsk->sk_ostate(sk);
> > +
> >  		if (sk->sk_state != TCP_ESTABLISHED) {
> >  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > @@ -1381,12 +1400,16 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> >  		return ERR_PTR(err);
> >  	}
> >  
> > -	inet->sk_user_data = svsk;
> >  	svsk->sk_sock = sock;
> >  	svsk->sk_sk = inet;
> >  	svsk->sk_ostate = inet->sk_state_change;
> >  	svsk->sk_odata = inet->sk_data_ready;
> >  	svsk->sk_owspace = inet->sk_write_space;
> > +	/* this barrier is necessary in order to prevent race condition
> > +	   with svc_data_ready(), svc_listen_data_ready()
> > +	   and others when calling callbacks above */
> > +	wmb();
> > +	inet->sk_user_data = svsk;
> >  
> >  	/* Initialize the socket */
> >  	if (sock->type == SOCK_DGRAM)
> 
> Patch itself looks fine -- nice catch!
> 
> The comment format probably ought to be in kernel-standard style though.
> What I'd probably do is have this longer comment next to the wmb() call
> and then have smaller comments by the rmb() calls referring them to the
> longer comment in svc_setup_socket().
> 
> Once you fix the comments, you can add:
> 
> Reviewed-by: Jeff Layton <jlayton@redhat.com>
> 

Thanks for reply. Will update comments and re-send it.

WBR,
Vadim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-21 10:51         ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-21 10:51 UTC (permalink / raw)
  To: Jeff Layton
  Cc: trond.myklebust-7I+n7zu2hftEKMMhf/gKZA,
	anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	pabeni-H+wXaHxf7aLQT0dZR+AlfA, vlomovts-H+wXaHxf7aLQT0dZR+AlfA

On Mon, Aug 21, 2017 at 06:42:07AM -0400, Jeff Layton wrote:
> On Mon, 2017-08-21 at 04:56 -0400, Vadim Lomovtsev wrote:
> > While running nfs/connectathon tests kernel NULL-pointer exception
> > has been observed due to races in svcsock.c.
> > 
> > Race is appear when kernel accepts connection by kernel_accept
> > (which creates new socket) and start queuing ingress packets
> > to new socket. This happens in ksoftirq context which could run
> > concurrently on a different core while new socket setup is not done yet.
> > 
> > The fix is to re-order socket user data init sequence and add
> > write/read barrier calls to be sure that we got proper values
> > for callback pointers before actually calling them.
> > 
> > Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> > 
> > Crash log:
> > ---<-snip->---
> > [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > [ 6708.647093] pgd = ffff0000094e0000
> > [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> > [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> > [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysim
 gblt fb_sys_fops
> > [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> > [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> > [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> > [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> > [ 6708.773822] PC is at 0x0
> > [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> > [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> > [ 6708.789248] sp : ffff810ffbad3900
> > [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> > [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> > [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> > [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> > [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> > [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> > [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> > [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> > [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> > [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> > [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> > [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> > [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> > [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> > [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> > [ 6708.872084]
> > [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> > [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> > [ 6708.886075] Call trace:
> > [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> > [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> > [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> > [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> > [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> > [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> > [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> > [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> > [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> > [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> > [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> > [ 6708.973117] [<          (null)>]           (null)
> > [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> > [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> > [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> > [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> > [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> > [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> > [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> > [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> > [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> > [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> > [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> > [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> > [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> > [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> > [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> > [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> > [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> > [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> > [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> > [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> > [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> > [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> > [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> > [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> > [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> > [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> > [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> > [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> > [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> > [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> > [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> > [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> > [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> > [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> > [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> > [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> > [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> > [ 6709.207967] Code: bad PC value
> > [ 6709.211061] SMP: stopping secondary CPUs
> > [ 6709.218830] Starting crashdump kernel...
> > [ 6709.222749] Bye!
> > ---<-snip>---
> > 
> > Signed-off-by: Vadim Lomovtsev <vlomovts-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> >  net/sunrpc/svcsock.c | 27 +++++++++++++++++++++++++--
> >  1 file changed, 25 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 2b720fa..9ec16c5 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -421,6 +421,10 @@ static void svc_data_ready(struct sock *sk)
> >  		dprintk("svc: socket %p(inet %p), busy=%d\n",
> >  			svsk, sk,
> >  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > +		/* this barrier is necessary to prevent kernel crash
> > +		   (due to bad CPU-value) caused by races against
> > +		   svc_setup_socket() while calling sk_odata() callback. */
> > +		rmb();
> >  		svsk->sk_odata(sk);
> >  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
> >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > @@ -437,6 +441,10 @@ static void svc_write_space(struct sock *sk)
> >  	if (svsk) {
> >  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
> >  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > +		/* this barrier is necessary to prevent kernel crash
> > +		   (due to bad CPU-value) caused by races against
> > +		   svc_setup_socket() while calling sk_owspace() callback. */
> > +		rmb();
> >  		svsk->sk_owspace(sk);
> >  		svc_xprt_enqueue(&svsk->sk_xprt);
> >  	}
> > @@ -760,8 +768,14 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
> >  	dprintk("svc: socket %p TCP (listen) state change %d\n",
> >  		sk, sk->sk_state);
> >  
> > -	if (svsk)
> > +	if (svsk) {
> > +		/* this barrier is necessary to prevent kernel crash
> > +		   (due to bad CPU-value) caused by races against
> > +		   svc_setup_socket() while calling sk_odata() callback.*/
> > +		rmb();
> >  		svsk->sk_odata(sk);
> > +	}
> > +
> >  	/*
> >  	 * This callback may called twice when a new connection
> >  	 * is established as a child socket inherits everything
> > @@ -794,7 +808,12 @@ static void svc_tcp_state_change(struct sock *sk)
> >  	if (!svsk)
> >  		printk("svc: socket %p: no user data\n", sk);
> >  	else {
> > +		/* this barrier is necessary to prevent kernel crash
> > +		   (due to bad CPU-value) caused by races against
> > +		   svc_setup_socket() while calling sk_ostate() callback. */
> > +		rmb();
> >  		svsk->sk_ostate(sk);
> > +
> >  		if (sk->sk_state != TCP_ESTABLISHED) {
> >  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > @@ -1381,12 +1400,16 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> >  		return ERR_PTR(err);
> >  	}
> >  
> > -	inet->sk_user_data = svsk;
> >  	svsk->sk_sock = sock;
> >  	svsk->sk_sk = inet;
> >  	svsk->sk_ostate = inet->sk_state_change;
> >  	svsk->sk_odata = inet->sk_data_ready;
> >  	svsk->sk_owspace = inet->sk_write_space;
> > +	/* this barrier is necessary in order to prevent race condition
> > +	   with svc_data_ready(), svc_listen_data_ready()
> > +	   and others when calling callbacks above */
> > +	wmb();
> > +	inet->sk_user_data = svsk;
> >  
> >  	/* Initialize the socket */
> >  	if (sock->type == SOCK_DGRAM)
> 
> Patch itself looks fine -- nice catch!
> 
> The comment format probably ought to be in kernel-standard style though.
> What I'd probably do is have this longer comment next to the wmb() call
> and then have smaller comments by the rmb() calls referring them to the
> longer comment in svc_setup_socket().
> 
> Once you fix the comments, you can add:
> 
> Reviewed-by: Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 

Thanks for reply. Will update comments and re-send it.

WBR,
Vadim
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v4] net: sunrpc: svcsock: fix NULL-pointer exception
  2017-08-21  8:56     ` Vadim Lomovtsev
@ 2017-08-21 11:23       ` Vadim Lomovtsev
  -1 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-21 11:23 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, bfields, jlayton, davem,
	linux-nfs, netdev, linux-kernel, pabeni
  Cc: Vadim Lomovtsev

While running nfs/connectathon tests kernel NULL-pointer exception
has been observed due to races in svcsock.c.

Race is appear when kernel accepts connection by kernel_accept
(which creates new socket) and start queuing ingress packets
to new socket. This happens in ksoftirq context which could run
concurrently on a different core while new socket setup is not done yet.

The fix is to re-order socket user data init sequence and add
write/read barrier calls to be sure that we got proper values
for callback pointers before actually calling them.

Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

Crash log:
---<-snip->---
[ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 6708.647093] pgd = ffff0000094e0000
[ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
[ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
[ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
[ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
[ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
[ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
[ 6708.773822] PC is at 0x0
[ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
[ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
[ 6708.789248] sp : ffff810ffbad3900
[ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
[ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
[ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
[ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
[ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
[ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
[ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
[ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
[ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
[ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
[ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
[ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
[ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
[ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
[ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
[ 6708.872084]
[ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
[ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
[ 6708.886075] Call trace:
[ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
[ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
[ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
[ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
[ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
[ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
[ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
[ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
[ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
[ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
[ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
[ 6708.973117] [<          (null)>]           (null)
[ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
[ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
[ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
[ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
[ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
[ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
[ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
[ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
[ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
[ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
[ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
[ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
[ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
[ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
[ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
[ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
[ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
[ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
[ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
[ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
[ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
[ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
[ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
[ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
[ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
[ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
[ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
[ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
[ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
[ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
[ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
[ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
[ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
[ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
[ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
[ 6709.207967] Code: bad PC value
[ 6709.211061] SMP: stopping secondary CPUs
[ 6709.218830] Starting crashdump kernel...
[ 6709.222749] Bye!
---<-snip>---

Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
---
 net/sunrpc/svcsock.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2b720fa..e185001 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -421,6 +421,9 @@ static void svc_data_ready(struct sock *sk)
 		dprintk("svc: socket %p(inet %p), busy=%d\n",
 			svsk, sk,
 			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+
+		/* Refer to svc_setup_socket() for details. */
+		rmb();
 		svsk->sk_odata(sk);
 		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
 			svc_xprt_enqueue(&svsk->sk_xprt);
@@ -437,6 +440,9 @@ static void svc_write_space(struct sock *sk)
 	if (svsk) {
 		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
 			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+
+		/* Refer to svc_setup_socket() for details. */
+		rmb();
 		svsk->sk_owspace(sk);
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
@@ -760,8 +766,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
 	dprintk("svc: socket %p TCP (listen) state change %d\n",
 		sk, sk->sk_state);
 
-	if (svsk)
+	if (svsk) {
+		/* Refer to svc_setup_socket() for details. */
+		rmb();
 		svsk->sk_odata(sk);
+	}
+
 	/*
 	 * This callback may called twice when a new connection
 	 * is established as a child socket inherits everything
@@ -794,6 +804,8 @@ static void svc_tcp_state_change(struct sock *sk)
 	if (!svsk)
 		printk("svc: socket %p: no user data\n", sk);
 	else {
+		/* Refer to svc_setup_socket() for details. */
+		rmb();
 		svsk->sk_ostate(sk);
 		if (sk->sk_state != TCP_ESTABLISHED) {
 			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
@@ -1381,12 +1393,18 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
 		return ERR_PTR(err);
 	}
 
-	inet->sk_user_data = svsk;
 	svsk->sk_sock = sock;
 	svsk->sk_sk = inet;
 	svsk->sk_ostate = inet->sk_state_change;
 	svsk->sk_odata = inet->sk_data_ready;
 	svsk->sk_owspace = inet->sk_write_space;
+	/*
+	 * This barrier is necessary in order to prevent race condition
+	 * with svc_data_ready(), svc_listen_data_ready() and others
+	 * when calling callbacks above.
+	 */
+	wmb();
+	inet->sk_user_data = svsk;
 
 	/* Initialize the socket */
 	if (sock->type == SOCK_DGRAM)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v4] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-21 11:23       ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-21 11:23 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, bfields, jlayton, davem,
	linux-nfs, netdev, linux-kernel, pabeni
  Cc: Vadim Lomovtsev

While running nfs/connectathon tests kernel NULL-pointer exception
has been observed due to races in svcsock.c.

Race is appear when kernel accepts connection by kernel_accept
(which creates new socket) and start queuing ingress packets
to new socket. This happens in ksoftirq context which could run
concurrently on a different core while new socket setup is not done yet.

The fix is to re-order socket user data init sequence and add
write/read barrier calls to be sure that we got proper values
for callback pointers before actually calling them.

Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

Crash log:
---<-snip->---
[ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 6708.647093] pgd = ffff0000094e0000
[ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
[ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
[ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
  fb_sys_fops
[ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
[ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
[ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
[ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
[ 6708.773822] PC is at 0x0
[ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
[ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
[ 6708.789248] sp : ffff810ffbad3900
[ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
[ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
[ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
[ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
[ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
[ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
[ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
[ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
[ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
[ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
[ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
[ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
[ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
[ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
[ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
[ 6708.872084]
[ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
[ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
[ 6708.886075] Call trace:
[ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
[ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
[ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
[ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
[ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
[ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
[ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
[ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
[ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
[ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
[ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
[ 6708.973117] [<          (null)>]           (null)
[ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
[ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
[ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
[ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
[ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
[ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
[ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
[ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
[ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
[ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
[ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
[ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
[ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
[ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
[ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
[ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
[ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
[ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
[ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
[ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
[ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
[ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
[ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
[ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
[ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
[ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
[ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
[ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
[ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
[ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
[ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
[ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
[ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
[ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
[ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
[ 6709.207967] Code: bad PC value
[ 6709.211061] SMP: stopping secondary CPUs
[ 6709.218830] Starting crashdump kernel...
[ 6709.222749] Bye!
---<-snip>---

Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
---
 net/sunrpc/svcsock.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2b720fa..e185001 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -421,6 +421,9 @@ static void svc_data_ready(struct sock *sk)
 		dprintk("svc: socket %p(inet %p), busy=%d\n",
 			svsk, sk,
 			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+
+		/* Refer to svc_setup_socket() for details. */
+		rmb();
 		svsk->sk_odata(sk);
 		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
 			svc_xprt_enqueue(&svsk->sk_xprt);
@@ -437,6 +440,9 @@ static void svc_write_space(struct sock *sk)
 	if (svsk) {
 		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
 			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+
+		/* Refer to svc_setup_socket() for details. */
+		rmb();
 		svsk->sk_owspace(sk);
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
@@ -760,8 +766,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
 	dprintk("svc: socket %p TCP (listen) state change %d\n",
 		sk, sk->sk_state);
 
-	if (svsk)
+	if (svsk) {
+		/* Refer to svc_setup_socket() for details. */
+		rmb();
 		svsk->sk_odata(sk);
+	}
+
 	/*
 	 * This callback may called twice when a new connection
 	 * is established as a child socket inherits everything
@@ -794,6 +804,8 @@ static void svc_tcp_state_change(struct sock *sk)
 	if (!svsk)
 		printk("svc: socket %p: no user data\n", sk);
 	else {
+		/* Refer to svc_setup_socket() for details. */
+		rmb();
 		svsk->sk_ostate(sk);
 		if (sk->sk_state != TCP_ESTABLISHED) {
 			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
@@ -1381,12 +1393,18 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
 		return ERR_PTR(err);
 	}
 
-	inet->sk_user_data = svsk;
 	svsk->sk_sock = sock;
 	svsk->sk_sk = inet;
 	svsk->sk_ostate = inet->sk_state_change;
 	svsk->sk_odata = inet->sk_data_ready;
 	svsk->sk_owspace = inet->sk_write_space;
+	/*
+	 * This barrier is necessary in order to prevent race condition
+	 * with svc_data_ready(), svc_listen_data_ready() and others
+	 * when calling callbacks above.
+	 */
+	wmb();
+	inet->sk_user_data = svsk;
 
 	/* Initialize the socket */
 	if (sock->type == SOCK_DGRAM)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v4] net: sunrpc: svcsock: fix NULL-pointer exception
  2017-08-21 11:23       ` Vadim Lomovtsev
@ 2017-08-23 10:24         ` Vadim Lomovtsev
  -1 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-23 10:24 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, bfields, jlayton, davem,
	linux-nfs, netdev, linux-kernel, pabeni
  Cc: vlomovts


Hi all,

Any comments on this ?

WBR,
Vadim

On Mon, Aug 21, 2017 at 07:23:07AM -0400, Vadim Lomovtsev wrote:
> While running nfs/connectathon tests kernel NULL-pointer exception
> has been observed due to races in svcsock.c.
> 
> Race is appear when kernel accepts connection by kernel_accept
> (which creates new socket) and start queuing ingress packets
> to new socket. This happens in ksoftirq context which could run
> concurrently on a different core while new socket setup is not done yet.
> 
> The fix is to re-order socket user data init sequence and add
> write/read barrier calls to be sure that we got proper values
> for callback pointers before actually calling them.
> 
> Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> 
> Crash log:
> ---<-snip->---
> [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [ 6708.647093] pgd = ffff0000094e0000
> [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> [ 6708.773822] PC is at 0x0
> [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> [ 6708.789248] sp : ffff810ffbad3900
> [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> [ 6708.872084]
> [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> [ 6708.886075] Call trace:
> [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> [ 6708.973117] [<          (null)>]           (null)
> [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> [ 6709.207967] Code: bad PC value
> [ 6709.211061] SMP: stopping secondary CPUs
> [ 6709.218830] Starting crashdump kernel...
> [ 6709.222749] Bye!
> ---<-snip>---
> 
> Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> Reviewed-by: Jeff Layton <jlayton@redhat.com>
> ---
>  net/sunrpc/svcsock.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 2b720fa..e185001 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -421,6 +421,9 @@ static void svc_data_ready(struct sock *sk)
>  		dprintk("svc: socket %p(inet %p), busy=%d\n",
>  			svsk, sk,
>  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> +
> +		/* Refer to svc_setup_socket() for details. */
> +		rmb();
>  		svsk->sk_odata(sk);
>  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -437,6 +440,9 @@ static void svc_write_space(struct sock *sk)
>  	if (svsk) {
>  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
>  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> +
> +		/* Refer to svc_setup_socket() for details. */
> +		rmb();
>  		svsk->sk_owspace(sk);
>  		svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
> @@ -760,8 +766,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>  	dprintk("svc: socket %p TCP (listen) state change %d\n",
>  		sk, sk->sk_state);
>  
> -	if (svsk)
> +	if (svsk) {
> +		/* Refer to svc_setup_socket() for details. */
> +		rmb();
>  		svsk->sk_odata(sk);
> +	}
> +
>  	/*
>  	 * This callback may called twice when a new connection
>  	 * is established as a child socket inherits everything
> @@ -794,6 +804,8 @@ static void svc_tcp_state_change(struct sock *sk)
>  	if (!svsk)
>  		printk("svc: socket %p: no user data\n", sk);
>  	else {
> +		/* Refer to svc_setup_socket() for details. */
> +		rmb();
>  		svsk->sk_ostate(sk);
>  		if (sk->sk_state != TCP_ESTABLISHED) {
>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> @@ -1381,12 +1393,18 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  		return ERR_PTR(err);
>  	}
>  
> -	inet->sk_user_data = svsk;
>  	svsk->sk_sock = sock;
>  	svsk->sk_sk = inet;
>  	svsk->sk_ostate = inet->sk_state_change;
>  	svsk->sk_odata = inet->sk_data_ready;
>  	svsk->sk_owspace = inet->sk_write_space;
> +	/*
> +	 * This barrier is necessary in order to prevent race condition
> +	 * with svc_data_ready(), svc_listen_data_ready() and others
> +	 * when calling callbacks above.
> +	 */
> +	wmb();
> +	inet->sk_user_data = svsk;
>  
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-23 10:24         ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-23 10:24 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, bfields, jlayton, davem,
	linux-nfs, netdev, linux-kernel, pabeni
  Cc: vlomovts


Hi all,

Any comments on this ?

WBR,
Vadim

On Mon, Aug 21, 2017 at 07:23:07AM -0400, Vadim Lomovtsev wrote:
> While running nfs/connectathon tests kernel NULL-pointer exception
> has been observed due to races in svcsock.c.
> 
> Race is appear when kernel accepts connection by kernel_accept
> (which creates new socket) and start queuing ingress packets
> to new socket. This happens in ksoftirq context which could run
> concurrently on a different core while new socket setup is not done yet.
> 
> The fix is to re-order socket user data init sequence and add
> write/read barrier calls to be sure that we got proper values
> for callback pointers before actually calling them.
> 
> Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> 
> Crash log:
> ---<-snip->---
> [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [ 6708.647093] pgd = ffff0000094e0000
> [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgb
 lt fb_sys_fops
> [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> [ 6708.773822] PC is at 0x0
> [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> [ 6708.789248] sp : ffff810ffbad3900
> [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> [ 6708.872084]
> [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> [ 6708.886075] Call trace:
> [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> [ 6708.973117] [<          (null)>]           (null)
> [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> [ 6709.207967] Code: bad PC value
> [ 6709.211061] SMP: stopping secondary CPUs
> [ 6709.218830] Starting crashdump kernel...
> [ 6709.222749] Bye!
> ---<-snip>---
> 
> Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> Reviewed-by: Jeff Layton <jlayton@redhat.com>
> ---
>  net/sunrpc/svcsock.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 2b720fa..e185001 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -421,6 +421,9 @@ static void svc_data_ready(struct sock *sk)
>  		dprintk("svc: socket %p(inet %p), busy=%d\n",
>  			svsk, sk,
>  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> +
> +		/* Refer to svc_setup_socket() for details. */
> +		rmb();
>  		svsk->sk_odata(sk);
>  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -437,6 +440,9 @@ static void svc_write_space(struct sock *sk)
>  	if (svsk) {
>  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
>  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> +
> +		/* Refer to svc_setup_socket() for details. */
> +		rmb();
>  		svsk->sk_owspace(sk);
>  		svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
> @@ -760,8 +766,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>  	dprintk("svc: socket %p TCP (listen) state change %d\n",
>  		sk, sk->sk_state);
>  
> -	if (svsk)
> +	if (svsk) {
> +		/* Refer to svc_setup_socket() for details. */
> +		rmb();
>  		svsk->sk_odata(sk);
> +	}
> +
>  	/*
>  	 * This callback may called twice when a new connection
>  	 * is established as a child socket inherits everything
> @@ -794,6 +804,8 @@ static void svc_tcp_state_change(struct sock *sk)
>  	if (!svsk)
>  		printk("svc: socket %p: no user data\n", sk);
>  	else {
> +		/* Refer to svc_setup_socket() for details. */
> +		rmb();
>  		svsk->sk_ostate(sk);
>  		if (sk->sk_state != TCP_ESTABLISHED) {
>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> @@ -1381,12 +1393,18 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  		return ERR_PTR(err);
>  	}
>  
> -	inet->sk_user_data = svsk;
>  	svsk->sk_sock = sock;
>  	svsk->sk_sk = inet;
>  	svsk->sk_ostate = inet->sk_state_change;
>  	svsk->sk_odata = inet->sk_data_ready;
>  	svsk->sk_owspace = inet->sk_write_space;
> +	/*
> +	 * This barrier is necessary in order to prevent race condition
> +	 * with svc_data_ready(), svc_listen_data_ready() and others
> +	 * when calling callbacks above.
> +	 */
> +	wmb();
> +	inet->sk_user_data = svsk;
>  
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4] net: sunrpc: svcsock: fix NULL-pointer exception
  2017-08-23 10:24         ` Vadim Lomovtsev
@ 2017-08-23 10:33           ` Jeff Layton
  -1 siblings, 0 replies; 33+ messages in thread
From: Jeff Layton @ 2017-08-23 10:33 UTC (permalink / raw)
  To: Vadim Lomovtsev, trond.myklebust, anna.schumaker, bfields, davem,
	linux-nfs, netdev, linux-kernel, pabeni

On Wed, 2017-08-23 at 06:24 -0400, Vadim Lomovtsev wrote:
> Hi all,
> 
> Any comments on this ?
> 
> WBR,
> Vadim
> 
> On Mon, Aug 21, 2017 at 07:23:07AM -0400, Vadim Lomovtsev wrote:
> > While running nfs/connectathon tests kernel NULL-pointer exception
> > has been observed due to races in svcsock.c.
> > 
> > Race is appear when kernel accepts connection by kernel_accept
> > (which creates new socket) and start queuing ingress packets
> > to new socket. This happens in ksoftirq context which could run
> > concurrently on a different core while new socket setup is not done yet.
> > 
> > The fix is to re-order socket user data init sequence and add
> > write/read barrier calls to be sure that we got proper values
> > for callback pointers before actually calling them.
> > 
> > Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> > 
> > Crash log:
> > ---<-snip->---
> > [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > [ 6708.647093] pgd = ffff0000094e0000
> > [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> > [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> > [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> > [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> > [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> > [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> > [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> > [ 6708.773822] PC is at 0x0
> > [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> > [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> > [ 6708.789248] sp : ffff810ffbad3900
> > [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> > [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> > [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> > [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> > [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> > [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> > [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> > [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> > [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> > [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> > [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> > [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> > [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> > [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> > [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> > [ 6708.872084]
> > [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> > [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> > [ 6708.886075] Call trace:
> > [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> > [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> > [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> > [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> > [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> > [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> > [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> > [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> > [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> > [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> > [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> > [ 6708.973117] [<          (null)>]           (null)
> > [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> > [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> > [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> > [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> > [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> > [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> > [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> > [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> > [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> > [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> > [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> > [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> > [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> > [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> > [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> > [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> > [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> > [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> > [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> > [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> > [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> > [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> > [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> > [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> > [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> > [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> > [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> > [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> > [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> > [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> > [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> > [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> > [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> > [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> > [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> > [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> > [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> > [ 6709.207967] Code: bad PC value
> > [ 6709.211061] SMP: stopping secondary CPUs
> > [ 6709.218830] Starting crashdump kernel...
> > [ 6709.222749] Bye!
> > ---<-snip>---
> > 
> > Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> > Reviewed-by: Jeff Layton <jlayton@redhat.com>
> > ---
> >  net/sunrpc/svcsock.c | 22 ++++++++++++++++++++--
> >  1 file changed, 20 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 2b720fa..e185001 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -421,6 +421,9 @@ static void svc_data_ready(struct sock *sk)
> >  		dprintk("svc: socket %p(inet %p), busy=%d\n",
> >  			svsk, sk,
> >  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > +
> > +		/* Refer to svc_setup_socket() for details. */
> > +		rmb();
> >  		svsk->sk_odata(sk);
> >  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
> >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > @@ -437,6 +440,9 @@ static void svc_write_space(struct sock *sk)
> >  	if (svsk) {
> >  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
> >  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > +
> > +		/* Refer to svc_setup_socket() for details. */
> > +		rmb();
> >  		svsk->sk_owspace(sk);
> >  		svc_xprt_enqueue(&svsk->sk_xprt);
> >  	}
> > @@ -760,8 +766,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
> >  	dprintk("svc: socket %p TCP (listen) state change %d\n",
> >  		sk, sk->sk_state);
> >  
> > -	if (svsk)
> > +	if (svsk) {
> > +		/* Refer to svc_setup_socket() for details. */
> > +		rmb();
> >  		svsk->sk_odata(sk);
> > +	}
> > +
> >  	/*
> >  	 * This callback may called twice when a new connection
> >  	 * is established as a child socket inherits everything
> > @@ -794,6 +804,8 @@ static void svc_tcp_state_change(struct sock *sk)
> >  	if (!svsk)
> >  		printk("svc: socket %p: no user data\n", sk);
> >  	else {
> > +		/* Refer to svc_setup_socket() for details. */
> > +		rmb();
> >  		svsk->sk_ostate(sk);
> >  		if (sk->sk_state != TCP_ESTABLISHED) {
> >  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> > @@ -1381,12 +1393,18 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> >  		return ERR_PTR(err);
> >  	}
> >  
> > -	inet->sk_user_data = svsk;
> >  	svsk->sk_sock = sock;
> >  	svsk->sk_sk = inet;
> >  	svsk->sk_ostate = inet->sk_state_change;
> >  	svsk->sk_odata = inet->sk_data_ready;
> >  	svsk->sk_owspace = inet->sk_write_space;
> > +	/*
> > +	 * This barrier is necessary in order to prevent race condition
> > +	 * with svc_data_ready(), svc_listen_data_ready() and others
> > +	 * when calling callbacks above.
> > +	 */
> > +	wmb();
> > +	inet->sk_user_data = svsk;
> >  
> >  	/* Initialize the socket */
> >  	if (sock->type == SOCK_DGRAM)
> > -- 
> > 1.8.3.1
> > 

I think this one looks fine. Bruce, do you mind picking this one up? It
might also be reasonable for stable...
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-23 10:33           ` Jeff Layton
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff Layton @ 2017-08-23 10:33 UTC (permalink / raw)
  To: Vadim Lomovtsev, trond.myklebust, anna.schumaker, bfields, davem,
	linux-nfs, netdev, linux-kernel, pabeni

On Wed, 2017-08-23 at 06:24 -0400, Vadim Lomovtsev wrote:
> Hi all,
> 
> Any comments on this ?
> 
> WBR,
> Vadim
> 
> On Mon, Aug 21, 2017 at 07:23:07AM -0400, Vadim Lomovtsev wrote:
> > While running nfs/connectathon tests kernel NULL-pointer exception
> > has been observed due to races in svcsock.c.
> > 
> > Race is appear when kernel accepts connection by kernel_accept
> > (which creates new socket) and start queuing ingress packets
> > to new socket. This happens in ksoftirq context which could run
> > concurrently on a different core while new socket setup is not done yet.
> > 
> > The fix is to re-order socket user data init sequence and add
> > write/read barrier calls to be sure that we got proper values
> > for callback pointers before actually calling them.
> > 
> > Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> > 
> > Crash log:
> > ---<-snip->---
> > [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > [ 6708.647093] pgd = ffff0000094e0000
> > [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> > [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> > [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysim
 gblt fb_sys_fops
> > [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> > [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> > [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> > [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> > [ 6708.773822] PC is at 0x0
> > [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> > [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> > [ 6708.789248] sp : ffff810ffbad3900
> > [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> > [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> > [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> > [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> > [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> > [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> > [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> > [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> > [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> > [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> > [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> > [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> > [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> > [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> > [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> > [ 6708.872084]
> > [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> > [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> > [ 6708.886075] Call trace:
> > [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> > [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> > [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> > [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> > [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> > [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> > [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> > [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> > [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> > [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> > [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> > [ 6708.973117] [<          (null)>]           (null)
> > [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> > [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> > [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> > [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> > [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> > [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> > [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> > [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> > [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> > [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> > [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> > [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> > [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> > [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> > [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> > [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> > [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> > [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> > [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> > [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> > [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> > [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> > [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> > [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> > [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> > [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> > [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> > [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> > [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> > [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> > [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> > [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> > [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> > [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> > [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> > [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> > [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> > [ 6709.207967] Code: bad PC value
> > [ 6709.211061] SMP: stopping secondary CPUs
> > [ 6709.218830] Starting crashdump kernel...
> > [ 6709.222749] Bye!
> > ---<-snip>---
> > 
> > Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> > Reviewed-by: Jeff Layton <jlayton@redhat.com>
> > ---
> >  net/sunrpc/svcsock.c | 22 ++++++++++++++++++++--
> >  1 file changed, 20 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 2b720fa..e185001 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -421,6 +421,9 @@ static void svc_data_ready(struct sock *sk)
> >  		dprintk("svc: socket %p(inet %p), busy=%d\n",
> >  			svsk, sk,
> >  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > +
> > +		/* Refer to svc_setup_socket() for details. */
> > +		rmb();
> >  		svsk->sk_odata(sk);
> >  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
> >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > @@ -437,6 +440,9 @@ static void svc_write_space(struct sock *sk)
> >  	if (svsk) {
> >  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
> >  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > +
> > +		/* Refer to svc_setup_socket() for details. */
> > +		rmb();
> >  		svsk->sk_owspace(sk);
> >  		svc_xprt_enqueue(&svsk->sk_xprt);
> >  	}
> > @@ -760,8 +766,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
> >  	dprintk("svc: socket %p TCP (listen) state change %d\n",
> >  		sk, sk->sk_state);
> >  
> > -	if (svsk)
> > +	if (svsk) {
> > +		/* Refer to svc_setup_socket() for details. */
> > +		rmb();
> >  		svsk->sk_odata(sk);
> > +	}
> > +
> >  	/*
> >  	 * This callback may called twice when a new connection
> >  	 * is established as a child socket inherits everything
> > @@ -794,6 +804,8 @@ static void svc_tcp_state_change(struct sock *sk)
> >  	if (!svsk)
> >  		printk("svc: socket %p: no user data\n", sk);
> >  	else {
> > +		/* Refer to svc_setup_socket() for details. */
> > +		rmb();
> >  		svsk->sk_ostate(sk);
> >  		if (sk->sk_state != TCP_ESTABLISHED) {
> >  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> > @@ -1381,12 +1393,18 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> >  		return ERR_PTR(err);
> >  	}
> >  
> > -	inet->sk_user_data = svsk;
> >  	svsk->sk_sock = sock;
> >  	svsk->sk_sk = inet;
> >  	svsk->sk_ostate = inet->sk_state_change;
> >  	svsk->sk_odata = inet->sk_data_ready;
> >  	svsk->sk_owspace = inet->sk_write_space;
> > +	/*
> > +	 * This barrier is necessary in order to prevent race condition
> > +	 * with svc_data_ready(), svc_listen_data_ready() and others
> > +	 * when calling callbacks above.
> > +	 */
> > +	wmb();
> > +	inet->sk_user_data = svsk;
> >  
> >  	/* Initialize the socket */
> >  	if (sock->type == SOCK_DGRAM)
> > -- 
> > 1.8.3.1
> > 

I think this one looks fine. Bruce, do you mind picking this one up? It
might also be reasonable for stable...
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-24 22:13             ` J. Bruce Fields
  0 siblings, 0 replies; 33+ messages in thread
From: J. Bruce Fields @ 2017-08-24 22:13 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Vadim Lomovtsev, trond.myklebust, anna.schumaker, davem,
	linux-nfs, netdev, linux-kernel, pabeni

On Wed, Aug 23, 2017 at 06:33:42AM -0400, Jeff Layton wrote:
> I think this one looks fine. Bruce, do you mind picking this one up? It
> might also be reasonable for stable...

Yep, thanks to you both, I'll plan to send a pull request tonight or
tomorrow.

--b.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v4] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-24 22:13             ` J. Bruce Fields
  0 siblings, 0 replies; 33+ messages in thread
From: J. Bruce Fields @ 2017-08-24 22:13 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Vadim Lomovtsev, trond.myklebust-7I+n7zu2hftEKMMhf/gKZA,
	anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, linux-nfs-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	pabeni-H+wXaHxf7aLQT0dZR+AlfA

On Wed, Aug 23, 2017 at 06:33:42AM -0400, Jeff Layton wrote:
> I think this one looks fine. Bruce, do you mind picking this one up? It
> might also be reasonable for stable...

Yep, thanks to you both, I'll plan to send a pull request tonight or
tomorrow.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
  2017-08-18 10:00 ` Vadim Lomovtsev
@ 2017-08-25 22:01   ` J. Bruce Fields
  -1 siblings, 0 replies; 33+ messages in thread
From: J. Bruce Fields @ 2017-08-25 22:01 UTC (permalink / raw)
  To: Vadim Lomovtsev
  Cc: trond.myklebust, anna.schumaker, jlayton, davem, linux-nfs,
	netdev, linux-kernel, pabeni

On Fri, Aug 18, 2017 at 06:00:47AM -0400, Vadim Lomovtsev wrote:
> While running nfs/connectathon tests kernel NULL-pointer exception
> has been observed due to races in svcsock.c.
> 
> Race is appear when kernel accepts connection by kernel_accept
> (which creates new socket) and start queuing ingress packets
> to new socket. This happanes in ksoftirq context which concurrently
> on a differnt core while new socket setup is not done yet.
> 
> The fix is to re-order socket user data init sequence, add NULL-ptr
> check before callback call along with barriers to prevent kernel crash.
> 
> Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

By the way, is there anything special about your setup that allows you
to reproduce this?  There's nothing special about connectathon tests, so
I'm just wondering why we haven't had a lot of reports of this.

--b.

> 
> Crash log:
> ---<-snip->---
> [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [ 6708.647093] pgd = ffff0000094e0000
> [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> [ 6708.773822] PC is at 0x0
> [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> [ 6708.789248] sp : ffff810ffbad3900
> [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> [ 6708.872084]
> [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> [ 6708.886075] Call trace:
> [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> [ 6708.973117] [<          (null)>]           (null)
> [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> [ 6709.207967] Code: bad PC value
> [ 6709.211061] SMP: stopping secondary CPUs
> [ 6709.218830] Starting crashdump kernel...
> [ 6709.222749] Bye!
> ---<-snip>---
> 
> Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> ---
>  net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 2b720fa..b6496f3 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
>  		dprintk("svc: socket %p(inet %p), busy=%d\n",
>  			svsk, sk,
>  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> -		svsk->sk_odata(sk);
> +		rmb();
> +		if (svsk->sk_odata)
> +			svsk->sk_odata(sk);
>  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
>  			svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
> @@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
>  	if (svsk) {
>  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
>  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> -		svsk->sk_owspace(sk);
> +		rmb();
> +		if (svsk->sk_owspace)
> +			svsk->sk_owspace(sk);
>  		svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
>  }
> @@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>  	dprintk("svc: socket %p TCP (listen) state change %d\n",
>  		sk, sk->sk_state);
>  
> -	if (svsk)
> -		svsk->sk_odata(sk);
> +	if (svsk) { 
> +		rmb();
> +		if (svsk->sk_odata)
> +			svsk->sk_odata(sk);
> +	}
> +
>  	/*
>  	 * This callback may called twice when a new connection
>  	 * is established as a child socket inherits everything
> @@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
>  	if (!svsk)
>  		printk("svc: socket %p: no user data\n", sk);
>  	else {
> -		svsk->sk_ostate(sk);
> +		rmb();
> +		if (svsk->sk_ostate)
> +			svsk->sk_ostate(sk);
> +
>  		if (sk->sk_state != TCP_ESTABLISHED) {
>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  		return ERR_PTR(err);
>  	}
>  
> -	inet->sk_user_data = svsk;
>  	svsk->sk_sock = sock;
>  	svsk->sk_sk = inet;
>  	svsk->sk_ostate = inet->sk_state_change;
>  	svsk->sk_odata = inet->sk_data_ready;
>  	svsk->sk_owspace = inet->sk_write_space;
> +	wmb();
> +	inet->sk_user_data = svsk;
>  
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-25 22:01   ` J. Bruce Fields
  0 siblings, 0 replies; 33+ messages in thread
From: J. Bruce Fields @ 2017-08-25 22:01 UTC (permalink / raw)
  To: Vadim Lomovtsev
  Cc: trond.myklebust, anna.schumaker, jlayton, davem, linux-nfs,
	netdev, linux-kernel, pabeni

On Fri, Aug 18, 2017 at 06:00:47AM -0400, Vadim Lomovtsev wrote:
> While running nfs/connectathon tests kernel NULL-pointer exception
> has been observed due to races in svcsock.c.
> 
> Race is appear when kernel accepts connection by kernel_accept
> (which creates new socket) and start queuing ingress packets
> to new socket. This happanes in ksoftirq context which concurrently
> on a differnt core while new socket setup is not done yet.
> 
> The fix is to re-order socket user data init sequence, add NULL-ptr
> check before callback call along with barriers to prevent kernel crash.
> 
> Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

By the way, is there anything special about your setup that allows you
to reproduce this?  There's nothing special about connectathon tests, so
I'm just wondering why we haven't had a lot of reports of this.

--b.

> 
> Crash log:
> ---<-snip->---
> [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [ 6708.647093] pgd = ffff0000094e0000
> [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgb
 lt fb_sys_fops
> [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> [ 6708.773822] PC is at 0x0
> [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> [ 6708.789248] sp : ffff810ffbad3900
> [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> [ 6708.872084]
> [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> [ 6708.886075] Call trace:
> [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> [ 6708.973117] [<          (null)>]           (null)
> [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> [ 6709.207967] Code: bad PC value
> [ 6709.211061] SMP: stopping secondary CPUs
> [ 6709.218830] Starting crashdump kernel...
> [ 6709.222749] Bye!
> ---<-snip>---
> 
> Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> ---
>  net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 2b720fa..b6496f3 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
>  		dprintk("svc: socket %p(inet %p), busy=%d\n",
>  			svsk, sk,
>  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> -		svsk->sk_odata(sk);
> +		rmb();
> +		if (svsk->sk_odata)
> +			svsk->sk_odata(sk);
>  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
>  			svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
> @@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
>  	if (svsk) {
>  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
>  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> -		svsk->sk_owspace(sk);
> +		rmb();
> +		if (svsk->sk_owspace)
> +			svsk->sk_owspace(sk);
>  		svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
>  }
> @@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>  	dprintk("svc: socket %p TCP (listen) state change %d\n",
>  		sk, sk->sk_state);
>  
> -	if (svsk)
> -		svsk->sk_odata(sk);
> +	if (svsk) { 
> +		rmb();
> +		if (svsk->sk_odata)
> +			svsk->sk_odata(sk);
> +	}
> +
>  	/*
>  	 * This callback may called twice when a new connection
>  	 * is established as a child socket inherits everything
> @@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
>  	if (!svsk)
>  		printk("svc: socket %p: no user data\n", sk);
>  	else {
> -		svsk->sk_ostate(sk);
> +		rmb();
> +		if (svsk->sk_ostate)
> +			svsk->sk_ostate(sk);
> +
>  		if (sk->sk_state != TCP_ESTABLISHED) {
>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  		return ERR_PTR(err);
>  	}
>  
> -	inet->sk_user_data = svsk;
>  	svsk->sk_sock = sock;
>  	svsk->sk_sk = inet;
>  	svsk->sk_ostate = inet->sk_state_change;
>  	svsk->sk_odata = inet->sk_data_ready;
>  	svsk->sk_owspace = inet->sk_write_space;
> +	wmb();
> +	inet->sk_user_data = svsk;
>  
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
  2017-08-25 22:01   ` J. Bruce Fields
  (?)
@ 2017-08-28  8:38     ` Vadim Lomovtsev
  -1 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-28  8:38 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: trond.myklebust, anna.schumaker, jlayton, davem, linux-nfs,
	netdev, linux-kernel, pabeni, vlomovts

On Fri, Aug 25, 2017 at 06:01:28PM -0400, J. Bruce Fields wrote:
> On Fri, Aug 18, 2017 at 06:00:47AM -0400, Vadim Lomovtsev wrote:
> > While running nfs/connectathon tests kernel NULL-pointer exception
> > has been observed due to races in svcsock.c.
> > 
> > Race is appear when kernel accepts connection by kernel_accept
> > (which creates new socket) and start queuing ingress packets
> > to new socket. This happanes in ksoftirq context which concurrently
> > on a differnt core while new socket setup is not done yet.
> > 
> > The fix is to re-order socket user data init sequence, add NULL-ptr
> > check before callback call along with barriers to prevent kernel crash.
> > 
> > Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> 
> By the way, is there anything special about your setup that allows you
> to reproduce this?  There's nothing special about connectathon tests, so
> I'm just wondering why we haven't had a lot of reports of this.

>From what I have now - nothing special in test setup and/or configuration.
I believe it is because or high amount of CPU running. It was found at 32
cores CPU, while simply invoking test by "make run" command.

WBR,
Vadim

> 
> --b.
> 
> > 
> > Crash log:
> > ---<-snip->---
> > [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > [ 6708.647093] pgd = ffff0000094e0000
> > [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> > [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> > [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> > [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> > [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> > [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> > [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> > [ 6708.773822] PC is at 0x0
> > [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> > [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> > [ 6708.789248] sp : ffff810ffbad3900
> > [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> > [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> > [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> > [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> > [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> > [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> > [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> > [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> > [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> > [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> > [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> > [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> > [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> > [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> > [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> > [ 6708.872084]
> > [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> > [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> > [ 6708.886075] Call trace:
> > [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> > [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> > [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> > [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> > [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> > [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> > [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> > [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> > [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> > [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> > [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> > [ 6708.973117] [<          (null)>]           (null)
> > [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> > [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> > [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> > [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> > [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> > [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> > [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> > [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> > [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> > [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> > [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> > [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> > [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> > [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> > [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> > [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> > [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> > [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> > [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> > [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> > [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> > [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> > [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> > [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> > [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> > [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> > [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> > [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> > [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> > [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> > [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> > [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> > [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> > [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> > [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> > [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> > [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> > [ 6709.207967] Code: bad PC value
> > [ 6709.211061] SMP: stopping secondary CPUs
> > [ 6709.218830] Starting crashdump kernel...
> > [ 6709.222749] Bye!
> > ---<-snip>---
> > 
> > Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> > ---
> >  net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
> >  1 file changed, 18 insertions(+), 6 deletions(-)
> > 
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 2b720fa..b6496f3 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
> >  		dprintk("svc: socket %p(inet %p), busy=%d\n",
> >  			svsk, sk,
> >  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > -		svsk->sk_odata(sk);
> > +		rmb();
> > +		if (svsk->sk_odata)
> > +			svsk->sk_odata(sk);
> >  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
> >  			svc_xprt_enqueue(&svsk->sk_xprt);
> >  	}
> > @@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
> >  	if (svsk) {
> >  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
> >  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > -		svsk->sk_owspace(sk);
> > +		rmb();
> > +		if (svsk->sk_owspace)
> > +			svsk->sk_owspace(sk);
> >  		svc_xprt_enqueue(&svsk->sk_xprt);
> >  	}
> >  }
> > @@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
> >  	dprintk("svc: socket %p TCP (listen) state change %d\n",
> >  		sk, sk->sk_state);
> >  
> > -	if (svsk)
> > -		svsk->sk_odata(sk);
> > +	if (svsk) { 
> > +		rmb();
> > +		if (svsk->sk_odata)
> > +			svsk->sk_odata(sk);
> > +	}
> > +
> >  	/*
> >  	 * This callback may called twice when a new connection
> >  	 * is established as a child socket inherits everything
> > @@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
> >  	if (!svsk)
> >  		printk("svc: socket %p: no user data\n", sk);
> >  	else {
> > -		svsk->sk_ostate(sk);
> > +		rmb();
> > +		if (svsk->sk_ostate)
> > +			svsk->sk_ostate(sk);
> > +
> >  		if (sk->sk_state != TCP_ESTABLISHED) {
> >  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > @@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> >  		return ERR_PTR(err);
> >  	}
> >  
> > -	inet->sk_user_data = svsk;
> >  	svsk->sk_sock = sock;
> >  	svsk->sk_sk = inet;
> >  	svsk->sk_ostate = inet->sk_state_change;
> >  	svsk->sk_odata = inet->sk_data_ready;
> >  	svsk->sk_owspace = inet->sk_write_space;
> > +	wmb();
> > +	inet->sk_user_data = svsk;
> >  
> >  	/* Initialize the socket */
> >  	if (sock->type == SOCK_DGRAM)
> > -- 
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-28  8:38     ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-28  8:38 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: trond.myklebust, anna.schumaker, jlayton, davem, linux-nfs,
	netdev, linux-kernel, pabeni, vlomovts

On Fri, Aug 25, 2017 at 06:01:28PM -0400, J. Bruce Fields wrote:
> On Fri, Aug 18, 2017 at 06:00:47AM -0400, Vadim Lomovtsev wrote:
> > While running nfs/connectathon tests kernel NULL-pointer exception
> > has been observed due to races in svcsock.c.
> > 
> > Race is appear when kernel accepts connection by kernel_accept
> > (which creates new socket) and start queuing ingress packets
> > to new socket. This happanes in ksoftirq context which concurrently
> > on a differnt core while new socket setup is not done yet.
> > 
> > The fix is to re-order socket user data init sequence, add NULL-ptr
> > check before callback call along with barriers to prevent kernel crash.
> > 
> > Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> 
> By the way, is there anything special about your setup that allows you
> to reproduce this?  There's nothing special about connectathon tests, so
> I'm just wondering why we haven't had a lot of reports of this.

>From what I have now - nothing special in test setup and/or configuration.
I believe it is because or high amount of CPU running. It was found at 32
cores CPU, while simply invoking test by "make run" command.

WBR,
Vadim

> 
> --b.
> 
> > 
> > Crash log:
> > ---<-snip->---
> > [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> > [ 6708.647093] pgd = ffff0000094e0000
> > [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> > [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> > [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysim
 gblt fb_sys_fops
> > [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> > [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> > [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> > [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> > [ 6708.773822] PC is at 0x0
> > [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> > [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> > [ 6708.789248] sp : ffff810ffbad3900
> > [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> > [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> > [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> > [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> > [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> > [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> > [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> > [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> > [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> > [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> > [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> > [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> > [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> > [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> > [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> > [ 6708.872084]
> > [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> > [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> > [ 6708.886075] Call trace:
> > [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> > [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> > [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> > [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> > [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> > [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> > [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> > [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> > [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> > [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> > [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> > [ 6708.973117] [<          (null)>]           (null)
> > [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> > [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> > [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> > [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> > [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> > [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> > [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> > [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> > [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> > [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> > [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> > [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> > [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> > [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> > [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> > [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> > [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> > [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> > [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> > [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> > [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> > [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> > [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> > [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> > [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> > [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> > [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> > [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> > [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> > [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> > [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> > [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> > [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> > [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> > [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> > [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> > [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> > [ 6709.207967] Code: bad PC value
> > [ 6709.211061] SMP: stopping secondary CPUs
> > [ 6709.218830] Starting crashdump kernel...
> > [ 6709.222749] Bye!
> > ---<-snip>---
> > 
> > Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
> > ---
> >  net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
> >  1 file changed, 18 insertions(+), 6 deletions(-)
> > 
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 2b720fa..b6496f3 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
> >  		dprintk("svc: socket %p(inet %p), busy=%d\n",
> >  			svsk, sk,
> >  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > -		svsk->sk_odata(sk);
> > +		rmb();
> > +		if (svsk->sk_odata)
> > +			svsk->sk_odata(sk);
> >  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
> >  			svc_xprt_enqueue(&svsk->sk_xprt);
> >  	}
> > @@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
> >  	if (svsk) {
> >  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
> >  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> > -		svsk->sk_owspace(sk);
> > +		rmb();
> > +		if (svsk->sk_owspace)
> > +			svsk->sk_owspace(sk);
> >  		svc_xprt_enqueue(&svsk->sk_xprt);
> >  	}
> >  }
> > @@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
> >  	dprintk("svc: socket %p TCP (listen) state change %d\n",
> >  		sk, sk->sk_state);
> >  
> > -	if (svsk)
> > -		svsk->sk_odata(sk);
> > +	if (svsk) { 
> > +		rmb();
> > +		if (svsk->sk_odata)
> > +			svsk->sk_odata(sk);
> > +	}
> > +
> >  	/*
> >  	 * This callback may called twice when a new connection
> >  	 * is established as a child socket inherits everything
> > @@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
> >  	if (!svsk)
> >  		printk("svc: socket %p: no user data\n", sk);
> >  	else {
> > -		svsk->sk_ostate(sk);
> > +		rmb();
> > +		if (svsk->sk_ostate)
> > +			svsk->sk_ostate(sk);
> > +
> >  		if (sk->sk_state != TCP_ESTABLISHED) {
> >  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
> >  			svc_xprt_enqueue(&svsk->sk_xprt);
> > @@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> >  		return ERR_PTR(err);
> >  	}
> >  
> > -	inet->sk_user_data = svsk;
> >  	svsk->sk_sock = sock;
> >  	svsk->sk_sk = inet;
> >  	svsk->sk_ostate = inet->sk_state_change;
> >  	svsk->sk_odata = inet->sk_data_ready;
> >  	svsk->sk_owspace = inet->sk_write_space;
> > +	wmb();
> > +	inet->sk_user_data = svsk;
> >  
> >  	/* Initialize the socket */
> >  	if (sock->type == SOCK_DGRAM)
> > -- 
> > 1.8.3.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
@ 2017-08-28  8:38     ` Vadim Lomovtsev
  0 siblings, 0 replies; 33+ messages in thread
From: Vadim Lomovtsev @ 2017-08-28  8:38 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: trond.myklebust, anna.schumaker, jlayton, davem, linux-nfs,
	netdev, linux-kernel, pabeni, vlomovts

On Fri, Aug 25, 2017 at 06:01:28PM -0400, J. Bruce Fields wrote:
> On Fri, Aug 18, 2017 at 06:00:47AM -0400, Vadim Lomovtsev wrote:
> > While running nfs/connectathon tests kernel NULL-pointer exception
> > has been observed due to races in svcsock.c.
> > 
> > Race is appear when kernel accepts connection by kernel_accept
> > (which creates new socket) and start queuing ingress packets
> > to new socket. This happanes in ksoftirq context which concurrently
> > on a differnt core while new socket setup is not done yet.
> > 
> > The fix is to re-order socket user data init sequence, add NULL-ptr
> > check before callback call along with barriers to prevent kernel crash.
> > 
> > Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> 
> By the way, is there anything special about your setup that allows you
> to reproduce this?  There's nothing special about connectathon tests, so
> I'm just wondering why we haven't had a lot of reports of this.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2017-08-28  8:38 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-18 10:00 [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception Vadim Lomovtsev
2017-08-18 10:00 ` Vadim Lomovtsev
2017-08-18 10:27 ` Jeff Layton
2017-08-18 10:27   ` Jeff Layton
2017-08-18 11:08   ` Vadim Lomovtsev
2017-08-18 11:08     ` Vadim Lomovtsev
2017-08-18 11:16     ` Jeff Layton
2017-08-18 11:16       ` Jeff Layton
2017-08-18 11:34       ` Vadim Lomovtsev
2017-08-18 11:34         ` Vadim Lomovtsev
2017-08-21  8:44 ` [PATCH v2] " Vadim Lomovtsev
2017-08-21  8:44   ` Vadim Lomovtsev
2017-08-21  8:50   ` Vadim Lomovtsev
2017-08-21  8:50     ` Vadim Lomovtsev
2017-08-21  8:56   ` [PATCH v3] " Vadim Lomovtsev
2017-08-21  8:56     ` Vadim Lomovtsev
2017-08-21 10:42     ` Jeff Layton
2017-08-21 10:42       ` Jeff Layton
2017-08-21 10:51       ` Vadim Lomovtsev
2017-08-21 10:51         ` Vadim Lomovtsev
2017-08-21 11:23     ` [PATCH v4] " Vadim Lomovtsev
2017-08-21 11:23       ` Vadim Lomovtsev
2017-08-23 10:24       ` Vadim Lomovtsev
2017-08-23 10:24         ` Vadim Lomovtsev
2017-08-23 10:33         ` Jeff Layton
2017-08-23 10:33           ` Jeff Layton
2017-08-24 22:13           ` J. Bruce Fields
2017-08-24 22:13             ` J. Bruce Fields
2017-08-25 22:01 ` [PATCH] " J. Bruce Fields
2017-08-25 22:01   ` J. Bruce Fields
2017-08-28  8:38   ` Vadim Lomovtsev
2017-08-28  8:38     ` Vadim Lomovtsev
2017-08-28  8:38     ` Vadim Lomovtsev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.