All of lore.kernel.org
 help / color / mirror / Atom feed
From: Senn Klemens <klemens.senn-cv18SyjCLaheoWH0uzbU5w@public.gmane.org>
To: linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: kernel soft lockup after stopping nfsd
Date: Wed, 02 Apr 2014 15:59:04 +0200	[thread overview]
Message-ID: <lhh538$rqm$1@ger.gmane.org> (raw)

Hi,

I am getting a kernel soft lockup if the client reads data from a rdma
mount after stopping the nfs server.

The export on the server side is done with
/data	*(fsid=0,crossmnt,rw,mp,no_root_squash,sync,no_subtree_check,insecure)

Following command is used for mounting the NFSv4 share:
mount -t nfs -o port=20049,rdma,vers=4.0,timeo=900 172.16.100.19:/ /mnt/

For my tests I used the vanilla kernel 3.10.34 with the nfsd patches
from Kinglong Mee and the nfsd-next kernel.

The traceback of the nfs-next kernel is

[  545.442326] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:3124]
[  545.448861] Modules linked in: md5 nfsd auth_rpcgss oid_registry
svcrdma cpuid af_packet 8021q garp stp llc rdma_ucm ib_ucm rdma_cm iw_cm
ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib ib_sa ib_mad ib_core
ib_addr joydev usbhid mlx4_core x86_pkg_temp_thermal coretemp kvm_intel
kvm crc32c_intel ghash_clmulni_intel aesni_intel isci ehci_pci ehci_hcd
ablk_helper cryptd usbcore libsas lrw gf128mul glue_helper iTCO_wdt
aes_x86_64 iTCO_vendor_support sb_edac ioatdma edac_core lpc_ich
scsi_transport_sas i2c_i801 acpi_cpufreq sg pcspkr mfd_core tpm_tis
microcode usb_common tpm ipmi_si wmi ipmi_msghandler processor
thermal_sys button edd autofs4 xfs libcrc32c nfsv3 nfs fscache lockd
nfs_acl sunrpc igb dca i2c_algo_bit ptp pps_core
[  545.448920] CPU: 0 PID: 3124 Comm: kworker/0:2 Not tainted
3.14.0-rc8-net-next-master-20140323+ #1
[  545.448921] Hardware name: Supermicro B9DRG-E/B9DRG-E, BIOS 3.0
09/04/2013
[  545.448925] Workqueue: ib_cm cm_work_handler [ib_cm]
[  545.448926] task: ffff88085de02690 ti: ffff88087c056000 task.ti:
ffff88087c056000
[  545.448927] RIP: 0010:[<ffffffff815a5e37>]  [<ffffffff815a5e37>]
_raw_spin_lock_bh+0x27/0x40
[  545.448932] RSP: 0018:ffff88087c057c38  EFLAGS: 00000297
[  545.448933] RAX: 0000000000008810 RBX: ffffffff00000048 RCX:
0000000000000000
[  545.448934] RDX: 000000000000ffff RSI: 0000000000000000 RDI:
ffff8810590b0204
[  545.448935] RBP: ffff88087c057c38 R08: 000000000000000a R09:
0000000000000890
[  545.448936] R10: 0000000000000000 R11: 000000000000088f R12:
0000000000000246
[  545.448937] R13: ffff88087c057c18 R14: 0000000000000006 R15:
0000000000000048
[  545.448938] FS:  0000000000000000(0000) GS:ffff88087fc00000(0000)
knlGS:0000000000000000
[  545.448939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  545.448940] CR2: 0000000000eb90f8 CR3: 0000000001a0c000 CR4:
00000000000407f0
[  545.448941] Stack:
[  545.448942]  ffff88087c057c78 ffffffffa007be50 ffff88087c057c88
ffff880858679400
[  545.448944]  ffff88085879ec00 ffff88087c057cb0 0000000000000000
0000000000000000
[  545.448947]  ffff88087c057c98 ffffffffa04dbd5b ffff88085879ec00
ffff88085ddd1ce8
[  545.448949] Call Trace:
[  545.448963]  [<ffffffffa007be50>] svc_xprt_enqueue+0x50/0x220 [sunrpc]
[  545.448967]  [<ffffffffa04dbd5b>] rdma_cma_handler+0xdb/0x180 [svcrdma]
[  545.448970]  [<ffffffffa04bbd61>] cma_ib_handler+0xe1/0x210 [rdma_cm]
[  545.448972]  [<ffffffffa03825e0>] cm_process_work+0x20/0x110 [ib_cm]
[  545.448975]  [<ffffffffa0384a7b>] cm_work_handler+0x98b/0x10a8 [ib_cm]
[  545.448979]  [<ffffffff8106b5b7>] process_one_work+0x177/0x410
[  545.448981]  [<ffffffff8106bcda>] worker_thread+0x11a/0x370
[  545.448983]  [<ffffffff8106bbc0>] ? rescuer_thread+0x330/0x330
[  545.448985]  [<ffffffff810727d4>] kthread+0xc4/0xe0
[  545.448987]  [<ffffffff81072710>] ? flush_kthread_worker+0x70/0x70
[  545.448990]  [<ffffffff815addfc>] ret_from_fork+0x7c/0xb0
[  545.448991]  [<ffffffff81072710>] ? flush_kthread_worker+0x70/0x70
[  545.448992] Code: 75 f6 5d c3 55 65 81 04 25 e0 b8 00 00 00 02 00 00
48 89 e5 b8 00 00 01 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 04 5d c3
f3 90 <0f> b7 07 66 39 d0 75 f6 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00

The last log messages are
[  507.831530] svc: __svc_unregister(nfsaclv3), error -107
[  507.831531] RPC:       shutting down rpcbind client for localhost
[  507.831532] RPC:       rpc_release_client(ffff88105d7b6600)
[  507.831533] RPC:       destroying rpcbind client for localhost
[  507.831537] RPC:       shutting down rpcbind client for localhost
[  507.831538] RPC:       rpc_release_client(ffff88105d939e00)
[  507.831539] RPC:       destroying rpcbind client for localhost
[  507.831540] RPC:       destroying transport ffff88105d81d800
[  507.831542] RPC:       xs_destroy xprt ffff88105d81d800
[  507.831543] RPC:       xs_close xprt ffff88105d81d800
[  507.831544] RPC:       disconnected transport ffff88105d81d800
[  507.831547] nfsd: last server has exited, flushing export cache
[  507.831563] svc: svc_destroy(nfsd, 1)
[  518.809147] svcrdma: Disconnect on DTO xprt=ffff880858679400,
cm_id=ffff88085879ec00


Following patch fixed the problem on both kernels:

---
 net/sunrpc/svc_xprt.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 06c6ff0..0b68f32 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -328,6 +328,8 @@ static void svc_thread_dequeue(struct svc_pool
*pool, struct svc_rqst *rqstp)

 static bool svc_xprt_has_something_to_do(struct svc_xprt *xprt)
 {
+	if (xprt->xpt_flags & (1<<XPT_DEAD))
+		return false;
 	if (xprt->xpt_flags & ((1<<XPT_CONN)|(1<<XPT_CLOSE)))
 		return true;
 	if (xprt->xpt_flags & ((1<<XPT_DATA)|(1<<XPT_DEFERRED)))
-- 
1.8.1.4

Is this the correct place to fix the soft lockup?

Kind regards,
Klemens

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

WARNING: multiple messages have this Message-ID (diff)
From: Senn Klemens <klemens.senn@ims.co.at>
To: linux-nfs@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Subject: kernel soft lockup after stopping nfsd
Date: Wed, 02 Apr 2014 15:59:04 +0200	[thread overview]
Message-ID: <lhh538$rqm$1@ger.gmane.org> (raw)

Hi,

I am getting a kernel soft lockup if the client reads data from a rdma
mount after stopping the nfs server.

The export on the server side is done with
/data	*(fsid=0,crossmnt,rw,mp,no_root_squash,sync,no_subtree_check,insecure)

Following command is used for mounting the NFSv4 share:
mount -t nfs -o port=20049,rdma,vers=4.0,timeo=900 172.16.100.19:/ /mnt/

For my tests I used the vanilla kernel 3.10.34 with the nfsd patches
from Kinglong Mee and the nfsd-next kernel.

The traceback of the nfs-next kernel is

[  545.442326] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:2:3124]
[  545.448861] Modules linked in: md5 nfsd auth_rpcgss oid_registry
svcrdma cpuid af_packet 8021q garp stp llc rdma_ucm ib_ucm rdma_cm iw_cm
ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib ib_sa ib_mad ib_core
ib_addr joydev usbhid mlx4_core x86_pkg_temp_thermal coretemp kvm_intel
kvm crc32c_intel ghash_clmulni_intel aesni_intel isci ehci_pci ehci_hcd
ablk_helper cryptd usbcore libsas lrw gf128mul glue_helper iTCO_wdt
aes_x86_64 iTCO_vendor_support sb_edac ioatdma edac_core lpc_ich
scsi_transport_sas i2c_i801 acpi_cpufreq sg pcspkr mfd_core tpm_tis
microcode usb_common tpm ipmi_si wmi ipmi_msghandler processor
thermal_sys button edd autofs4 xfs libcrc32c nfsv3 nfs fscache lockd
nfs_acl sunrpc igb dca i2c_algo_bit ptp pps_core
[  545.448920] CPU: 0 PID: 3124 Comm: kworker/0:2 Not tainted
3.14.0-rc8-net-next-master-20140323+ #1
[  545.448921] Hardware name: Supermicro B9DRG-E/B9DRG-E, BIOS 3.0
09/04/2013
[  545.448925] Workqueue: ib_cm cm_work_handler [ib_cm]
[  545.448926] task: ffff88085de02690 ti: ffff88087c056000 task.ti:
ffff88087c056000
[  545.448927] RIP: 0010:[<ffffffff815a5e37>]  [<ffffffff815a5e37>]
_raw_spin_lock_bh+0x27/0x40
[  545.448932] RSP: 0018:ffff88087c057c38  EFLAGS: 00000297
[  545.448933] RAX: 0000000000008810 RBX: ffffffff00000048 RCX:
0000000000000000
[  545.448934] RDX: 000000000000ffff RSI: 0000000000000000 RDI:
ffff8810590b0204
[  545.448935] RBP: ffff88087c057c38 R08: 000000000000000a R09:
0000000000000890
[  545.448936] R10: 0000000000000000 R11: 000000000000088f R12:
0000000000000246
[  545.448937] R13: ffff88087c057c18 R14: 0000000000000006 R15:
0000000000000048
[  545.448938] FS:  0000000000000000(0000) GS:ffff88087fc00000(0000)
knlGS:0000000000000000
[  545.448939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  545.448940] CR2: 0000000000eb90f8 CR3: 0000000001a0c000 CR4:
00000000000407f0
[  545.448941] Stack:
[  545.448942]  ffff88087c057c78 ffffffffa007be50 ffff88087c057c88
ffff880858679400
[  545.448944]  ffff88085879ec00 ffff88087c057cb0 0000000000000000
0000000000000000
[  545.448947]  ffff88087c057c98 ffffffffa04dbd5b ffff88085879ec00
ffff88085ddd1ce8
[  545.448949] Call Trace:
[  545.448963]  [<ffffffffa007be50>] svc_xprt_enqueue+0x50/0x220 [sunrpc]
[  545.448967]  [<ffffffffa04dbd5b>] rdma_cma_handler+0xdb/0x180 [svcrdma]
[  545.448970]  [<ffffffffa04bbd61>] cma_ib_handler+0xe1/0x210 [rdma_cm]
[  545.448972]  [<ffffffffa03825e0>] cm_process_work+0x20/0x110 [ib_cm]
[  545.448975]  [<ffffffffa0384a7b>] cm_work_handler+0x98b/0x10a8 [ib_cm]
[  545.448979]  [<ffffffff8106b5b7>] process_one_work+0x177/0x410
[  545.448981]  [<ffffffff8106bcda>] worker_thread+0x11a/0x370
[  545.448983]  [<ffffffff8106bbc0>] ? rescuer_thread+0x330/0x330
[  545.448985]  [<ffffffff810727d4>] kthread+0xc4/0xe0
[  545.448987]  [<ffffffff81072710>] ? flush_kthread_worker+0x70/0x70
[  545.448990]  [<ffffffff815addfc>] ret_from_fork+0x7c/0xb0
[  545.448991]  [<ffffffff81072710>] ? flush_kthread_worker+0x70/0x70
[  545.448992] Code: 75 f6 5d c3 55 65 81 04 25 e0 b8 00 00 00 02 00 00
48 89 e5 b8 00 00 01 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 04 5d c3
f3 90 <0f> b7 07 66 39 d0 75 f6 5d c3 66 66 66 66 66 66 2e 0f 1f 84 00

The last log messages are
[  507.831530] svc: __svc_unregister(nfsaclv3), error -107
[  507.831531] RPC:       shutting down rpcbind client for localhost
[  507.831532] RPC:       rpc_release_client(ffff88105d7b6600)
[  507.831533] RPC:       destroying rpcbind client for localhost
[  507.831537] RPC:       shutting down rpcbind client for localhost
[  507.831538] RPC:       rpc_release_client(ffff88105d939e00)
[  507.831539] RPC:       destroying rpcbind client for localhost
[  507.831540] RPC:       destroying transport ffff88105d81d800
[  507.831542] RPC:       xs_destroy xprt ffff88105d81d800
[  507.831543] RPC:       xs_close xprt ffff88105d81d800
[  507.831544] RPC:       disconnected transport ffff88105d81d800
[  507.831547] nfsd: last server has exited, flushing export cache
[  507.831563] svc: svc_destroy(nfsd, 1)
[  518.809147] svcrdma: Disconnect on DTO xprt=ffff880858679400,
cm_id=ffff88085879ec00


Following patch fixed the problem on both kernels:

---
 net/sunrpc/svc_xprt.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 06c6ff0..0b68f32 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -328,6 +328,8 @@ static void svc_thread_dequeue(struct svc_pool
*pool, struct svc_rqst *rqstp)

 static bool svc_xprt_has_something_to_do(struct svc_xprt *xprt)
 {
+	if (xprt->xpt_flags & (1<<XPT_DEAD))
+		return false;
 	if (xprt->xpt_flags & ((1<<XPT_CONN)|(1<<XPT_CLOSE)))
 		return true;
 	if (xprt->xpt_flags & ((1<<XPT_DATA)|(1<<XPT_DEFERRED)))
-- 
1.8.1.4

Is this the correct place to fix the soft lockup?

Kind regards,
Klemens


             reply	other threads:[~2014-04-02 13:59 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-02 13:59 Senn Klemens [this message]
2014-04-02 13:59 ` kernel soft lockup after stopping nfsd Senn Klemens
     [not found] <md5:9sC4QT4T8lFPdsGxv8Rq+w==>
2014-04-03  7:19 ` Klemens Senn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='lhh538$rqm$1@ger.gmane.org' \
    --to=klemens.senn-cv18syjclaheowh0uzbu5w@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.