From: Krishnamraju Eraparaju <krishna2@chelsio.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-rdma@vger.kernel.org, bharat@chelsio.com,
linux-nvme@lists.infradead.org
Subject: Re: Hang at NVME Host caused by Controller reset
Date: Tue, 28 Jul 2020 17:29:07 +0530 [thread overview]
Message-ID: <20200728115904.GA5508@chelsio.com> (raw)
In-Reply-To: <9b8dae53-1fcc-3c03-5fcd-cfb55cd8cc80@grimberg.me>
Sagi,
With the given patch, I am no more seeing the freeze_queue_wait hang
issue, but I am seeing another hang issue:
dmesg:
[Jul28 11:01] igb 0000:03:00.0 enp3s0f0: igb: enp3s0f0 NIC Link is Up
1000 Mbps Full Duplex, Flow Control: RX
[ +0.000137] IPv6: ADDRCONF(NETDEV_CHANGE): enp3s0f0: link becomes
ready
[Jul28 11:17] cxgb4 0000:02:00.4 enp2s0f4: passive DA module inserted
[ +0.579450] cxgb4 0000:02:00.4 enp2s0f4: link up, 40Gbps, full-duplex,
Tx/Rx PAUSE
[ +0.000683] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes
ready
[Jul28 11:19] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full
support of multi-port devices.
[ +0.000159] nvme nvme0: creating 1 I/O queues.
[ +0.000350] nvme nvme0: mapped 1/0/0 default/read/poll queues.
[ +0.001316] nvme nvme0: new ctrl: NQN "nvme-ram0", addr 102.1.1.6:4420
[Jul28 11:20] DEBUG: cpu: 3: blk_queue_enter:448 process is "nvme" (pid
4011)
q->mq_freeze_depth: 1
(pm || (blk_pm_request_resume(q),!blk_queue_pm_only(q)))): 1
blk_queue_dying(q): 0
[ +21.511514] cxgb4 0000:02:00.4: Port 0 link down, reason: Link Down
[ +0.560355] cxgb4 0000:02:00.4 enp2s0f4: link up, 40Gbps, full-duplex,
Tx/Rx PAUSE
[ +0.000941] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes
ready
[Jul28 11:21] cxgb4 0000:02:00.4: Port 0 link down, reason: Link Down
[ +0.552934] cxgb4 0000:02:00.4 enp2s0f4: link up, 40Gbps, full-duplex,
Tx/Rx PAUSE
[ +0.001076] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes
ready
[Jul28 11:22] cxgb4 0000:02:00.4: Port 0 link down, reason: Link Down
[ +0.615365] cxgb4 0000:02:00.4 enp2s0f4: link up, 40Gbps, full-duplex,
Tx/Rx PAUSE
[ +0.000886] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes
ready
[Jul28 11:23] cxgb4 0000:02:00.4: Port 0 link down, reason: Link Down
[ +0.556661] cxgb4 0000:02:00.4 enp2s0f4: link up, 40Gbps, full-duplex,
Tx/Rx PAUSE
[ +0.000837] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes
ready
[ +3.765550] INFO: task bash:3014 blocked for more than 122 seconds.
[ +0.000067] Not tainted 5.8.0-rc7ekr+ #2
[ +0.000057] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ +0.000064] bash D14272 3014 2417 0x00000000
[ +0.000066] Call Trace:
[ +0.000064] __schedule+0x32b/0x670
[ +0.000060] schedule+0x45/0xb0
[ +0.000059] schedule_timeout+0x216/0x330
[ +0.000060] ? enqueue_task_fair+0x196/0x7e0
[ +0.000059] wait_for_completion+0x81/0xe0
[ +0.000061] __flush_work+0x114/0x1c0
[ +0.000058] ? flush_workqueue_prep_pwqs+0x130/0x130
[ +0.000066] nvme_reset_ctrl_sync+0x25/0x40 [nvme_core]
[ +0.000125] nvme_sysfs_reset+0xd/0x20 [nvme_core]
[ +0.000137] kernfs_fop_write+0xbc/0x1a0
[ +0.000114] vfs_write+0xc2/0x1f0
[ +0.000120] ksys_write+0x5a/0xd0
[ +0.000106] do_syscall_64+0x3e/0x70
[ +0.000122] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ +0.000115] RIP: 0033:0x7f8124b93317
[ +0.000110] Code: Bad RIP value.
[ +0.000109] RSP: 002b:00007ffdbbbff1c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ +0.000182] RAX: ffffffffffffffda RBX: 0000000000000002 RCX:
00007f8124b93317
[ +0.000138] RDX: 0000000000000002 RSI: 0000559345c156d0 RDI:
0000000000000001
[ +0.000125] RBP: 0000559345c156d0 R08: 000000000000000a R09:
0000000000000001
[ +0.000117] R10: 00005593453d1471 R11: 0000000000000246 R12:
0000000000000002
[ +0.000116] R13: 00007f8124c6d6a0 R14: 00007f8124c6e4a0 R15:
00007f8124c6d8a0
[ +0.000121] INFO: task nvme:4011 blocked for more than 122 seconds.
[ +0.000118] Not tainted 5.8.0-rc7ekr+ #2
[ +0.000114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ +0.000190] nvme D14392 4011 2326 0x00004000
[ +0.000132] Call Trace:
[ +0.000117] __schedule+0x32b/0x670
[ +0.000109] schedule+0x45/0xb0
[ +0.000108] blk_queue_enter+0x1e9/0x250
[ +0.000109] ? wait_woken+0x70/0x70
[ +0.000108] blk_mq_alloc_request+0x53/0xc0
[ +0.000112] nvme_alloc_request+0x61/0x70 [nvme_core]
[ +0.000118] nvme_submit_user_cmd+0x50/0x310 [nvme_core]
[ +0.000126] nvme_user_cmd+0x12e/0x1c0 [nvme_core]
[ +0.000124] ? _copy_to_user+0x22/0x30
[ +0.000108] blkdev_ioctl+0x100/0x250
[ +0.000109] block_ioctl+0x34/0x40
[ +0.000110] ksys_ioctl+0x82/0xc0
[ +0.000106] __x64_sys_ioctl+0x11/0x20
[ +0.000126] do_syscall_64+0x3e/0x70
[ +0.000113] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ +0.000132] RIP: 0033:0x7fed0bd2967b
[ +0.000134] Code: Bad RIP value.
[ +0.000107] RSP: 002b:00007fff55b568a8 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ +0.000172] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
00007fed0bd2967b
[ +0.000112] RDX: 00007fff55b568b0 RSI: 00000000c0484e43 RDI:
0000000000000003
[ +0.000113] RBP: 0000000000000000 R08: 0000000000000001 R09:
0000000000000000
[ +0.000130] R10: 0000000000000000 R11: 0000000000000246 R12:
00007fff55b5878a
[ +0.000119] R13: 0000000000000006 R14: 00007fff55b56f60 R15:
00005595f54554a0
[ +0.000135] Kernel panic - not syncing: hung_task: blocked tasks
[ +0.000141] CPU: 8 PID: 520 Comm: khungtaskd Not tainted 5.8.0-rc7ekr+
#2
Testcase:
while [ 1 ]; do nvme write-zeroes /dev/nvme0n1 -s 1 -c 1; done
while [ 1 ]; do echo 1 > /sys/block/nvme0n1/device/reset_controller;
done
while [ 1 ]; do ifconfig enp2s0f4 down; sleep 24; ifconfig enp2s0f4 up;
sleep 28; done
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-07-28 11:59 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-27 18:19 Hang at NVME Host caused by Controller reset Krishnamraju Eraparaju
2020-07-27 18:47 ` Sagi Grimberg
2020-07-28 11:59 ` Krishnamraju Eraparaju [this message]
2020-07-28 15:54 ` Sagi Grimberg
2020-07-28 17:42 ` Krishnamraju Eraparaju
2020-07-28 18:35 ` Sagi Grimberg
2020-07-28 20:20 ` Sagi Grimberg
2020-07-29 8:57 ` Krishnamraju Eraparaju
2020-07-29 9:28 ` Sagi Grimberg
[not found] ` <20200730162056.GA17468@chelsio.com>
2020-07-30 20:59 ` Sagi Grimberg
2020-07-30 21:32 ` Krishnamraju Eraparaju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200728115904.GA5508@chelsio.com \
--to=krishna2@chelsio.com \
--cc=bharat@chelsio.com \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).