From: Sagi Grimberg <sagi@grimberg.me> To: Krishnamraju Eraparaju <krishna2@chelsio.com> Cc: linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org, bharat@chelsio.com Subject: Re: Hang at NVME Host caused by Controller reset Date: Tue, 28 Jul 2020 11:35:53 -0700 [thread overview] Message-ID: <3963dc58-1d64-b6e1-ea27-06f3030d5c6e@grimberg.me> (raw) In-Reply-To: <20200728174224.GA5497@chelsio.com> > Sagi, > > Yes, Multipath is disabled. Thanks. > This time, with "nvme-fabrics: allow to queue requests for live queues" > patch applied, I see hang only at blk_queue_enter(): Interesting, does the reset loop hang? or is it able to make forward progress? > [Jul28 17:25] INFO: task nvme:21119 blocked for more than 122 seconds. > [ +0.000061] Not tainted 5.8.0-rc7ekr+ #2 > [ +0.000052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ +0.000059] nvme D14392 21119 2456 0x00004000 > [ +0.000059] Call Trace: > [ +0.000110] __schedule+0x32b/0x670 > [ +0.000108] schedule+0x45/0xb0 > [ +0.000107] blk_queue_enter+0x1e9/0x250 > [ +0.000109] ? wait_woken+0x70/0x70 > [ +0.000110] blk_mq_alloc_request+0x53/0xc0 > [ +0.000111] nvme_alloc_request+0x61/0x70 [nvme_core] > [ +0.000121] nvme_submit_user_cmd+0x50/0x310 [nvme_core] > [ +0.000118] nvme_user_cmd+0x12e/0x1c0 [nvme_core] > [ +0.000163] ? _copy_to_user+0x22/0x30 > [ +0.000113] blkdev_ioctl+0x100/0x250 > [ +0.000115] block_ioctl+0x34/0x40 > [ +0.000110] ksys_ioctl+0x82/0xc0 > [ +0.000109] __x64_sys_ioctl+0x11/0x20 > [ +0.000109] do_syscall_64+0x3e/0x70 > [ +0.000120] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ +0.000112] RIP: 0033:0x7fbe9cdbb67b > [ +0.000110] Code: Bad RIP value. > [ +0.000124] RSP: 002b:00007ffd61ff5778 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > [ +0.000170] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: > 00007fbe9cdbb67b > [ +0.000114] RDX: 00007ffd61ff5780 RSI: 00000000c0484e43 RDI: > 0000000000000003 > [ +0.000113] RBP: 0000000000000000 R08: 0000000000000001 R09: > 0000000000000000 > [ +0.000115] R10: 0000000000000000 R11: 0000000000000246 R12: > 00007ffd61ff7219 > [ +0.000123] R13: 0000000000000006 R14: 00007ffd61ff5e30 R15: > 000055e09c1854a0 > [ +0.000115] Kernel panic - not syncing: hung_task: blocked tasks For some reason the ioctl is not woken up when unfreezing the queue... > You could easily reproduce this by running below, parallelly, for 10min: > while [ 1 ]; do nvme write-zeroes /dev/nvme0n1 -s 1 -c 1; done > while [ 1 ]; do echo 1 > /sys/block/nvme0n1/device/reset_controller; > done > while [ 1 ]; do ifconfig enp2s0f4 down; sleep 24; ifconfig enp2s0f4 up; > sleep 28; done > > Not sure using nvme-write this way is valid or not.. sure it is, its I/O just like fs I/O.
WARNING: multiple messages have this Message-ID (diff)
From: Sagi Grimberg <sagi@grimberg.me> To: Krishnamraju Eraparaju <krishna2@chelsio.com> Cc: linux-rdma@vger.kernel.org, bharat@chelsio.com, linux-nvme@lists.infradead.org Subject: Re: Hang at NVME Host caused by Controller reset Date: Tue, 28 Jul 2020 11:35:53 -0700 [thread overview] Message-ID: <3963dc58-1d64-b6e1-ea27-06f3030d5c6e@grimberg.me> (raw) In-Reply-To: <20200728174224.GA5497@chelsio.com> > Sagi, > > Yes, Multipath is disabled. Thanks. > This time, with "nvme-fabrics: allow to queue requests for live queues" > patch applied, I see hang only at blk_queue_enter(): Interesting, does the reset loop hang? or is it able to make forward progress? > [Jul28 17:25] INFO: task nvme:21119 blocked for more than 122 seconds. > [ +0.000061] Not tainted 5.8.0-rc7ekr+ #2 > [ +0.000052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ +0.000059] nvme D14392 21119 2456 0x00004000 > [ +0.000059] Call Trace: > [ +0.000110] __schedule+0x32b/0x670 > [ +0.000108] schedule+0x45/0xb0 > [ +0.000107] blk_queue_enter+0x1e9/0x250 > [ +0.000109] ? wait_woken+0x70/0x70 > [ +0.000110] blk_mq_alloc_request+0x53/0xc0 > [ +0.000111] nvme_alloc_request+0x61/0x70 [nvme_core] > [ +0.000121] nvme_submit_user_cmd+0x50/0x310 [nvme_core] > [ +0.000118] nvme_user_cmd+0x12e/0x1c0 [nvme_core] > [ +0.000163] ? _copy_to_user+0x22/0x30 > [ +0.000113] blkdev_ioctl+0x100/0x250 > [ +0.000115] block_ioctl+0x34/0x40 > [ +0.000110] ksys_ioctl+0x82/0xc0 > [ +0.000109] __x64_sys_ioctl+0x11/0x20 > [ +0.000109] do_syscall_64+0x3e/0x70 > [ +0.000120] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ +0.000112] RIP: 0033:0x7fbe9cdbb67b > [ +0.000110] Code: Bad RIP value. > [ +0.000124] RSP: 002b:00007ffd61ff5778 EFLAGS: 00000246 ORIG_RAX: > 0000000000000010 > [ +0.000170] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: > 00007fbe9cdbb67b > [ +0.000114] RDX: 00007ffd61ff5780 RSI: 00000000c0484e43 RDI: > 0000000000000003 > [ +0.000113] RBP: 0000000000000000 R08: 0000000000000001 R09: > 0000000000000000 > [ +0.000115] R10: 0000000000000000 R11: 0000000000000246 R12: > 00007ffd61ff7219 > [ +0.000123] R13: 0000000000000006 R14: 00007ffd61ff5e30 R15: > 000055e09c1854a0 > [ +0.000115] Kernel panic - not syncing: hung_task: blocked tasks For some reason the ioctl is not woken up when unfreezing the queue... > You could easily reproduce this by running below, parallelly, for 10min: > while [ 1 ]; do nvme write-zeroes /dev/nvme0n1 -s 1 -c 1; done > while [ 1 ]; do echo 1 > /sys/block/nvme0n1/device/reset_controller; > done > while [ 1 ]; do ifconfig enp2s0f4 down; sleep 24; ifconfig enp2s0f4 up; > sleep 28; done > > Not sure using nvme-write this way is valid or not.. sure it is, its I/O just like fs I/O. _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-07-28 18:35 UTC|newest] Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-07-27 18:19 Hang at NVME Host caused by Controller reset Krishnamraju Eraparaju 2020-07-27 18:19 ` Krishnamraju Eraparaju 2020-07-27 18:47 ` Sagi Grimberg 2020-07-27 18:47 ` Sagi Grimberg 2020-07-28 11:59 ` Krishnamraju Eraparaju 2020-07-28 11:59 ` Krishnamraju Eraparaju 2020-07-28 15:54 ` Sagi Grimberg 2020-07-28 15:54 ` Sagi Grimberg 2020-07-28 17:42 ` Krishnamraju Eraparaju 2020-07-28 17:42 ` Krishnamraju Eraparaju 2020-07-28 18:35 ` Sagi Grimberg [this message] 2020-07-28 18:35 ` Sagi Grimberg 2020-07-28 20:20 ` Sagi Grimberg 2020-07-28 20:20 ` Sagi Grimberg 2020-07-29 8:57 ` Krishnamraju Eraparaju 2020-07-29 8:57 ` Krishnamraju Eraparaju 2020-07-29 9:28 ` Sagi Grimberg [not found] ` <20200730162056.GA17468@chelsio.com> 2020-07-30 20:59 ` Sagi Grimberg 2020-07-30 21:32 ` Krishnamraju Eraparaju
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=3963dc58-1d64-b6e1-ea27-06f3030d5c6e@grimberg.me \ --to=sagi@grimberg.me \ --cc=bharat@chelsio.com \ --cc=krishna2@chelsio.com \ --cc=linux-nvme@lists.infradead.org \ --cc=linux-rdma@vger.kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.