All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Krishnamraju Eraparaju <krishna2@chelsio.com>
Cc: linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
	bharat@chelsio.com
Subject: Re: Hang at NVME Host caused by Controller reset
Date: Tue, 28 Jul 2020 11:35:53 -0700	[thread overview]
Message-ID: <3963dc58-1d64-b6e1-ea27-06f3030d5c6e@grimberg.me> (raw)
In-Reply-To: <20200728174224.GA5497@chelsio.com>

> Sagi,
> 
> Yes, Multipath is disabled.

Thanks.

> This time, with "nvme-fabrics: allow to queue requests for live queues"
> patch applied, I see hang only at blk_queue_enter():

Interesting, does the reset loop hang? or is it able to make forward
progress?

> [Jul28 17:25] INFO: task nvme:21119 blocked for more than 122 seconds.
> [  +0.000061]       Not tainted 5.8.0-rc7ekr+ #2
> [  +0.000052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  +0.000059] nvme            D14392 21119   2456 0x00004000
> [  +0.000059] Call Trace:
> [  +0.000110]  __schedule+0x32b/0x670
> [  +0.000108]  schedule+0x45/0xb0
> [  +0.000107]  blk_queue_enter+0x1e9/0x250
> [  +0.000109]  ? wait_woken+0x70/0x70
> [  +0.000110]  blk_mq_alloc_request+0x53/0xc0
> [  +0.000111]  nvme_alloc_request+0x61/0x70 [nvme_core]
> [  +0.000121]  nvme_submit_user_cmd+0x50/0x310 [nvme_core]
> [  +0.000118]  nvme_user_cmd+0x12e/0x1c0 [nvme_core]
> [  +0.000163]  ? _copy_to_user+0x22/0x30
> [  +0.000113]  blkdev_ioctl+0x100/0x250
> [  +0.000115]  block_ioctl+0x34/0x40
> [  +0.000110]  ksys_ioctl+0x82/0xc0
> [  +0.000109]  __x64_sys_ioctl+0x11/0x20
> [  +0.000109]  do_syscall_64+0x3e/0x70
> [  +0.000120]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  +0.000112] RIP: 0033:0x7fbe9cdbb67b
> [  +0.000110] Code: Bad RIP value.
> [  +0.000124] RSP: 002b:00007ffd61ff5778 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  +0.000170] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
> 00007fbe9cdbb67b
> [  +0.000114] RDX: 00007ffd61ff5780 RSI: 00000000c0484e43 RDI:
> 0000000000000003
> [  +0.000113] RBP: 0000000000000000 R08: 0000000000000001 R09:
> 0000000000000000
> [  +0.000115] R10: 0000000000000000 R11: 0000000000000246 R12:
> 00007ffd61ff7219
> [  +0.000123] R13: 0000000000000006 R14: 00007ffd61ff5e30 R15:
> 000055e09c1854a0
> [  +0.000115] Kernel panic - not syncing: hung_task: blocked tasks

For some reason the ioctl is not woken up when unfreezing the queue...

> You could easily reproduce this by running below, parallelly, for 10min:
>   while [ 1 ]; do  nvme write-zeroes /dev/nvme0n1 -s 1 -c 1; done
>   while [ 1 ]; do echo 1 > /sys/block/nvme0n1/device/reset_controller;
> done
>   while [ 1 ]; do ifconfig enp2s0f4 down; sleep 24; ifconfig enp2s0f4 up;
> sleep 28; done
>   
>   Not sure using nvme-write this way is valid or not..

sure it is, its I/O just like fs I/O.

WARNING: multiple messages have this Message-ID (diff)
From: Sagi Grimberg <sagi@grimberg.me>
To: Krishnamraju Eraparaju <krishna2@chelsio.com>
Cc: linux-rdma@vger.kernel.org, bharat@chelsio.com,
	linux-nvme@lists.infradead.org
Subject: Re: Hang at NVME Host caused by Controller reset
Date: Tue, 28 Jul 2020 11:35:53 -0700	[thread overview]
Message-ID: <3963dc58-1d64-b6e1-ea27-06f3030d5c6e@grimberg.me> (raw)
In-Reply-To: <20200728174224.GA5497@chelsio.com>

> Sagi,
> 
> Yes, Multipath is disabled.

Thanks.

> This time, with "nvme-fabrics: allow to queue requests for live queues"
> patch applied, I see hang only at blk_queue_enter():

Interesting, does the reset loop hang? or is it able to make forward
progress?

> [Jul28 17:25] INFO: task nvme:21119 blocked for more than 122 seconds.
> [  +0.000061]       Not tainted 5.8.0-rc7ekr+ #2
> [  +0.000052] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  +0.000059] nvme            D14392 21119   2456 0x00004000
> [  +0.000059] Call Trace:
> [  +0.000110]  __schedule+0x32b/0x670
> [  +0.000108]  schedule+0x45/0xb0
> [  +0.000107]  blk_queue_enter+0x1e9/0x250
> [  +0.000109]  ? wait_woken+0x70/0x70
> [  +0.000110]  blk_mq_alloc_request+0x53/0xc0
> [  +0.000111]  nvme_alloc_request+0x61/0x70 [nvme_core]
> [  +0.000121]  nvme_submit_user_cmd+0x50/0x310 [nvme_core]
> [  +0.000118]  nvme_user_cmd+0x12e/0x1c0 [nvme_core]
> [  +0.000163]  ? _copy_to_user+0x22/0x30
> [  +0.000113]  blkdev_ioctl+0x100/0x250
> [  +0.000115]  block_ioctl+0x34/0x40
> [  +0.000110]  ksys_ioctl+0x82/0xc0
> [  +0.000109]  __x64_sys_ioctl+0x11/0x20
> [  +0.000109]  do_syscall_64+0x3e/0x70
> [  +0.000120]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  +0.000112] RIP: 0033:0x7fbe9cdbb67b
> [  +0.000110] Code: Bad RIP value.
> [  +0.000124] RSP: 002b:00007ffd61ff5778 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  +0.000170] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
> 00007fbe9cdbb67b
> [  +0.000114] RDX: 00007ffd61ff5780 RSI: 00000000c0484e43 RDI:
> 0000000000000003
> [  +0.000113] RBP: 0000000000000000 R08: 0000000000000001 R09:
> 0000000000000000
> [  +0.000115] R10: 0000000000000000 R11: 0000000000000246 R12:
> 00007ffd61ff7219
> [  +0.000123] R13: 0000000000000006 R14: 00007ffd61ff5e30 R15:
> 000055e09c1854a0
> [  +0.000115] Kernel panic - not syncing: hung_task: blocked tasks

For some reason the ioctl is not woken up when unfreezing the queue...

> You could easily reproduce this by running below, parallelly, for 10min:
>   while [ 1 ]; do  nvme write-zeroes /dev/nvme0n1 -s 1 -c 1; done
>   while [ 1 ]; do echo 1 > /sys/block/nvme0n1/device/reset_controller;
> done
>   while [ 1 ]; do ifconfig enp2s0f4 down; sleep 24; ifconfig enp2s0f4 up;
> sleep 28; done
>   
>   Not sure using nvme-write this way is valid or not..

sure it is, its I/O just like fs I/O.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-07-28 18:35 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-27 18:19 Hang at NVME Host caused by Controller reset Krishnamraju Eraparaju
2020-07-27 18:19 ` Krishnamraju Eraparaju
2020-07-27 18:47 ` Sagi Grimberg
2020-07-27 18:47   ` Sagi Grimberg
2020-07-28 11:59   ` Krishnamraju Eraparaju
2020-07-28 11:59     ` Krishnamraju Eraparaju
2020-07-28 15:54     ` Sagi Grimberg
2020-07-28 15:54       ` Sagi Grimberg
2020-07-28 17:42       ` Krishnamraju Eraparaju
2020-07-28 17:42         ` Krishnamraju Eraparaju
2020-07-28 18:35         ` Sagi Grimberg [this message]
2020-07-28 18:35           ` Sagi Grimberg
2020-07-28 20:20           ` Sagi Grimberg
2020-07-28 20:20             ` Sagi Grimberg
2020-07-29  8:57             ` Krishnamraju Eraparaju
2020-07-29  8:57               ` Krishnamraju Eraparaju
2020-07-29  9:28               ` Sagi Grimberg
     [not found]                 ` <20200730162056.GA17468@chelsio.com>
2020-07-30 20:59                   ` Sagi Grimberg
2020-07-30 21:32                     ` Krishnamraju Eraparaju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3963dc58-1d64-b6e1-ea27-06f3030d5c6e@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=bharat@chelsio.com \
    --cc=krishna2@chelsio.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.