Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: Krishnamraju Eraparaju <krishna2@chelsio.com>,
	linux-nvme@lists.infradead.org
Cc: linux-rdma@vger.kernel.org
Subject: Re: Hang at NVME Host caused by Controller reset
Date: Mon, 27 Jul 2020 11:47:03 -0700
Message-ID: <9b8dae53-1fcc-3c03-5fcd-cfb55cd8cc80@grimberg.me> (raw)
In-Reply-To: <20200727181944.GA5484@chelsio.com>


> kernel hang observed on NVME Host(TCP) while running iozone with link
> toggle:

Hi Krishnamraju,

Good timing, please see my patch:
[PATCH 1/2] nvme-tcp: fix controller reset hang during traffic

It will be great to get your tested-by.

> 
>      
> [ +42.773018] INFO: task kworker/u24:5:1243 blocked for more than 122
> seconds.
> [  +0.000124]       Not tainted 5.8.0-rc4ekr+ #19
> [  +0.000105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  +0.000164] kworker/u24:5   D12600  1243      2 0x00004000
> [  +0.000114] Workqueue: nvme-reset-wq nvme_reset_ctrl_work [nvme_tcp]
> [  +0.000109] Call Trace:
> [  +0.000105]  __schedule+0x270/0x5d0
> [  +0.000105]  schedule+0x45/0xb0
> [  +0.000125]  blk_mq_freeze_queue_wait+0x41/0xa0
> [  +0.000122]  ? wait_woken+0x80/0x80
> [  +0.000116]  blk_mq_update_nr_hw_queues+0x8a/0x380
> [  +0.000109]  nvme_tcp_setup_ctrl+0x345/0x510 [nvme_tcp]
> [  +0.000108]  nvme_reset_ctrl_work+0x45/0x60 [nvme_tcp]
> [  +0.000135]  process_one_work+0x149/0x380
> [  +0.000107]  worker_thread+0x1ae/0x3a0
> [  +0.000107]  ? process_one_work+0x380/0x380
> [  +0.000108]  kthread+0xf7/0x130
> [  +0.000135]  ? kthread_bind+0x10/0x10
> [  +0.000121]  ret_from_fork+0x22/0x30
> [  +0.000134] INFO: task bash:6000 blocked for more than 122 seconds.
> [  +0.000122]       Not tainted 5.8.0-rc4ekr+ #19
> [  +0.000109] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  +0.000195] bash            D14232  6000   5967 0x00000080
> [  +0.000115] Call Trace:
> [  +0.000106]  __schedule+0x270/0x5d0
> [  +0.000138]  ? terminate_walk+0x8a/0x90
> [  +0.000123]  schedule+0x45/0xb0
> [  +0.000108]  schedule_timeout+0x1d6/0x290
> [  +0.000121]  wait_for_completion+0x82/0xe0
> [  +0.000120]  __flush_work.isra.37+0x10c/0x180
> [  +0.000115]  ? flush_workqueue_prep_pwqs+0x110/0x110
> [  +0.000119]  nvme_reset_ctrl_sync+0x1c/0x30 [nvme_core]
> [  +0.000110]  nvme_sysfs_reset+0xd/0x20 [nvme_core]
> [  +0.000137]  kernfs_fop_write+0x10a/0x1a0
> [  +0.000124]  vfs_write+0xa8/0x1a0
> [  +0.000122]  ksys_write+0x50/0xc0
> [  +0.000117]  do_syscall_64+0x3e/0x70
> [  +0.000108]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  +0.000111] RIP: 0033:0x7f4ed689dc60
> [  +0.000107] Code: Bad RIP value.
> [  +0.000105] RSP: 002b:00007ffe636b6fe8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000001
> [  +0.000188] RAX: ffffffffffffffda RBX: 0000000000000002 RCX:
> 00007f4ed689dc60
> ----
> 
> This issue got uncovered after commit fe35ec58f0d3, which does
> freeze-queue operation if set->nr_maps is greater than '1'(all nvmef
> trasnports sets nr_maps to '2' by default).
>    
> Issue will not occur with multipath enabled.
> Issue observed with RDMA transports also.
> 
> Steps to reproduce:
> nvme connect -t tcp -a 102.1.1.6 -s 4420 -n nvme-ram0 -i 1
> 
> Run below each while loop in different terminals parallelly, to reprodue
> instantaneously.
> while [ 1 ]; do echo 1 > /sys/block/nvme0n1/device/reset_controller;
> done
> while [ 1 ]; do  nvme write-zeroes /dev/nvme0n1 -s 1 -c 1; done
> 
> 
> 
> My understanding is: while performing reset-controller, nvme-write task
> tries to submit IO/blk_queue_enter, but fails at blk_mq_run_hw_queue()
> after seeing blk_queue_quiesced.
> And never succeeded to blk_queue_exit, may be due to out-of-sync percpu
> counter operations(q_usage_counter), causing this hang at freeze_queue.
> 
> Thanks,
> Krishna.
> 

  reply index

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-27 18:19 Krishnamraju Eraparaju
2020-07-27 18:47 ` Sagi Grimberg [this message]
2020-07-28 11:59   ` Krishnamraju Eraparaju
2020-07-28 15:54     ` Sagi Grimberg
2020-07-28 17:42       ` Krishnamraju Eraparaju
2020-07-28 18:35         ` Sagi Grimberg
2020-07-28 20:20           ` Sagi Grimberg
2020-07-29  8:57             ` Krishnamraju Eraparaju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9b8dae53-1fcc-3c03-5fcd-cfb55cd8cc80@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=krishna2@chelsio.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git