All of lore.kernel.org
 help / color / mirror / Atom feed
From: "jianchao.wang" <jianchao.w.wang@oracle.com>
To: Ming Lei <ming.lei@redhat.com>, Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>, Xiao Liang <xiliang@redhat.com>,
	linux-nvme@lists.infradead.org,
	Keith Busch <keith.busch@intel.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH] nvme: don't retry request marked as NVME_REQ_CANCELLED
Date: Thu, 25 Jan 2018 16:52:31 +0800	[thread overview]
Message-ID: <db07601a-e298-3194-9b02-e7077d57e0a3@oracle.com> (raw)
In-Reply-To: <20180125081023.13303-1-ming.lei@redhat.com>

Hi Ming

Thanks for your patch and detailed comment.

On 01/25/2018 04:10 PM, Ming Lei wrote:
> If request is marked as NVME_REQ_CANCELLED, we don't need to retry for
> requeuing it, and it should be completed immediately. Even simply from
> the flag name, it needn't to be requeued.
> 
> Otherwise, it is easy to cause IO hang when IO is timed out in case of
> PCI NVMe:
> 
> 1) IO timeout is triggered, and nvme_timeout() tries to disable
> device(nvme_dev_disable) and reset controller(nvme_reset_ctrl)
> 
> 2) inside nvme_dev_disable(), queue is frozen and quiesced, and
> try to cancel every request, but the timeout request can't be canceled
> since it is completed by __blk_mq_complete_request() in blk_mq_rq_timed_out().
> 
> 3) this timeout req is requeued via nvme_complete_rq(), but can't be
> dispatched at all because queue is quiesced and hardware isn't ready,
> finally nvme_wait_freeze() waits for ever in nvme_reset_work().

The queue will be unquiesced by nvme_start_queues() before start to wait freeze.

nvme_reset_work
....
		nvme_start_queues(&dev->ctrl);
		nvme_wait_freeze(&dev->ctrl);
		/* hit this only when allocate tagset fails */
		if (nvme_dev_add(dev))
			new_state = NVME_CTRL_ADMIN_ONLY;
		nvme_unfreeze(&dev->ctrl);
....
And the relationship between nvme_timeout and nvme_dev_disable is really complicated.
The nvme_timeout may run with nvme_dev_disable in parallel and also could invoke it.
nvme_dev_disbale may sleep holding shutdown_lock, nvme_timeout will try to get it when invokes nvme_dev_disable....

On the other hand, can we provide a helper interface to quiesce the request_work timeout_work ?
Such as following:

diff --git a/block/blk-core.c b/block/blk-core.c
index b888175..60469cd 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -616,6 +616,44 @@ void blk_queue_bypass_end(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_queue_bypass_end);
 
+/*
+ * When returns, the timeout_work will be invoked any more.
+ * The caller must take the charge of the outstanding IOs.
+ */
+void blk_quiesce_queue_timeout(struct request_queue *q)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	queue_flag_set(QUEUE_FLAG_TIMEOUT_QUIESCED, q);
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	synchronize_rcu_bh();
+	cancel_work_sync(&q->timeout_work);
+	/*
+	 * The timeout work will not be invoked anymore.
+	 */
+	del_timer_sync(&q->timeout);
+	/*
+	 * If the caller doesn't want the timer to be re-armed,
+	 * it has to stop/quiesce the queue.
+	 */
+}
+EXPORT_SYMBOL_GPL(blk_quiesce_queue_timeout);
+
+/*
+ * Caller must ensure all the outstanding IOs has been handled.
+ */
+void blk_unquiesce_queue_timeout(struct request_queue *q)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	queue_flag_clear(QUEUE_FLAG_TIMEOUT_QUIESCED, q);
+	spin_unlock_irqrestore(q->queue_lock, flags);
+}
+
+EXPORT_SYMBOL_GPL(blk_unquiesce_queue_timeout);
 void blk_set_queue_dying(struct request_queue *q)
 {
 	spin_lock_irq(q->queue_lock);
@@ -867,6 +905,9 @@ static void blk_rq_timed_out_timer(struct timer_list *t)
 {
 	struct request_queue *q = from_timer(q, t, timeout);
 
+	if (blk_queue_timeout_quiesced(q))
+		return;
+
 	kblockd_schedule_work(&q->timeout_work);
 }

WARNING: multiple messages have this Message-ID (diff)
From: jianchao.w.wang@oracle.com (jianchao.wang)
Subject: [PATCH] nvme: don't retry request marked as NVME_REQ_CANCELLED
Date: Thu, 25 Jan 2018 16:52:31 +0800	[thread overview]
Message-ID: <db07601a-e298-3194-9b02-e7077d57e0a3@oracle.com> (raw)
In-Reply-To: <20180125081023.13303-1-ming.lei@redhat.com>

Hi Ming

Thanks for your patch and detailed comment.

On 01/25/2018 04:10 PM, Ming Lei wrote:
> If request is marked as NVME_REQ_CANCELLED, we don't need to retry for
> requeuing it, and it should be completed immediately. Even simply from
> the flag name, it needn't to be requeued.
> 
> Otherwise, it is easy to cause IO hang when IO is timed out in case of
> PCI NVMe:
> 
> 1) IO timeout is triggered, and nvme_timeout() tries to disable
> device(nvme_dev_disable) and reset controller(nvme_reset_ctrl)
> 
> 2) inside nvme_dev_disable(), queue is frozen and quiesced, and
> try to cancel every request, but the timeout request can't be canceled
> since it is completed by __blk_mq_complete_request() in blk_mq_rq_timed_out().
> 
> 3) this timeout req is requeued via nvme_complete_rq(), but can't be
> dispatched at all because queue is quiesced and hardware isn't ready,
> finally nvme_wait_freeze() waits for ever in nvme_reset_work().

The queue will be unquiesced by nvme_start_queues() before start to wait freeze.

nvme_reset_work
....
		nvme_start_queues(&dev->ctrl);
		nvme_wait_freeze(&dev->ctrl);
		/* hit this only when allocate tagset fails */
		if (nvme_dev_add(dev))
			new_state = NVME_CTRL_ADMIN_ONLY;
		nvme_unfreeze(&dev->ctrl);
....
And the relationship between nvme_timeout and nvme_dev_disable is really complicated.
The nvme_timeout may run with nvme_dev_disable in parallel and also could invoke it.
nvme_dev_disbale may sleep holding shutdown_lock, nvme_timeout will try to get it when invokes nvme_dev_disable....

On the other hand, can we provide a helper interface to quiesce the request_work timeout_work ?
Such as following:

diff --git a/block/blk-core.c b/block/blk-core.c
index b888175..60469cd 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -616,6 +616,44 @@ void blk_queue_bypass_end(struct request_queue *q)
 }
 EXPORT_SYMBOL_GPL(blk_queue_bypass_end);
 
+/*
+ * When returns, the timeout_work will be invoked any more.
+ * The caller must take the charge of the outstanding IOs.
+ */
+void blk_quiesce_queue_timeout(struct request_queue *q)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	queue_flag_set(QUEUE_FLAG_TIMEOUT_QUIESCED, q);
+	spin_unlock_irqrestore(q->queue_lock, flags);
+
+	synchronize_rcu_bh();
+	cancel_work_sync(&q->timeout_work);
+	/*
+	 * The timeout work will not be invoked anymore.
+	 */
+	del_timer_sync(&q->timeout);
+	/*
+	 * If the caller doesn't want the timer to be re-armed,
+	 * it has to stop/quiesce the queue.
+	 */
+}
+EXPORT_SYMBOL_GPL(blk_quiesce_queue_timeout);
+
+/*
+ * Caller must ensure all the outstanding IOs has been handled.
+ */
+void blk_unquiesce_queue_timeout(struct request_queue *q)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	queue_flag_clear(QUEUE_FLAG_TIMEOUT_QUIESCED, q);
+	spin_unlock_irqrestore(q->queue_lock, flags);
+}
+
+EXPORT_SYMBOL_GPL(blk_unquiesce_queue_timeout);
 void blk_set_queue_dying(struct request_queue *q)
 {
 	spin_lock_irq(q->queue_lock);
@@ -867,6 +905,9 @@ static void blk_rq_timed_out_timer(struct timer_list *t)
 {
 	struct request_queue *q = from_timer(q, t, timeout);
 
+	if (blk_queue_timeout_quiesced(q))
+		return;
+
 	kblockd_schedule_work(&q->timeout_work);
 }

  reply	other threads:[~2018-01-25  8:53 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-25  8:10 [PATCH] nvme: don't retry request marked as NVME_REQ_CANCELLED Ming Lei
2018-01-25  8:10 ` Ming Lei
2018-01-25  8:52 ` jianchao.wang [this message]
2018-01-25  8:52   ` jianchao.wang
2018-01-25 10:15   ` Ming Lei
2018-01-25 10:15     ` Ming Lei
2018-01-27 12:33     ` jianchao.wang
2018-01-27 12:33       ` jianchao.wang
2018-01-27 13:31       ` Ming Lei
2018-01-27 13:31         ` Ming Lei
2018-01-27 14:29         ` jianchao.wang
2018-01-27 14:29           ` jianchao.wang
2018-01-27 15:44           ` Ming Lei
2018-01-27 15:44             ` Ming Lei
2018-01-28  9:01             ` jianchao.wang
2018-01-28  9:01               ` jianchao.wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=db07601a-e298-3194-9b02-e7077d57e0a3@oracle.com \
    --to=jianchao.w.wang@oracle.com \
    --cc=hch@lst.de \
    --cc=keith.busch@intel.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=ming.lei@redhat.com \
    --cc=sagi@grimberg.me \
    --cc=stable@vger.kernel.org \
    --cc=xiliang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.