All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Block layer patches for kernel v4.4
@ 2015-09-23 22:11 Bart Van Assche
  2015-09-23 22:12 ` [PATCH 1/3] blk-cgroup: Declare local symbols static Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Bart Van Assche @ 2015-09-23 22:11 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, Tejun Heo, linux-kernel

Hello Jens,

This is a short patch series with three patches I came up with while 
retesting the v4.3-rc2 SRP initiator. The individual patches are as follows:

0001-blk-cgroup-Declare-local-symbols-static.patch
0002-bsg-Add-sparse-annotations-to-bsg_request_fn.patch
0003-blk-mq-Fix-the-queue-freezing-mechanism.patch

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] blk-cgroup: Declare local symbols static
  2015-09-23 22:11 [PATCH 0/3] Block layer patches for kernel v4.4 Bart Van Assche
@ 2015-09-23 22:12 ` Bart Van Assche
  2015-09-25 15:35   ` Tejun Heo
  2015-09-23 22:12 ` [PATCH 2/3] bsg: Add sparse annotations to bsg_request_fn() Bart Van Assche
  2015-09-23 22:14 ` [PATCH 3/3] blk-mq: Fix the queue freezing mechanism Bart Van Assche
  2 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2015-09-23 22:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, Tejun Heo, linux-kernel

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 55512dd..46966a3 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -896,7 +896,7 @@ static int blkcg_print_stat(struct seq_file *sf, void *v)
 	return 0;
 }
 
-struct cftype blkcg_files[] = {
+static struct cftype blkcg_files[] = {
 	{
 		.name = "stat",
 		.seq_show = blkcg_print_stat,
@@ -904,7 +904,7 @@ struct cftype blkcg_files[] = {
 	{ }	/* terminate */
 };
 
-struct cftype blkcg_legacy_files[] = {
+static struct cftype blkcg_legacy_files[] = {
 	{
 		.name = "reset_stats",
 		.write_u64 = blkcg_reset_stats,
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] bsg: Add sparse annotations to bsg_request_fn()
  2015-09-23 22:11 [PATCH 0/3] Block layer patches for kernel v4.4 Bart Van Assche
  2015-09-23 22:12 ` [PATCH 1/3] blk-cgroup: Declare local symbols static Bart Van Assche
@ 2015-09-23 22:12 ` Bart Van Assche
  2015-09-23 22:14 ` [PATCH 3/3] blk-mq: Fix the queue freezing mechanism Bart Van Assche
  2 siblings, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2015-09-23 22:12 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, Tejun Heo, linux-kernel

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
 block/bsg-lib.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index 650f427..b2a61e3 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -161,6 +161,8 @@ failjob_rls_job:
  * Drivers/subsys should pass this to the queue init function.
  */
 void bsg_request_fn(struct request_queue *q)
+	__releases(q->queue_lock)
+	__acquires(q->queue_lock)
 {
 	struct device *dev = q->queuedata;
 	struct request *req;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] blk-mq: Fix the queue freezing mechanism
  2015-09-23 22:11 [PATCH 0/3] Block layer patches for kernel v4.4 Bart Van Assche
  2015-09-23 22:12 ` [PATCH 1/3] blk-cgroup: Declare local symbols static Bart Van Assche
  2015-09-23 22:12 ` [PATCH 2/3] bsg: Add sparse annotations to bsg_request_fn() Bart Van Assche
@ 2015-09-23 22:14 ` Bart Van Assche
  2015-09-24  3:22   ` Ming Lei
  2 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2015-09-23 22:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, Tejun Heo, linux-kernel

Ensure that blk_mq_queue_enter() waits if mq_freeze_depth is not
zero. Ensure that the update of mq_freeze_depth by blk_mq_freeze_queue()
is visible by all CPU cores before that function waits on
mq_usage_counter.

It is unfortunate that this patch introduces an smp_mb() in the
hot path (blk_mq_queue_enter()) but I have not yet found a way to
avoid this.

I came across this code while analyzing a lockup triggered by
deleting a SCSI host created by the SRP initiator immediately
followed by a relogin.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
---
 block/blk-mq.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2077f0d..e3ad411 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -83,8 +83,13 @@ static int blk_mq_queue_enter(struct request_queue *q, gfp_t gfp)
 	while (true) {
 		int ret;
 
-		if (percpu_ref_tryget_live(&q->mq_usage_counter))
-			return 0;
+		if (percpu_ref_tryget_live(&q->mq_usage_counter)) {
+			/* Order mq_use_counter and mq_freeze_depth accesses */
+			smp_mb();
+			if (!atomic_read(&q->mq_freeze_depth))
+				return 0;
+			percpu_ref_put(&q->mq_usage_counter);
+		}
 
 		if (!(gfp & __GFP_WAIT))
 			return -EBUSY;
@@ -136,6 +141,11 @@ static void blk_mq_freeze_queue_wait(struct request_queue *q)
 void blk_mq_freeze_queue(struct request_queue *q)
 {
 	blk_mq_freeze_queue_start(q);
+	/*
+	 * Ensure that the mq_freeze_depth update is visiable before
+	 * mq_use_counter is read.
+	 */
+	smp_mb();
 	blk_mq_freeze_queue_wait(q);
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] blk-mq: Fix the queue freezing mechanism
  2015-09-23 22:14 ` [PATCH 3/3] blk-mq: Fix the queue freezing mechanism Bart Van Assche
@ 2015-09-24  3:22   ` Ming Lei
  2015-09-24 16:43     ` Bart Van Assche
  2015-09-24 19:04     ` Bart Van Assche
  0 siblings, 2 replies; 15+ messages in thread
From: Ming Lei @ 2015-09-24  3:22 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, Tejun Heo, linux-kernel,
	tom.leiming, Akinobu Mita

On Wed, 23 Sep 2015 15:14:10 -0700
Bart Van Assche <bart.vanassche@sandisk.com> wrote:

> Ensure that blk_mq_queue_enter() waits if mq_freeze_depth is not
> zero. Ensure that the update of mq_freeze_depth by blk_mq_freeze_queue()
> is visible by all CPU cores before that function waits on
> mq_usage_counter.
> 
> It is unfortunate that this patch introduces an smp_mb() in the
> hot path (blk_mq_queue_enter()) but I have not yet found a way to
> avoid this.
> 
> I came across this code while analyzing a lockup triggered by
> deleting a SCSI host created by the SRP initiator immediately
> followed by a relogin.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: <stable@vger.kernel.org>
> ---
>  block/blk-mq.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 2077f0d..e3ad411 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -83,8 +83,13 @@ static int blk_mq_queue_enter(struct request_queue *q, gfp_t gfp)
>  	while (true) {
>  		int ret;
>  
> -		if (percpu_ref_tryget_live(&q->mq_usage_counter))
> -			return 0;
> +		if (percpu_ref_tryget_live(&q->mq_usage_counter)) {
> +			/* Order mq_use_counter and mq_freeze_depth accesses */
> +			smp_mb();
> +			if (!atomic_read(&q->mq_freeze_depth))
> +				return 0;
> +			percpu_ref_put(&q->mq_usage_counter);
> +		}

IMO, mq_freeze_depth should only be accessed in slow path, and looks
the race just happens during the small window between increasing
'mq_freeze_depth' and killing the percpu counter.

One solution I thought of is the following patch, which depends on
Akinobu's patch (blk-mq: fix freeze queue race
http://marc.info/?l=linux-kernel&m=143723697010781&w=2).

---
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f774f67..1c71c04 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -77,6 +77,17 @@ static void blk_mq_hctx_clear_pending(struct blk_mq_hw_ctx *hctx,
 	clear_bit(CTX_TO_BIT(hctx, ctx), &bm->word);
 }
 
+static inline int blk_mq_read_freeze_depth(struct request_queue *q)
+{
+	int  depth;
+
+	mutex_lock(&q->mq_freeze_lock);
+	depth = q->mq_freeze_depth;
+	mutex_unlock(&q->mq_freeze_lock);
+
+	return depth;
+}
+
 static int blk_mq_queue_enter(struct request_queue *q, gfp_t gfp)
 {
 	while (true) {
@@ -89,7 +100,7 @@ static int blk_mq_queue_enter(struct request_queue *q, gfp_t gfp)
 			return -EBUSY;
 
 		ret = wait_event_interruptible(q->mq_freeze_wq,
-				!atomic_read(&q->mq_freeze_depth) ||
+				!blk_mq_read_freeze_depth(q) ||
 				blk_queue_dying(q));
 		if (blk_queue_dying(q))
 			return -ENODEV;
@@ -113,12 +124,9 @@ static void blk_mq_usage_counter_release(struct percpu_ref *ref)
 
 void blk_mq_freeze_queue_start(struct request_queue *q)
 {
-	int freeze_depth;
-
 	mutex_lock(&q->mq_freeze_lock);
 
-	freeze_depth = atomic_inc_return(&q->mq_freeze_depth);
-	if (freeze_depth == 1) {
+	if (!q->mq_freeze_depth++) {
 		percpu_ref_kill(&q->mq_usage_counter);
 		blk_mq_run_hw_queues(q, false);
 	}
@@ -149,7 +157,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
 
 	mutex_lock(&q->mq_freeze_lock);
 
-	freeze_depth = atomic_dec_return(&q->mq_freeze_depth);
+	freeze_depth = --q->mq_freeze_depth;
 	WARN_ON_ONCE(freeze_depth < 0);
 	if (!freeze_depth) {
 		percpu_ref_reinit(&q->mq_usage_counter);
@@ -2084,7 +2092,7 @@ void blk_mq_free_queue(struct request_queue *q)
 /* Basically redo blk_mq_init_queue with queue frozen */
 static void blk_mq_queue_reinit(struct request_queue *q)
 {
-	WARN_ON_ONCE(!atomic_read(&q->mq_freeze_depth));
+	WARN_ON_ONCE(!ACCESS_ONCE(q->mq_freeze_depth));
 
 	blk_mq_sysfs_unregister(q);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6cdf2b7..86fedcc 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -436,7 +436,7 @@ struct request_queue {
 	struct mutex		sysfs_lock;
 
 	int			bypass_depth;
-	atomic_t		mq_freeze_depth;
+	int			mq_freeze_depth;
 
 #if defined(CONFIG_BLK_DEV_BSG)
 	bsg_job_fn		*bsg_job_fn;



>  
>  		if (!(gfp & __GFP_WAIT))
>  			return -EBUSY;
> @@ -136,6 +141,11 @@ static void blk_mq_freeze_queue_wait(struct request_queue *q)
>  void blk_mq_freeze_queue(struct request_queue *q)
>  {
>  	blk_mq_freeze_queue_start(q);
> +	/*
> +	 * Ensure that the mq_freeze_depth update is visiable before
> +	 * mq_use_counter is read.
> +	 */
> +	smp_mb();
>  	blk_mq_freeze_queue_wait(q);
>  }
>  EXPORT_SYMBOL_GPL(blk_mq_freeze_queue);


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] blk-mq: Fix the queue freezing mechanism
  2015-09-24  3:22   ` Ming Lei
@ 2015-09-24 16:43     ` Bart Van Assche
  2015-09-24 16:53       ` Tejun Heo
  2015-09-24 19:04     ` Bart Van Assche
  1 sibling, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2015-09-24 16:43 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, Tejun Heo, linux-kernel, Akinobu Mita

On 09/23/2015 08:23 PM, Ming Lei wrote:
> IMO, mq_freeze_depth should only be accessed in slow path, and looks
> the race just happens during the small window between increasing
> 'mq_freeze_depth' and killing the percpu counter.

Hello Ming,

My concern is that *not* checking mq_freeze_depth in the hot path can 
cause a livelock. If there is a software layer, e.g. multipathd, that 
periodically submits new commands and if these commands take time to 
process e.g. because the transport layer is unavailable, how to 
guarantee that freezing ever succeeds without checking mq_freeze_depth 
in the hot path ?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] blk-mq: Fix the queue freezing mechanism
  2015-09-24 16:43     ` Bart Van Assche
@ 2015-09-24 16:53       ` Tejun Heo
  2015-09-24 17:35         ` Bart Van Assche
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2015-09-24 16:53 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Ming Lei, Jens Axboe, Christoph Hellwig, linux-kernel, Akinobu Mita

On Thu, Sep 24, 2015 at 09:43:48AM -0700, Bart Van Assche wrote:
> On 09/23/2015 08:23 PM, Ming Lei wrote:
> >IMO, mq_freeze_depth should only be accessed in slow path, and looks
> >the race just happens during the small window between increasing
> >'mq_freeze_depth' and killing the percpu counter.
> 
> Hello Ming,
> 
> My concern is that *not* checking mq_freeze_depth in the hot path can cause
> a livelock. If there is a software layer, e.g. multipathd, that periodically
> submits new commands and if these commands take time to process e.g. because
> the transport layer is unavailable, how to guarantee that freezing ever
> succeeds without checking mq_freeze_depth in the hot path ?

I couldn't tell what the patch was trying to do from the patch
description, so including the above prolly is a good idea.  Isn't the
above guaranteed by percpu_ref_kill() preventing new tryget_live()'s?
Also, what does the barriers do in your patch?

The only race condition that I can see there is if unfreeze and freeze
race each other and freeze tries to kill the ref which hasn't finished
reinit yet.  We prolly want to put mutexes around freeze/unfreeze so
that they're serialized if something like that can happen (it isn't a
hot path to begin with).

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] blk-mq: Fix the queue freezing mechanism
  2015-09-24 16:53       ` Tejun Heo
@ 2015-09-24 17:35         ` Bart Van Assche
  2015-09-24 17:49           ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2015-09-24 17:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ming Lei, Jens Axboe, Christoph Hellwig, linux-kernel, Akinobu Mita

On 09/24/2015 09:54 AM, Tejun Heo wrote:
> On Thu, Sep 24, 2015 at 09:43:48AM -0700, Bart Van Assche wrote:
>> On 09/23/2015 08:23 PM, Ming Lei wrote:
>>> IMO, mq_freeze_depth should only be accessed in slow path, and looks
>>> the race just happens during the small window between increasing
>>> 'mq_freeze_depth' and killing the percpu counter.
>>
>> Hello Ming,
>>
>> My concern is that *not* checking mq_freeze_depth in the hot path can cause
>> a livelock. If there is a software layer, e.g. multipathd, that periodically
>> submits new commands and if these commands take time to process e.g. because
>> the transport layer is unavailable, how to guarantee that freezing ever
>> succeeds without checking mq_freeze_depth in the hot path ?
>
> I couldn't tell what the patch was trying to do from the patch
> description, so including the above prolly is a good idea.  Isn't the
> above guaranteed by percpu_ref_kill() preventing new tryget_live()'s?

My interpretation of the percpu_ref_tryget_live() implementation in 
<linux/percpu-refcount.h> is that the tryget operation will only fail if 
the refcount is in atomic mode and additionally the __PERCPU_REF_DEAD 
flag has been set.

> Also, what does the barriers do in your patch?

My intention was to guarantee that on architectures that do not provide 
the same ordering guarantees as x86 (e.g. PPC or ARM) that the store and 
load operations on mq_freeze_depth and mq_usage_counter would not be 
reordered. However, it is probably safe to leave out the barrier I 
proposed to introduce in blk_mq_queue_enter() since it is acceptable 
that there is some delay in communicating mq_freeze_depth updates from 
the CPU that modified that counter to the CPU that reads that counter.

> The only race condition that I can see there is if unfreeze and freeze
> race each other and freeze tries to kill the ref which hasn't finished
> reinit yet.  We prolly want to put mutexes around freeze/unfreeze so
> that they're serialized if something like that can happen (it isn't a
> hot path to begin with).

My concern is that the following could happen if mq_freeze_depth is not 
checked in the hot path of blk_mq_queue_enter():
* mq_usage_counter >= 1 before blk_mq_freeze_queue() is called.
* blk_mq_freeze_queue() keeps waiting forever if new requests are queued
   faster than that these requests complete.

Bart.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] blk-mq: Fix the queue freezing mechanism
  2015-09-24 17:35         ` Bart Van Assche
@ 2015-09-24 17:49           ` Tejun Heo
  2015-09-24 18:09             ` Bart Van Assche
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2015-09-24 17:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Ming Lei, Jens Axboe, Christoph Hellwig, linux-kernel, Akinobu Mita

Hello, Bart.

On Thu, Sep 24, 2015 at 10:35:41AM -0700, Bart Van Assche wrote:
> My interpretation of the percpu_ref_tryget_live() implementation in
> <linux/percpu-refcount.h> is that the tryget operation will only fail if the
> refcount is in atomic mode and additionally the __PERCPU_REF_DEAD flag has
> been set.

Yeah and percpu_ref_kill() does both.

> >Also, what does the barriers do in your patch?
> 
> My intention was to guarantee that on architectures that do not provide the
> same ordering guarantees as x86 (e.g. PPC or ARM) that the store and load
> operations on mq_freeze_depth and mq_usage_counter would not be reordered.
> However, it is probably safe to leave out the barrier I proposed to
> introduce in blk_mq_queue_enter() since it is acceptable that there is some
> delay in communicating mq_freeze_depth updates from the CPU that modified
> that counter to the CPU that reads that counter.

Hmmm... please don't use barriers this way.  Use it only when there's
a clear requirement for interlocking writer and reader pair.  There
isn't one here.  All it does is confusing people trying to read the
code.

> >The only race condition that I can see there is if unfreeze and freeze
> >race each other and freeze tries to kill the ref which hasn't finished
> >reinit yet.  We prolly want to put mutexes around freeze/unfreeze so
> >that they're serialized if something like that can happen (it isn't a
> >hot path to begin with).
> 
> My concern is that the following could happen if mq_freeze_depth is not
> checked in the hot path of blk_mq_queue_enter():
> * mq_usage_counter >= 1 before blk_mq_freeze_queue() is called.
> * blk_mq_freeze_queue() keeps waiting forever if new requests are queued
>   faster than that these requests complete.

Again, that doesn't happen.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] blk-mq: Fix the queue freezing mechanism
  2015-09-24 17:49           ` Tejun Heo
@ 2015-09-24 18:09             ` Bart Van Assche
  2015-09-24 18:14               ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2015-09-24 18:09 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ming Lei, Jens Axboe, Christoph Hellwig, linux-kernel, Akinobu Mita

On 09/24/2015 10:49 AM, Tejun Heo wrote:
> Again, that doesn't happen.

Hello Tejun,

In case anyone would be interested, the backtraces for the lockup I had
observed are as follows:

sysrq: SysRq : Show Blocked State
  task                        PC stack   pid father
kworker/4:0     D ffff88045c5d5a00     0    29      2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
 ffff88045c767c08 0000000000000086 ffffffff810ba11d ffff88047fd15ad8
 ffff88045c5d5a00 ffff88045c768000 ffff88045c768000 ffff88041c737ab8
 ffff880036f7bcf8 ffff8804158b55e8 0000000000000100 ffff88045c767c20
Call Trace:
 [<ffffffff810ba11d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff814effea>] schedule+0x3a/0x90
 [<ffffffff81271a36>] blk_mq_freeze_queue_wait+0x56/0xb0
 [<ffffffff810b4620>] ? prepare_to_wait_event+0xf0/0xf0
 [<ffffffff81273b64>] blk_mq_update_tag_set_depth+0x44/0xb0
 [<ffffffff81275620>] blk_mq_free_queue+0x50/0x110
 [<ffffffff81266468>] blk_cleanup_queue+0x148/0x240
 [<ffffffffa001aaf5>] __scsi_remove_device+0x65/0xd0 [scsi_mod]
 [<ffffffffa0019249>] scsi_forget_host+0x69/0x70 [scsi_mod]
 [<ffffffffa000d292>] scsi_remove_host+0x82/0x130 [scsi_mod]
 [<ffffffffa02a5c80>] srp_remove_work+0x90/0x1f0 [ib_srp]
 [<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
 [<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
 [<ffffffff8108df04>] worker_thread+0x114/0x460
 [<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
 [<ffffffff810941f8>] kthread+0xf8/0x110
 [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
 [<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
 [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
kworker/1:2     D ffff88045c648000     0   264      2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
 ffff880444d47c08 0000000000000086 ffffffff810ba11d ffff88047fc55ad8
 ffff88045c648000 ffff880450a9da00 ffff880444d48000 ffff88042826c8d8
 ffff880428969790 ffff880413b27b50 0000000000000040 ffff880444d47c20
Call Trace:
 [<ffffffff810ba11d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff814effea>] schedule+0x3a/0x90
 [<ffffffff81271a36>] blk_mq_freeze_queue_wait+0x56/0xb0
 [<ffffffff810b4620>] ? prepare_to_wait_event+0xf0/0xf0
 [<ffffffff81273b64>] blk_mq_update_tag_set_depth+0x44/0xb0
 [<ffffffff81275620>] blk_mq_free_queue+0x50/0x110
 [<ffffffff81266468>] blk_cleanup_queue+0x148/0x240
 [<ffffffffa001aaf5>] __scsi_remove_device+0x65/0xd0 [scsi_mod]
 [<ffffffffa0019249>] scsi_forget_host+0x69/0x70 [scsi_mod]
 [<ffffffffa000d292>] scsi_remove_host+0x82/0x130 [scsi_mod]
 [<ffffffffa02a5c80>] srp_remove_work+0x90/0x1f0 [ib_srp]
 [<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
 [<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
 [<ffffffff8108df04>] worker_thread+0x114/0x460
 [<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
 [<ffffffff810941f8>] kthread+0xf8/0x110
 [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
 [<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
 [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
kworker/7:0     D ffff88045c675a00     0 27179      2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
 ffff8803f7333c08 0000000000000086 ffffffff810ba11d ffff88047fdd5ad8
 ffff88045c675a00 ffff8803f7139680 ffff8803f7334000 ffff880403d76e40
 ffff88040c412408 ffff8803fe493cf8 00000000000001c0 ffff8803f7333c20
Call Trace:
 [<ffffffff810ba11d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff814effea>] schedule+0x3a/0x90
 [<ffffffff81271a36>] blk_mq_freeze_queue_wait+0x56/0xb0
 [<ffffffff810b4620>] ? prepare_to_wait_event+0xf0/0xf0
 [<ffffffff81273b64>] blk_mq_update_tag_set_depth+0x44/0xb0
 [<ffffffff81275620>] blk_mq_free_queue+0x50/0x110
 [<ffffffff81266468>] blk_cleanup_queue+0x148/0x240
 [<ffffffffa001aaf5>] __scsi_remove_device+0x65/0xd0 [scsi_mod]
 [<ffffffffa0019249>] scsi_forget_host+0x69/0x70 [scsi_mod]
 [<ffffffffa000d292>] scsi_remove_host+0x82/0x130 [scsi_mod]
 [<ffffffffa02a5c80>] srp_remove_work+0x90/0x1f0 [ib_srp]
 [<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
 [<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
 [<ffffffff8108df04>] worker_thread+0x114/0x460
 [<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
 [<ffffffff810941f8>] kthread+0xf8/0x110
 [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
 [<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
 [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
kworker/3:0     D ffff88045c649680     0 21529      2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
 ffff880392f2fc08 0000000000000086 ffffffff810ba11d ffff88047fcd5ad8
 ffff88045c649680 ffff88039a72c380 ffff880392f30000 ffff880420f2fab8
 ffff880425896ed8 ffff880416043cf8 00000000000000c0 ffff880392f2fc20
Call Trace:
 [<ffffffff810ba11d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff814effea>] schedule+0x3a/0x90
 [<ffffffff81271a36>] blk_mq_freeze_queue_wait+0x56/0xb0
 [<ffffffff810b4620>] ? prepare_to_wait_event+0xf0/0xf0
 [<ffffffff81273b64>] blk_mq_update_tag_set_depth+0x44/0xb0
 [<ffffffff81275620>] blk_mq_free_queue+0x50/0x110
 [<ffffffff81266468>] blk_cleanup_queue+0x148/0x240
 [<ffffffffa001aaf5>] __scsi_remove_device+0x65/0xd0 [scsi_mod]
 [<ffffffffa0019249>] scsi_forget_host+0x69/0x70 [scsi_mod]
 [<ffffffffa000d292>] scsi_remove_host+0x82/0x130 [scsi_mod]
 [<ffffffffa02a5c80>] srp_remove_work+0x90/0x1f0 [ib_srp]
 [<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
 [<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
 [<ffffffff8108df04>] worker_thread+0x114/0x460
 [<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
 [<ffffffff810941f8>] kthread+0xf8/0x110
 [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
 [<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
 [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
kworker/5:0     D ffff88045c64ad00     0 10950      2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
 ffff88037111bc08 0000000000000086 ffffffff810ba11d ffff88047fd55ad8
 ffff88045c64ad00 ffff880055991680 ffff88037111c000 ffff88041ff348d8
 ffff880424463080 ffff8804145155e8 0000000000000140 ffff88037111bc20
Call Trace:
 [<ffffffff810ba11d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff814effea>] schedule+0x3a/0x90
 [<ffffffff81271a36>] blk_mq_freeze_queue_wait+0x56/0xb0
 [<ffffffff810b4620>] ? prepare_to_wait_event+0xf0/0xf0
 [<ffffffff81273b64>] blk_mq_update_tag_set_depth+0x44/0xb0
 [<ffffffff81275620>] blk_mq_free_queue+0x50/0x110
 [<ffffffff81266468>] blk_cleanup_queue+0x148/0x240
 [<ffffffffa001aaf5>] __scsi_remove_device+0x65/0xd0 [scsi_mod]
 [<ffffffffa0019249>] scsi_forget_host+0x69/0x70 [scsi_mod]
 [<ffffffffa000d292>] scsi_remove_host+0x82/0x130 [scsi_mod]
 [<ffffffffa02a5c80>] srp_remove_work+0x90/0x1f0 [ib_srp]
 [<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
 [<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
 [<ffffffff8108df04>] worker_thread+0x114/0x460
 [<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
 [<ffffffff810941f8>] kthread+0xf8/0x110
 [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
 [<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
 [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
kworker/6:1     D ffff88045c670000     0 17446      2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
 ffff880052effc08 0000000000000086 ffffffff810ba11d ffff88047fd95ad8
 ffff88045c670000 ffff88035e401680 ffff880052f00000 ffff8803ecf8ee40
 ffff8803eed58b18 ffff880067a88b18 0000000000000180 ffff880052effc20
Call Trace:
 [<ffffffff810ba11d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff814effea>] schedule+0x3a/0x90
 [<ffffffff81271a36>] blk_mq_freeze_queue_wait+0x56/0xb0
 [<ffffffff810b4620>] ? prepare_to_wait_event+0xf0/0xf0
 [<ffffffff81273b64>] blk_mq_update_tag_set_depth+0x44/0xb0
 [<ffffffff81275620>] blk_mq_free_queue+0x50/0x110
 [<ffffffff81266468>] blk_cleanup_queue+0x148/0x240
 [<ffffffffa001aaf5>] __scsi_remove_device+0x65/0xd0 [scsi_mod]
 [<ffffffffa0019249>] scsi_forget_host+0x69/0x70 [scsi_mod]
 [<ffffffffa000d292>] scsi_remove_host+0x82/0x130 [scsi_mod]
 [<ffffffffa02a5c80>] srp_remove_work+0x90/0x1f0 [ib_srp]
 [<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
 [<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
 [<ffffffff8108df04>] worker_thread+0x114/0x460
 [<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
 [<ffffffff810941f8>] kthread+0xf8/0x110
 [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
 [<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
 [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
kworker/0:0     D ffff88035d205a00     0 23356      2 0x00000000
Workqueue: srp_remove srp_remove_work [ib_srp]
 ffff88035ec83c08 0000000000000086 ffffffff810ba11d ffff88047fc15ad8
 ffff88035d205a00 ffff880360a3ad00 ffff88035ec84000 ffff8803e75a61c8
 ffff8803eed59790 ffff8803e4c10b18 0000000000000000 ffff88035ec83c20
Call Trace:
 [<ffffffff810ba11d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff814effea>] schedule+0x3a/0x90
 [<ffffffff81271a36>] blk_mq_freeze_queue_wait+0x56/0xb0
 [<ffffffff810b4620>] ? prepare_to_wait_event+0xf0/0xf0
 [<ffffffff81273b64>] blk_mq_update_tag_set_depth+0x44/0xb0
 [<ffffffff81275620>] blk_mq_free_queue+0x50/0x110
 [<ffffffff81266468>] blk_cleanup_queue+0x148/0x240
 [<ffffffffa001aaf5>] __scsi_remove_device+0x65/0xd0 [scsi_mod]
 [<ffffffffa0019249>] scsi_forget_host+0x69/0x70 [scsi_mod]
 [<ffffffffa000d292>] scsi_remove_host+0x82/0x130 [scsi_mod]
 [<ffffffffa02a5c80>] srp_remove_work+0x90/0x1f0 [ib_srp]
 [<ffffffff8108d9b8>] process_one_work+0x1d8/0x610
 [<ffffffff8108d92b>] ? process_one_work+0x14b/0x610
 [<ffffffff8108df04>] worker_thread+0x114/0x460
 [<ffffffff8108ddf0>] ? process_one_work+0x610/0x610
 [<ffffffff810941f8>] kthread+0xf8/0x110
 [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
 [<ffffffff814f5eaf>] ret_from_fork+0x3f/0x70
 [<ffffffff81094100>] ? kthread_create_on_node+0x200/0x200
modprobe        D ffff88045c64ad00     0 24137  17294 0x00000004
 ffff8804146e7c48 0000000000000082 ffff880400000001 ffff88047fd55ad8
 ffff88045c64ad00 ffff88045c5d0000 ffff8804146e8000 ffff8804146e7df0
 ffff8804146e7de8 ffff88045c5d0000 ffff8804146e7dd0 ffff8804146e7c60
Call Trace:
 [<ffffffff814effea>] schedule+0x3a/0x90
 [<ffffffff814f4463>] schedule_timeout+0x1f3/0x290
 [<ffffffff810b9f16>] ? mark_held_locks+0x66/0x90
 [<ffffffff814f520c>] ? _raw_spin_unlock_irq+0x2c/0x40
 [<ffffffff810ba042>] ? trace_hardirqs_on_caller+0x102/0x1d0
 [<ffffffff814f1246>] wait_for_completion+0xd6/0x110
 [<ffffffff8109ebd0>] ? wake_up_q+0x70/0x70
 [<ffffffff8108a7cf>] flush_workqueue+0x1cf/0x6b0
 [<ffffffff8108a605>] ? flush_workqueue+0x5/0x6b0
 [<ffffffffa02a51a1>] srp_remove_one+0xc1/0x130 [ib_srp]
 [<ffffffffa02b77ae>] ib_unregister_client+0x11e/0x1a0 [ib_core]
 [<ffffffffa02aa471>] srp_cleanup_module+0x10/0xb9f [ib_srp]
 [<ffffffff810f106f>] SyS_delete_module+0x16f/0x1f0
 [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
 [<ffffffff814f5af6>] entry_SYSCALL_64_fastpath+0x16/0x7a


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] blk-mq: Fix the queue freezing mechanism
  2015-09-24 18:09             ` Bart Van Assche
@ 2015-09-24 18:14               ` Tejun Heo
  2015-09-24 22:54                 ` Bart Van Assche
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2015-09-24 18:14 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Ming Lei, Jens Axboe, Christoph Hellwig, linux-kernel, Akinobu Mita

Hello,

On Thu, Sep 24, 2015 at 11:09:33AM -0700, Bart Van Assche wrote:
> On 09/24/2015 10:49 AM, Tejun Heo wrote:
> > Again, that doesn't happen.
> 
> Hello Tejun,
> 
> In case anyone would be interested, the backtraces for the lockup I had
> observed are as follows:

If this is happening and it's not caused by a hung in-flight request,
it's either percpu_ref being buggy or the forementioned kill/reinit
race screwing it up.  percpu_ref_kill() is expected to disable
tryget_live() in a finite amount of time regardless of concurrent
tryget tries.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] blk-mq: Fix the queue freezing mechanism
  2015-09-24  3:22   ` Ming Lei
  2015-09-24 16:43     ` Bart Van Assche
@ 2015-09-24 19:04     ` Bart Van Assche
  1 sibling, 0 replies; 15+ messages in thread
From: Bart Van Assche @ 2015-09-24 19:04 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, Christoph Hellwig, Tejun Heo, linux-kernel, Akinobu Mita

On 09/23/2015 08:23 PM, Ming Lei wrote:
> One solution I thought of is the following patch, which depends on
> Akinobu's patch (blk-mq: fix freeze queue race
> http://marc.info/?l=linux-kernel&m=143723697010781&w=2).

Has that patch been tested against a debug kernel ? The following call
trace is triggered by that patch:

WARNING: CPU: 3 PID: 200 at kernel/sched/core.c:7485 __might_sleep+0x77/0x80()
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810b4593>] prepare_to_wait_event+0x63/0xf0
Modules linked in: ib_srp(O) scsi_transport_srp(O) netconsole configfs af_packet msr sg coretemp x86_pkg_temp_thermal crct10dif_pclmul crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ipmi_devintf ablk_helper cryptd tg3 lpc_ich microcode pcspkr libphy mfd_core ipmi_si ipmi_msghandler ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad dm_multipath dm_mod rdma_cm ib_cm acpi_power_meter hwmon iw_cm processor button wmi ext4 crc16 mbcache jbd2 mlx4_ib mlx4_en ib_sa ptp pps_core ib_mad ib_core ib_addr sd_mod sr_mod cdrom hid_generic usbhid hid ahci libahci libata ehci_pci ehci_hcd mlx4_core usbcore usb_common scsi_mod autofs4 [last unloaded: brd]
CPU: 3 PID: 200 Comm: kworker/u16:9 Tainted: G           O    4.3.0-rc2-debug+ #1
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
Workqueue: events_unbound async_run_entry_fn
Call Trace:
 [<ffffffff81288603>] dump_stack+0x4b/0x68
 [<ffffffff810725c6>] warn_slowpath_common+0x86/0xc0
 [<ffffffff8107264c>] warn_slowpath_fmt+0x4c/0x50
 [<ffffffff810b4593>] ? prepare_to_wait_event+0x63/0xf0
 [<ffffffff810b4593>] ? prepare_to_wait_event+0x63/0xf0
 [<ffffffff81099e47>] __might_sleep+0x77/0x80
 [<ffffffff814f20a3>] mutex_lock_nested+0x33/0x3b0
 [<ffffffff810ba11d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff810b45c5>] ? prepare_to_wait_event+0x95/0xf0
 [<ffffffff812725fa>] blk_mq_queue_enter+0x17a/0x2d0
 [<ffffffff812724b3>] ? blk_mq_queue_enter+0x33/0x2d0
 [<ffffffff810b4620>] ? prepare_to_wait_event+0xf0/0xf0
 [<ffffffff8127368e>] blk_mq_map_request+0x2e/0x330
 [<ffffffff810b82e9>] ? __lock_is_held+0x49/0x70
 [<ffffffff81274941>] blk_sq_make_request+0x91/0x430
[ ... ]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] blk-mq: Fix the queue freezing mechanism
  2015-09-24 18:14               ` Tejun Heo
@ 2015-09-24 22:54                 ` Bart Van Assche
  2015-09-24 22:56                   ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Bart Van Assche @ 2015-09-24 22:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Ming Lei, Jens Axboe, Christoph Hellwig, linux-kernel, Akinobu Mita

On 09/24/2015 11:14 AM, Tejun Heo wrote:
> On Thu, Sep 24, 2015 at 11:09:33AM -0700, Bart Van Assche wrote:
>> On 09/24/2015 10:49 AM, Tejun Heo wrote:
>>> Again, that doesn't happen.
>>
>> In case anyone would be interested, the backtraces for the lockup I had
>> observed are as follows:
>
> If this is happening and it's not caused by a hung in-flight request,
> it's either percpu_ref being buggy or the forementioned kill/reinit
> race screwing it up.  percpu_ref_kill() is expected to disable
> tryget_live() in a finite amount of time regardless of concurrent
> tryget tries.

Hello Tejun,

Sorry that I had not yet made this clear but I agreed with the analysis 
in your two most recent e-mails. I think I have found the cause of the 
loop: for one or another reason the scsi_dh_alua driver was not loaded 
automatically. I think that caused the SCSI core to return a retryable 
error code for reads and writes sent over paths in the SCSI ALUA state 
"standby" instead of a non-retryable error code and that that caused the 
dm-mpath driver to enter an infinite loop. Loading the scsi_dh_alua 
driver resolved the infinite loop. Anyway, thank you for the feedback.

Bart.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] blk-mq: Fix the queue freezing mechanism
  2015-09-24 22:54                 ` Bart Van Assche
@ 2015-09-24 22:56                   ` Tejun Heo
  0 siblings, 0 replies; 15+ messages in thread
From: Tejun Heo @ 2015-09-24 22:56 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Ming Lei, Jens Axboe, Christoph Hellwig, linux-kernel, Akinobu Mita

Hello, Bart.

On Thu, Sep 24, 2015 at 03:54:18PM -0700, Bart Van Assche wrote:
> Sorry that I had not yet made this clear but I agreed with the analysis in
> your two most recent e-mails. I think I have found the cause of the loop:
> for one or another reason the scsi_dh_alua driver was not loaded
> automatically. I think that caused the SCSI core to return a retryable error
> code for reads and writes sent over paths in the SCSI ALUA state "standby"
> instead of a non-retryable error code and that that caused the dm-mpath
> driver to enter an infinite loop. Loading the scsi_dh_alua driver resolved
> the infinite loop. Anyway, thank you for the feedback.

Great!  I think we still probably wanna fix the kill/reinit race tho.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] blk-cgroup: Declare local symbols static
  2015-09-23 22:12 ` [PATCH 1/3] blk-cgroup: Declare local symbols static Bart Van Assche
@ 2015-09-25 15:35   ` Tejun Heo
  0 siblings, 0 replies; 15+ messages in thread
From: Tejun Heo @ 2015-09-25 15:35 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Jens Axboe, Christoph Hellwig, linux-kernel

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-09-25 15:35 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-23 22:11 [PATCH 0/3] Block layer patches for kernel v4.4 Bart Van Assche
2015-09-23 22:12 ` [PATCH 1/3] blk-cgroup: Declare local symbols static Bart Van Assche
2015-09-25 15:35   ` Tejun Heo
2015-09-23 22:12 ` [PATCH 2/3] bsg: Add sparse annotations to bsg_request_fn() Bart Van Assche
2015-09-23 22:14 ` [PATCH 3/3] blk-mq: Fix the queue freezing mechanism Bart Van Assche
2015-09-24  3:22   ` Ming Lei
2015-09-24 16:43     ` Bart Van Assche
2015-09-24 16:53       ` Tejun Heo
2015-09-24 17:35         ` Bart Van Assche
2015-09-24 17:49           ` Tejun Heo
2015-09-24 18:09             ` Bart Van Assche
2015-09-24 18:14               ` Tejun Heo
2015-09-24 22:54                 ` Bart Van Assche
2015-09-24 22:56                   ` Tejun Heo
2015-09-24 19:04     ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.