All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH BUGFIX V3] block, bfq: add requeue-request hook
@ 2018-02-07 21:19 Paolo Valente
  2018-02-07 22:18 ` Jens Axboe
  2018-02-23 15:07 ` Ming Lei
  0 siblings, 2 replies; 54+ messages in thread
From: Paolo Valente @ 2018-02-07 21:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, linux-kernel, ulf.hansson, broonie, linus.walleij,
	bfq-iosched, oleksandr, alban.browaeys, ming.lei, ivan, 169364,
	holger, efault, Paolo Valente, Serena Ziviani

Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
be re-inserted into the active I/O scheduler for that device. As a
consequence, I/O schedulers may get the same request inserted again,
even several times, without a finish_request invoked on that request
before each re-insertion.

This fact is the cause of the failure reported in [1]. For an I/O
scheduler, every re-insertion of the same re-prepared request is
equivalent to the insertion of a new request. For schedulers like
mq-deadline or kyber, this fact causes no harm. In contrast, it
confuses a stateful scheduler like BFQ, which keeps state for an I/O
request, until the finish_request hook is invoked on the request. In
particular, BFQ may get stuck, waiting forever for the number of
request dispatches, of the same request, to be balanced by an equal
number of request completions (while there will be one completion for
that request). In this state, BFQ may refuse to serve I/O requests
from other bfq_queues. The hang reported in [1] then follows.

However, the above re-prepared requests undergo a requeue, thus the
requeue_request hook of the active elevator is invoked for these
requests, if set. This commit then addresses the above issue by
properly implementing the hook requeue_request in BFQ.

[1] https://marc.info/?l=linux-block&m=151211117608676

Reported-by: Ivan Kozik <ivan@ludios.org>
Reported-by: Alban Browaeys <alban.browaeys@gmail.com>
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Serena Ziviani <ziviani.serena@gmail.com>
---
V2: contains fix to bug reported in [2]
V3: implements the improvement suggested in [3]

[2] https://lkml.org/lkml/2018/2/5/599
[3] https://lkml.org/lkml/2018/2/7/532

 block/bfq-iosched.c | 107 ++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 82 insertions(+), 25 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 47e6ec7427c4..aeca22d91101 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -3823,24 +3823,26 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
 		}

 		/*
-		 * We exploit the bfq_finish_request hook to decrement
-		 * rq_in_driver, but bfq_finish_request will not be
-		 * invoked on this request. So, to avoid unbalance,
-		 * just start this request, without incrementing
-		 * rq_in_driver. As a negative consequence,
-		 * rq_in_driver is deceptively lower than it should be
-		 * while this request is in service. This may cause
-		 * bfq_schedule_dispatch to be invoked uselessly.
+		 * We exploit the bfq_finish_requeue_request hook to
+		 * decrement rq_in_driver, but
+		 * bfq_finish_requeue_request will not be invoked on
+		 * this request. So, to avoid unbalance, just start
+		 * this request, without incrementing rq_in_driver. As
+		 * a negative consequence, rq_in_driver is deceptively
+		 * lower than it should be while this request is in
+		 * service. This may cause bfq_schedule_dispatch to be
+		 * invoked uselessly.
 		 *
 		 * As for implementing an exact solution, the
-		 * bfq_finish_request hook, if defined, is probably
-		 * invoked also on this request. So, by exploiting
-		 * this hook, we could 1) increment rq_in_driver here,
-		 * and 2) decrement it in bfq_finish_request. Such a
-		 * solution would let the value of the counter be
-		 * always accurate, but it would entail using an extra
-		 * interface function. This cost seems higher than the
-		 * benefit, being the frequency of non-elevator-private
+		 * bfq_finish_requeue_request hook, if defined, is
+		 * probably invoked also on this request. So, by
+		 * exploiting this hook, we could 1) increment
+		 * rq_in_driver here, and 2) decrement it in
+		 * bfq_finish_requeue_request. Such a solution would
+		 * let the value of the counter be always accurate,
+		 * but it would entail using an extra interface
+		 * function. This cost seems higher than the benefit,
+		 * being the frequency of non-elevator-private
 		 * requests very low.
 		 */
 		goto start_rq;
@@ -4515,6 +4517,8 @@ static inline void bfq_update_insert_stats(struct request_queue *q,
 					   unsigned int cmd_flags) {}
 #endif

+static void bfq_prepare_request(struct request *rq, struct bio *bio);
+
 static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 			       bool at_head)
 {
@@ -4541,6 +4545,18 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 		else
 			list_add_tail(&rq->queuelist, &bfqd->dispatch);
 	} else {
+		if (WARN_ON_ONCE(!bfqq)) {
+			/*
+			 * This should never happen. Most likely rq is
+			 * a requeued regular request, being
+			 * re-inserted without being first
+			 * re-prepared. Do a prepare, to avoid
+			 * failure.
+			 */
+			bfq_prepare_request(rq, rq->bio);
+			bfqq = RQ_BFQQ(rq);
+		}
+
 		idle_timer_disabled = __bfq_insert_request(bfqd, rq);
 		/*
 		 * Update bfqq, because, if a queue merge has occurred
@@ -4697,22 +4713,44 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
 		bfq_schedule_dispatch(bfqd);
 }

-static void bfq_finish_request_body(struct bfq_queue *bfqq)
+static void bfq_finish_requeue_request_body(struct bfq_queue *bfqq)
 {
 	bfqq->allocated--;

 	bfq_put_queue(bfqq);
 }

-static void bfq_finish_request(struct request *rq)
+/*
+ * Handle either a requeue or a finish for rq. The things to do are
+ * the same in both cases: all references to rq are to be dropped. In
+ * particular, rq is considered completed from the point of view of
+ * the scheduler.
+ */
+static void bfq_finish_requeue_request(struct request *rq)
 {
-	struct bfq_queue *bfqq;
+	struct bfq_queue *bfqq = RQ_BFQQ(rq);
 	struct bfq_data *bfqd;

-	if (!rq->elv.icq)
+	/*
+	 * Requeue and finish hooks are invoked in blk-mq without
+	 * checking whether the involved request is actually still
+	 * referenced in the scheduler. To handle this fact, the
+	 * following two checks make this function exit in case of
+	 * spurious invocations, for which there is nothing to do.
+	 *
+	 * First, check whether rq has nothing to do with an elevator.
+	 */
+	if (unlikely(!(rq->rq_flags & RQF_ELVPRIV)))
+		return;
+
+	/*
+	 * rq either is not associated with any icq, or is an already
+	 * requeued request that has not (yet) been re-inserted into
+	 * a bfq_queue.
+	 */
+	if (!rq->elv.icq || !bfqq)
 		return;

-	bfqq = RQ_BFQQ(rq);
 	bfqd = bfqq->bfqd;

 	if (rq->rq_flags & RQF_STARTED)
@@ -4727,13 +4765,14 @@ static void bfq_finish_request(struct request *rq)
 		spin_lock_irqsave(&bfqd->lock, flags);

 		bfq_completed_request(bfqq, bfqd);
-		bfq_finish_request_body(bfqq);
+		bfq_finish_requeue_request_body(bfqq);

 		spin_unlock_irqrestore(&bfqd->lock, flags);
 	} else {
 		/*
 		 * Request rq may be still/already in the scheduler,
-		 * in which case we need to remove it. And we cannot
+		 * in which case we need to remove it (this should
+		 * never happen in case of requeue). And we cannot
 		 * defer such a check and removal, to avoid
 		 * inconsistencies in the time interval from the end
 		 * of this function to the start of the deferred work.
@@ -4748,9 +4787,26 @@ static void bfq_finish_request(struct request *rq)
 			bfqg_stats_update_io_remove(bfqq_group(bfqq),
 						    rq->cmd_flags);
 		}
-		bfq_finish_request_body(bfqq);
+		bfq_finish_requeue_request_body(bfqq);
 	}

+	/*
+	 * Reset private fields. In case of a requeue, this allows
+	 * this function to correctly do nothing if it is spuriously
+	 * invoked again on this same request (see the check at the
+	 * beginning of the function). Probably, a better general
+	 * design would be to prevent blk-mq from invoking the requeue
+	 * or finish hooks of an elevator, for a request that is not
+	 * referred by that elevator.
+	 *
+	 * Resetting the following fields would break the
+	 * request-insertion logic if rq is re-inserted into a bfq
+	 * internal queue, without a re-preparation. Here we assume
+	 * that re-insertions of requeued requests, without
+	 * re-preparation, can happen only for pass_through or at_head
+	 * requests (which are not re-inserted into bfq internal
+	 * queues).
+	 */
 	rq->elv.priv[0] = NULL;
 	rq->elv.priv[1] = NULL;
 }
@@ -5426,7 +5482,8 @@ static struct elevator_type iosched_bfq_mq = {
 	.ops.mq = {
 		.limit_depth		= bfq_limit_depth,
 		.prepare_request	= bfq_prepare_request,
-		.finish_request		= bfq_finish_request,
+		.requeue_request        = bfq_finish_requeue_request,
+		.finish_request		= bfq_finish_requeue_request,
 		.exit_icq		= bfq_exit_icq,
 		.insert_requests	= bfq_insert_requests,
 		.dispatch_request	= bfq_dispatch_request,
--
2.15.1

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-07 21:19 [PATCH BUGFIX V3] block, bfq: add requeue-request hook Paolo Valente
@ 2018-02-07 22:18 ` Jens Axboe
  2018-02-08  7:16     ` Paolo Valente
  2018-02-23 15:07 ` Ming Lei
  1 sibling, 1 reply; 54+ messages in thread
From: Jens Axboe @ 2018-02-07 22:18 UTC (permalink / raw)
  To: Paolo Valente
  Cc: linux-block, linux-kernel, ulf.hansson, broonie, linus.walleij,
	bfq-iosched, oleksandr, alban.browaeys, ming.lei, ivan, 169364,
	holger, efault, Serena Ziviani

On 2/7/18 2:19 PM, Paolo Valente wrote:
> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
> be re-inserted into the active I/O scheduler for that device. As a
> consequence, I/O schedulers may get the same request inserted again,
> even several times, without a finish_request invoked on that request
> before each re-insertion.
> 
> This fact is the cause of the failure reported in [1]. For an I/O
> scheduler, every re-insertion of the same re-prepared request is
> equivalent to the insertion of a new request. For schedulers like
> mq-deadline or kyber, this fact causes no harm. In contrast, it
> confuses a stateful scheduler like BFQ, which keeps state for an I/O
> request, until the finish_request hook is invoked on the request. In
> particular, BFQ may get stuck, waiting forever for the number of
> request dispatches, of the same request, to be balanced by an equal
> number of request completions (while there will be one completion for
> that request). In this state, BFQ may refuse to serve I/O requests
> from other bfq_queues. The hang reported in [1] then follows.
> 
> However, the above re-prepared requests undergo a requeue, thus the
> requeue_request hook of the active elevator is invoked for these
> requests, if set. This commit then addresses the above issue by
> properly implementing the hook requeue_request in BFQ.

Thanks, applied.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-07 22:18 ` Jens Axboe
@ 2018-02-08  7:16     ` Paolo Valente
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Valente @ 2018-02-08  7:16 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Linux Kernel Mailing List, Ulf Hansson, Mark Brown,
	Linus Walleij, 'Paolo Valente' via bfq-iosched,
	Oleksandr Natalenko, Alban Browaeys, Ming Lei, Ivan Kozik,
	169364, holger, efault, Serena Ziviani



> Il giorno 07 feb 2018, alle ore 23:18, Jens Axboe <axboe@kernel.dk> ha =
scritto:
>=20
> On 2/7/18 2:19 PM, Paolo Valente wrote:
>> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq =
via
>> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a =
device
>> be re-inserted into the active I/O scheduler for that device. As a
>> consequence, I/O schedulers may get the same request inserted again,
>> even several times, without a finish_request invoked on that request
>> before each re-insertion.
>>=20
>> This fact is the cause of the failure reported in [1]. For an I/O
>> scheduler, every re-insertion of the same re-prepared request is
>> equivalent to the insertion of a new request. For schedulers like
>> mq-deadline or kyber, this fact causes no harm. In contrast, it
>> confuses a stateful scheduler like BFQ, which keeps state for an I/O
>> request, until the finish_request hook is invoked on the request. In
>> particular, BFQ may get stuck, waiting forever for the number of
>> request dispatches, of the same request, to be balanced by an equal
>> number of request completions (while there will be one completion for
>> that request). In this state, BFQ may refuse to serve I/O requests
>> from other bfq_queues. The hang reported in [1] then follows.
>>=20
>> However, the above re-prepared requests undergo a requeue, thus the
>> requeue_request hook of the active elevator is invoked for these
>> requests, if set. This commit then addresses the above issue by
>> properly implementing the hook requeue_request in BFQ.
>=20
> Thanks, applied.
>=20

I Jens,
I forgot to add
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
in the patch.

Is it still possible to add it?

Thanks,
Paolo

> --=20
> Jens Axboe

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
@ 2018-02-08  7:16     ` Paolo Valente
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Valente @ 2018-02-08  7:16 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, Linux Kernel Mailing List, Ulf Hansson, Mark Brown,
	Linus Walleij, 'Paolo Valente' via bfq-iosched,
	Oleksandr Natalenko, Alban Browaeys, Ming Lei, Ivan Kozik,
	169364, holger, efault, Serena Ziviani



> Il giorno 07 feb 2018, alle ore 23:18, Jens Axboe <axboe@kernel.dk> ha scritto:
> 
> On 2/7/18 2:19 PM, Paolo Valente wrote:
>> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
>> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
>> be re-inserted into the active I/O scheduler for that device. As a
>> consequence, I/O schedulers may get the same request inserted again,
>> even several times, without a finish_request invoked on that request
>> before each re-insertion.
>> 
>> This fact is the cause of the failure reported in [1]. For an I/O
>> scheduler, every re-insertion of the same re-prepared request is
>> equivalent to the insertion of a new request. For schedulers like
>> mq-deadline or kyber, this fact causes no harm. In contrast, it
>> confuses a stateful scheduler like BFQ, which keeps state for an I/O
>> request, until the finish_request hook is invoked on the request. In
>> particular, BFQ may get stuck, waiting forever for the number of
>> request dispatches, of the same request, to be balanced by an equal
>> number of request completions (while there will be one completion for
>> that request). In this state, BFQ may refuse to serve I/O requests
>> from other bfq_queues. The hang reported in [1] then follows.
>> 
>> However, the above re-prepared requests undergo a requeue, thus the
>> requeue_request hook of the active elevator is invoked for these
>> requests, if set. This commit then addresses the above issue by
>> properly implementing the hook requeue_request in BFQ.
> 
> Thanks, applied.
> 

I Jens,
I forgot to add
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
in the patch.

Is it still possible to add it?

Thanks,
Paolo

> -- 
> Jens Axboe

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-08  7:16     ` Paolo Valente
  (?)
@ 2018-02-09 13:21     ` Oleksandr Natalenko
  2018-02-09 17:17       ` Jens Axboe
  2018-02-09 17:29         ` Mike Galbraith
  -1 siblings, 2 replies; 54+ messages in thread
From: Oleksandr Natalenko @ 2018-02-09 13:21 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Jens Axboe, linux-block, Linux Kernel Mailing List, Ulf Hansson,
	Mark Brown, Linus Walleij,
	'Paolo Valente' via bfq-iosched, Alban Browaeys,
	Ming Lei, Ivan Kozik, 169364, holger, efault, Serena Ziviani

Hi.

08.02.2018 08:16, Paolo Valente wrote:
>> Il giorno 07 feb 2018, alle ore 23:18, Jens Axboe <axboe@kernel.dk> ha 
>> scritto:
>> 
>> On 2/7/18 2:19 PM, Paolo Valente wrote:
>>> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq 
>>> via
>>> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a 
>>> device
>>> be re-inserted into the active I/O scheduler for that device. As a
>>> consequence, I/O schedulers may get the same request inserted again,
>>> even several times, without a finish_request invoked on that request
>>> before each re-insertion.
>>> 
>>> This fact is the cause of the failure reported in [1]. For an I/O
>>> scheduler, every re-insertion of the same re-prepared request is
>>> equivalent to the insertion of a new request. For schedulers like
>>> mq-deadline or kyber, this fact causes no harm. In contrast, it
>>> confuses a stateful scheduler like BFQ, which keeps state for an I/O
>>> request, until the finish_request hook is invoked on the request. In
>>> particular, BFQ may get stuck, waiting forever for the number of
>>> request dispatches, of the same request, to be balanced by an equal
>>> number of request completions (while there will be one completion for
>>> that request). In this state, BFQ may refuse to serve I/O requests
>>> from other bfq_queues. The hang reported in [1] then follows.
>>> 
>>> However, the above re-prepared requests undergo a requeue, thus the
>>> requeue_request hook of the active elevator is invoked for these
>>> requests, if set. This commit then addresses the above issue by
>>> properly implementing the hook requeue_request in BFQ.
>> 
>> Thanks, applied.
>> 
> 
> I Jens,
> I forgot to add
> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> in the patch.
> 
> Is it still possible to add it?
> 

In addition to this I think it should be worth considering CC'ing Greg 
to pull this fix into 4.15 stable tree.

Oleksandr

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-09 13:21     ` Oleksandr Natalenko
@ 2018-02-09 17:17       ` Jens Axboe
  2018-02-09 17:29         ` Mike Galbraith
  1 sibling, 0 replies; 54+ messages in thread
From: Jens Axboe @ 2018-02-09 17:17 UTC (permalink / raw)
  To: Oleksandr Natalenko, Paolo Valente
  Cc: linux-block, Linux Kernel Mailing List, Ulf Hansson, Mark Brown,
	Linus Walleij, 'Paolo Valente' via bfq-iosched,
	Alban Browaeys, Ming Lei, Ivan Kozik, 169364, holger, efault,
	Serena Ziviani

On 2/9/18 6:21 AM, Oleksandr Natalenko wrote:
> Hi.
> 
> 08.02.2018 08:16, Paolo Valente wrote:
>>> Il giorno 07 feb 2018, alle ore 23:18, Jens Axboe <axboe@kernel.dk> ha 
>>> scritto:
>>>
>>> On 2/7/18 2:19 PM, Paolo Valente wrote:
>>>> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq 
>>>> via
>>>> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a 
>>>> device
>>>> be re-inserted into the active I/O scheduler for that device. As a
>>>> consequence, I/O schedulers may get the same request inserted again,
>>>> even several times, without a finish_request invoked on that request
>>>> before each re-insertion.
>>>>
>>>> This fact is the cause of the failure reported in [1]. For an I/O
>>>> scheduler, every re-insertion of the same re-prepared request is
>>>> equivalent to the insertion of a new request. For schedulers like
>>>> mq-deadline or kyber, this fact causes no harm. In contrast, it
>>>> confuses a stateful scheduler like BFQ, which keeps state for an I/O
>>>> request, until the finish_request hook is invoked on the request. In
>>>> particular, BFQ may get stuck, waiting forever for the number of
>>>> request dispatches, of the same request, to be balanced by an equal
>>>> number of request completions (while there will be one completion for
>>>> that request). In this state, BFQ may refuse to serve I/O requests
>>>> from other bfq_queues. The hang reported in [1] then follows.
>>>>
>>>> However, the above re-prepared requests undergo a requeue, thus the
>>>> requeue_request hook of the active elevator is invoked for these
>>>> requests, if set. This commit then addresses the above issue by
>>>> properly implementing the hook requeue_request in BFQ.
>>>
>>> Thanks, applied.
>>>
>>
>> I Jens,
>> I forgot to add
>> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
>> in the patch.
>>
>> Is it still possible to add it?
>>
> 
> In addition to this I think it should be worth considering CC'ing Greg 
> to pull this fix into 4.15 stable tree.

I can't add the tested-by anymore, but it's easy enough to target for
stable after-the-fact.


-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-09 13:21     ` Oleksandr Natalenko
@ 2018-02-09 17:29         ` Mike Galbraith
  2018-02-09 17:29         ` Mike Galbraith
  1 sibling, 0 replies; 54+ messages in thread
From: Mike Galbraith @ 2018-02-09 17:29 UTC (permalink / raw)
  To: Oleksandr Natalenko, Paolo Valente
  Cc: Jens Axboe, linux-block, Linux Kernel Mailing List, Ulf Hansson,
	Mark Brown, Linus Walleij,
	'Paolo Valente' via bfq-iosched, Alban Browaeys,
	Ming Lei, Ivan Kozik, 169364, holger, Serena Ziviani

On Fri, 2018-02-09 at 14:21 +0100, Oleksandr Natalenko wrote:
> 
> In addition to this I think it should be worth considering CC'ing Greg 
> to pull this fix into 4.15 stable tree.

This isn't one he can cherry-pick, some munging required, in which case
he usually wants a properly tested backport.

	-Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
@ 2018-02-09 17:29         ` Mike Galbraith
  0 siblings, 0 replies; 54+ messages in thread
From: Mike Galbraith @ 2018-02-09 17:29 UTC (permalink / raw)
  To: Oleksandr Natalenko, Paolo Valente
  Cc: Jens Axboe, linux-block, Linux Kernel Mailing List, Ulf Hansson,
	Mark Brown, Linus Walleij,
	'Paolo Valente' via bfq-iosched, Alban Browaeys,
	Ming Lei, Ivan Kozik, 169364, holger, Serena Ziviani

On Fri, 2018-02-09 at 14:21 +0100, Oleksandr Natalenko wrote:
> 
> In addition to this I think it should be worth considering CC'ing Greg 
> to pull this fix into 4.15 stable tree.

This isn't one he can cherry-pick, some munging required, in which case
he usually wants a properly tested backport.

	-Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-09 17:29         ` Mike Galbraith
@ 2018-02-10  8:29           ` Oleksandr Natalenko
  -1 siblings, 0 replies; 54+ messages in thread
From: Oleksandr Natalenko @ 2018-02-10  8:29 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Paolo Valente, Jens Axboe, linux-block,
	Linux Kernel Mailing List, Ulf Hansson, Mark Brown,
	Linus Walleij, 'Paolo Valente' via bfq-iosched,
	Alban Browaeys, Ming Lei, Ivan Kozik, 169364, holger,
	Serena Ziviani

Hi.

On p=E1tek 9. =FAnora 2018 18:29:39 CET Mike Galbraith wrote:
> On Fri, 2018-02-09 at 14:21 +0100, Oleksandr Natalenko wrote:
> > In addition to this I think it should be worth considering CC'ing Greg
> > to pull this fix into 4.15 stable tree.
>=20
> This isn't one he can cherry-pick, some munging required, in which case
> he usually wants a properly tested backport.
>=20
> 	-Mike

Maybe, this could be a good opportunity to push all the pending BFQ patches=
=20
into the stable 4.15 branch? Because IIUC currently BFQ in 4.15 is just=20
unusable.

Paolo?

=2D--

block, bfq: add requeue-request hook
bfq-iosched: don't call bfqg_and_blkg_put for !CONFIG_BFQ_GROUP_IOSCHED
block, bfq: release oom-queue ref to root group on exit
block, bfq: put async queues for root bfq groups too
block, bfq: limit sectors served with interactive weight raising
block, bfq: limit tags for writes and async I/O
block, bfq: increase threshold to deem I/O as random
block, bfq: remove superfluous check in queue-merging setup
block, bfq: let a queue be merged only shortly after starting I/O
block, bfq: check low_latency flag in bfq_bfqq_save_state()
block, bfq: add missing rq_pos_tree update on rq removal
block, bfq: fix occurrences of request finish method's old name
block, bfq: consider also past I/O in soft real-time detection
block, bfq: remove batches of confusing ifdefs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
@ 2018-02-10  8:29           ` Oleksandr Natalenko
  0 siblings, 0 replies; 54+ messages in thread
From: Oleksandr Natalenko @ 2018-02-10  8:29 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Paolo Valente, Jens Axboe, linux-block,
	Linux Kernel Mailing List, Ulf Hansson, Mark Brown,
	Linus Walleij, 'Paolo Valente' via bfq-iosched,
	Alban Browaeys, Ming Lei, Ivan Kozik, 169364, holger,
	Serena Ziviani

Hi.

On pátek 9. února 2018 18:29:39 CET Mike Galbraith wrote:
> On Fri, 2018-02-09 at 14:21 +0100, Oleksandr Natalenko wrote:
> > In addition to this I think it should be worth considering CC'ing Greg
> > to pull this fix into 4.15 stable tree.
> 
> This isn't one he can cherry-pick, some munging required, in which case
> he usually wants a properly tested backport.
> 
> 	-Mike

Maybe, this could be a good opportunity to push all the pending BFQ patches 
into the stable 4.15 branch? Because IIUC currently BFQ in 4.15 is just 
unusable.

Paolo?

---

block, bfq: add requeue-request hook
bfq-iosched: don't call bfqg_and_blkg_put for !CONFIG_BFQ_GROUP_IOSCHED
block, bfq: release oom-queue ref to root group on exit
block, bfq: put async queues for root bfq groups too
block, bfq: limit sectors served with interactive weight raising
block, bfq: limit tags for writes and async I/O
block, bfq: increase threshold to deem I/O as random
block, bfq: remove superfluous check in queue-merging setup
block, bfq: let a queue be merged only shortly after starting I/O
block, bfq: check low_latency flag in bfq_bfqq_save_state()
block, bfq: add missing rq_pos_tree update on rq removal
block, bfq: fix occurrences of request finish method's old name
block, bfq: consider also past I/O in soft real-time detection
block, bfq: remove batches of confusing ifdefs

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-10  8:29           ` Oleksandr Natalenko
@ 2018-02-12  7:24             ` Paolo Valente
  -1 siblings, 0 replies; 54+ messages in thread
From: Paolo Valente @ 2018-02-12  7:24 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Mike Galbraith, Jens Axboe, linux-block,
	Linux Kernel Mailing List, Ulf Hansson, Mark Brown,
	Linus Walleij, 'Paolo Valente' via bfq-iosched,
	Alban Browaeys, Ming Lei, Ivan Kozik, 169364, holger,
	Serena Ziviani



> Il giorno 10 feb 2018, alle ore 09:29, Oleksandr Natalenko =
<oleksandr@natalenko.name> ha scritto:
>=20
> Hi.
>=20
> On p=C3=A1tek 9. =C3=BAnora 2018 18:29:39 CET Mike Galbraith wrote:
>> On Fri, 2018-02-09 at 14:21 +0100, Oleksandr Natalenko wrote:
>>> In addition to this I think it should be worth considering CC'ing =
Greg
>>> to pull this fix into 4.15 stable tree.
>>=20
>> This isn't one he can cherry-pick, some munging required, in which =
case
>> he usually wants a properly tested backport.
>>=20
>> 	-Mike
>=20
> Maybe, this could be a good opportunity to push all the pending BFQ =
patches=20
> into the stable 4.15 branch? Because IIUC currently BFQ in 4.15 is =
just=20
> unusable.
>=20
> Paolo?
>=20

Of course ok for me, and thanks Oleksandr for proposing this.  These
commits should apply cleanly on 4.15, and FWIW have been tested, by me
and BFQ users, on 4.15 too in these months.

Thanks,
Paolo

> ---
>=20
> block, bfq: add requeue-request hook
> bfq-iosched: don't call bfqg_and_blkg_put for =
!CONFIG_BFQ_GROUP_IOSCHED
> block, bfq: release oom-queue ref to root group on exit
> block, bfq: put async queues for root bfq groups too
> block, bfq: limit sectors served with interactive weight raising
> block, bfq: limit tags for writes and async I/O
> block, bfq: increase threshold to deem I/O as random
> block, bfq: remove superfluous check in queue-merging setup
> block, bfq: let a queue be merged only shortly after starting I/O
> block, bfq: check low_latency flag in bfq_bfqq_save_state()
> block, bfq: add missing rq_pos_tree update on rq removal
> block, bfq: fix occurrences of request finish method's old name
> block, bfq: consider also past I/O in soft real-time detection
> block, bfq: remove batches of confusing ifdefs
>=20
>=20

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
@ 2018-02-12  7:24             ` Paolo Valente
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Valente @ 2018-02-12  7:24 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Mike Galbraith, Jens Axboe, linux-block,
	Linux Kernel Mailing List, Ulf Hansson, Mark Brown,
	Linus Walleij, 'Paolo Valente' via bfq-iosched,
	Alban Browaeys, Ming Lei, Ivan Kozik, 169364, holger,
	Serena Ziviani



> Il giorno 10 feb 2018, alle ore 09:29, Oleksandr Natalenko <oleksandr@natalenko.name> ha scritto:
> 
> Hi.
> 
> On pátek 9. února 2018 18:29:39 CET Mike Galbraith wrote:
>> On Fri, 2018-02-09 at 14:21 +0100, Oleksandr Natalenko wrote:
>>> In addition to this I think it should be worth considering CC'ing Greg
>>> to pull this fix into 4.15 stable tree.
>> 
>> This isn't one he can cherry-pick, some munging required, in which case
>> he usually wants a properly tested backport.
>> 
>> 	-Mike
> 
> Maybe, this could be a good opportunity to push all the pending BFQ patches 
> into the stable 4.15 branch? Because IIUC currently BFQ in 4.15 is just 
> unusable.
> 
> Paolo?
> 

Of course ok for me, and thanks Oleksandr for proposing this.  These
commits should apply cleanly on 4.15, and FWIW have been tested, by me
and BFQ users, on 4.15 too in these months.

Thanks,
Paolo

> ---
> 
> block, bfq: add requeue-request hook
> bfq-iosched: don't call bfqg_and_blkg_put for !CONFIG_BFQ_GROUP_IOSCHED
> block, bfq: release oom-queue ref to root group on exit
> block, bfq: put async queues for root bfq groups too
> block, bfq: limit sectors served with interactive weight raising
> block, bfq: limit tags for writes and async I/O
> block, bfq: increase threshold to deem I/O as random
> block, bfq: remove superfluous check in queue-merging setup
> block, bfq: let a queue be merged only shortly after starting I/O
> block, bfq: check low_latency flag in bfq_bfqq_save_state()
> block, bfq: add missing rq_pos_tree update on rq removal
> block, bfq: fix occurrences of request finish method's old name
> block, bfq: consider also past I/O in soft real-time detection
> block, bfq: remove batches of confusing ifdefs
> 
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-12  7:24             ` Paolo Valente
  (?)
@ 2018-02-12 11:22             ` Mike Galbraith
  2018-02-12 12:48               ` Oleksandr Natalenko
  -1 siblings, 1 reply; 54+ messages in thread
From: Mike Galbraith @ 2018-02-12 11:22 UTC (permalink / raw)
  To: Paolo Valente, Oleksandr Natalenko; +Cc: Jens Axboe, stable

On Mon, 2018-02-12 at 08:24 +0100, Paolo Valente wrote:
> 
> > Il giorno 10 feb 2018, alle ore 09:29, Oleksandr Natalenko <oleksandr@natalenko.name> ha scritto:
> > 
> > Hi.
> > 
> > On pï¿œtek 9. ï¿œnora 2018 18:29:39 CET Mike Galbraith wrote:
> >> On Fri, 2018-02-09 at 14:21 +0100, Oleksandr Natalenko wrote:
> >>> In addition to this I think it should be worth considering CC'ing Greg
> >>> to pull this fix into 4.15 stable tree.
> >> 
> >> This isn't one he can cherry-pick, some munging required, in which case
> >> he usually wants a properly tested backport.
> >> 
> >> 	-Mike
> > 
> > Maybe, this could be a good opportunity to push all the pending BFQ patches 
> > into the stable 4.15 branch? Because IIUC currently BFQ in 4.15 is just 
> > unusable.
> > 
> > Paolo?
> > 
> 
> Of course ok for me, and thanks Oleksandr for proposing this.  These
> commits should apply cleanly on 4.15, and FWIW have been tested, by me
> and BFQ users, on 4.15 too in these months.

How does that work without someone actually submitting patches? ï¿œCC
stable and pass along a conveniently sorted cherry-pick list?

9b25bd0368d5 block, bfq: remove batches of confusing ifdefs
a34b024448eb block, bfq: consider also past I/O in soft real-time detection
4403e4e467c3 block, bfq: remove superfluous check in queue-merging setup
7b8fa3b900a0 block, bfq: let a queue be merged only shortly after starting I/O
1be6e8a964ee block, bfq: check low_latency flag in bfq_bfqq_save_state()
05e902835616 block, bfq: add missing rq_pos_tree update on rq removal
f0ba5ea2fe45 block, bfq: increase threshold to deem I/O as random
0d52af590552 block, bfq: release oom-queue ref to root group on exit
52257ffbfcaf block, bfq: put async queues for root bfq groups too
8abef10b3de1 bfq-iosched: don't call bfqg_and_blkg_put for !CONFIG_BFQ_GROUP_IOSCHED
8993d445df38 block, bfq: fix occurrences of request finish method's old name
8a8747dc01ce block, bfq: limit sectors served with interactive weight raising
a52a69ea89dc block, bfq: limit tags for writes and async I/O
a78773906147 block, bfq: add requeue-request hook

> ---
> > 
> > block, bfq: add requeue-request hook
> > bfq-iosched: don't call bfqg_and_blkg_put for !CONFIG_BFQ_GROUP_IOSCHED
> > block, bfq: release oom-queue ref to root group on exit
> > block, bfq: put async queues for root bfq groups too
> > block, bfq: limit sectors served with interactive weight raising
> > block, bfq: limit tags for writes and async I/O
> > block, bfq: increase threshold to deem I/O as random
> > block, bfq: remove superfluous check in queue-merging setup
> > block, bfq: let a queue be merged only shortly after starting I/O
> > block, bfq: check low_latency flag in bfq_bfqq_save_state()
> > block, bfq: add missing rq_pos_tree update on rq removal
> > block, bfq: fix occurrences of request finish method's old name
> > block, bfq: consider also past I/O in soft real-time detection
> > block, bfq: remove batches of confusing ifdefs
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-12 11:22             ` Mike Galbraith
@ 2018-02-12 12:48               ` Oleksandr Natalenko
  2018-02-13  5:02                 ` Mike Galbraith
  2018-02-26 19:48                 ` Greg KH
  0 siblings, 2 replies; 54+ messages in thread
From: Oleksandr Natalenko @ 2018-02-12 12:48 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Paolo Valente, Jens Axboe, stable

Hi.

12.02.2018 12:22, Mike Galbraith wrote:
> How does that work without someone actually submitting patches?  CC
> stable and pass along a conveniently sorted cherry-pick list?

 From my experience, Greg is fine with the list of hashes, but he needs 
an approval from the relevant subtree maintainer (Jens in this case).

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-12 12:48               ` Oleksandr Natalenko
@ 2018-02-13  5:02                 ` Mike Galbraith
  2018-02-13  5:46                   ` Paolo Valente
  2018-02-26 19:48                 ` Greg KH
  1 sibling, 1 reply; 54+ messages in thread
From: Mike Galbraith @ 2018-02-13  5:02 UTC (permalink / raw)
  To: Oleksandr Natalenko; +Cc: Paolo Valente, Jens Axboe, stable

On Mon, 2018-02-12 at 13:48 +0100, Oleksandr Natalenko wrote:
> Hi.
> 
> 12.02.2018 12:22, Mike Galbraith wrote:
> > How does that work without someone actually submitting patches? ï¿œCC
> > stable and pass along a conveniently sorted cherry-pick list?
> 
>  From my experience, Greg is fine with the list of hashes, but he needs 
> an approval from the relevant subtree maintainer (Jens in this case).

Oh dear. ï¿œI'm a bit reluctant to even mention this, buuut, when
creating that hash list, I extracted/applied them to 4.15.3 via quilt
to make sure that they really do slip right in.. and later, while
building two fat enterprise config kernels concurrently, trees living
in separate spinning rust buckets, box IO stalled on me. ï¿œNot good.
ï¿œHard reboot, clean up and try it again, it reproduced. ï¿œSwitch to
master, that reproduced. ï¿œExtra not good.

I found that rather odd, because I had earlier done this to 4.15..

patches/block-bfq:-fix-occurrences-of-request-finish-method-s-old-name.patch
patches/block-bfq:-add-requeue-request-hook.patch

..ie take only the rename, wedge the fix in on top of it, beat on box,
it worked fine. ï¿œThis morning, I go back to only those two, and box is
again (at least pretending to be) perfectly happy to build the same
trees it hung on twice in a row yesterday, it _seems_ to be completely
uninterested in stalling no matter what I do.

So, something not so funny is still going on, here at least. ï¿œTest
results imply that something in that hash list _may_ not be all that
desirable a thing to send to stable, but who knows, some playful
gremlin may well be pulling my chain, will let box stall on me again a
few seconds after I write that, and poke send :)

Ok IO gremlin, go for it...

	-Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-13  5:02                 ` Mike Galbraith
@ 2018-02-13  5:46                   ` Paolo Valente
  2018-02-13  6:42                     ` Mike Galbraith
  0 siblings, 1 reply; 54+ messages in thread
From: Paolo Valente @ 2018-02-13  5:46 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Oleksandr Natalenko, Jens Axboe, stable



> Il giorno 13 feb 2018, alle ore 06:02, Mike Galbraith <efault@gmx.de> ha scritto:
> 
> On Mon, 2018-02-12 at 13:48 +0100, Oleksandr Natalenko wrote:
>> Hi.
>> 
>> 12.02.2018 12:22, Mike Galbraith wrote:
>>> How does that work without someone actually submitting patches?  CC
>>> stable and pass along a conveniently sorted cherry-pick list?
>> 
>> From my experience, Greg is fine with the list of hashes, but he needs 
>> an approval from the relevant subtree maintainer (Jens in this case).
> 
> Oh dear.  I'm a bit reluctant to even mention this, buuut, when
> creating that hash list, I extracted/applied them to 4.15.3 via quilt
> to make sure that they really do slip right in.. and later, while
> building two fat enterprise config kernels concurrently, trees living
> in separate spinning rust buckets, box IO stalled on me.  Not good.
>  Hard reboot, clean up and try it again, it reproduced.  Switch to
> master, that reproduced.  Extra not good.
> 
> I found that rather odd, because I had earlier done this to 4.15..
> 
> patches/block-bfq:-fix-occurrences-of-request-finish-method-s-old-name.patch
> patches/block-bfq:-add-requeue-request-hook.patch
> 
> ..ie take only the rename, wedge the fix in on top of it, beat on box,
> it worked fine.  This morning, I go back to only those two, and box is
> again (at least pretending to be) perfectly happy to build the same
> trees it hung on twice in a row yesterday, it _seems_ to be completely
> uninterested in stalling no matter what I do.
> 
> So, something not so funny is still going on, here at least.  Test
> results imply that something in that hash list _may_ not be all that
> desirable a thing to send to stable, but who knows, some playful
> gremlin may well be pulling my chain, will let box stall on me again a
> few seconds after I write that, and poke send :)
> 
> Ok IO gremlin, go for it...
> 

Any chance your gremlin grants you some OOPS?

BTW, the only other commit in that series that affects bfq interaction
with the rest of the system is:
a52a69ea89dc block, bfq: limit tags for writes and async I/O

And the other commits that do something beyond changing some calculation are:
0d52af590552 block, bfq: release oom-queue ref to root group on exit
52257ffbfcaf block, bfq: put async queues for root bfq groups too

Thanks,
Paolo

> 	-Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-13  5:46                   ` Paolo Valente
@ 2018-02-13  6:42                     ` Mike Galbraith
  2018-02-13  6:50                       ` Mike Galbraith
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Galbraith @ 2018-02-13  6:42 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Oleksandr Natalenko, Jens Axboe, stable

On Tue, 2018-02-13 at 06:46 +0100, Paolo Valente wrote:
> 
> Any chance your gremlin grants you some OOPS?

I crash dumped it, though I'm unlikely to have time to even figure out
what the hell _I'm_ doing in IO land, much less what it's doing :) ᅵ

If you want it, and have a spot, I could try an over night upload
(vmcore=14GB). ï¿œHung tasks are not in short supply.

crash> ps | grep UN
      4      2   0  ffff880187f12940  UN   0.0       0      0  [kworker/0:0H]
      6      2   0  ffff880187f144c0  UN   0.0       0      0  [mm_percpu_wq]
      8      2   1  ffff880187f16040  UN   0.0       0      0  [rcu_sched]
      9      2   4  ffff880187f16e00  UN   0.0       0      0  [rcu_bh]
     18      2   1  ffff88018740d280  UN   0.0       0      0  [kworker/1:0H]
     24      2   2  ffff880187582940  UN   0.0       0      0  [kworker/2:0H]
     30      2   3  ffff8801875e8000  UN   0.0       0      0  [kworker/3:0H]
     36      2   4  ffff8801875ed280  UN   0.0       0      0  [kworker/4:0H]
     42      2   5  ffff88018769a940  UN   0.0       0      0  [kworker/5:0H]
     48      2   6  ffff880187700000  UN   0.0       0      0  [kworker/6:0H]
     54      2   7  ffff880187705280  UN   0.0       0      0  [kworker/7:0H]
     57      2   1  ffff8801877b1b80  UN   0.0       0      0  [netns]
     58      2   0  ffff8801877b2940  UN   0.0       0      0  [kworker/0:1]
     62      2   2  ffff8801877b6040  UN   0.0       0      0  [writeback]
     66      2   3  ffff88041e881b80  UN   0.0       0      0  [crypto]
     67      2   5  ffff88041e882940  UN   0.0       0      0  [kintegrityd]
     68      2   6  ffff88041e883700  UN   0.0       0      0  [kblockd]
     69      2   2  ffff88041e8844c0  UN   0.0       0      0  [kworker/2:1]
     73      2   4  ffff88041e910dc0  UN   0.0       0      0  [edac-poller]
     74      2   7  ffff88041e911b80  UN   0.0       0      0  [devfreq_wq]
     75      2   5  ffff8801877b0000  UN   0.0       0      0  [watchdogd]
     91      2   3  ffff8803fa08d280  UN   0.0       0      0  [nvme-wq]
     92      2   0  ffff8803fa08e040  UN   0.0       0      0  [ipv6_addrconf]
    160      2   1  ffff88041e886e00  UN   0.0       0      0  [kaluad]
    163      2   5  ffff8803fa346e00  UN   0.0       0      0  [kmpath_rdacd]
    167      2   7  ffff8803fa346040  UN   0.0       0      0  [kmpathd]
    168      2   3  ffff8803fa345280  UN   0.0       0      0  [kmpath_handlerd]
    290      2   7  ffff8803f784a940  UN   0.0       0      0  [kworker/7:2]
    356      2   6  ffff8803f677ee00  UN   0.0       0      0  [ata_sff]
    526      2   3  ffff8803fc678000  UN   0.0       0      0  [scsi_tmf_0]
    528      2   0  ffff8803fc679b80  UN   0.0       0      0  [scsi_tmf_1]
    530      2   3  ffff8803fc67b700  UN   0.0       0      0  [scsi_tmf_2]
    532      2   7  ffff8803fc67d280  UN   0.0       0      0  [scsi_tmf_3]
    534      2   7  ffff8803fc67ee00  UN   0.0       0      0  [scsi_tmf_4]
    536      2   3  ffff8803f6848dc0  UN   0.0       0      0  [scsi_tmf_5]
    540      2   2  ffff8803f684c4c0  UN   0.0       0      0  [kworker/u16:6]
    541      2   5  ffff8803f684d280  UN   0.0       0      0  [nvkm-disp]
    543      2   4  ffff8803f684ee00  UN   0.0       0      0  [ttm_swap]
    546      2   3  ffff8803f3ae8000  UN   0.0       0      0  [kworker/3:1H]
    547      2   2  ffff8803f3ae8dc0  UN   0.0       0      0  [kworker/2:1H]
    550      2   6  ffff8803f3ae9b80  UN   0.0       0      0  [kworker/6:1H]
    552      2   5  ffff8803f3aeb700  UN   0.0       0      0  [scsi_tmf_6]
    591      2   1  ffff8803f3aeee00  UN   0.0       0      0  [ext4-rsv-conver]
    601      2   5  ffff8803f880ee00  UN   0.0       0      0  [kworker/5:1H]
    632      2   1  ffff8803fa023700  UN   0.0       0      0  [kworker/1:1H]
    643      2   0  ffff8803fa025280  UN   0.0       0      0  [kworker/0:1H]
    659      2   4  ffff8803f677d280  UN   0.0       0      0  [kworker/4:1H]
    663      2   7  ffff8803f6779b80  UN   0.0       0      0  [kworker/7:1H]
    701      2   0  ffff8803f9eea940  UN   0.0       0      0  [kworker/0:2]
    702      2   1  ffff8803f9eeb700  UN   0.0       0      0  [rpciod]
    703      2   7  ffff8803f9eee040  UN   0.0       0      0  [xprtiod]
   1020      2   4  ffff8803fc1c44c0  UN   0.0       0      0  [acpi_thermal_pm]
   1170      2   3  ffff8803f1708dc0  UN   0.0       0      0  [jbd2/sdd1-8]
   1171      2   6  ffff8803f170c4c0  UN   0.0       0      0  [ext4-rsv-conver]
   1176      2   3  ffff8803fac58000  UN   0.0       0      0  [ext4-rsv-conver]
   4233      2   5  ffff8803a8be8dc0  UN   0.0       0      0  [kworker/5:1]
   8808  10134   6  ffff88037e789b80  UN   0.0    8104   3248  make
  15603      2   5  ffff88041e886040  UN   0.0       0      0  [kworker/5:0]
  15815      2   1  ffff88037e788dc0  UN   0.0       0      0  [kworker/1:0]
  19344   8808   6  ffff8803c6658dc0  UN   0.0    7720   2960  make
  19633  19632   3  ffff88041e916e00  UN   0.0    9924   2296  gcc
  19651  19649   3  ffff8803a8bee040  UN   0.0    9896   2276  gcc
  19683  19682   3  ffff8803b6e29b80  UN   0.0    9904   2300  gcc
  19709  19687   2  ffff8803a8a41b80  UN   0.0    5852   1868  rm
  19728  19727   2  ffff8803a8be9b80  UN   0.0    9932   2256  gcc
  19732  19731   5  ffff880396dc0dc0  UN   0.0    9920   2296  gcc
  19739  19606   4  ffff8803f394d280  UN   0.0    4560   1608  rm
  19740      1   5  ffff8803b6ec2940  UN   0.1   49792  19732  cc1
  19744  19656   2  ffff880396ceb700  UN   0.0    4524   1828  recordmcount
  19922      1   3  ffff8803a8a40dc0  UN   0.1  388140  13996  pool
  20278      2   2  ffff880396dc2940  UN   0.0       0      0  [kworker/u16:1]
  20505      2   2  ffff8803b6eb1b80  UN   0.0       0      0  [kworker/2:2]
  20648      2   3  ffff88037e78a940  UN   0.0       0      0  [kworker/3:1]
  20766      2   6  ffff880187f11b80  UN   0.0       0      0  [kworker/u16:0]
  21007      2   6  ffff8803c97844c0  UN   0.0       0      0  [kworker/6:0]
  21385      2   4  ffff8803a8a46040  UN   0.0       0      0  [kworker/4:1]
  21562      2   1  ffff8803f7b61b80  UN   0.0       0      0  [kworker/1:2]
  21730      2   7  ffff8803d6525280  UN   0.0       0      0  [kworker/7:3]
  22775      2   4  ffff880396dbe040  UN   0.0       0      0  [kworker/4:2]
  22879      2   6  ffff880396db8000  UN   0.0       0      0  [kworker/6:1]
  22937      2   7  ffff880396dbc4c0  UN   0.0       0      0  [kworker/7:0]
  22952      2   4  ffff880396dbee00  UN   0.0       0      0  [kworker/u16:2]
  23004      2   5  ffff880396dbb700  UN   0.0       0      0  [kworker/5:2]
  23070      2   1  ffff8803d6520dc0  UN   0.0       0      0  [kworker/1:1]
  23152   4478   2  ffff8803a8a444c0  UN   0.0    5748   1656  sync
  31874      2   3  ffff88018740c4c0  UN   0.0       0      0  [kworker/3:0]


> BTW, the only other commit in that series that affects bfq interaction
> with the rest of the system is:
> a52a69ea89dc block, bfq: limit tags for writes and async I/O

I'll see if that changes things on the side. ï¿œWork beckons.

> And the other commits that do something beyond changing some calculation are:
> 0d52af590552 block, bfq: release oom-queue ref to root group on exit
> 52257ffbfcaf block, bfq: put async queues for root bfq groups too
> 
> Thanks,
> Paolo
> 
> > 	-Mike
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-13  6:42                     ` Mike Galbraith
@ 2018-02-13  6:50                       ` Mike Galbraith
  2018-02-13  8:15                         ` Paolo Valente
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Galbraith @ 2018-02-13  6:50 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Oleksandr Natalenko, Jens Axboe, stable

Oops, meant to provide the dmesg spew.

[ 1954.780829] INFO: task kworker/7:2:290 blocked for more than 480 seconds.
[ 1954.780837]       Tainted: G            E    4.15.3-smp #8
[ 1954.780839] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1954.780843] kworker/7:2     D    0   290      2 0x80000000
[ 1954.780858] Workqueue: events_freezable_power_ disk_events_workfn
[ 1954.780862] Call Trace:
[ 1954.780876]  ? __schedule+0x268/0x840
[ 1954.780883]  schedule+0x32/0x80
[ 1954.780891]  io_schedule+0x12/0x40
[ 1954.780896]  blk_mq_get_tag+0x11e/0x230
[ 1954.780902]  ? blk_mq_sched_dispatch_requests+0x108/0x180
[ 1954.780910]  ? remove_wait_queue+0x60/0x60
[ 1954.780920]  blk_mq_get_request+0xcc/0x340
[ 1954.780927]  blk_mq_alloc_request+0x5a/0xa0
[ 1954.780933]  blk_get_request_flags+0x44/0x170
[ 1954.780955]  scsi_execute+0x38/0x1e0 [scsi_mod]
[ 1954.780971]  scsi_test_unit_ready+0x9c/0xe0 [scsi_mod]
[ 1954.780981]  sd_check_events+0xf7/0x170 [sd_mod]
[ 1954.780988]  disk_check_events+0x55/0x120
[ 1954.780997]  process_one_work+0x147/0x350
[ 1954.781003]  worker_thread+0x47/0x3e0
[ 1954.781009]  kthread+0xf8/0x130
[ 1954.781015]  ? rescuer_thread+0x360/0x360
[ 1954.781019]  ? kthread_stop+0x120/0x120
[ 1954.781025]  ret_from_fork+0x35/0x40
[ 1954.781033] INFO: task kworker/u16:6:540 blocked for more than 480 seconds.
[ 1954.781037]       Tainted: G            E    4.15.3-smp #8
[ 1954.781039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1954.781042] kworker/u16:6   D    0   540      2 0x80000000
[ 1954.781053] Workqueue: writeback wb_workfn (flush-8:48)
[ 1954.781057] Call Trace:
[ 1954.781064]  ? __schedule+0x268/0x840
[ 1954.781071]  schedule+0x32/0x80
[ 1954.781076]  io_schedule+0x12/0x40
[ 1954.781081]  blk_mq_get_tag+0x11e/0x230
[ 1954.781087]  ? remove_wait_queue+0x60/0x60
[ 1954.781094]  blk_mq_get_request+0xcc/0x340
[ 1954.781099]  blk_mq_make_request+0xf8/0x580
[ 1954.781105]  generic_make_request+0xfd/0x2d0
[ 1954.781110]  submit_bio+0x5e/0x120
[ 1954.781151]  ext4_io_submit+0x48/0x60 [ext4]
[ 1954.781178]  ext4_bio_write_page+0x199/0x3a0 [ext4]
[ 1954.781202]  mpage_submit_page+0x51/0x70 [ext4]
[ 1954.781223]  mpage_map_and_submit_buffers+0x109/0x1d0 [ext4]
[ 1954.781248]  ext4_writepages+0x831/0xe50 [ext4]
[ 1954.781255]  ? bfq_deactivate_entity+0x21/0x50
[ 1954.781263]  ? cpumask_next_and+0x2b/0x40
[ 1954.781272]  do_writepages+0x1a/0x70
[ 1954.781279]  __writeback_single_inode+0x3d/0x340
[ 1954.781286]  writeback_sb_inodes+0x21f/0x4c0
[ 1954.781294]  __writeback_inodes_wb+0x87/0xb0
[ 1954.781300]  wb_writeback+0x258/0x2f0
[ 1954.781306]  ? cpumask_next+0x16/0x20
[ 1954.781313]  wb_workfn+0x2f0/0x400
[ 1954.781319]  process_one_work+0x147/0x350
[ 1954.781325]  worker_thread+0x47/0x3e0
[ 1954.781330]  kthread+0xf8/0x130
[ 1954.781336]  ? rescuer_thread+0x360/0x360
[ 1954.781340]  ? kthread_stop+0x120/0x120
[ 1954.781345]  ret_from_fork+0x35/0x40
[ 1954.781356] INFO: task jbd2/sdd1-8:1170 blocked for more than 480 seconds.
[ 1954.781359]       Tainted: G            E    4.15.3-smp #8
[ 1954.781361] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1954.781364] jbd2/sdd1-8     D    0  1170      2 0x80000000
[ 1954.781368] Call Trace:
[ 1954.781375]  ? __schedule+0x268/0x840
[ 1954.781381]  ? __wake_up_common_lock+0x77/0x90
[ 1954.781387]  schedule+0x32/0x80
[ 1954.781401]  jbd2_journal_commit_transaction+0x262/0x1b50 [jbd2]
[ 1954.781409]  ? rb_erase_cached+0x31b/0x390
[ 1954.781415]  ? pick_next_task_fair+0x58d/0x8c0
[ 1954.781424]  ? __switch_to+0x98/0x3f0
[ 1954.781430]  ? remove_wait_queue+0x60/0x60
[ 1954.781441]  kjournald2+0xb1/0x230 [jbd2]
[ 1954.781447]  ? remove_wait_queue+0x60/0x60
[ 1954.781452]  kthread+0xf8/0x130
[ 1954.781460]  ? commit_timeout+0x10/0x10 [jbd2]
[ 1954.781464]  ? kthread_stop+0x120/0x120
[ 1954.781469]  ret_from_fork+0x35/0x40
[ 1954.781493] INFO: task pool:19922 blocked for more than 480 seconds.
[ 1954.781498]       Tainted: G            E    4.15.3-smp #8
[ 1954.781502] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1954.781506] pool            D    0 19922      1 0x00000000
[ 1954.781512] Call Trace:
[ 1954.781520]  ? __schedule+0x268/0x840
[ 1954.781526]  schedule+0x32/0x80
[ 1954.781531]  schedule_timeout+0x1d2/0x2e0
[ 1954.781538]  wait_for_completion+0xff/0x160
[ 1954.781543]  ? wake_up_q+0x70/0x70
[ 1954.781549]  flush_work+0x109/0x180
[ 1954.781554]  ? wake_up_worker+0x30/0x30
[ 1954.781560]  __cancel_work_timer+0xe3/0x160
[ 1954.781569]  ? kobj_lookup+0x111/0x160
[ 1954.781574]  ? disk_map_sector_rcu+0x70/0x70
[ 1954.781580]  disk_block_events+0x6f/0x90
[ 1954.781587]  __blkdev_get+0x5a/0x470
[ 1954.781592]  ? bd_acquire+0xd0/0xd0
[ 1954.781597]  blkdev_get+0x195/0x2e0
[ 1954.781605]  ? __inode_permission+0x2d/0x110
[ 1954.781609]  ? bd_acquire+0x37/0xd0
[ 1954.781613]  ? bd_acquire+0xd0/0xd0
[ 1954.781619]  do_dentry_open.isra.20+0x1ee/0x300
[ 1954.781625]  path_openat+0x28a/0x1050
[ 1954.781631]  do_filp_open+0x7e/0xd0
[ 1954.781640]  ? read_null+0x10/0x10
[ 1954.781646]  ? _cond_resched+0x15/0x30
[ 1954.781653]  do_sys_open+0x116/0x1e0
[ 1954.781659]  do_syscall_64+0x75/0x1a0
[ 1954.781665]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 1954.781670] RIP: 0033:0x7f81a021d44d
[ 1954.781673] RSP: 002b:00007f819d542b30 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
[ 1954.781679] RAX: ffffffffffffffda RBX: 000000000234b0c0 RCX: 00007f81a021d44d
[ 1954.781682] RDX: 00007f819fc62f03 RSI: 0000000000000800 RDI: 00000000023606b0
[ 1954.781685] RBP: 00007f8190011ec0 R08: 0000000000000073 R09: 0000000000000003
[ 1954.781689] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f819d542cc8
[ 1954.781692] R13: 00007f819d542bf0 R14: 00007f8190011ec0 R15: 0000000000000001
[ 1954.781735] INFO: task gcc:19633 blocked for more than 480 seconds.
[ 1954.781738]       Tainted: G            E    4.15.3-smp #8
[ 1954.781741] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1954.781743] gcc             D    0 19633  19632 0x00000000
[ 1954.781748] Call Trace:
[ 1954.781755]  ? __schedule+0x268/0x840
[ 1954.781761]  schedule+0x32/0x80
[ 1954.781766]  io_schedule+0x12/0x40
[ 1954.781771]  blk_mq_get_tag+0x11e/0x230
[ 1954.781779]  ? elv_merge+0x66/0xe0
[ 1954.781786]  ? remove_wait_queue+0x60/0x60
[ 1954.781796]  blk_mq_get_request+0xcc/0x340
[ 1954.781803]  blk_mq_make_request+0xf8/0x580
[ 1954.781810]  generic_make_request+0xfd/0x2d0
[ 1954.781815]  submit_bio+0x5e/0x120
[ 1954.781835]  ? __ext4_journal_stop+0x37/0xa0 [ext4]
[ 1954.781857]  ext4_io_submit+0x48/0x60 [ext4]
[ 1954.781878]  ext4_writepages+0x64a/0xe50 [ext4]
[ 1954.781888]  do_writepages+0x1a/0x70
[ 1954.781893]  __filemap_fdatawrite_range+0x71/0x90
[ 1954.781915]  ext4_rename+0x600/0x830 [ext4]
[ 1954.781923]  ? legitimize_path.isra.54+0x28/0x50
[ 1954.781929]  ? terminate_walk+0x55/0xb0
[ 1954.781935]  vfs_rename+0x1df/0x6d0
[ 1954.781942]  ? __d_lookup+0x111/0x130
[ 1954.781948]  SyS_rename+0x325/0x390
[ 1954.781954]  do_syscall_64+0x75/0x1a0
[ 1954.781960]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 1954.781964] RIP: 0033:0x147b429b7a67
[ 1954.781967] RSP: 002b:00007ffca7063be8 EFLAGS: 00000202 ORIG_RAX: 0000000000000052
[ 1954.781971] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 0000147b429b7a67
[ 1954.781975] RDX: 0000000000000000 RSI: 000000000212e200 RDI: 000000000212f1c0
[ 1954.781978] RBP: 000000000211a390 R08: 000000000212f310 R09: 0000147b433d4700
[ 1954.781981] R10: 0000000000000000 R11: 0000000000000202 R12: 000000000212f230
[ 1954.781984] R13: 000000000212e200 R14: 0000000000000000 R15: 0000000000000006
[ 1954.781990] INFO: task gcc:19651 blocked for more than 480 seconds.
[ 1954.781993]       Tainted: G            E    4.15.3-smp #8
[ 1954.781995] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1954.781998] gcc             D    0 19651  19649 0x00000000
[ 1954.782002] Call Trace:
[ 1954.782008]  ? __schedule+0x268/0x840
[ 1954.782014]  schedule+0x32/0x80
[ 1954.782020]  io_schedule+0x12/0x40
[ 1954.782025]  blk_mq_get_tag+0x11e/0x230
[ 1954.782030]  ? elv_merge+0x66/0xe0
[ 1954.782036]  ? remove_wait_queue+0x60/0x60
[ 1954.782042]  blk_mq_get_request+0xcc/0x340
[ 1954.782047]  blk_mq_make_request+0xf8/0x580
[ 1954.782053]  generic_make_request+0xfd/0x2d0
[ 1954.782058]  submit_bio+0x5e/0x120
[ 1954.782076]  ? __ext4_journal_stop+0x37/0xa0 [ext4]
[ 1954.782095]  ext4_io_submit+0x48/0x60 [ext4]
[ 1954.782114]  ext4_writepages+0x64a/0xe50 [ext4]
[ 1954.782123]  do_writepages+0x1a/0x70
[ 1954.782128]  __filemap_fdatawrite_range+0x71/0x90
[ 1954.782147]  ext4_rename+0x600/0x830 [ext4]
[ 1954.782154]  ? legitimize_path.isra.54+0x28/0x50
[ 1954.782160]  ? terminate_walk+0x55/0xb0
[ 1954.782165]  vfs_rename+0x1df/0x6d0
[ 1954.782172]  ? __d_lookup+0x111/0x130
[ 1954.782177]  SyS_rename+0x325/0x390
[ 1954.782183]  do_syscall_64+0x75/0x1a0
[ 1954.782188]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 1954.782192] RIP: 0033:0x146b9f33ca67
[ 1954.782194] RSP: 002b:00007fffe33828b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000052
[ 1954.782199] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 0000146b9f33ca67
[ 1954.782202] RDX: 0000000000000000 RSI: 000000000233b910 RDI: 000000000233c110
[ 1954.782205] RBP: 000000000232a390 R08: 000000000233c260 R09: 0000146b9fd59700
[ 1954.782208] R10: 0000000000000000 R11: 0000000000000202 R12: 000000000233c180
[ 1954.782211] R13: 000000000233b910 R14: 0000000000000000 R15: 0000000000000006
[ 1954.782217] INFO: task gcc:19683 blocked for more than 480 seconds.
[ 1954.782220]       Tainted: G            E    4.15.3-smp #8
[ 1954.782222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1954.782225] gcc             D    0 19683  19682 0x00000000
[ 1954.782229] Call Trace:
[ 1954.782236]  ? __schedule+0x268/0x840
[ 1954.782242]  schedule+0x32/0x80
[ 1954.782247]  io_schedule+0x12/0x40
[ 1954.782252]  blk_mq_get_tag+0x11e/0x230
[ 1954.782257]  ? elv_merge+0x66/0xe0
[ 1954.782262]  ? remove_wait_queue+0x60/0x60
[ 1954.782269]  blk_mq_get_request+0xcc/0x340
[ 1954.782274]  blk_mq_make_request+0xf8/0x580
[ 1954.782279]  generic_make_request+0xfd/0x2d0
[ 1954.782284]  submit_bio+0x5e/0x120
[ 1954.782302]  ? __ext4_journal_stop+0x37/0xa0 [ext4]
[ 1954.782320]  ext4_io_submit+0x48/0x60 [ext4]
[ 1954.782338]  ext4_writepages+0x64a/0xe50 [ext4]
[ 1954.782347]  do_writepages+0x1a/0x70
[ 1954.782352]  __filemap_fdatawrite_range+0x71/0x90
[ 1954.782370]  ext4_rename+0x600/0x830 [ext4]
[ 1954.782376]  ? legitimize_path.isra.54+0x28/0x50
[ 1954.782382]  ? terminate_walk+0x55/0xb0
[ 1954.782388]  vfs_rename+0x1df/0x6d0
[ 1954.782394]  ? __d_lookup+0x111/0x130
[ 1954.782400]  SyS_rename+0x325/0x390
[ 1954.782405]  do_syscall_64+0x75/0x1a0
[ 1954.782411]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 1954.782414] RIP: 0033:0x14d933ff7a67
[ 1954.782417] RSP: 002b:00007fffc5db42c8 EFLAGS: 00000206 ORIG_RAX: 0000000000000052
[ 1954.782421] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 000014d933ff7a67
[ 1954.782424] RDX: 0000000000000000 RSI: 00000000016eba00 RDI: 00000000016e7b30
[ 1954.782427] RBP: 00000000016d6390 R08: 00000000016e7c80 R09: 000014d934a14700
[ 1954.782430] R10: 0000000000000000 R11: 0000000000000206 R12: 00000000016e7ba0
[ 1954.782433] R13: 00000000016eba00 R14: 0000000000000000 R15: 0000000000000006
[ 1954.782438] INFO: task rm:19709 blocked for more than 480 seconds.
[ 1954.782442]       Tainted: G            E    4.15.3-smp #8
[ 1954.782444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1954.782446] rm              D    0 19709  19687 0x00000000
[ 1954.782450] Call Trace:
[ 1954.782457]  ? __schedule+0x268/0x840
[ 1954.782463]  schedule+0x32/0x80
[ 1954.782471]  wait_transaction_locked+0x76/0xa0 [jbd2]
[ 1954.782478]  ? remove_wait_queue+0x60/0x60
[ 1954.782486]  add_transaction_credits+0x1a4/0x280 [jbd2]
[ 1954.782494]  start_this_handle+0x189/0x3e0 [jbd2]
[ 1954.782500]  ? _cond_resched+0x15/0x30
[ 1954.782505]  ? kmem_cache_alloc+0x142/0x150
[ 1954.782513]  jbd2__journal_start+0xe9/0x1f0 [jbd2]
[ 1954.782532]  ext4_unlink+0x1ac/0x330 [ext4]
[ 1954.782537]  vfs_unlink+0xb1/0x1a0
[ 1954.782542]  do_unlinkat+0x23b/0x280
[ 1954.782548]  do_syscall_64+0x75/0x1a0
[ 1954.782553]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 1954.782556] RIP: 0033:0x7fca75fcfba7
[ 1954.782559] RSP: 002b:00007ffdaa5b75a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[ 1954.782563] RAX: ffffffffffffffda RBX: 00000000011bf770 RCX: 00007fca75fcfba7
[ 1954.782566] RDX: 0000000000000000 RSI: 00000000011bf878 RDI: 0000000000000006
[ 1954.782569] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 1954.782573] R10: 000000000000004a R11: 0000000000000246 R12: 00000000011b3fdc
[ 1954.782576] R13: 00007ffdaa5b7700 R14: 00000000011bf770 R15: 0000000000000000
[ 1954.782581] INFO: task gcc:19728 blocked for more than 480 seconds.
[ 1954.782584]       Tainted: G            E    4.15.3-smp #8
[ 1954.782586] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1954.782588] gcc             D    0 19728  19727 0x00000000
[ 1954.782592] Call Trace:
[ 1954.782599]  ? __schedule+0x268/0x840
[ 1954.782603]  ? filename_parentat+0xc9/0x120
[ 1954.782609]  schedule+0x32/0x80
[ 1954.782614]  rwsem_down_write_failed+0x1e9/0x360
[ 1954.782620]  call_rwsem_down_write_failed+0x13/0x20
[ 1954.782627]  down_write+0x29/0x40
[ 1954.782632]  filename_create+0x66/0x150
[ 1954.782637]  SyS_symlink+0x4e/0xc0
[ 1954.782642]  do_syscall_64+0x75/0x1a0
[ 1954.782647]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 1954.782650] RIP: 0033:0x147656b2cab7
[ 1954.782653] RSP: 002b:00007fff10c42988 EFLAGS: 00000206 ORIG_RAX: 0000000000000058
[ 1954.782657] RAX: ffffffffffffffda RBX: 0000000000643250 RCX: 0000147656b2cab7
[ 1954.782661] RDX: 0000000000643390 RSI: 0000000000643250 RDI: 0000000000643390
[ 1954.782664] RBP: 0000000000000000 R08: 3836333633343800 R09: 0000000000658be0
[ 1954.782667] R10: 00000000ba5a914c R11: 0000000000000206 R12: 0000000000643390
[ 1954.782670] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000006
[ 1954.782675] INFO: task gcc:19732 blocked for more than 480 seconds.
[ 1954.782678]       Tainted: G            E    4.15.3-smp #8
[ 1954.782680] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1954.782683] gcc             D    0 19732  19731 0x00000000
[ 1954.782687] Call Trace:
[ 1954.782693]  ? __schedule+0x268/0x840
[ 1954.782699]  schedule+0x32/0x80
[ 1954.782704]  io_schedule+0x12/0x40
[ 1954.782709]  blk_mq_get_tag+0x11e/0x230
[ 1954.782715]  ? elv_merge+0x66/0xe0
[ 1954.782720]  ? remove_wait_queue+0x60/0x60
[ 1954.782727]  blk_mq_get_request+0xcc/0x340
[ 1954.782732]  blk_mq_make_request+0xf8/0x580
[ 1954.782737]  generic_make_request+0xfd/0x2d0
[ 1954.782741]  submit_bio+0x5e/0x120
[ 1954.782759]  ? __ext4_journal_stop+0x37/0xa0 [ext4]
[ 1954.782777]  ext4_io_submit+0x48/0x60 [ext4]
[ 1954.782795]  ext4_writepages+0x64a/0xe50 [ext4]
[ 1954.782805]  do_writepages+0x1a/0x70
[ 1954.782809]  __filemap_fdatawrite_range+0x71/0x90
[ 1954.782828]  ext4_rename+0x600/0x830 [ext4]
[ 1954.782834]  ? legitimize_path.isra.54+0x28/0x50
[ 1954.782840]  ? terminate_walk+0x55/0xb0
[ 1954.782846]  vfs_rename+0x1df/0x6d0
[ 1954.782853]  ? __d_lookup+0x111/0x130
[ 1954.782858]  SyS_rename+0x325/0x390
[ 1954.782864]  do_syscall_64+0x75/0x1a0
[ 1954.782869]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 1954.782872] RIP: 0033:0x15499c5b4a67
[ 1954.782875] RSP: 002b:00007fff60bd92b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000052
[ 1954.782879] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 000015499c5b4a67
[ 1954.782882] RDX: 0000000000000000 RSI: 0000000001966b60 RDI: 00000000019673c0
[ 1954.782886] RBP: 000000000194d2e0 R08: 0000000001967510 R09: 000015499cfd1700
[ 1954.782888] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000001967430
[ 1954.782891] R13: 0000000001966b60 R14: 0000000000000000 R15: 0000000000000006

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-13  6:50                       ` Mike Galbraith
@ 2018-02-13  8:15                         ` Paolo Valente
  2018-02-13 14:30                           ` Mike Galbraith
  0 siblings, 1 reply; 54+ messages in thread
From: Paolo Valente @ 2018-02-13  8:15 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Oleksandr Natalenko, Jens Axboe, stable



> Il giorno 13 feb 2018, alle ore 07:50, Mike Galbraith <efault@gmx.de> ha scritto:
> 
> Oops, meant to provide the dmesg spew.
> 
> [ 1954.780829] INFO: task kworker/7:2:290 blocked for more than 480 seconds.
> [ 1954.780837]       Tainted: G            E    4.15.3-smp #8
> [ 1954.780839] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1954.780843] kworker/7:2     D    0   290      2 0x80000000
> [ 1954.780858] Workqueue: events_freezable_power_ disk_events_workfn
> [ 1954.780862] Call Trace:
> [ 1954.780876]  ? __schedule+0x268/0x840
> [ 1954.780883]  schedule+0x32/0x80
> [ 1954.780891]  io_schedule+0x12/0x40
> [ 1954.780896]  blk_mq_get_tag+0x11e/0x230
> [ 1954.780902]  ? blk_mq_sched_dispatch_requests+0x108/0x180
> [ 1954.780910]  ? remove_wait_queue+0x60/0x60
> [ 1954.780920]  blk_mq_get_request+0xcc/0x340
> [ 1954.780927]  blk_mq_alloc_request+0x5a/0xa0
> [ 1954.780933]  blk_get_request_flags+0x44/0x170
> [ 1954.780955]  scsi_execute+0x38/0x1e0 [scsi_mod]
> [ 1954.780971]  scsi_test_unit_ready+0x9c/0xe0 [scsi_mod]
> [ 1954.780981]  sd_check_events+0xf7/0x170 [sd_mod]
> [ 1954.780988]  disk_check_events+0x55/0x120
> [ 1954.780997]  process_one_work+0x147/0x350
> [ 1954.781003]  worker_thread+0x47/0x3e0
> [ 1954.781009]  kthread+0xf8/0x130
> [ 1954.781015]  ? rescuer_thread+0x360/0x360
> [ 1954.781019]  ? kthread_stop+0x120/0x120
> [ 1954.781025]  ret_from_fork+0x35/0x40
> [ 1954.781033] INFO: task kworker/u16:6:540 blocked for more than 480 seconds.
> [ 1954.781037]       Tainted: G            E    4.15.3-smp #8
> [ 1954.781039] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1954.781042] kworker/u16:6   D    0   540      2 0x80000000
> [ 1954.781053] Workqueue: writeback wb_workfn (flush-8:48)
> [ 1954.781057] Call Trace:
> [ 1954.781064]  ? __schedule+0x268/0x840
> [ 1954.781071]  schedule+0x32/0x80
> [ 1954.781076]  io_schedule+0x12/0x40
> [ 1954.781081]  blk_mq_get_tag+0x11e/0x230
> [ 1954.781087]  ? remove_wait_queue+0x60/0x60
> [ 1954.781094]  blk_mq_get_request+0xcc/0x340
> [ 1954.781099]  blk_mq_make_request+0xf8/0x580
> [ 1954.781105]  generic_make_request+0xfd/0x2d0
> [ 1954.781110]  submit_bio+0x5e/0x120
> [ 1954.781151]  ext4_io_submit+0x48/0x60 [ext4]
> [ 1954.781178]  ext4_bio_write_page+0x199/0x3a0 [ext4]
> [ 1954.781202]  mpage_submit_page+0x51/0x70 [ext4]
> [ 1954.781223]  mpage_map_and_submit_buffers+0x109/0x1d0 [ext4]
> [ 1954.781248]  ext4_writepages+0x831/0xe50 [ext4]
> [ 1954.781255]  ? bfq_deactivate_entity+0x21/0x50
> [ 1954.781263]  ? cpumask_next_and+0x2b/0x40
> [ 1954.781272]  do_writepages+0x1a/0x70
> [ 1954.781279]  __writeback_single_inode+0x3d/0x340
> [ 1954.781286]  writeback_sb_inodes+0x21f/0x4c0
> [ 1954.781294]  __writeback_inodes_wb+0x87/0xb0
> [ 1954.781300]  wb_writeback+0x258/0x2f0
> [ 1954.781306]  ? cpumask_next+0x16/0x20
> [ 1954.781313]  wb_workfn+0x2f0/0x400
> [ 1954.781319]  process_one_work+0x147/0x350
> [ 1954.781325]  worker_thread+0x47/0x3e0
> [ 1954.781330]  kthread+0xf8/0x130
> [ 1954.781336]  ? rescuer_thread+0x360/0x360
> [ 1954.781340]  ? kthread_stop+0x120/0x120
> [ 1954.781345]  ret_from_fork+0x35/0x40
> [ 1954.781356] INFO: task jbd2/sdd1-8:1170 blocked for more than 480 seconds.
> [ 1954.781359]       Tainted: G            E    4.15.3-smp #8
> [ 1954.781361] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1954.781364] jbd2/sdd1-8     D    0  1170      2 0x80000000
> [ 1954.781368] Call Trace:
> [ 1954.781375]  ? __schedule+0x268/0x840
> [ 1954.781381]  ? __wake_up_common_lock+0x77/0x90
> [ 1954.781387]  schedule+0x32/0x80
> [ 1954.781401]  jbd2_journal_commit_transaction+0x262/0x1b50 [jbd2]
> [ 1954.781409]  ? rb_erase_cached+0x31b/0x390
> [ 1954.781415]  ? pick_next_task_fair+0x58d/0x8c0
> [ 1954.781424]  ? __switch_to+0x98/0x3f0
> [ 1954.781430]  ? remove_wait_queue+0x60/0x60
> [ 1954.781441]  kjournald2+0xb1/0x230 [jbd2]
> [ 1954.781447]  ? remove_wait_queue+0x60/0x60
> [ 1954.781452]  kthread+0xf8/0x130
> [ 1954.781460]  ? commit_timeout+0x10/0x10 [jbd2]
> [ 1954.781464]  ? kthread_stop+0x120/0x120
> [ 1954.781469]  ret_from_fork+0x35/0x40
> [ 1954.781493] INFO: task pool:19922 blocked for more than 480 seconds.
> [ 1954.781498]       Tainted: G            E    4.15.3-smp #8
> [ 1954.781502] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1954.781506] pool            D    0 19922      1 0x00000000
> [ 1954.781512] Call Trace:
> [ 1954.781520]  ? __schedule+0x268/0x840
> [ 1954.781526]  schedule+0x32/0x80
> [ 1954.781531]  schedule_timeout+0x1d2/0x2e0
> [ 1954.781538]  wait_for_completion+0xff/0x160
> [ 1954.781543]  ? wake_up_q+0x70/0x70
> [ 1954.781549]  flush_work+0x109/0x180
> [ 1954.781554]  ? wake_up_worker+0x30/0x30
> [ 1954.781560]  __cancel_work_timer+0xe3/0x160
> [ 1954.781569]  ? kobj_lookup+0x111/0x160
> [ 1954.781574]  ? disk_map_sector_rcu+0x70/0x70
> [ 1954.781580]  disk_block_events+0x6f/0x90
> [ 1954.781587]  __blkdev_get+0x5a/0x470
> [ 1954.781592]  ? bd_acquire+0xd0/0xd0
> [ 1954.781597]  blkdev_get+0x195/0x2e0
> [ 1954.781605]  ? __inode_permission+0x2d/0x110
> [ 1954.781609]  ? bd_acquire+0x37/0xd0
> [ 1954.781613]  ? bd_acquire+0xd0/0xd0
> [ 1954.781619]  do_dentry_open.isra.20+0x1ee/0x300
> [ 1954.781625]  path_openat+0x28a/0x1050
> [ 1954.781631]  do_filp_open+0x7e/0xd0
> [ 1954.781640]  ? read_null+0x10/0x10
> [ 1954.781646]  ? _cond_resched+0x15/0x30
> [ 1954.781653]  do_sys_open+0x116/0x1e0
> [ 1954.781659]  do_syscall_64+0x75/0x1a0
> [ 1954.781665]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> [ 1954.781670] RIP: 0033:0x7f81a021d44d
> [ 1954.781673] RSP: 002b:00007f819d542b30 EFLAGS: 00000293 ORIG_RAX: 0000000000000002
> [ 1954.781679] RAX: ffffffffffffffda RBX: 000000000234b0c0 RCX: 00007f81a021d44d
> [ 1954.781682] RDX: 00007f819fc62f03 RSI: 0000000000000800 RDI: 00000000023606b0
> [ 1954.781685] RBP: 00007f8190011ec0 R08: 0000000000000073 R09: 0000000000000003
> [ 1954.781689] R10: 0000000000000000 R11: 0000000000000293 R12: 00007f819d542cc8
> [ 1954.781692] R13: 00007f819d542bf0 R14: 00007f8190011ec0 R15: 0000000000000001
> [ 1954.781735] INFO: task gcc:19633 blocked for more than 480 seconds.
> [ 1954.781738]       Tainted: G            E    4.15.3-smp #8
> [ 1954.781741] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1954.781743] gcc             D    0 19633  19632 0x00000000
> [ 1954.781748] Call Trace:
> [ 1954.781755]  ? __schedule+0x268/0x840
> [ 1954.781761]  schedule+0x32/0x80
> [ 1954.781766]  io_schedule+0x12/0x40
> [ 1954.781771]  blk_mq_get_tag+0x11e/0x230
> [ 1954.781779]  ? elv_merge+0x66/0xe0
> [ 1954.781786]  ? remove_wait_queue+0x60/0x60
> [ 1954.781796]  blk_mq_get_request+0xcc/0x340
> [ 1954.781803]  blk_mq_make_request+0xf8/0x580
> [ 1954.781810]  generic_make_request+0xfd/0x2d0
> [ 1954.781815]  submit_bio+0x5e/0x120
> [ 1954.781835]  ? __ext4_journal_stop+0x37/0xa0 [ext4]
> [ 1954.781857]  ext4_io_submit+0x48/0x60 [ext4]
> [ 1954.781878]  ext4_writepages+0x64a/0xe50 [ext4]
> [ 1954.781888]  do_writepages+0x1a/0x70
> [ 1954.781893]  __filemap_fdatawrite_range+0x71/0x90
> [ 1954.781915]  ext4_rename+0x600/0x830 [ext4]
> [ 1954.781923]  ? legitimize_path.isra.54+0x28/0x50
> [ 1954.781929]  ? terminate_walk+0x55/0xb0
> [ 1954.781935]  vfs_rename+0x1df/0x6d0
> [ 1954.781942]  ? __d_lookup+0x111/0x130
> [ 1954.781948]  SyS_rename+0x325/0x390
> [ 1954.781954]  do_syscall_64+0x75/0x1a0
> [ 1954.781960]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> [ 1954.781964] RIP: 0033:0x147b429b7a67
> [ 1954.781967] RSP: 002b:00007ffca7063be8 EFLAGS: 00000202 ORIG_RAX: 0000000000000052
> [ 1954.781971] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 0000147b429b7a67
> [ 1954.781975] RDX: 0000000000000000 RSI: 000000000212e200 RDI: 000000000212f1c0
> [ 1954.781978] RBP: 000000000211a390 R08: 000000000212f310 R09: 0000147b433d4700
> [ 1954.781981] R10: 0000000000000000 R11: 0000000000000202 R12: 000000000212f230
> [ 1954.781984] R13: 000000000212e200 R14: 0000000000000000 R15: 0000000000000006
> [ 1954.781990] INFO: task gcc:19651 blocked for more than 480 seconds.
> [ 1954.781993]       Tainted: G            E    4.15.3-smp #8
> [ 1954.781995] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1954.781998] gcc             D    0 19651  19649 0x00000000
> [ 1954.782002] Call Trace:
> [ 1954.782008]  ? __schedule+0x268/0x840
> [ 1954.782014]  schedule+0x32/0x80
> [ 1954.782020]  io_schedule+0x12/0x40
> [ 1954.782025]  blk_mq_get_tag+0x11e/0x230
> [ 1954.782030]  ? elv_merge+0x66/0xe0
> [ 1954.782036]  ? remove_wait_queue+0x60/0x60
> [ 1954.782042]  blk_mq_get_request+0xcc/0x340
> [ 1954.782047]  blk_mq_make_request+0xf8/0x580
> [ 1954.782053]  generic_make_request+0xfd/0x2d0
> [ 1954.782058]  submit_bio+0x5e/0x120
> [ 1954.782076]  ? __ext4_journal_stop+0x37/0xa0 [ext4]
> [ 1954.782095]  ext4_io_submit+0x48/0x60 [ext4]
> [ 1954.782114]  ext4_writepages+0x64a/0xe50 [ext4]
> [ 1954.782123]  do_writepages+0x1a/0x70
> [ 1954.782128]  __filemap_fdatawrite_range+0x71/0x90
> [ 1954.782147]  ext4_rename+0x600/0x830 [ext4]
> [ 1954.782154]  ? legitimize_path.isra.54+0x28/0x50
> [ 1954.782160]  ? terminate_walk+0x55/0xb0
> [ 1954.782165]  vfs_rename+0x1df/0x6d0
> [ 1954.782172]  ? __d_lookup+0x111/0x130
> [ 1954.782177]  SyS_rename+0x325/0x390
> [ 1954.782183]  do_syscall_64+0x75/0x1a0
> [ 1954.782188]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> [ 1954.782192] RIP: 0033:0x146b9f33ca67
> [ 1954.782194] RSP: 002b:00007fffe33828b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000052
> [ 1954.782199] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 0000146b9f33ca67
> [ 1954.782202] RDX: 0000000000000000 RSI: 000000000233b910 RDI: 000000000233c110
> [ 1954.782205] RBP: 000000000232a390 R08: 000000000233c260 R09: 0000146b9fd59700
> [ 1954.782208] R10: 0000000000000000 R11: 0000000000000202 R12: 000000000233c180
> [ 1954.782211] R13: 000000000233b910 R14: 0000000000000000 R15: 0000000000000006
> [ 1954.782217] INFO: task gcc:19683 blocked for more than 480 seconds.
> [ 1954.782220]       Tainted: G            E    4.15.3-smp #8
> [ 1954.782222] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1954.782225] gcc             D    0 19683  19682 0x00000000
> [ 1954.782229] Call Trace:
> [ 1954.782236]  ? __schedule+0x268/0x840
> [ 1954.782242]  schedule+0x32/0x80
> [ 1954.782247]  io_schedule+0x12/0x40
> [ 1954.782252]  blk_mq_get_tag+0x11e/0x230
> [ 1954.782257]  ? elv_merge+0x66/0xe0
> [ 1954.782262]  ? remove_wait_queue+0x60/0x60
> [ 1954.782269]  blk_mq_get_request+0xcc/0x340
> [ 1954.782274]  blk_mq_make_request+0xf8/0x580
> [ 1954.782279]  generic_make_request+0xfd/0x2d0
> [ 1954.782284]  submit_bio+0x5e/0x120
> [ 1954.782302]  ? __ext4_journal_stop+0x37/0xa0 [ext4]
> [ 1954.782320]  ext4_io_submit+0x48/0x60 [ext4]
> [ 1954.782338]  ext4_writepages+0x64a/0xe50 [ext4]
> [ 1954.782347]  do_writepages+0x1a/0x70
> [ 1954.782352]  __filemap_fdatawrite_range+0x71/0x90
> [ 1954.782370]  ext4_rename+0x600/0x830 [ext4]
> [ 1954.782376]  ? legitimize_path.isra.54+0x28/0x50
> [ 1954.782382]  ? terminate_walk+0x55/0xb0
> [ 1954.782388]  vfs_rename+0x1df/0x6d0
> [ 1954.782394]  ? __d_lookup+0x111/0x130
> [ 1954.782400]  SyS_rename+0x325/0x390
> [ 1954.782405]  do_syscall_64+0x75/0x1a0
> [ 1954.782411]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> [ 1954.782414] RIP: 0033:0x14d933ff7a67
> [ 1954.782417] RSP: 002b:00007fffc5db42c8 EFLAGS: 00000206 ORIG_RAX: 0000000000000052
> [ 1954.782421] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 000014d933ff7a67
> [ 1954.782424] RDX: 0000000000000000 RSI: 00000000016eba00 RDI: 00000000016e7b30
> [ 1954.782427] RBP: 00000000016d6390 R08: 00000000016e7c80 R09: 000014d934a14700
> [ 1954.782430] R10: 0000000000000000 R11: 0000000000000206 R12: 00000000016e7ba0
> [ 1954.782433] R13: 00000000016eba00 R14: 0000000000000000 R15: 0000000000000006
> [ 1954.782438] INFO: task rm:19709 blocked for more than 480 seconds.
> [ 1954.782442]       Tainted: G            E    4.15.3-smp #8
> [ 1954.782444] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1954.782446] rm              D    0 19709  19687 0x00000000
> [ 1954.782450] Call Trace:
> [ 1954.782457]  ? __schedule+0x268/0x840
> [ 1954.782463]  schedule+0x32/0x80
> [ 1954.782471]  wait_transaction_locked+0x76/0xa0 [jbd2]
> [ 1954.782478]  ? remove_wait_queue+0x60/0x60
> [ 1954.782486]  add_transaction_credits+0x1a4/0x280 [jbd2]
> [ 1954.782494]  start_this_handle+0x189/0x3e0 [jbd2]
> [ 1954.782500]  ? _cond_resched+0x15/0x30
> [ 1954.782505]  ? kmem_cache_alloc+0x142/0x150
> [ 1954.782513]  jbd2__journal_start+0xe9/0x1f0 [jbd2]
> [ 1954.782532]  ext4_unlink+0x1ac/0x330 [ext4]
> [ 1954.782537]  vfs_unlink+0xb1/0x1a0
> [ 1954.782542]  do_unlinkat+0x23b/0x280
> [ 1954.782548]  do_syscall_64+0x75/0x1a0
> [ 1954.782553]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> [ 1954.782556] RIP: 0033:0x7fca75fcfba7
> [ 1954.782559] RSP: 002b:00007ffdaa5b75a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
> [ 1954.782563] RAX: ffffffffffffffda RBX: 00000000011bf770 RCX: 00007fca75fcfba7
> [ 1954.782566] RDX: 0000000000000000 RSI: 00000000011bf878 RDI: 0000000000000006
> [ 1954.782569] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ 1954.782573] R10: 000000000000004a R11: 0000000000000246 R12: 00000000011b3fdc
> [ 1954.782576] R13: 00007ffdaa5b7700 R14: 00000000011bf770 R15: 0000000000000000
> [ 1954.782581] INFO: task gcc:19728 blocked for more than 480 seconds.
> [ 1954.782584]       Tainted: G            E    4.15.3-smp #8
> [ 1954.782586] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1954.782588] gcc             D    0 19728  19727 0x00000000
> [ 1954.782592] Call Trace:
> [ 1954.782599]  ? __schedule+0x268/0x840
> [ 1954.782603]  ? filename_parentat+0xc9/0x120
> [ 1954.782609]  schedule+0x32/0x80
> [ 1954.782614]  rwsem_down_write_failed+0x1e9/0x360
> [ 1954.782620]  call_rwsem_down_write_failed+0x13/0x20
> [ 1954.782627]  down_write+0x29/0x40
> [ 1954.782632]  filename_create+0x66/0x150
> [ 1954.782637]  SyS_symlink+0x4e/0xc0
> [ 1954.782642]  do_syscall_64+0x75/0x1a0
> [ 1954.782647]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> [ 1954.782650] RIP: 0033:0x147656b2cab7
> [ 1954.782653] RSP: 002b:00007fff10c42988 EFLAGS: 00000206 ORIG_RAX: 0000000000000058
> [ 1954.782657] RAX: ffffffffffffffda RBX: 0000000000643250 RCX: 0000147656b2cab7
> [ 1954.782661] RDX: 0000000000643390 RSI: 0000000000643250 RDI: 0000000000643390
> [ 1954.782664] RBP: 0000000000000000 R08: 3836333633343800 R09: 0000000000658be0
> [ 1954.782667] R10: 00000000ba5a914c R11: 0000000000000206 R12: 0000000000643390
> [ 1954.782670] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000006
> [ 1954.782675] INFO: task gcc:19732 blocked for more than 480 seconds.
> [ 1954.782678]       Tainted: G            E    4.15.3-smp #8
> [ 1954.782680] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 1954.782683] gcc             D    0 19732  19731 0x00000000
> [ 1954.782687] Call Trace:
> [ 1954.782693]  ? __schedule+0x268/0x840
> [ 1954.782699]  schedule+0x32/0x80
> [ 1954.782704]  io_schedule+0x12/0x40
> [ 1954.782709]  blk_mq_get_tag+0x11e/0x230
> [ 1954.782715]  ? elv_merge+0x66/0xe0
> [ 1954.782720]  ? remove_wait_queue+0x60/0x60
> [ 1954.782727]  blk_mq_get_request+0xcc/0x340
> [ 1954.782732]  blk_mq_make_request+0xf8/0x580
> [ 1954.782737]  generic_make_request+0xfd/0x2d0
> [ 1954.782741]  submit_bio+0x5e/0x120
> [ 1954.782759]  ? __ext4_journal_stop+0x37/0xa0 [ext4]
> [ 1954.782777]  ext4_io_submit+0x48/0x60 [ext4]
> [ 1954.782795]  ext4_writepages+0x64a/0xe50 [ext4]
> [ 1954.782805]  do_writepages+0x1a/0x70
> [ 1954.782809]  __filemap_fdatawrite_range+0x71/0x90
> [ 1954.782828]  ext4_rename+0x600/0x830 [ext4]
> [ 1954.782834]  ? legitimize_path.isra.54+0x28/0x50
> [ 1954.782840]  ? terminate_walk+0x55/0xb0
> [ 1954.782846]  vfs_rename+0x1df/0x6d0
> [ 1954.782853]  ? __d_lookup+0x111/0x130
> [ 1954.782858]  SyS_rename+0x325/0x390
> [ 1954.782864]  do_syscall_64+0x75/0x1a0
> [ 1954.782869]  entry_SYSCALL_64_after_hwframe+0x21/0x86
> [ 1954.782872] RIP: 0033:0x15499c5b4a67
> [ 1954.782875] RSP: 002b:00007fff60bd92b8 EFLAGS: 00000202 ORIG_RAX: 0000000000000052
> [ 1954.782879] RAX: ffffffffffffffda RBX: 000000000000001d RCX: 000015499c5b4a67
> [ 1954.782882] RDX: 0000000000000000 RSI: 0000000001966b60 RDI: 00000000019673c0
> [ 1954.782886] RBP: 000000000194d2e0 R08: 0000000001967510 R09: 000015499cfd1700
> [ 1954.782888] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000001967430
> [ 1954.782891] R13: 0000000001966b60 R14: 0000000000000000 R15: 0000000000000006

No bfq in the trace, so probably yet another problem stemming from the
interaction between bfq and blk-mq.  Hung tasks seem to wait
indefinitely for a tag, and the only bfq patch that affects tag
distribution is:
a52a69ea89dc block, bfq: limit tags for writes and async I/O

So, for the next step (and as you already decided to do), please retry
after adding the above commit to your initial pair:
patches/block-bfq:-fix-occurrences-of-request-finish-method-s-old-name.patch
patches/block-bfq:-add-requeue-request-hook.patch

Thanks,
Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-13  8:15                         ` Paolo Valente
@ 2018-02-13 14:30                           ` Mike Galbraith
  2018-02-14  5:58                             ` Mike Galbraith
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Galbraith @ 2018-02-13 14:30 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Oleksandr Natalenko, Jens Axboe, stable

On Tue, 2018-02-13 at 09:15 +0100, Paolo Valente wrote:
> 
> 
> No bfq in the trace, so probably yet another problem stemming from the
> interaction between bfq and blk-mq.  Hung tasks seem to wait
> indefinitely for a tag, and the only bfq patch that affects tag
> distribution is:
> a52a69ea89dc block, bfq: limit tags for writes and async I/O
> 
> So, for the next step (and as you already decided to do), please retry
> after adding the above commit to your initial pair:
> patches/block-bfq:-fix-occurrences-of-request-finish-method-s-old-name.patch
> patches/block-bfq:-add-requeue-request-hook.patch

I did it a bit differently, revert that commit in master, and skip it
in 4.15.3 with the otherwise full set first, and saw no hangs testing
either. ï¿œI then added it to the 2 patch series, and was rewarded
with... no bloody hang. ï¿œSo it's not as deterministic as it had
appeared to be, there's a phase-of-moon factor. ï¿œOh. Joy.

	-Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-13 14:30                           ` Mike Galbraith
@ 2018-02-14  5:58                             ` Mike Galbraith
  2018-02-14  7:04                               ` Mike Galbraith
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Galbraith @ 2018-02-14  5:58 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Oleksandr Natalenko, Jens Axboe, stable

On Tue, 2018-02-13 at 15:30 +0100, Mike Galbraith wrote:
> On Tue, 2018-02-13 at 09:15 +0100, Paolo Valente wrote:
> > 
> > 
> > No bfq in the trace, so probably yet another problem stemming from the
> > interaction between bfq and blk-mq.  Hung tasks seem to wait
> > indefinitely for a tag, and the only bfq patch that affects tag
> > distribution is:
> > a52a69ea89dc block, bfq: limit tags for writes and async I/O
> > 
> > So, for the next step (and as you already decided to do), please retry
> > after adding the above commit to your initial pair:
> > patches/block-bfq:-fix-occurrences-of-request-finish-method-s-old-name.patch
> > patches/block-bfq:-add-requeue-request-hook.patch
> 
> I did it a bit differently, revert that commit in master, and skip it
> in 4.15.3 with the otherwise full set first, and saw no hangs testing
> either. ï¿œI then added it to the 2 patch series, and was rewarded
> with... no bloody hang. ï¿œSo it's not as deterministic as it had
> appeared to be, there's a phase-of-moon factor. ï¿œOh. Joy.

Bah. ï¿œI'm back on 4.15.3 with the full set, and it's holding up fine
too, so file this report under "elves 'n' gremlins".

	-Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-14  5:58                             ` Mike Galbraith
@ 2018-02-14  7:04                               ` Mike Galbraith
  2018-02-14  7:15                                 ` Mike Galbraith
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Galbraith @ 2018-02-14  7:04 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Oleksandr Natalenko, Jens Axboe, stable

On Wed, 2018-02-14 at 06:58 +0100, Mike Galbraith wrote:
> On Tue, 2018-02-13 at 15:30 +0100, Mike Galbraith wrote:
> > On Tue, 2018-02-13 at 09:15 +0100, Paolo Valente wrote:
> > > 
> > > 
> > > No bfq in the trace, so probably yet another problem stemming from the
> > > interaction between bfq and blk-mq.  Hung tasks seem to wait
> > > indefinitely for a tag, and the only bfq patch that affects tag
> > > distribution is:
> > > a52a69ea89dc block, bfq: limit tags for writes and async I/O
> > > 
> > > So, for the next step (and as you already decided to do), please retry
> > > after adding the above commit to your initial pair:
> > > patches/block-bfq:-fix-occurrences-of-request-finish-method-s-old-name.patch
> > > patches/block-bfq:-add-requeue-request-hook.patch
> > 
> > I did it a bit differently, revert that commit in master, and skip it
> > in 4.15.3 with the otherwise full set first, and saw no hangs testing
> > either. ï¿œI then added it to the 2 patch series, and was rewarded
> > with... no bloody hang. ï¿œSo it's not as deterministic as it had
> > appeared to be, there's a phase-of-moon factor. ï¿œOh. Joy.
> 
> Bah. ï¿œI'm back on 4.15.3 with the full set, and it's holding up fine
> too, so file this report under "elves 'n' gremlins".

And _of course_, roughly two minutes later, IO stalled.  Ok gremlin,
back to deadline I go, scoot, go wreck somebody else's day ;-)

	-Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-14  7:04                               ` Mike Galbraith
@ 2018-02-14  7:15                                 ` Mike Galbraith
  2018-02-14  8:56                                   ` Paolo Valente
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Galbraith @ 2018-02-14  7:15 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Oleksandr Natalenko, Jens Axboe, stable

On Wed, 2018-02-14 at 08:04 +0100, Mike Galbraith wrote:
> 
> And _of course_, roughly two minutes later, IO stalled.

P.S.

crash> bt 19117
PID: 19117  TASK: ffff8803d2dcd280  CPU: 7   COMMAND: "kworker/7:2"
 #0 [ffff8803f7207bb8] __schedule at ffffffff81595e18
 #1 [ffff8803f7207c40] schedule at ffffffff81596422
 #2 [ffff8803f7207c50] io_schedule at ffffffff8108a832
 #3 [ffff8803f7207c60] blk_mq_get_tag at ffffffff8129cd1e
 #4 [ffff8803f7207cc0] blk_mq_get_request at ffffffff812987cc
 #5 [ffff8803f7207d00] blk_mq_alloc_request at ffffffff81298a9a
 #6 [ffff8803f7207d38] blk_get_request_flags at ffffffff8128e674
 #7 [ffff8803f7207d60] scsi_execute at ffffffffa0025b58 [scsi_mod]
 #8 [ffff8803f7207d98] scsi_test_unit_ready at ffffffffa002611c [scsi_mod]
 #9 [ffff8803f7207df8] sd_check_events at ffffffffa0212747 [sd_mod]
#10 [ffff8803f7207e20] disk_check_events at ffffffff812a0f85
#11 [ffff8803f7207e78] process_one_work at ffffffff81079867
#12 [ffff8803f7207eb8] worker_thread at ffffffff8107a127
#13 [ffff8803f7207f10] kthread at ffffffff8107ef48
#14 [ffff8803f7207f50] ret_from_fork at ffffffff816001a5
crash>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-14  7:15                                 ` Mike Galbraith
@ 2018-02-14  8:56                                   ` Paolo Valente
  2018-02-14  9:55                                     ` Oleksandr Natalenko
  2018-02-14 15:19                                     ` Jens Axboe
  0 siblings, 2 replies; 54+ messages in thread
From: Paolo Valente @ 2018-02-14  8:56 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Oleksandr Natalenko, Jens Axboe, stable



> Il giorno 14 feb 2018, alle ore 08:15, Mike Galbraith <efault@gmx.de> ha scritto:
> 
> On Wed, 2018-02-14 at 08:04 +0100, Mike Galbraith wrote:
>> 
>> And _of course_, roughly two minutes later, IO stalled.
> 
> P.S.
> 
> crash> bt 19117
> PID: 19117  TASK: ffff8803d2dcd280  CPU: 7   COMMAND: "kworker/7:2"
> #0 [ffff8803f7207bb8] __schedule at ffffffff81595e18
> #1 [ffff8803f7207c40] schedule at ffffffff81596422
> #2 [ffff8803f7207c50] io_schedule at ffffffff8108a832
> #3 [ffff8803f7207c60] blk_mq_get_tag at ffffffff8129cd1e
> #4 [ffff8803f7207cc0] blk_mq_get_request at ffffffff812987cc
> #5 [ffff8803f7207d00] blk_mq_alloc_request at ffffffff81298a9a
> #6 [ffff8803f7207d38] blk_get_request_flags at ffffffff8128e674
> #7 [ffff8803f7207d60] scsi_execute at ffffffffa0025b58 [scsi_mod]
> #8 [ffff8803f7207d98] scsi_test_unit_ready at ffffffffa002611c [scsi_mod]
> #9 [ffff8803f7207df8] sd_check_events at ffffffffa0212747 [sd_mod]
> #10 [ffff8803f7207e20] disk_check_events at ffffffff812a0f85
> #11 [ffff8803f7207e78] process_one_work at ffffffff81079867
> #12 [ffff8803f7207eb8] worker_thread at ffffffff8107a127
> #13 [ffff8803f7207f10] kthread at ffffffff8107ef48
> #14 [ffff8803f7207f50] ret_from_fork at ffffffff816001a5
> crash>

This has evidently to do with tag pressure.  I've looked for a way to
easily reduce the number of tags online, so as to put your system in
the bad spot deterministically.  But at no avail.  Does anyone know a
way to do it?

Thanks,
Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-14  8:56                                   ` Paolo Valente
@ 2018-02-14  9:55                                     ` Oleksandr Natalenko
  2018-02-14 13:25                                       ` Mike Galbraith
  2018-02-14 15:19                                     ` Jens Axboe
  1 sibling, 1 reply; 54+ messages in thread
From: Oleksandr Natalenko @ 2018-02-14  9:55 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Mike Galbraith, Jens Axboe, stable

Hi.

14.02.2018 09:56, Paolo Valente wrote:
> This has evidently to do with tag pressure.  I've looked for a way to
> easily reduce the number of tags online, so as to put your system in
> the bad spot deterministically.  But at no avail.  Does anyone know a
> way to do it?

Isn't this a case for block device queue_depth? IIRC, the amount of tags 
is derived from it.

Oleksandr

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-14  9:55                                     ` Oleksandr Natalenko
@ 2018-02-14 13:25                                       ` Mike Galbraith
  0 siblings, 0 replies; 54+ messages in thread
From: Mike Galbraith @ 2018-02-14 13:25 UTC (permalink / raw)
  To: Oleksandr Natalenko, Paolo Valente; +Cc: Jens Axboe, stable

On Wed, 2018-02-14 at 10:55 +0100, Oleksandr Natalenko wrote:
> Hi.
> 
> 14.02.2018 09:56, Paolo Valente wrote:
> > This has evidently to do with tag pressure.  I've looked for a way to
> > easily reduce the number of tags online, so as to put your system in
> > the bad spot deterministically.  But at no avail.  Does anyone know a
> > way to do it?
> 
> Isn't this a case for block device queue_depth? IIRC, the amount of tags 
> is derived from it.

/sys/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/queue_depth
/sys/devices/pci0000:00/0000:00:1f.2/ata6/host5/target5:0:0/5:0:0:0/queue_depth
/sys/devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/queue_depth
/sys/devices/pci0000:00/0000:00:1f.2/ata5/host4/target4:0:0/4:0:0:0/queue_depth

FWIW, these guys are all 31.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-14  8:56                                   ` Paolo Valente
  2018-02-14  9:55                                     ` Oleksandr Natalenko
@ 2018-02-14 15:19                                     ` Jens Axboe
  2018-02-14 15:39                                       ` Paolo Valente
  1 sibling, 1 reply; 54+ messages in thread
From: Jens Axboe @ 2018-02-14 15:19 UTC (permalink / raw)
  To: Paolo Valente, Mike Galbraith; +Cc: Oleksandr Natalenko, stable

On 2/14/18 1:56 AM, Paolo Valente wrote:
> 
> 
>> Il giorno 14 feb 2018, alle ore 08:15, Mike Galbraith <efault@gmx.de> ha scritto:
>>
>> On Wed, 2018-02-14 at 08:04 +0100, Mike Galbraith wrote:
>>>
>>> And _of course_, roughly two minutes later, IO stalled.
>>
>> P.S.
>>
>> crash> bt 19117
>> PID: 19117  TASK: ffff8803d2dcd280  CPU: 7   COMMAND: "kworker/7:2"
>> #0 [ffff8803f7207bb8] __schedule at ffffffff81595e18
>> #1 [ffff8803f7207c40] schedule at ffffffff81596422
>> #2 [ffff8803f7207c50] io_schedule at ffffffff8108a832
>> #3 [ffff8803f7207c60] blk_mq_get_tag at ffffffff8129cd1e
>> #4 [ffff8803f7207cc0] blk_mq_get_request at ffffffff812987cc
>> #5 [ffff8803f7207d00] blk_mq_alloc_request at ffffffff81298a9a
>> #6 [ffff8803f7207d38] blk_get_request_flags at ffffffff8128e674
>> #7 [ffff8803f7207d60] scsi_execute at ffffffffa0025b58 [scsi_mod]
>> #8 [ffff8803f7207d98] scsi_test_unit_ready at ffffffffa002611c [scsi_mod]
>> #9 [ffff8803f7207df8] sd_check_events at ffffffffa0212747 [sd_mod]
>> #10 [ffff8803f7207e20] disk_check_events at ffffffff812a0f85
>> #11 [ffff8803f7207e78] process_one_work at ffffffff81079867
>> #12 [ffff8803f7207eb8] worker_thread at ffffffff8107a127
>> #13 [ffff8803f7207f10] kthread at ffffffff8107ef48
>> #14 [ffff8803f7207f50] ret_from_fork at ffffffff816001a5
>> crash>
> 
> This has evidently to do with tag pressure.  I've looked for a way to
> easily reduce the number of tags online, so as to put your system in
> the bad spot deterministically.  But at no avail.  Does anyone know a
> way to do it?

The key here might be that it's not a regular file system request,
which I'm sure bfq probably handles differently. So it's possible
that you are slowly leaking those tags, and we end up in this
miserable situation after a while.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-14 15:19                                     ` Jens Axboe
@ 2018-02-14 15:39                                       ` Paolo Valente
  2018-02-14 15:44                                         ` Jens Axboe
  0 siblings, 1 reply; 54+ messages in thread
From: Paolo Valente @ 2018-02-14 15:39 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Mike Galbraith, Oleksandr Natalenko, stable



> Il giorno 14 feb 2018, alle ore 16:19, Jens Axboe <axboe@kernel.dk> ha scritto:
> 
> On 2/14/18 1:56 AM, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 14 feb 2018, alle ore 08:15, Mike Galbraith <efault@gmx.de> ha scritto:
>>> 
>>> On Wed, 2018-02-14 at 08:04 +0100, Mike Galbraith wrote:
>>>> 
>>>> And _of course_, roughly two minutes later, IO stalled.
>>> 
>>> P.S.
>>> 
>>> crash> bt 19117
>>> PID: 19117  TASK: ffff8803d2dcd280  CPU: 7   COMMAND: "kworker/7:2"
>>> #0 [ffff8803f7207bb8] __schedule at ffffffff81595e18
>>> #1 [ffff8803f7207c40] schedule at ffffffff81596422
>>> #2 [ffff8803f7207c50] io_schedule at ffffffff8108a832
>>> #3 [ffff8803f7207c60] blk_mq_get_tag at ffffffff8129cd1e
>>> #4 [ffff8803f7207cc0] blk_mq_get_request at ffffffff812987cc
>>> #5 [ffff8803f7207d00] blk_mq_alloc_request at ffffffff81298a9a
>>> #6 [ffff8803f7207d38] blk_get_request_flags at ffffffff8128e674
>>> #7 [ffff8803f7207d60] scsi_execute at ffffffffa0025b58 [scsi_mod]
>>> #8 [ffff8803f7207d98] scsi_test_unit_ready at ffffffffa002611c [scsi_mod]
>>> #9 [ffff8803f7207df8] sd_check_events at ffffffffa0212747 [sd_mod]
>>> #10 [ffff8803f7207e20] disk_check_events at ffffffff812a0f85
>>> #11 [ffff8803f7207e78] process_one_work at ffffffff81079867
>>> #12 [ffff8803f7207eb8] worker_thread at ffffffff8107a127
>>> #13 [ffff8803f7207f10] kthread at ffffffff8107ef48
>>> #14 [ffff8803f7207f50] ret_from_fork at ffffffff816001a5
>>> crash>
>> 
>> This has evidently to do with tag pressure.  I've looked for a way to
>> easily reduce the number of tags online, so as to put your system in
>> the bad spot deterministically.  But at no avail.  Does anyone know a
>> way to do it?
> 
> The key here might be that it's not a regular file system request,
> which I'm sure bfq probably handles differently. So it's possible
> that you are slowly leaking those tags, and we end up in this
> miserable situation after a while.
> 

Could you elaborate more on this?  My mental model of bfq hooks in
this respect is that they do only side operations, which AFAIK cannot
block the putting of a tag.  IOW, tag getting and putting is done
outside bfq, regardless of what bfq does with I/O requests.  Is there
a flaw in this?

In any case, is there any flag in or the like, in requests passed to
bfq, that I could make bfq check, to raise some warning?

Thanks,
Paolo

> -- 
> Jens Axboe

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-14 15:39                                       ` Paolo Valente
@ 2018-02-14 15:44                                         ` Jens Axboe
  2018-02-15 18:13                                           ` Paolo Valente
  2018-02-17  5:30                                           ` Mike Galbraith
  0 siblings, 2 replies; 54+ messages in thread
From: Jens Axboe @ 2018-02-14 15:44 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Mike Galbraith, Oleksandr Natalenko, stable

On 2/14/18 8:39 AM, Paolo Valente wrote:
> 
> 
>> Il giorno 14 feb 2018, alle ore 16:19, Jens Axboe <axboe@kernel.dk> ha scritto:
>>
>> On 2/14/18 1:56 AM, Paolo Valente wrote:
>>>
>>>
>>>> Il giorno 14 feb 2018, alle ore 08:15, Mike Galbraith <efault@gmx.de> ha scritto:
>>>>
>>>> On Wed, 2018-02-14 at 08:04 +0100, Mike Galbraith wrote:
>>>>>
>>>>> And _of course_, roughly two minutes later, IO stalled.
>>>>
>>>> P.S.
>>>>
>>>> crash> bt 19117
>>>> PID: 19117  TASK: ffff8803d2dcd280  CPU: 7   COMMAND: "kworker/7:2"
>>>> #0 [ffff8803f7207bb8] __schedule at ffffffff81595e18
>>>> #1 [ffff8803f7207c40] schedule at ffffffff81596422
>>>> #2 [ffff8803f7207c50] io_schedule at ffffffff8108a832
>>>> #3 [ffff8803f7207c60] blk_mq_get_tag at ffffffff8129cd1e
>>>> #4 [ffff8803f7207cc0] blk_mq_get_request at ffffffff812987cc
>>>> #5 [ffff8803f7207d00] blk_mq_alloc_request at ffffffff81298a9a
>>>> #6 [ffff8803f7207d38] blk_get_request_flags at ffffffff8128e674
>>>> #7 [ffff8803f7207d60] scsi_execute at ffffffffa0025b58 [scsi_mod]
>>>> #8 [ffff8803f7207d98] scsi_test_unit_ready at ffffffffa002611c [scsi_mod]
>>>> #9 [ffff8803f7207df8] sd_check_events at ffffffffa0212747 [sd_mod]
>>>> #10 [ffff8803f7207e20] disk_check_events at ffffffff812a0f85
>>>> #11 [ffff8803f7207e78] process_one_work at ffffffff81079867
>>>> #12 [ffff8803f7207eb8] worker_thread at ffffffff8107a127
>>>> #13 [ffff8803f7207f10] kthread at ffffffff8107ef48
>>>> #14 [ffff8803f7207f50] ret_from_fork at ffffffff816001a5
>>>> crash>
>>>
>>> This has evidently to do with tag pressure.  I've looked for a way to
>>> easily reduce the number of tags online, so as to put your system in
>>> the bad spot deterministically.  But at no avail.  Does anyone know a
>>> way to do it?
>>
>> The key here might be that it's not a regular file system request,
>> which I'm sure bfq probably handles differently. So it's possible
>> that you are slowly leaking those tags, and we end up in this
>> miserable situation after a while.
>>
> 
> Could you elaborate more on this?  My mental model of bfq hooks in
> this respect is that they do only side operations, which AFAIK cannot
> block the putting of a tag.  IOW, tag getting and putting is done
> outside bfq, regardless of what bfq does with I/O requests.  Is there
> a flaw in this?
> 
> In any case, is there any flag in or the like, in requests passed to
> bfq, that I could make bfq check, to raise some warning?

I'm completely guessing, and I don't know if this trace is always what
Mike sees when things hang. It just seems suspect that we end up with a
"special" request here, since I'm sure the regular file system requests
outnumber them greatly. That raises my suspicion that the type is
related.

But no, there should be no special handling on the freeing side, my
guess was that BFQ ends them a bit differently.

Mike, when you see a hang like that, would it be possible for you to
dump the contents of /sys/kernel/debug/block/<dev in question/* for us
to inspect? That will tell us a lot about the internal state at that
time.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-14 15:44                                         ` Jens Axboe
@ 2018-02-15 18:13                                           ` Paolo Valente
  2018-02-16  5:39                                             ` Mike Galbraith
  2018-02-17  5:30                                           ` Mike Galbraith
  1 sibling, 1 reply; 54+ messages in thread
From: Paolo Valente @ 2018-02-15 18:13 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Mike Galbraith, Oleksandr Natalenko, stable



> Il giorno 14 feb 2018, alle ore 16:44, Jens Axboe <axboe@kernel.dk> ha scritto:
> 
> On 2/14/18 8:39 AM, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 14 feb 2018, alle ore 16:19, Jens Axboe <axboe@kernel.dk> ha scritto:
>>> 
>>> On 2/14/18 1:56 AM, Paolo Valente wrote:
>>>> 
>>>> 
>>>>> Il giorno 14 feb 2018, alle ore 08:15, Mike Galbraith <efault@gmx.de> ha scritto:
>>>>> 
>>>>> On Wed, 2018-02-14 at 08:04 +0100, Mike Galbraith wrote:
>>>>>> 
>>>>>> And _of course_, roughly two minutes later, IO stalled.
>>>>> 
>>>>> P.S.
>>>>> 
>>>>> crash> bt 19117
>>>>> PID: 19117  TASK: ffff8803d2dcd280  CPU: 7   COMMAND: "kworker/7:2"
>>>>> #0 [ffff8803f7207bb8] __schedule at ffffffff81595e18
>>>>> #1 [ffff8803f7207c40] schedule at ffffffff81596422
>>>>> #2 [ffff8803f7207c50] io_schedule at ffffffff8108a832
>>>>> #3 [ffff8803f7207c60] blk_mq_get_tag at ffffffff8129cd1e
>>>>> #4 [ffff8803f7207cc0] blk_mq_get_request at ffffffff812987cc
>>>>> #5 [ffff8803f7207d00] blk_mq_alloc_request at ffffffff81298a9a
>>>>> #6 [ffff8803f7207d38] blk_get_request_flags at ffffffff8128e674
>>>>> #7 [ffff8803f7207d60] scsi_execute at ffffffffa0025b58 [scsi_mod]
>>>>> #8 [ffff8803f7207d98] scsi_test_unit_ready at ffffffffa002611c [scsi_mod]
>>>>> #9 [ffff8803f7207df8] sd_check_events at ffffffffa0212747 [sd_mod]
>>>>> #10 [ffff8803f7207e20] disk_check_events at ffffffff812a0f85
>>>>> #11 [ffff8803f7207e78] process_one_work at ffffffff81079867
>>>>> #12 [ffff8803f7207eb8] worker_thread at ffffffff8107a127
>>>>> #13 [ffff8803f7207f10] kthread at ffffffff8107ef48
>>>>> #14 [ffff8803f7207f50] ret_from_fork at ffffffff816001a5
>>>>> crash>
>>>> 
>>>> This has evidently to do with tag pressure.  I've looked for a way to
>>>> easily reduce the number of tags online, so as to put your system in
>>>> the bad spot deterministically.  But at no avail.  Does anyone know a
>>>> way to do it?
>>> 
>>> The key here might be that it's not a regular file system request,
>>> which I'm sure bfq probably handles differently. So it's possible
>>> that you are slowly leaking those tags, and we end up in this
>>> miserable situation after a while.
>>> 
>> 
>> Could you elaborate more on this?  My mental model of bfq hooks in
>> this respect is that they do only side operations, which AFAIK cannot
>> block the putting of a tag.  IOW, tag getting and putting is done
>> outside bfq, regardless of what bfq does with I/O requests.  Is there
>> a flaw in this?
>> 
>> In any case, is there any flag in or the like, in requests passed to
>> bfq, that I could make bfq check, to raise some warning?
> 
> I'm completely guessing, and I don't know if this trace is always what
> Mike sees when things hang. It just seems suspect that we end up with a
> "special" request here, since I'm sure the regular file system requests
> outnumber them greatly. That raises my suspicion that the type is
> related.
> 
> But no, there should be no special handling on the freeing side, my
> guess was that BFQ ends them a bit differently.
> 

Hi Jens,
whatever the exact cause of leakage is, a leakage in its turn does
sound like a reasonable cause for these hangs.  But also if leakage is
the cause, it seems to me that reducing tags to just 1 might help
trigger the problem quickly and reliably on Mike's machine.  If you
agree, Jens, which would be the quickest/easiest way to reduce tags?

Thanks,
Paolo

> Mike, when you see a hang like that, would it be possible for you to
> dump the contents of /sys/kernel/debug/block/<dev in question/* for us
> to inspect? That will tell us a lot about the internal state at that
> time.
> 
> -- 
> Jens Axboe

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-15 18:13                                           ` Paolo Valente
@ 2018-02-16  5:39                                             ` Mike Galbraith
  2018-02-16  7:16                                               ` Paolo Valente
  2018-02-16  7:20                                               ` Paolo Valente
  0 siblings, 2 replies; 54+ messages in thread
From: Mike Galbraith @ 2018-02-16  5:39 UTC (permalink / raw)
  To: Paolo Valente, Jens Axboe; +Cc: Oleksandr Natalenko, stable

On Thu, 2018-02-15 at 19:13 +0100, Paolo Valente wrote:
> 
> > Il giorno 14 feb 2018, alle ore 16:44, Jens Axboe <axboe@kernel.dk> ha scritto:
> > 
> > On 2/14/18 8:39 AM, Paolo Valente wrote:
> >> 
> >> 
> >>> Il giorno 14 feb 2018, alle ore 16:19, Jens Axboe <axboe@kernel.dk> ha scritto:
> >>> 
> >>> On 2/14/18 1:56 AM, Paolo Valente wrote:
> >>>> 
> >>>> 
> >>>>> Il giorno 14 feb 2018, alle ore 08:15, Mike Galbraith <efault@gmx.de> ha scritto:
> >>>>> 
> >>>>> On Wed, 2018-02-14 at 08:04 +0100, Mike Galbraith wrote:
> >>>>>> 
> >>>>>> And _of course_, roughly two minutes later, IO stalled.
> >>>>> 
> >>>>> P.S.
> >>>>> 
> >>>>> crash> bt 19117
> >>>>> PID: 19117  TASK: ffff8803d2dcd280  CPU: 7   COMMAND: "kworker/7:2"
> >>>>> #0 [ffff8803f7207bb8] __schedule at ffffffff81595e18
> >>>>> #1 [ffff8803f7207c40] schedule at ffffffff81596422
> >>>>> #2 [ffff8803f7207c50] io_schedule at ffffffff8108a832
> >>>>> #3 [ffff8803f7207c60] blk_mq_get_tag at ffffffff8129cd1e
> >>>>> #4 [ffff8803f7207cc0] blk_mq_get_request at ffffffff812987cc
> >>>>> #5 [ffff8803f7207d00] blk_mq_alloc_request at ffffffff81298a9a
> >>>>> #6 [ffff8803f7207d38] blk_get_request_flags at ffffffff8128e674
> >>>>> #7 [ffff8803f7207d60] scsi_execute at ffffffffa0025b58 [scsi_mod]
> >>>>> #8 [ffff8803f7207d98] scsi_test_unit_ready at ffffffffa002611c [scsi_mod]
> >>>>> #9 [ffff8803f7207df8] sd_check_events at ffffffffa0212747 [sd_mod]
> >>>>> #10 [ffff8803f7207e20] disk_check_events at ffffffff812a0f85
> >>>>> #11 [ffff8803f7207e78] process_one_work at ffffffff81079867
> >>>>> #12 [ffff8803f7207eb8] worker_thread at ffffffff8107a127
> >>>>> #13 [ffff8803f7207f10] kthread at ffffffff8107ef48
> >>>>> #14 [ffff8803f7207f50] ret_from_fork at ffffffff816001a5
> >>>>> crash>
> >>>> 
> >>>> This has evidently to do with tag pressure.  I've looked for a way to
> >>>> easily reduce the number of tags online, so as to put your system in
> >>>> the bad spot deterministically.  But at no avail.  Does anyone know a
> >>>> way to do it?
> >>> 
> >>> The key here might be that it's not a regular file system request,
> >>> which I'm sure bfq probably handles differently. So it's possible
> >>> that you are slowly leaking those tags, and we end up in this
> >>> miserable situation after a while.
> >>> 
> >> 
> >> Could you elaborate more on this?  My mental model of bfq hooks in
> >> this respect is that they do only side operations, which AFAIK cannot
> >> block the putting of a tag.  IOW, tag getting and putting is done
> >> outside bfq, regardless of what bfq does with I/O requests.  Is there
> >> a flaw in this?
> >> 
> >> In any case, is there any flag in or the like, in requests passed to
> >> bfq, that I could make bfq check, to raise some warning?
> > 
> > I'm completely guessing, and I don't know if this trace is always what
> > Mike sees when things hang. It just seems suspect that we end up with a
> > "special" request here, since I'm sure the regular file system requests
> > outnumber them greatly. That raises my suspicion that the type is
> > related.
> > 
> > But no, there should be no special handling on the freeing side, my
> > guess was that BFQ ends them a bit differently.
> > 
> 
> Hi Jens,
> whatever the exact cause of leakage is, a leakage in its turn does
> sound like a reasonable cause for these hangs.  But also if leakage is
> the cause, it seems to me that reducing tags to just 1 might help
> trigger the problem quickly and reliably on Mike's machine.  If you
> agree, Jens, which would be the quickest/easiest way to reduce tags?

Whatever the cause, seems this wants some instrumentation that can be
left in place for a while. ï¿œI turned onï¿œCONFIG_BLK_DEBUG_FS for Jens,
but the little bugger didn't raise it's ugly head all day long.

What you need most is more reproducers. ï¿œMy box swears there's
something amiss, and that something is BFQ.. but it's alone.

	-Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-16  5:39                                             ` Mike Galbraith
@ 2018-02-16  7:16                                               ` Paolo Valente
  2018-02-16  7:20                                               ` Paolo Valente
  1 sibling, 0 replies; 54+ messages in thread
From: Paolo Valente @ 2018-02-16  7:16 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Jens Axboe, Oleksandr Natalenko, stable



> Il giorno 16 feb 2018, alle ore 06:39, Mike Galbraith <efault@gmx.de> ha scritto:
> 
> On Thu, 2018-02-15 at 19:13 +0100, Paolo Valente wrote:
>> 
>>> Il giorno 14 feb 2018, alle ore 16:44, Jens Axboe <axboe@kernel.dk> ha scritto:
>>> 
>>> On 2/14/18 8:39 AM, Paolo Valente wrote:
>>>> 
>>>> 
>>>>> Il giorno 14 feb 2018, alle ore 16:19, Jens Axboe <axboe@kernel.dk> ha scritto:
>>>>> 
>>>>> On 2/14/18 1:56 AM, Paolo Valente wrote:
>>>>>> 
>>>>>> 
>>>>>>> Il giorno 14 feb 2018, alle ore 08:15, Mike Galbraith <efault@gmx.de> ha scritto:
>>>>>>> 
>>>>>>> On Wed, 2018-02-14 at 08:04 +0100, Mike Galbraith wrote:
>>>>>>>> 
>>>>>>>> And _of course_, roughly two minutes later, IO stalled.
>>>>>>> 
>>>>>>> P.S.
>>>>>>> 
>>>>>>> crash> bt 19117
>>>>>>> PID: 19117  TASK: ffff8803d2dcd280  CPU: 7   COMMAND: "kworker/7:2"
>>>>>>> #0 [ffff8803f7207bb8] __schedule at ffffffff81595e18
>>>>>>> #1 [ffff8803f7207c40] schedule at ffffffff81596422
>>>>>>> #2 [ffff8803f7207c50] io_schedule at ffffffff8108a832
>>>>>>> #3 [ffff8803f7207c60] blk_mq_get_tag at ffffffff8129cd1e
>>>>>>> #4 [ffff8803f7207cc0] blk_mq_get_request at ffffffff812987cc
>>>>>>> #5 [ffff8803f7207d00] blk_mq_alloc_request at ffffffff81298a9a
>>>>>>> #6 [ffff8803f7207d38] blk_get_request_flags at ffffffff8128e674
>>>>>>> #7 [ffff8803f7207d60] scsi_execute at ffffffffa0025b58 [scsi_mod]
>>>>>>> #8 [ffff8803f7207d98] scsi_test_unit_ready at ffffffffa002611c [scsi_mod]
>>>>>>> #9 [ffff8803f7207df8] sd_check_events at ffffffffa0212747 [sd_mod]
>>>>>>> #10 [ffff8803f7207e20] disk_check_events at ffffffff812a0f85
>>>>>>> #11 [ffff8803f7207e78] process_one_work at ffffffff81079867
>>>>>>> #12 [ffff8803f7207eb8] worker_thread at ffffffff8107a127
>>>>>>> #13 [ffff8803f7207f10] kthread at ffffffff8107ef48
>>>>>>> #14 [ffff8803f7207f50] ret_from_fork at ffffffff816001a5
>>>>>>> crash>
>>>>>> 
>>>>>> This has evidently to do with tag pressure.  I've looked for a way to
>>>>>> easily reduce the number of tags online, so as to put your system in
>>>>>> the bad spot deterministically.  But at no avail.  Does anyone know a
>>>>>> way to do it?
>>>>> 
>>>>> The key here might be that it's not a regular file system request,
>>>>> which I'm sure bfq probably handles differently. So it's possible
>>>>> that you are slowly leaking those tags, and we end up in this
>>>>> miserable situation after a while.
>>>>> 
>>>> 
>>>> Could you elaborate more on this?  My mental model of bfq hooks in
>>>> this respect is that they do only side operations, which AFAIK cannot
>>>> block the putting of a tag.  IOW, tag getting and putting is done
>>>> outside bfq, regardless of what bfq does with I/O requests.  Is there
>>>> a flaw in this?
>>>> 
>>>> In any case, is there any flag in or the like, in requests passed to
>>>> bfq, that I could make bfq check, to raise some warning?
>>> 
>>> I'm completely guessing, and I don't know if this trace is always what
>>> Mike sees when things hang. It just seems suspect that we end up with a
>>> "special" request here, since I'm sure the regular file system requests
>>> outnumber them greatly. That raises my suspicion that the type is
>>> related.
>>> 
>>> But no, there should be no special handling on the freeing side, my
>>> guess was that BFQ ends them a bit differently.
>>> 
>> 
>> Hi Jens,
>> whatever the exact cause of leakage is, a leakage in its turn does
>> sound like a reasonable cause for these hangs.  But also if leakage is
>> the cause, it seems to me that reducing tags to just 1 might help
>> trigger the problem quickly and reliably on Mike's machine.  If you
>> agree, Jens, which would be the quickest/easiest way to reduce tags?
> 
> Whatever the cause, seems this wants some instrumentation that can be
> left in place for a while.  I turned on CONFIG_BLK_DEBUG_FS for Jens,
> but the little bugger didn't raise it's ugly head all day long.
> 
> What you need most is more reproducers.

Yeah, which is however out of our control.  That's why I'm nagging
Jens, and other knowledgeable people, for that trick to hopefully push
your system to quick failure.

Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-16  5:39                                             ` Mike Galbraith
  2018-02-16  7:16                                               ` Paolo Valente
@ 2018-02-16  7:20                                               ` Paolo Valente
  1 sibling, 0 replies; 54+ messages in thread
From: Paolo Valente @ 2018-02-16  7:20 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Jens Axboe, Oleksandr Natalenko, stable



> Il giorno 16 feb 2018, alle ore 06:39, Mike Galbraith <efault@gmx.de> ha scritto:
> 
> On Thu, 2018-02-15 at 19:13 +0100, Paolo Valente wrote:
>> 
>>> Il giorno 14 feb 2018, alle ore 16:44, Jens Axboe <axboe@kernel.dk> ha scritto:
>>> 
>>> On 2/14/18 8:39 AM, Paolo Valente wrote:
>>>> 
>>>> 
>>>>> Il giorno 14 feb 2018, alle ore 16:19, Jens Axboe <axboe@kernel.dk> ha scritto:
>>>>> 
>>>>> On 2/14/18 1:56 AM, Paolo Valente wrote:
>>>>>> 
>>>>>> 
>>>>>>> Il giorno 14 feb 2018, alle ore 08:15, Mike Galbraith <efault@gmx.de> ha scritto:
>>>>>>> 
>>>>>>> On Wed, 2018-02-14 at 08:04 +0100, Mike Galbraith wrote:
>>>>>>>> 
>>>>>>>> And _of course_, roughly two minutes later, IO stalled.
>>>>>>> 
>>>>>>> P.S.
>>>>>>> 
>>>>>>> crash> bt 19117
>>>>>>> PID: 19117  TASK: ffff8803d2dcd280  CPU: 7   COMMAND: "kworker/7:2"
>>>>>>> #0 [ffff8803f7207bb8] __schedule at ffffffff81595e18
>>>>>>> #1 [ffff8803f7207c40] schedule at ffffffff81596422
>>>>>>> #2 [ffff8803f7207c50] io_schedule at ffffffff8108a832
>>>>>>> #3 [ffff8803f7207c60] blk_mq_get_tag at ffffffff8129cd1e
>>>>>>> #4 [ffff8803f7207cc0] blk_mq_get_request at ffffffff812987cc
>>>>>>> #5 [ffff8803f7207d00] blk_mq_alloc_request at ffffffff81298a9a
>>>>>>> #6 [ffff8803f7207d38] blk_get_request_flags at ffffffff8128e674
>>>>>>> #7 [ffff8803f7207d60] scsi_execute at ffffffffa0025b58 [scsi_mod]
>>>>>>> #8 [ffff8803f7207d98] scsi_test_unit_ready at ffffffffa002611c [scsi_mod]
>>>>>>> #9 [ffff8803f7207df8] sd_check_events at ffffffffa0212747 [sd_mod]
>>>>>>> #10 [ffff8803f7207e20] disk_check_events at ffffffff812a0f85
>>>>>>> #11 [ffff8803f7207e78] process_one_work at ffffffff81079867
>>>>>>> #12 [ffff8803f7207eb8] worker_thread at ffffffff8107a127
>>>>>>> #13 [ffff8803f7207f10] kthread at ffffffff8107ef48
>>>>>>> #14 [ffff8803f7207f50] ret_from_fork at ffffffff816001a5
>>>>>>> crash>
>>>>>> 
>>>>>> This has evidently to do with tag pressure.  I've looked for a way to
>>>>>> easily reduce the number of tags online, so as to put your system in
>>>>>> the bad spot deterministically.  But at no avail.  Does anyone know a
>>>>>> way to do it?
>>>>> 
>>>>> The key here might be that it's not a regular file system request,
>>>>> which I'm sure bfq probably handles differently. So it's possible
>>>>> that you are slowly leaking those tags, and we end up in this
>>>>> miserable situation after a while.
>>>>> 
>>>> 
>>>> Could you elaborate more on this?  My mental model of bfq hooks in
>>>> this respect is that they do only side operations, which AFAIK cannot
>>>> block the putting of a tag.  IOW, tag getting and putting is done
>>>> outside bfq, regardless of what bfq does with I/O requests.  Is there
>>>> a flaw in this?
>>>> 
>>>> In any case, is there any flag in or the like, in requests passed to
>>>> bfq, that I could make bfq check, to raise some warning?
>>> 
>>> I'm completely guessing, and I don't know if this trace is always what
>>> Mike sees when things hang. It just seems suspect that we end up with a
>>> "special" request here, since I'm sure the regular file system requests
>>> outnumber them greatly. That raises my suspicion that the type is
>>> related.
>>> 
>>> But no, there should be no special handling on the freeing side, my
>>> guess was that BFQ ends them a bit differently.
>>> 
>> 
>> Hi Jens,
>> whatever the exact cause of leakage is, a leakage in its turn does
>> sound like a reasonable cause for these hangs.  But also if leakage is
>> the cause, it seems to me that reducing tags to just 1 might help
>> trigger the problem quickly and reliably on Mike's machine.  If you
>> agree, Jens, which would be the quickest/easiest way to reduce tags?
> 
> Whatever the cause, seems this wants some instrumentation that can be
> left in place for a while.  I turned on CONFIG_BLK_DEBUG_FS for Jens,
> but the little bugger didn't raise it's ugly head all day long.
> 
> What you need most is more reproducers.

Yeah, which is however out of our control.  That's why I'm nagging
Jens, and other knowledgeable people, for that trick to hopefully push
your system to quick failure.

Paolo

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-14 15:44                                         ` Jens Axboe
  2018-02-15 18:13                                           ` Paolo Valente
@ 2018-02-17  5:30                                           ` Mike Galbraith
  2018-02-17  5:33                                             ` Mike Galbraith
  2018-02-17  5:39                                             ` Mike Galbraith
  1 sibling, 2 replies; 54+ messages in thread
From: Mike Galbraith @ 2018-02-17  5:30 UTC (permalink / raw)
  To: Jens Axboe, Paolo Valente; +Cc: Oleksandr Natalenko, stable

On Wed, 2018-02-14 at 08:44 -0700, Jens Axboe wrote:
> 
> Mike, when you see a hang like that, would it be possible for you to
> dump the contents of /sys/kernel/debug/block/<dev in question/* for us
> to inspect? That will tell us a lot about the internal state at that
> time.

Finally. ᅵ4.15.4 stalled this morning while building other updated
trees.

#!/bin/sh

for i in `findᅵᅵ/sys/kernel/debug/block/sd[bd] -type f`
do
	echo $i:
	cat $i
	echo
done

/sys/kernel/debug/block/sdb/hctx0/cpu7/completed:
7483 4884

/sys/kernel/debug/block/sdb/hctx0/cpu7/merged:
0

/sys/kernel/debug/block/sdb/hctx0/cpu7/dispatched:
7483 4884

/sys/kernel/debug/block/sdb/hctx0/cpu7/rq_list:

/sys/kernel/debug/block/sdb/hctx0/cpu6/completed:
12240 7814

/sys/kernel/debug/block/sdb/hctx0/cpu6/merged:
0

/sys/kernel/debug/block/sdb/hctx0/cpu6/dispatched:
12240 7814

/sys/kernel/debug/block/sdb/hctx0/cpu6/rq_list:

/sys/kernel/debug/block/sdb/hctx0/cpu5/completed:
8549 7842

/sys/kernel/debug/block/sdb/hctx0/cpu5/merged:
0

/sys/kernel/debug/block/sdb/hctx0/cpu5/dispatched:
8549 7842

/sys/kernel/debug/block/sdb/hctx0/cpu5/rq_list:

/sys/kernel/debug/block/sdb/hctx0/cpu4/completed:
10115 5073

/sys/kernel/debug/block/sdb/hctx0/cpu4/merged:
0

/sys/kernel/debug/block/sdb/hctx0/cpu4/dispatched:
10115 5073

/sys/kernel/debug/block/sdb/hctx0/cpu4/rq_list:

/sys/kernel/debug/block/sdb/hctx0/cpu3/completed:
13181 4867

/sys/kernel/debug/block/sdb/hctx0/cpu3/merged:
0

/sys/kernel/debug/block/sdb/hctx0/cpu3/dispatched:
13181 4867

/sys/kernel/debug/block/sdb/hctx0/cpu3/rq_list:

/sys/kernel/debug/block/sdb/hctx0/cpu2/completed:
101270 9089

/sys/kernel/debug/block/sdb/hctx0/cpu2/merged:
0

/sys/kernel/debug/block/sdb/hctx0/cpu2/dispatched:
101270 9089

/sys/kernel/debug/block/sdb/hctx0/cpu2/rq_list:

/sys/kernel/debug/block/sdb/hctx0/cpu1/completed:
8158 4392

/sys/kernel/debug/block/sdb/hctx0/cpu1/merged:
0

/sys/kernel/debug/block/sdb/hctx0/cpu1/dispatched:
8158 4392

/sys/kernel/debug/block/sdb/hctx0/cpu1/rq_list:

/sys/kernel/debug/block/sdb/hctx0/cpu0/completed:
8605 5819

/sys/kernel/debug/block/sdb/hctx0/cpu0/merged:
0

/sys/kernel/debug/block/sdb/hctx0/cpu0/dispatched:
8605 5819

/sys/kernel/debug/block/sdb/hctx0/cpu0/rq_list:

/sys/kernel/debug/block/sdb/hctx0/active:
0

/sys/kernel/debug/block/sdb/hctx0/run:
373736

/sys/kernel/debug/block/sdb/hctx0/queued:
219336

/sys/kernel/debug/block/sdb/hctx0/dispatched:
       0	7629
       1	216748
       2	446
       4	0
       8	0
      16	0
      32+	0

/sys/kernel/debug/block/sdb/hctx0/io_poll:
considered=0
invoked=0
success=0

/sys/kernel/debug/block/sdb/hctx0/sched_tags_bitmap:
00000000: 0000 0000 0000 0000

/sys/kernel/debug/block/sdb/hctx0/sched_tags:
nr_tags=62
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=62
busy=0
bits_per_word=8
map_nr=8
alloc_hint={9, 1, 3, 0, 17, 0, 9, 17}
wake_batch=7
wake_index=4
ws={
	{.wait_cnt=7, .wait=active},
	{.wait_cnt=7, .wait=active},
	{.wait_cnt=7, .wait=inactive},
	{.wait_cnt=7, .wait=inactive},
	{.wait_cnt=1, .wait=active},
	{.wait_cnt=7, .wait=active},
	{.wait_cnt=7, .wait=inactive},
	{.wait_cnt=7, .wait=active},
}
round_robin=1

/sys/kernel/debug/block/sdb/hctx0/tags_bitmap:
00000000: 0000 0000

/sys/kernel/debug/block/sdb/hctx0/tags:
nr_tags=31
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=31
busy=0
bits_per_word=4
map_nr=8
alloc_hint={3, 12, 2, 9, 18, 2, 4, 11}
wake_batch=3
wake_index=0
ws={
	{.wait_cnt=3, .wait=inactive},
	{.wait_cnt=3, .wait=inactive},
	{.wait_cnt=3, .wait=inactive},
	{.wait_cnt=3, .wait=inactive},
	{.wait_cnt=3, .wait=inactive},
	{.wait_cnt=3, .wait=inactive},
	{.wait_cnt=3, .wait=inactive},
	{.wait_cnt=3, .wait=inactive},
}
round_robin=1

/sys/kernel/debug/block/sdb/hctx0/ctx_map:
00000000: 00

/sys/kernel/debug/block/sdb/hctx0/busy:

/sys/kernel/debug/block/sdb/hctx0/dispatch:

/sys/kernel/debug/block/sdb/hctx0/flags:
alloc_policy=RR SHOULD_MERGE|SG_MERGE

/sys/kernel/debug/block/sdb/hctx0/state:


/sys/kernel/debug/block/sdb/write_hints:
hint0: 0
hint1: 0
hint2: 0
hint3: 0
hint4: 0

/sys/kernel/debug/block/sdb/state:
SAME_COMP|IO_STAT|INIT_DONE|POLL|WC|FLUSH_NQ|REGISTERED|SCSI_PASSTHROUGH

/sys/kernel/debug/block/sdb/requeue_list:

/sys/kernel/debug/block/sdb/poll_stat:
read  (512 Bytes): samples=0
write (512 Bytes): samples=0
read  (1024 Bytes): samples=0
write (1024 Bytes): samples=0
read  (2048 Bytes): samples=0
write (2048 Bytes): samples=0
read  (4096 Bytes): samples=0
write (4096 Bytes): samples=0
read  (8192 Bytes): samples=0
write (8192 Bytes): samples=0
read  (16384 Bytes): samples=0
write (16384 Bytes): samples=0
read  (32768 Bytes): samples=0
write (32768 Bytes): samples=0
read  (65536 Bytes): samples=0
write (65536 Bytes): samples=0

/sys/kernel/debug/block/sdd/hctx0/cpu7/completed:
2642 5235

/sys/kernel/debug/block/sdd/hctx0/cpu7/merged:
0

/sys/kernel/debug/block/sdd/hctx0/cpu7/dispatched:
2642 5235

/sys/kernel/debug/block/sdd/hctx0/cpu7/rq_list:

/sys/kernel/debug/block/sdd/hctx0/cpu6/completed:
2873 15051

/sys/kernel/debug/block/sdd/hctx0/cpu6/merged:
0

/sys/kernel/debug/block/sdd/hctx0/cpu6/dispatched:
2873 15051

/sys/kernel/debug/block/sdd/hctx0/cpu6/rq_list:

/sys/kernel/debug/block/sdd/hctx0/cpu5/completed:
3212 13912

/sys/kernel/debug/block/sdd/hctx0/cpu5/merged:
0

/sys/kernel/debug/block/sdd/hctx0/cpu5/dispatched:
3212 13912

/sys/kernel/debug/block/sdd/hctx0/cpu5/rq_list:

/sys/kernel/debug/block/sdd/hctx0/cpu4/completed:
4304 8158

/sys/kernel/debug/block/sdd/hctx0/cpu4/merged:
0

/sys/kernel/debug/block/sdd/hctx0/cpu4/dispatched:
4304 8158

/sys/kernel/debug/block/sdd/hctx0/cpu4/rq_list:

/sys/kernel/debug/block/sdd/hctx0/cpu3/completed:
3034 6275

/sys/kernel/debug/block/sdd/hctx0/cpu3/merged:
0

/sys/kernel/debug/block/sdd/hctx0/cpu3/dispatched:
3034 6275

/sys/kernel/debug/block/sdd/hctx0/cpu3/rq_list:

/sys/kernel/debug/block/sdd/hctx0/cpu2/completed:
10339 21025

/sys/kernel/debug/block/sdd/hctx0/cpu2/merged:
0

/sys/kernel/debug/block/sdd/hctx0/cpu2/dispatched:
10339 21025

/sys/kernel/debug/block/sdd/hctx0/cpu2/rq_list:

/sys/kernel/debug/block/sdd/hctx0/cpu1/completed:
3237 9737

/sys/kernel/debug/block/sdd/hctx0/cpu1/merged:
0

/sys/kernel/debug/block/sdd/hctx0/cpu1/dispatched:
3237 9737

/sys/kernel/debug/block/sdd/hctx0/cpu1/rq_list:

/sys/kernel/debug/block/sdd/hctx0/cpu0/completed:
1490 6101

/sys/kernel/debug/block/sdd/hctx0/cpu0/merged:
0

/sys/kernel/debug/block/sdd/hctx0/cpu0/dispatched:
1490 6101

/sys/kernel/debug/block/sdd/hctx0/cpu0/rq_list:

/sys/kernel/debug/block/sdd/hctx0/active:
0

/sys/kernel/debug/block/sdd/hctx0/run:
116625

/sys/kernel/debug/block/sdd/hctx0/queued:
116625

/sys/kernel/debug/block/sdd/hctx0/dispatched:
       0	2438
       1	116700
       2	92
       4	0
       8	0
      16	0
      32+	0

/sys/kernel/debug/block/sdd/hctx0/io_poll:
considered=0
invoked=0
success=0

/sys/kernel/debug/block/sdd/hctx0/sched_tags_bitmap:
00000000: 0000 0000 0000 0000

/sys/kernel/debug/block/sdd/hctx0/sched_tags:
nr_tags=62
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=62
busy=0
bits_per_word=8
map_nr=8
alloc_hint={9, 51, 57, 0, 0, 2, 2, 21}
wake_batch=7
wake_index=0
ws={
	{.wait_cnt=7, .wait=inactive},
	{.wait_cnt=7, .wait=inactive},
	{.wait_cnt=7, .wait=inactive},
	{.wait_cnt=7, .wait=inactive},
	{.wait_cnt=7, .wait=inactive},
	{.wait_cnt=7, .wait=inactive},
	{.wait_cnt=7, .wait=inactive},
	{.wait_cnt=7, .wait=inactive},
}
round_robin=1

/sys/kernel/debug/block/sdd/hctx0/tags_bitmap:
00000000: 0000 0000

/sys/kernel/debug/block/sdd/hctx0/tags:
nr_tags=31
nr_reserved_tags=0
active_queues=0

bitmap_tags:
depth=31
busy=0
bits_per_word=4
map_nr=8
alloc_hint={7, 21, 26, 26, 25, 2, 22, 6}
wake_batch=3
wake_index=0
ws={
	{.wait_cnt=3, .wait=inactive},
	{.wait_cnt=3, .wait=inactive},
	{.wait_cnt=3, .wait=inactive},
	{.wait_cnt=3, .wait=inactive},
	{.wait_cnt=3, .wait=inactive},
	{.wait_cnt=3, .wait=inactive},
	{.wait_cnt=3, .wait=inactive},
	{.wait_cnt=3, .wait=inactive},
}
round_robin=1

/sys/kernel/debug/block/sdd/hctx0/ctx_map:
00000000: 00

/sys/kernel/debug/block/sdd/hctx0/busy:

/sys/kernel/debug/block/sdd/hctx0/dispatch:

/sys/kernel/debug/block/sdd/hctx0/flags:
alloc_policy=RR SHOULD_MERGE|SG_MERGE

/sys/kernel/debug/block/sdd/hctx0/state:


/sys/kernel/debug/block/sdd/write_hints:
hint0: 0
hint1: 0
hint2: 0
hint3: 0
hint4: 0

/sys/kernel/debug/block/sdd/state:
SAME_COMP|IO_STAT|INIT_DONE|POLL|WC|FLUSH_NQ|REGISTERED|SCSI_PASSTHROUGH

/sys/kernel/debug/block/sdd/requeue_list:

/sys/kernel/debug/block/sdd/poll_stat:
read  (512 Bytes): samples=0
write (512 Bytes): samples=0
read  (1024 Bytes): samples=0
write (1024 Bytes): samples=0
read  (2048 Bytes): samples=0
write (2048 Bytes): samples=0
read  (4096 Bytes): samples=0
write (4096 Bytes): samples=0
read  (8192 Bytes): samples=0
write (8192 Bytes): samples=0
read  (16384 Bytes): samples=0
write (16384 Bytes): samples=0
read  (32768 Bytes): samples=0
write (32768 Bytes): samples=0
read  (65536 Bytes): samples=0
write (65536 Bytes): samples=0

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-17  5:30                                           ` Mike Galbraith
@ 2018-02-17  5:33                                             ` Mike Galbraith
  2018-02-19  5:34                                               ` Mike Galbraith
  2018-02-17  5:39                                             ` Mike Galbraith
  1 sibling, 1 reply; 54+ messages in thread
From: Mike Galbraith @ 2018-02-17  5:33 UTC (permalink / raw)
  To: Jens Axboe, Paolo Valente; +Cc: Oleksandr Natalenko, stable


homer:..crash/2018-02-17-06:12 # grep blk_mq_get_tag bta
ᅵ#3 [ffff8803f794b7e8] blk_mq_get_tag at ffffffff8129cf8e
ᅵ#3 [ffff88035ef139b8] blk_mq_get_tag at ffffffff8129cf8e
ᅵ#3 [ffff8802430439b8] blk_mq_get_tag at ffffffff8129cf8e
ᅵ#3 [ffff880290fbb9b8] blk_mq_get_tag at ffffffff8129cf8e

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-17  5:30                                           ` Mike Galbraith
  2018-02-17  5:33                                             ` Mike Galbraith
@ 2018-02-17  5:39                                             ` Mike Galbraith
  1 sibling, 0 replies; 54+ messages in thread
From: Mike Galbraith @ 2018-02-17  5:39 UTC (permalink / raw)
  To: Jens Axboe, Paolo Valente; +Cc: Oleksandr Natalenko, stable

On Sat, 2018-02-17 at 06:30 +0100, Mike Galbraith wrote:
> On Wed, 2018-02-14 at 08:44 -0700, Jens Axboe wrote:
> > 
> > Mike, when you see a hang like that, would it be possible for you to
> > dump the contents of /sys/kernel/debug/block/<dev in question/* for us
> > to inspect? That will tell us a lot about the internal state at that
> > time.
> 
> Finally. ᅵ4.15.4 stalled this morning while building other updated
> trees.
> 
> #!/bin/sh
> 
> for i in `findᅵᅵ/sys/kernel/debug/block/sd[bd] -type f`
> do
> 	echo $i:
> 	cat $i
> 	echo
> done

sdd was idle.. should have chopped that off.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-17  5:33                                             ` Mike Galbraith
@ 2018-02-19  5:34                                               ` Mike Galbraith
  2018-02-19  6:05                                                 ` Oleksandr Natalenko
  2018-02-19  9:37                                                 ` Paolo Valente
  0 siblings, 2 replies; 54+ messages in thread
From: Mike Galbraith @ 2018-02-19  5:34 UTC (permalink / raw)
  To: Jens Axboe, Paolo Valente; +Cc: Oleksandr Natalenko, stable

FWIW, my box spent the weekend running a kbuild marathon withï¿œa52a69ea
reverted, has convinced me that it _is_ the source of the hang woes
(directly or indirectly). ï¿œNo outright stalls, no stall for way too
long but then recover, nada, not the slightest hint of trouble.

	-Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-19  5:34                                               ` Mike Galbraith
@ 2018-02-19  6:05                                                 ` Oleksandr Natalenko
  2018-02-19  7:09                                                   ` Mike Galbraith
  2018-02-19  9:37                                                 ` Paolo Valente
  1 sibling, 1 reply; 54+ messages in thread
From: Oleksandr Natalenko @ 2018-02-19  6:05 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Jens Axboe, Paolo Valente, stable

On pondělí 19. února 2018 6:34:44 CET Mike Galbraith wrote:
> a52a69ea

…which is strange as I'm running it since the very beginning w/o any troubles 
on 4 machines (hence my tested-by tag there)…

Oleksandr

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-19  6:05                                                 ` Oleksandr Natalenko
@ 2018-02-19  7:09                                                   ` Mike Galbraith
  2018-02-19  7:58                                                     ` Mike Galbraith
  0 siblings, 1 reply; 54+ messages in thread
From: Mike Galbraith @ 2018-02-19  7:09 UTC (permalink / raw)
  To: Oleksandr Natalenko; +Cc: Jens Axboe, Paolo Valente, stable

On Mon, 2018-02-19 at 07:05 +0100, Oleksandr Natalenko wrote:
> On pondělí 19. února 2018 6:34:44 CET Mike Galbraith wrote:
> > a52a69ea
> 
> …which is strange as I'm running it since the very beginning w/o any troubles 
> on 4 machines (hence my tested-by tag there)…

My box routinely bitches about stuff with signed-off-by's and tested-
by's, it has no respect at all for other boxen's opinions :)  It's the
only box bitching, so salt to taste.. but it remains adamant.

	-Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-19  7:09                                                   ` Mike Galbraith
@ 2018-02-19  7:58                                                     ` Mike Galbraith
  0 siblings, 0 replies; 54+ messages in thread
From: Mike Galbraith @ 2018-02-19  7:58 UTC (permalink / raw)
  To: Oleksandr Natalenko; +Cc: Jens Axboe, Paolo Valente, stable

On Mon, 2018-02-19 at 08:09 +0100, Mike Galbraith wrote:
> On Mon, 2018-02-19 at 07:05 +0100, Oleksandr Natalenko wrote:
> > On pondělí 19. února 2018 6:34:44 CET Mike Galbraith wrote:
> > > a52a69ea
> > 
> > …which is strange as I'm running it since the very beginning w/o any troubles 
> > on 4 machines (hence my tested-by tag there)…
> 
> My box routinely bitches about stuff with signed-off-by's and tested-
> by's, it has no respect at all for other boxen's opinions :)  It's the
> only box bitching, so salt to taste.. but it remains adamant.

Given it is the only one, and whatever the problem being so non-
deterministic, ignoring my box, shipping the lot off to stable then
wait and see seems perfectly reasonable.  It won't be any worse off.

	-Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-19  5:34                                               ` Mike Galbraith
  2018-02-19  6:05                                                 ` Oleksandr Natalenko
@ 2018-02-19  9:37                                                 ` Paolo Valente
  2018-02-19 13:18                                                   ` Mike Galbraith
  1 sibling, 1 reply; 54+ messages in thread
From: Paolo Valente @ 2018-02-19  9:37 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Jens Axboe, Oleksandr Natalenko, stable



> Il giorno 19 feb 2018, alle ore 06:34, Mike Galbraith <efault@gmx.de> ha scritto:
> 
> FWIW, my box spent the weekend running a kbuild marathon with a52a69ea
> reverted, has convinced me that it _is_ the source of the hang woes
> (directly or indirectly).  No outright stalls, no stall for way too
> long but then recover, nada, not the slightest hint of trouble.
> 

ok, after this piece of evidence, and while waiting for more
authoritative suggestions, I have investigated a little bit how to
reduce the number of your angry tags available, to hopefully push your
system so close to failure to have your problem reproduced reliably
and quickly.

The only easy way seems to be that you, as root, issue an
echo 4 > /sys/block/sda/queue/nr_requests
(nr_requests is lower-bounded to 4)

NOTE: do/repeat this *after* you switch to a new scheduler, because a
scheduler switch restores a high 62 value.

Reducing nr_requests does not change the value of the nr_tags
parameter, but, according to the code, it should de facto reduce the
number of available tags, by reducing the depth of the bitmaps of free
tags.

Thanks,
Paolo

> 	-Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-19  9:37                                                 ` Paolo Valente
@ 2018-02-19 13:18                                                   ` Mike Galbraith
  0 siblings, 0 replies; 54+ messages in thread
From: Mike Galbraith @ 2018-02-19 13:18 UTC (permalink / raw)
  To: Paolo Valente; +Cc: Jens Axboe, Oleksandr Natalenko, stable

On Mon, 2018-02-19 at 10:37 +0100, Paolo Valente wrote:
> 
> > Il giorno 19 feb 2018, alle ore 06:34, Mike Galbraith <efault@gmx.de> ha scritto:
> > 
> > FWIW, my box spent the weekend running a kbuild marathon with a52a69ea
> > reverted, has convinced me that it _is_ the source of the hang woes
> > (directly or indirectly).  No outright stalls, no stall for way too
> > long but then recover, nada, not the slightest hint of trouble.
> > 
> 
> ok, after this piece of evidence, and while waiting for more
> authoritative suggestions, I have investigated a little bit how to
> reduce the number of your angry tags available, to hopefully push your
> system so close to failure to have your problem reproduced reliably
> and quickly.

Neither wish came true.

	-Mike

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-07 21:19 [PATCH BUGFIX V3] block, bfq: add requeue-request hook Paolo Valente
  2018-02-07 22:18 ` Jens Axboe
@ 2018-02-23 15:07 ` Ming Lei
  2018-02-23 15:41     ` Paolo Valente
  1 sibling, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-02-23 15:07 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Jens Axboe, linux-block, linux-kernel, ulf.hansson, broonie,
	linus.walleij, bfq-iosched, oleksandr, alban.browaeys, ivan,
	169364, holger, efault, Serena Ziviani

Hi Paolo,

On Wed, Feb 07, 2018 at 10:19:20PM +0100, Paolo Valente wrote:
> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
> be re-inserted into the active I/O scheduler for that device. As a

No, this behaviour isn't related with commit a6a252e64914, and
it has been there since blk_mq_requeue_request() is introduced.

And you can see blk_mq_requeue_request() is called by lots of drivers,
especially it is often used in error handler, see SCSI's example.

> consequence, I/O schedulers may get the same request inserted again,
> even several times, without a finish_request invoked on that request
> before each re-insertion.
> 
> This fact is the cause of the failure reported in [1]. For an I/O
> scheduler, every re-insertion of the same re-prepared request is
> equivalent to the insertion of a new request. For schedulers like
> mq-deadline or kyber, this fact causes no harm. In contrast, it
> confuses a stateful scheduler like BFQ, which keeps state for an I/O
> request, until the finish_request hook is invoked on the request. In
> particular, BFQ may get stuck, waiting forever for the number of
> request dispatches, of the same request, to be balanced by an equal
> number of request completions (while there will be one completion for
> that request). In this state, BFQ may refuse to serve I/O requests
> from other bfq_queues. The hang reported in [1] then follows.
> 
> However, the above re-prepared requests undergo a requeue, thus the
> requeue_request hook of the active elevator is invoked for these
> requests, if set. This commit then addresses the above issue by
> properly implementing the hook requeue_request in BFQ.
> 
> [1] https://marc.info/?l=linux-block&m=151211117608676
> 
> Reported-by: Ivan Kozik <ivan@ludios.org>
> Reported-by: Alban Browaeys <alban.browaeys@gmail.com>
> Tested-by: Mike Galbraith <efault@gmx.de>
> Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
> Signed-off-by: Serena Ziviani <ziviani.serena@gmail.com>
> ---
> V2: contains fix to bug reported in [2]
> V3: implements the improvement suggested in [3]
> 
> [2] https://lkml.org/lkml/2018/2/5/599
> [3] https://lkml.org/lkml/2018/2/7/532
> 
>  block/bfq-iosched.c | 107 ++++++++++++++++++++++++++++++++++++++++------------
>  1 file changed, 82 insertions(+), 25 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 47e6ec7427c4..aeca22d91101 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -3823,24 +3823,26 @@ static struct request *__bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
>  		}
> 
>  		/*
> -		 * We exploit the bfq_finish_request hook to decrement
> -		 * rq_in_driver, but bfq_finish_request will not be
> -		 * invoked on this request. So, to avoid unbalance,
> -		 * just start this request, without incrementing
> -		 * rq_in_driver. As a negative consequence,
> -		 * rq_in_driver is deceptively lower than it should be
> -		 * while this request is in service. This may cause
> -		 * bfq_schedule_dispatch to be invoked uselessly.
> +		 * We exploit the bfq_finish_requeue_request hook to
> +		 * decrement rq_in_driver, but
> +		 * bfq_finish_requeue_request will not be invoked on
> +		 * this request. So, to avoid unbalance, just start
> +		 * this request, without incrementing rq_in_driver. As
> +		 * a negative consequence, rq_in_driver is deceptively
> +		 * lower than it should be while this request is in
> +		 * service. This may cause bfq_schedule_dispatch to be
> +		 * invoked uselessly.
>  		 *
>  		 * As for implementing an exact solution, the
> -		 * bfq_finish_request hook, if defined, is probably
> -		 * invoked also on this request. So, by exploiting
> -		 * this hook, we could 1) increment rq_in_driver here,
> -		 * and 2) decrement it in bfq_finish_request. Such a
> -		 * solution would let the value of the counter be
> -		 * always accurate, but it would entail using an extra
> -		 * interface function. This cost seems higher than the
> -		 * benefit, being the frequency of non-elevator-private
> +		 * bfq_finish_requeue_request hook, if defined, is
> +		 * probably invoked also on this request. So, by
> +		 * exploiting this hook, we could 1) increment
> +		 * rq_in_driver here, and 2) decrement it in
> +		 * bfq_finish_requeue_request. Such a solution would
> +		 * let the value of the counter be always accurate,
> +		 * but it would entail using an extra interface
> +		 * function. This cost seems higher than the benefit,
> +		 * being the frequency of non-elevator-private
>  		 * requests very low.
>  		 */
>  		goto start_rq;
> @@ -4515,6 +4517,8 @@ static inline void bfq_update_insert_stats(struct request_queue *q,
>  					   unsigned int cmd_flags) {}
>  #endif
> 
> +static void bfq_prepare_request(struct request *rq, struct bio *bio);
> +
>  static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  			       bool at_head)
>  {
> @@ -4541,6 +4545,18 @@ static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  		else
>  			list_add_tail(&rq->queuelist, &bfqd->dispatch);
>  	} else {
> +		if (WARN_ON_ONCE(!bfqq)) {
> +			/*
> +			 * This should never happen. Most likely rq is
> +			 * a requeued regular request, being
> +			 * re-inserted without being first
> +			 * re-prepared. Do a prepare, to avoid
> +			 * failure.
> +			 */
> +			bfq_prepare_request(rq, rq->bio);
> +			bfqq = RQ_BFQQ(rq);
> +		}
> +
>  		idle_timer_disabled = __bfq_insert_request(bfqd, rq);
>  		/*
>  		 * Update bfqq, because, if a queue merge has occurred
> @@ -4697,22 +4713,44 @@ static void bfq_completed_request(struct bfq_queue *bfqq, struct bfq_data *bfqd)
>  		bfq_schedule_dispatch(bfqd);
>  }
> 
> -static void bfq_finish_request_body(struct bfq_queue *bfqq)
> +static void bfq_finish_requeue_request_body(struct bfq_queue *bfqq)
>  {
>  	bfqq->allocated--;
> 
>  	bfq_put_queue(bfqq);
>  }
> 
> -static void bfq_finish_request(struct request *rq)
> +/*
> + * Handle either a requeue or a finish for rq. The things to do are
> + * the same in both cases: all references to rq are to be dropped. In
> + * particular, rq is considered completed from the point of view of
> + * the scheduler.
> + */
> +static void bfq_finish_requeue_request(struct request *rq)
>  {
> -	struct bfq_queue *bfqq;
> +	struct bfq_queue *bfqq = RQ_BFQQ(rq);
>  	struct bfq_data *bfqd;
> 
> -	if (!rq->elv.icq)
> +	/*
> +	 * Requeue and finish hooks are invoked in blk-mq without
> +	 * checking whether the involved request is actually still
> +	 * referenced in the scheduler. To handle this fact, the
> +	 * following two checks make this function exit in case of
> +	 * spurious invocations, for which there is nothing to do.
> +	 *
> +	 * First, check whether rq has nothing to do with an elevator.
> +	 */
> +	if (unlikely(!(rq->rq_flags & RQF_ELVPRIV)))
> +		return;
> +
> +	/*
> +	 * rq either is not associated with any icq, or is an already
> +	 * requeued request that has not (yet) been re-inserted into
> +	 * a bfq_queue.
> +	 */
> +	if (!rq->elv.icq || !bfqq)
>  		return;
> 
> -	bfqq = RQ_BFQQ(rq);
>  	bfqd = bfqq->bfqd;
> 
>  	if (rq->rq_flags & RQF_STARTED)
> @@ -4727,13 +4765,14 @@ static void bfq_finish_request(struct request *rq)
>  		spin_lock_irqsave(&bfqd->lock, flags);
> 
>  		bfq_completed_request(bfqq, bfqd);
> -		bfq_finish_request_body(bfqq);
> +		bfq_finish_requeue_request_body(bfqq);
> 
>  		spin_unlock_irqrestore(&bfqd->lock, flags);
>  	} else {
>  		/*
>  		 * Request rq may be still/already in the scheduler,
> -		 * in which case we need to remove it. And we cannot
> +		 * in which case we need to remove it (this should
> +		 * never happen in case of requeue). And we cannot
>  		 * defer such a check and removal, to avoid
>  		 * inconsistencies in the time interval from the end
>  		 * of this function to the start of the deferred work.
> @@ -4748,9 +4787,26 @@ static void bfq_finish_request(struct request *rq)
>  			bfqg_stats_update_io_remove(bfqq_group(bfqq),
>  						    rq->cmd_flags);
>  		}
> -		bfq_finish_request_body(bfqq);
> +		bfq_finish_requeue_request_body(bfqq);
>  	}
> 
> +	/*
> +	 * Reset private fields. In case of a requeue, this allows
> +	 * this function to correctly do nothing if it is spuriously
> +	 * invoked again on this same request (see the check at the
> +	 * beginning of the function). Probably, a better general
> +	 * design would be to prevent blk-mq from invoking the requeue
> +	 * or finish hooks of an elevator, for a request that is not
> +	 * referred by that elevator.
> +	 *
> +	 * Resetting the following fields would break the
> +	 * request-insertion logic if rq is re-inserted into a bfq
> +	 * internal queue, without a re-preparation. Here we assume
> +	 * that re-insertions of requeued requests, without
> +	 * re-preparation, can happen only for pass_through or at_head
> +	 * requests (which are not re-inserted into bfq internal
> +	 * queues).
> +	 */
>  	rq->elv.priv[0] = NULL;
>  	rq->elv.priv[1] = NULL;
>  }
> @@ -5426,7 +5482,8 @@ static struct elevator_type iosched_bfq_mq = {
>  	.ops.mq = {
>  		.limit_depth		= bfq_limit_depth,
>  		.prepare_request	= bfq_prepare_request,
> -		.finish_request		= bfq_finish_request,
> +		.requeue_request        = bfq_finish_requeue_request,
> +		.finish_request		= bfq_finish_requeue_request,
>  		.exit_icq		= bfq_exit_icq,
>  		.insert_requests	= bfq_insert_requests,
>  		.dispatch_request	= bfq_dispatch_request,

This way may not be correct since blk_mq_sched_requeue_request() can be
called for one request which won't enter io scheduler.

__blk_mq_requeue_request() is called for two cases:

	- one is that the requeued request is added to hctx->dispatch, such
	as blk_mq_dispatch_rq_list()
	- another case is that the request is requeued to io scheduler, such as
	blk_mq_requeue_request().

For the 1st case, blk_mq_sched_requeue_request() shouldn't be called
since it is nothing to do with scheduler, seems we only need to do that
for 2nd case.

So looks we need the following patch:

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 23de7fd8099a..a216f3c3c3ce 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -712,7 +714,6 @@ static void __blk_mq_requeue_request(struct request *rq)
 
 	trace_block_rq_requeue(q, rq);
 	wbt_requeue(q->rq_wb, &rq->issue_stat);
-	blk_mq_sched_requeue_request(rq);
 
 	if (blk_mq_rq_state(rq) != MQ_RQ_IDLE) {
 		blk_mq_rq_update_state(rq, MQ_RQ_IDLE);
@@ -725,6 +726,9 @@ void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 {
 	__blk_mq_requeue_request(rq);
 
+	/* this request will be re-inserted to io scheduler queue */
+	blk_mq_sched_requeue_request(rq);
+
 	BUG_ON(blk_queued_rq(rq));
 	blk_mq_add_to_requeue_list(rq, true, kick_requeue_list);
 }


Thanks,
Ming

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-23 15:07 ` Ming Lei
@ 2018-02-23 15:41     ` Paolo Valente
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Valente @ 2018-02-23 15:41 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Linux Kernel Mailing List, Ulf Hansson,
	Mark Brown, Linus Walleij,
	'Paolo Valente' via bfq-iosched, Oleksandr Natalenko,
	Alban Browaeys, ivan, 169364, holger, efault, Serena Ziviani



> Il giorno 23 feb 2018, alle ore 16:07, Ming Lei <ming.lei@redhat.com> =
ha scritto:
>=20
> Hi Paolo,
>=20
> On Wed, Feb 07, 2018 at 10:19:20PM +0100, Paolo Valente wrote:
>> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq =
via
>> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a =
device
>> be re-inserted into the active I/O scheduler for that device. As a
>=20
> No, this behaviour isn't related with commit a6a252e64914, and
> it has been there since blk_mq_requeue_request() is introduced.
>=20

Hi Ming,
actually, we wrote the above statement after simply following the call
chain that led to the failure.  In this respect, the change in commit
a6a252e64914:

 static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
+                                      bool has_sched,
                                       struct request *rq)
 {
-       if (rq->tag =3D=3D -1) {
+       /* dispatch flush rq directly */
+       if (rq->rq_flags & RQF_FLUSH_SEQ) {
+               spin_lock(&hctx->lock);
+               list_add(&rq->queuelist, &hctx->dispatch);
+               spin_unlock(&hctx->lock);
+               return true;
+       }
+
+       if (has_sched) {
                rq->rq_flags |=3D RQF_SORTED;
-               return false;
+               WARN_ON(rq->tag !=3D -1);
        }
=20
-       /*
-        * If we already have a real request tag, send directly to
-        * the dispatch list.
-        */
-       spin_lock(&hctx->lock);
-       list_add(&rq->queuelist, &hctx->dispatch);
-       spin_unlock(&hctx->lock);
-       return true;
+       return false;
 }

makes blk_mq_sched_bypass_insert return false for all non-flush
requests.  =46rom that, the anomaly described in our commit follows, for
bfq any stateful scheduler that waits for the completion of requests
that passed through it.  I'm elaborating again a little bit on this in
my replies to your next points below.

I don't mean that this change is an error, it simply sends a stateful
scheduler in an inconsistent state, unless the scheduler properly
handles the requeue that precedes the re-insertion into the
scheduler.

If this clarifies the situation, but there is still some misleading
statement in the commit, just let me better understand, and I'll be
glad to rectify it, if possible somehow.


> And you can see blk_mq_requeue_request() is called by lots of drivers,
> especially it is often used in error handler, see SCSI's example.
>=20
>> consequence, I/O schedulers may get the same request inserted again,
>> even several times, without a finish_request invoked on that request
>> before each re-insertion.
>>=20

...

>> @@ -5426,7 +5482,8 @@ static struct elevator_type iosched_bfq_mq =3D =
{
>> 	.ops.mq =3D {
>> 		.limit_depth		=3D bfq_limit_depth,
>> 		.prepare_request	=3D bfq_prepare_request,
>> -		.finish_request		=3D bfq_finish_request,
>> +		.requeue_request        =3D bfq_finish_requeue_request,
>> +		.finish_request		=3D bfq_finish_requeue_request,
>> 		.exit_icq		=3D bfq_exit_icq,
>> 		.insert_requests	=3D bfq_insert_requests,
>> 		.dispatch_request	=3D bfq_dispatch_request,
>=20
> This way may not be correct since blk_mq_sched_requeue_request() can =
be
> called for one request which won't enter io scheduler.
>=20

Exactly, there are two cases: requeues that lead to subsequent
re-insertions, and requeues that don't.  The function
bfq_finish_requeue_request handles both, and both need to be handled,
to inform bfq that it has not to wait for the completion of rq any
longer.

One special case is when bfq_finish_requeue_request gets invoked even
if rq has nothing to do with any scheduler.  In that case,
bfq_finish_requeue_request exists immediately.


> __blk_mq_requeue_request() is called for two cases:
>=20
> 	- one is that the requeued request is added to hctx->dispatch, =
such
> 	as blk_mq_dispatch_rq_list()

yes

> 	- another case is that the request is requeued to io scheduler, =
such as
> 	blk_mq_requeue_request().
>=20

yes

> For the 1st case, blk_mq_sched_requeue_request() shouldn't be called
> since it is nothing to do with scheduler,

No, if that rq has been inserted and then extracted from the scheduler
through a dispatch_request, then it has.  The scheduler is stateful,
and keeps state for rq, because it must do so, until a completion or a
requeue arrive.  In particular, bfq may decide that no other
bfq_queues must be served before rq has been completed, because this
is necessary to preserve its target service guarantees.  If bfq is not
informed, either through its completion or its requeue hook, then it
will wait forever.

> seems we only need to do that
> for 2nd case.
>=20

> So looks we need the following patch:
>=20
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 23de7fd8099a..a216f3c3c3ce 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -712,7 +714,6 @@ static void __blk_mq_requeue_request(struct =
request *rq)
>=20
> 	trace_block_rq_requeue(q, rq);
> 	wbt_requeue(q->rq_wb, &rq->issue_stat);
> -	blk_mq_sched_requeue_request(rq);
>=20

By doing so, if rq has 'passed' through bfq, then you block again bfq
forever.

Of course, I may be wrong, as I don't have a complete mental model of
all what blk-mq does around bfq.  But I thin that you can quickly
check whether a hang occurs again if you add this change.  In
particular, it should happen in the USB former failure case.

Thanks,
Paolo

> 	if (blk_mq_rq_state(rq) !=3D MQ_RQ_IDLE) {
> 		blk_mq_rq_update_state(rq, MQ_RQ_IDLE);
> @@ -725,6 +726,9 @@ void blk_mq_requeue_request(struct request *rq, =
bool kick_requeue_list)
> {
> 	__blk_mq_requeue_request(rq);
>=20
> +	/* this request will be re-inserted to io scheduler queue */
> +	blk_mq_sched_requeue_request(rq);
> +
> 	BUG_ON(blk_queued_rq(rq));
> 	blk_mq_add_to_requeue_list(rq, true, kick_requeue_list);
> }
>=20
>=20
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
@ 2018-02-23 15:41     ` Paolo Valente
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Valente @ 2018-02-23 15:41 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Linux Kernel Mailing List, Ulf Hansson,
	Mark Brown, Linus Walleij,
	'Paolo Valente' via bfq-iosched, Oleksandr Natalenko,
	Alban Browaeys, ivan, 169364, holger, efault, Serena Ziviani



> Il giorno 23 feb 2018, alle ore 16:07, Ming Lei <ming.lei@redhat.com> ha scritto:
> 
> Hi Paolo,
> 
> On Wed, Feb 07, 2018 at 10:19:20PM +0100, Paolo Valente wrote:
>> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
>> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
>> be re-inserted into the active I/O scheduler for that device. As a
> 
> No, this behaviour isn't related with commit a6a252e64914, and
> it has been there since blk_mq_requeue_request() is introduced.
> 

Hi Ming,
actually, we wrote the above statement after simply following the call
chain that led to the failure.  In this respect, the change in commit
a6a252e64914:

 static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
+                                      bool has_sched,
                                       struct request *rq)
 {
-       if (rq->tag == -1) {
+       /* dispatch flush rq directly */
+       if (rq->rq_flags & RQF_FLUSH_SEQ) {
+               spin_lock(&hctx->lock);
+               list_add(&rq->queuelist, &hctx->dispatch);
+               spin_unlock(&hctx->lock);
+               return true;
+       }
+
+       if (has_sched) {
                rq->rq_flags |= RQF_SORTED;
-               return false;
+               WARN_ON(rq->tag != -1);
        }
 
-       /*
-        * If we already have a real request tag, send directly to
-        * the dispatch list.
-        */
-       spin_lock(&hctx->lock);
-       list_add(&rq->queuelist, &hctx->dispatch);
-       spin_unlock(&hctx->lock);
-       return true;
+       return false;
 }

makes blk_mq_sched_bypass_insert return false for all non-flush
requests.  From that, the anomaly described in our commit follows, for
bfq any stateful scheduler that waits for the completion of requests
that passed through it.  I'm elaborating again a little bit on this in
my replies to your next points below.

I don't mean that this change is an error, it simply sends a stateful
scheduler in an inconsistent state, unless the scheduler properly
handles the requeue that precedes the re-insertion into the
scheduler.

If this clarifies the situation, but there is still some misleading
statement in the commit, just let me better understand, and I'll be
glad to rectify it, if possible somehow.


> And you can see blk_mq_requeue_request() is called by lots of drivers,
> especially it is often used in error handler, see SCSI's example.
> 
>> consequence, I/O schedulers may get the same request inserted again,
>> even several times, without a finish_request invoked on that request
>> before each re-insertion.
>> 

...

>> @@ -5426,7 +5482,8 @@ static struct elevator_type iosched_bfq_mq = {
>> 	.ops.mq = {
>> 		.limit_depth		= bfq_limit_depth,
>> 		.prepare_request	= bfq_prepare_request,
>> -		.finish_request		= bfq_finish_request,
>> +		.requeue_request        = bfq_finish_requeue_request,
>> +		.finish_request		= bfq_finish_requeue_request,
>> 		.exit_icq		= bfq_exit_icq,
>> 		.insert_requests	= bfq_insert_requests,
>> 		.dispatch_request	= bfq_dispatch_request,
> 
> This way may not be correct since blk_mq_sched_requeue_request() can be
> called for one request which won't enter io scheduler.
> 

Exactly, there are two cases: requeues that lead to subsequent
re-insertions, and requeues that don't.  The function
bfq_finish_requeue_request handles both, and both need to be handled,
to inform bfq that it has not to wait for the completion of rq any
longer.

One special case is when bfq_finish_requeue_request gets invoked even
if rq has nothing to do with any scheduler.  In that case,
bfq_finish_requeue_request exists immediately.


> __blk_mq_requeue_request() is called for two cases:
> 
> 	- one is that the requeued request is added to hctx->dispatch, such
> 	as blk_mq_dispatch_rq_list()

yes

> 	- another case is that the request is requeued to io scheduler, such as
> 	blk_mq_requeue_request().
> 

yes

> For the 1st case, blk_mq_sched_requeue_request() shouldn't be called
> since it is nothing to do with scheduler,

No, if that rq has been inserted and then extracted from the scheduler
through a dispatch_request, then it has.  The scheduler is stateful,
and keeps state for rq, because it must do so, until a completion or a
requeue arrive.  In particular, bfq may decide that no other
bfq_queues must be served before rq has been completed, because this
is necessary to preserve its target service guarantees.  If bfq is not
informed, either through its completion or its requeue hook, then it
will wait forever.

> seems we only need to do that
> for 2nd case.
> 

> So looks we need the following patch:
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 23de7fd8099a..a216f3c3c3ce 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -712,7 +714,6 @@ static void __blk_mq_requeue_request(struct request *rq)
> 
> 	trace_block_rq_requeue(q, rq);
> 	wbt_requeue(q->rq_wb, &rq->issue_stat);
> -	blk_mq_sched_requeue_request(rq);
> 

By doing so, if rq has 'passed' through bfq, then you block again bfq
forever.

Of course, I may be wrong, as I don't have a complete mental model of
all what blk-mq does around bfq.  But I thin that you can quickly
check whether a hang occurs again if you add this change.  In
particular, it should happen in the USB former failure case.

Thanks,
Paolo

> 	if (blk_mq_rq_state(rq) != MQ_RQ_IDLE) {
> 		blk_mq_rq_update_state(rq, MQ_RQ_IDLE);
> @@ -725,6 +726,9 @@ void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
> {
> 	__blk_mq_requeue_request(rq);
> 
> +	/* this request will be re-inserted to io scheduler queue */
> +	blk_mq_sched_requeue_request(rq);
> +
> 	BUG_ON(blk_queued_rq(rq));
> 	blk_mq_add_to_requeue_list(rq, true, kick_requeue_list);
> }
> 
> 
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-23 15:41     ` Paolo Valente
  (?)
@ 2018-02-23 16:17     ` Ming Lei
  2018-02-24  7:54         ` Paolo Valente
  -1 siblings, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-02-23 16:17 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Jens Axboe, linux-block, Linux Kernel Mailing List, Ulf Hansson,
	Mark Brown, Linus Walleij,
	'Paolo Valente' via bfq-iosched, Oleksandr Natalenko,
	Alban Browaeys, ivan, 169364, holger, efault, Serena Ziviani

Hi Paolo,

On Fri, Feb 23, 2018 at 04:41:36PM +0100, Paolo Valente wrote:
> 
> 
> > Il giorno 23 feb 2018, alle ore 16:07, Ming Lei <ming.lei@redhat.com> ha scritto:
> > 
> > Hi Paolo,
> > 
> > On Wed, Feb 07, 2018 at 10:19:20PM +0100, Paolo Valente wrote:
> >> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
> >> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
> >> be re-inserted into the active I/O scheduler for that device. As a
> > 
> > No, this behaviour isn't related with commit a6a252e64914, and
> > it has been there since blk_mq_requeue_request() is introduced.
> > 
> 
> Hi Ming,
> actually, we wrote the above statement after simply following the call
> chain that led to the failure.  In this respect, the change in commit
> a6a252e64914:
> 
>  static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
> +                                      bool has_sched,
>                                        struct request *rq)
>  {
> -       if (rq->tag == -1) {
> +       /* dispatch flush rq directly */
> +       if (rq->rq_flags & RQF_FLUSH_SEQ) {
> +               spin_lock(&hctx->lock);
> +               list_add(&rq->queuelist, &hctx->dispatch);
> +               spin_unlock(&hctx->lock);
> +               return true;
> +       }
> +
> +       if (has_sched) {
>                 rq->rq_flags |= RQF_SORTED;
> -               return false;
> +               WARN_ON(rq->tag != -1);
>         }
>  
> -       /*
> -        * If we already have a real request tag, send directly to
> -        * the dispatch list.
> -        */
> -       spin_lock(&hctx->lock);
> -       list_add(&rq->queuelist, &hctx->dispatch);
> -       spin_unlock(&hctx->lock);
> -       return true;
> +       return false;
>  }
> 
> makes blk_mq_sched_bypass_insert return false for all non-flush
> requests.  From that, the anomaly described in our commit follows, for
> bfq any stateful scheduler that waits for the completion of requests
> that passed through it.  I'm elaborating again a little bit on this in
> my replies to your next points below.

Before a6a252e64914, follows blk_mq_sched_bypass_insert()

       if (rq->tag == -1) {
               rq->rq_flags |= RQF_SORTED;
               return false;
	   }

       /*
        * If we already have a real request tag, send directly to
        * the dispatch list.
        */
       spin_lock(&hctx->lock);
       list_add(&rq->queuelist, &hctx->dispatch);
       spin_unlock(&hctx->lock);
       return true;

This function still returns false for all non-flush request, so nothing
changes wrt. this kind of handling.

> 
> I don't mean that this change is an error, it simply sends a stateful
> scheduler in an inconsistent state, unless the scheduler properly
> handles the requeue that precedes the re-insertion into the
> scheduler.
> 
> If this clarifies the situation, but there is still some misleading
> statement in the commit, just let me better understand, and I'll be
> glad to rectify it, if possible somehow.
> 
> 
> > And you can see blk_mq_requeue_request() is called by lots of drivers,
> > especially it is often used in error handler, see SCSI's example.
> > 
> >> consequence, I/O schedulers may get the same request inserted again,
> >> even several times, without a finish_request invoked on that request
> >> before each re-insertion.
> >> 
> 
> ...
> 
> >> @@ -5426,7 +5482,8 @@ static struct elevator_type iosched_bfq_mq = {
> >> 	.ops.mq = {
> >> 		.limit_depth		= bfq_limit_depth,
> >> 		.prepare_request	= bfq_prepare_request,
> >> -		.finish_request		= bfq_finish_request,
> >> +		.requeue_request        = bfq_finish_requeue_request,
> >> +		.finish_request		= bfq_finish_requeue_request,
> >> 		.exit_icq		= bfq_exit_icq,
> >> 		.insert_requests	= bfq_insert_requests,
> >> 		.dispatch_request	= bfq_dispatch_request,
> > 
> > This way may not be correct since blk_mq_sched_requeue_request() can be
> > called for one request which won't enter io scheduler.
> > 
> 
> Exactly, there are two cases: requeues that lead to subsequent
> re-insertions, and requeues that don't.  The function
> bfq_finish_requeue_request handles both, and both need to be handled,
> to inform bfq that it has not to wait for the completion of rq any
> longer.
> 
> One special case is when bfq_finish_requeue_request gets invoked even
> if rq has nothing to do with any scheduler.  In that case,
> bfq_finish_requeue_request exists immediately.
> 
> 
> > __blk_mq_requeue_request() is called for two cases:
> > 
> > 	- one is that the requeued request is added to hctx->dispatch, such
> > 	as blk_mq_dispatch_rq_list()
> 
> yes
> 
> > 	- another case is that the request is requeued to io scheduler, such as
> > 	blk_mq_requeue_request().
> > 
> 
> yes
> 
> > For the 1st case, blk_mq_sched_requeue_request() shouldn't be called
> > since it is nothing to do with scheduler,
> 
> No, if that rq has been inserted and then extracted from the scheduler
> through a dispatch_request, then it has.  The scheduler is stateful,
> and keeps state for rq, because it must do so, until a completion or a
> requeue arrive.  In particular, bfq may decide that no other

This 'requeue' to hctx->dispatch is invisible to io scheduler, and from
io scheduler view, seems no difference between STS_OK and STS_RESOURCE,
since this request will be submitted to lld automatically by blk-mq
core.

Also in legacy path, no such notification to io scheduler too.

> bfq_queues must be served before rq has been completed, because this
> is necessary to preserve its target service guarantees.  If bfq is not
> informed, either through its completion or its requeue hook, then it
> will wait forever.

The .finish_request will be called after this request is completed for this
case, either after this io is completed by device, or timeout.

Or .requeue_request will be called if the driver need to requeue it to
io scheduler via blk_mq_requeue_request() explicitly.

So io scheduler will be made sure to get notified.

> 
> > seems we only need to do that
> > for 2nd case.
> > 
> 
> > So looks we need the following patch:
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 23de7fd8099a..a216f3c3c3ce 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -712,7 +714,6 @@ static void __blk_mq_requeue_request(struct request *rq)
> > 
> > 	trace_block_rq_requeue(q, rq);
> > 	wbt_requeue(q->rq_wb, &rq->issue_stat);
> > -	blk_mq_sched_requeue_request(rq);
> > 
> 
> By doing so, if rq has 'passed' through bfq, then you block again bfq
> forever.

Could you explain it a bit why bfq is blocked by this way?

> 
> Of course, I may be wrong, as I don't have a complete mental model of
> all what blk-mq does around bfq.  But I thin that you can quickly
> check whether a hang occurs again if you add this change.  In
> particular, it should happen in the USB former failure case.

USB/BFQ runs well on my parted test with this change, and this test can
trigger the IO hang issue easily on kyber or without your patch of 'block,
bfq: add requeue-request hook'.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-23 16:17     ` Ming Lei
@ 2018-02-24  7:54         ` Paolo Valente
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Valente @ 2018-02-24  7:54 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Linux Kernel Mailing List, Ulf Hansson,
	Mark Brown, Linus Walleij,
	'Paolo Valente' via bfq-iosched, Oleksandr Natalenko,
	Alban Browaeys, Ivan Kozik, SERENA ZIVIANI,
	Holger Hoffstätte, efault, Serena Ziviani



> Il giorno 23 feb 2018, alle ore 17:17, Ming Lei <ming.lei@redhat.com> =
ha scritto:
>=20
> Hi Paolo,
>=20
> On Fri, Feb 23, 2018 at 04:41:36PM +0100, Paolo Valente wrote:
>>=20
>>=20
>>> Il giorno 23 feb 2018, alle ore 16:07, Ming Lei =
<ming.lei@redhat.com> ha scritto:
>>>=20
>>> Hi Paolo,
>>>=20
>>> On Wed, Feb 07, 2018 at 10:19:20PM +0100, Paolo Valente wrote:
>>>> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq =
via
>>>> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a =
device
>>>> be re-inserted into the active I/O scheduler for that device. As a
>>>=20
>>> No, this behaviour isn't related with commit a6a252e64914, and
>>> it has been there since blk_mq_requeue_request() is introduced.
>>>=20
>>=20
>> Hi Ming,
>> actually, we wrote the above statement after simply following the =
call
>> chain that led to the failure.  In this respect, the change in commit
>> a6a252e64914:
>>=20
>> static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
>> +                                      bool has_sched,
>>                                       struct request *rq)
>> {
>> -       if (rq->tag =3D=3D -1) {
>> +       /* dispatch flush rq directly */
>> +       if (rq->rq_flags & RQF_FLUSH_SEQ) {
>> +               spin_lock(&hctx->lock);
>> +               list_add(&rq->queuelist, &hctx->dispatch);
>> +               spin_unlock(&hctx->lock);
>> +               return true;
>> +       }
>> +
>> +       if (has_sched) {
>>                rq->rq_flags |=3D RQF_SORTED;
>> -               return false;
>> +               WARN_ON(rq->tag !=3D -1);
>>        }
>>=20
>> -       /*
>> -        * If we already have a real request tag, send directly to
>> -        * the dispatch list.
>> -        */
>> -       spin_lock(&hctx->lock);
>> -       list_add(&rq->queuelist, &hctx->dispatch);
>> -       spin_unlock(&hctx->lock);
>> -       return true;
>> +       return false;
>> }
>>=20
>> makes blk_mq_sched_bypass_insert return false for all non-flush
>> requests.  =46rom that, the anomaly described in our commit follows, =
for
>> bfq any stateful scheduler that waits for the completion of requests
>> that passed through it.  I'm elaborating again a little bit on this =
in
>> my replies to your next points below.
>=20
> Before a6a252e64914, follows blk_mq_sched_bypass_insert()
>=20
>       if (rq->tag =3D=3D -1) {
>               rq->rq_flags |=3D RQF_SORTED;
>               return false;
> 	   }
>=20
>       /*
>        * If we already have a real request tag, send directly to
>        * the dispatch list.
>        */
>       spin_lock(&hctx->lock);
>       list_add(&rq->queuelist, &hctx->dispatch);
>       spin_unlock(&hctx->lock);
>       return true;
>=20
> This function still returns false for all non-flush request, so =
nothing
> changes wrt. this kind of handling.
>=20

Yep Ming.  I don't have the expertise to tell you why, but the failure
in the USB case was caused by an rq that is not flush, but for which
rq->tag !=3D -1.  So, the previous version of blk_mq_sched_bypass_insert
returned true, and there was not failure, while after commit
a6a252e64914 the function returns true and the failure occurs if bfq
does not exploit the requeue hook.

You have actually shown it yourself, several months ago, through the
simple code instrumentation you made and used to show that bfq was
stuck.  And I guess it can still be reproduced very easily, unless
something else has changed in blk-mq.

BTW, if you can shed a light on this fact, that would be a great
occasion to learn for me.

>>=20
>> I don't mean that this change is an error, it simply sends a stateful
>> scheduler in an inconsistent state, unless the scheduler properly
>> handles the requeue that precedes the re-insertion into the
>> scheduler.
>>=20
>> If this clarifies the situation, but there is still some misleading
>> statement in the commit, just let me better understand, and I'll be
>> glad to rectify it, if possible somehow.
>>=20
>>=20
>>> And you can see blk_mq_requeue_request() is called by lots of =
drivers,
>>> especially it is often used in error handler, see SCSI's example.
>>>=20
>>>> consequence, I/O schedulers may get the same request inserted =
again,
>>>> even several times, without a finish_request invoked on that =
request
>>>> before each re-insertion.
>>>>=20
>>=20
>> ...
>>=20
>>>> @@ -5426,7 +5482,8 @@ static struct elevator_type iosched_bfq_mq =3D =
{
>>>> 	.ops.mq =3D {
>>>> 		.limit_depth		=3D bfq_limit_depth,
>>>> 		.prepare_request	=3D bfq_prepare_request,
>>>> -		.finish_request		=3D bfq_finish_request,
>>>> +		.requeue_request        =3D bfq_finish_requeue_request,
>>>> +		.finish_request		=3D bfq_finish_requeue_request,
>>>> 		.exit_icq		=3D bfq_exit_icq,
>>>> 		.insert_requests	=3D bfq_insert_requests,
>>>> 		.dispatch_request	=3D bfq_dispatch_request,
>>>=20
>>> This way may not be correct since blk_mq_sched_requeue_request() can =
be
>>> called for one request which won't enter io scheduler.
>>>=20
>>=20
>> Exactly, there are two cases: requeues that lead to subsequent
>> re-insertions, and requeues that don't.  The function
>> bfq_finish_requeue_request handles both, and both need to be handled,
>> to inform bfq that it has not to wait for the completion of rq any
>> longer.
>>=20
>> One special case is when bfq_finish_requeue_request gets invoked even
>> if rq has nothing to do with any scheduler.  In that case,
>> bfq_finish_requeue_request exists immediately.
>>=20
>>=20
>>> __blk_mq_requeue_request() is called for two cases:
>>>=20
>>> 	- one is that the requeued request is added to hctx->dispatch, =
such
>>> 	as blk_mq_dispatch_rq_list()
>>=20
>> yes
>>=20
>>> 	- another case is that the request is requeued to io scheduler, =
such as
>>> 	blk_mq_requeue_request().
>>>=20
>>=20
>> yes
>>=20
>>> For the 1st case, blk_mq_sched_requeue_request() shouldn't be called
>>> since it is nothing to do with scheduler,
>>=20
>> No, if that rq has been inserted and then extracted from the =
scheduler
>> through a dispatch_request, then it has.  The scheduler is stateful,
>> and keeps state for rq, because it must do so, until a completion or =
a
>> requeue arrive.  In particular, bfq may decide that no other
>=20
> This 'requeue' to hctx->dispatch is invisible to io scheduler, and =
from
> io scheduler view, seems no difference between STS_OK and =
STS_RESOURCE,
> since this request will be submitted to lld automatically by blk-mq
> core.
>=20
> Also in legacy path, no such notification to io scheduler too.

ok, still, for reasons that at this point I don't know, in the last
9-10 years it has never been reported a failure caused by a request
that has been first dispatched by bfq and then re-inserted into bfq
without being completed.  This new event is the problem.


>> bfq_queues must be served before rq has been completed, because this
>> is necessary to preserve its target service guarantees.  If bfq is =
not
>> informed, either through its completion or its requeue hook, then it
>> will wait forever.
>=20
> The .finish_request will be called after this request is completed for =
this
> case, either after this io is completed by device, or timeout.
>=20
> Or .requeue_request will be called if the driver need to requeue it to
> io scheduler via blk_mq_requeue_request() explicitly.
>=20
> So io scheduler will be made sure to get notified.
>=20

So, you mean that the scheduler will get notified in a way or the
other, if it has a requeue hook defined, right?  Or, simply, the
request will be eventually completed, thus unblocking the scheduler?


>>=20
>>> seems we only need to do that
>>> for 2nd case.
>>>=20
>>=20
>>> So looks we need the following patch:
>>>=20
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 23de7fd8099a..a216f3c3c3ce 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -712,7 +714,6 @@ static void __blk_mq_requeue_request(struct =
request *rq)
>>>=20
>>> 	trace_block_rq_requeue(q, rq);
>>> 	wbt_requeue(q->rq_wb, &rq->issue_stat);
>>> -	blk_mq_sched_requeue_request(rq);
>>>=20
>>=20
>> By doing so, if rq has 'passed' through bfq, then you block again bfq
>> forever.
>=20
> Could you explain it a bit why bfq is blocked by this way?

It's a theoretical problem (unfortunately not just a contingency you
can work around).  If you want to guarantee that a process doing sync
I/O, and with a higher weight than other processes, gets a share of
the total throughput proportional to its weight, then you have to idle
the device during the service slot of that process.  Otherwise, the
pending requests of the other processes will simply slip through
*every time* the high-weight process has no pending request because it
is blocked by one of its sync request.  The final result will be loss
of control on the percentage of requests of the high-weight process
that get dispatched w.r.t. to the total number of requests
dispatched.  So, failure to provide the process with its due share of
the throughput.

I'm working on solutions to compensate for this nasty service
constraint, and its effects on throughput, but this is another story.

>=20
>>=20
>> Of course, I may be wrong, as I don't have a complete mental model of
>> all what blk-mq does around bfq.  But I thin that you can quickly
>> check whether a hang occurs again if you add this change.  In
>> particular, it should happen in the USB former failure case.
>=20
> USB/BFQ runs well on my parted test with this change, and this test =
can
> trigger the IO hang issue easily on kyber or without your patch of =
'block,
> bfq: add requeue-request hook'.
>=20

Great!  According to the facts I reported at the beginning of
this email, I guess your patch works because it avoids the
nasty case for bfq, i.e., a request re-inserted into bfq without being
completed first, right?  And then, about the requeues without
re-insertions, everything works because the latter type of requests
eventually get a completion.  Is it correct?

Sorry for being pedantic, but because of our lack of expertise on
these requeueing mechanism, we worked really a lot on this commit, and
it would be frustrating if we bumped again into troubles, after
removing it.  Apart from that, if this commit is useless, it is
perfectly fine for me that it is reverted and replaced by a better
solution.  In the end, less code in bfq means less room for bugs ;)

Thanks,
Paolo

> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
@ 2018-02-24  7:54         ` Paolo Valente
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Valente @ 2018-02-24  7:54 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Linux Kernel Mailing List, Ulf Hansson,
	Mark Brown, Linus Walleij,
	'Paolo Valente' via bfq-iosched, Oleksandr Natalenko,
	Alban Browaeys, Ivan Kozik, SERENA ZIVIANI,
	Holger Hoffstätte, efault, Serena Ziviani



> Il giorno 23 feb 2018, alle ore 17:17, Ming Lei <ming.lei@redhat.com> ha scritto:
> 
> Hi Paolo,
> 
> On Fri, Feb 23, 2018 at 04:41:36PM +0100, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 23 feb 2018, alle ore 16:07, Ming Lei <ming.lei@redhat.com> ha scritto:
>>> 
>>> Hi Paolo,
>>> 
>>> On Wed, Feb 07, 2018 at 10:19:20PM +0100, Paolo Valente wrote:
>>>> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
>>>> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
>>>> be re-inserted into the active I/O scheduler for that device. As a
>>> 
>>> No, this behaviour isn't related with commit a6a252e64914, and
>>> it has been there since blk_mq_requeue_request() is introduced.
>>> 
>> 
>> Hi Ming,
>> actually, we wrote the above statement after simply following the call
>> chain that led to the failure.  In this respect, the change in commit
>> a6a252e64914:
>> 
>> static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
>> +                                      bool has_sched,
>>                                       struct request *rq)
>> {
>> -       if (rq->tag == -1) {
>> +       /* dispatch flush rq directly */
>> +       if (rq->rq_flags & RQF_FLUSH_SEQ) {
>> +               spin_lock(&hctx->lock);
>> +               list_add(&rq->queuelist, &hctx->dispatch);
>> +               spin_unlock(&hctx->lock);
>> +               return true;
>> +       }
>> +
>> +       if (has_sched) {
>>                rq->rq_flags |= RQF_SORTED;
>> -               return false;
>> +               WARN_ON(rq->tag != -1);
>>        }
>> 
>> -       /*
>> -        * If we already have a real request tag, send directly to
>> -        * the dispatch list.
>> -        */
>> -       spin_lock(&hctx->lock);
>> -       list_add(&rq->queuelist, &hctx->dispatch);
>> -       spin_unlock(&hctx->lock);
>> -       return true;
>> +       return false;
>> }
>> 
>> makes blk_mq_sched_bypass_insert return false for all non-flush
>> requests.  From that, the anomaly described in our commit follows, for
>> bfq any stateful scheduler that waits for the completion of requests
>> that passed through it.  I'm elaborating again a little bit on this in
>> my replies to your next points below.
> 
> Before a6a252e64914, follows blk_mq_sched_bypass_insert()
> 
>       if (rq->tag == -1) {
>               rq->rq_flags |= RQF_SORTED;
>               return false;
> 	   }
> 
>       /*
>        * If we already have a real request tag, send directly to
>        * the dispatch list.
>        */
>       spin_lock(&hctx->lock);
>       list_add(&rq->queuelist, &hctx->dispatch);
>       spin_unlock(&hctx->lock);
>       return true;
> 
> This function still returns false for all non-flush request, so nothing
> changes wrt. this kind of handling.
> 

Yep Ming.  I don't have the expertise to tell you why, but the failure
in the USB case was caused by an rq that is not flush, but for which
rq->tag != -1.  So, the previous version of blk_mq_sched_bypass_insert
returned true, and there was not failure, while after commit
a6a252e64914 the function returns true and the failure occurs if bfq
does not exploit the requeue hook.

You have actually shown it yourself, several months ago, through the
simple code instrumentation you made and used to show that bfq was
stuck.  And I guess it can still be reproduced very easily, unless
something else has changed in blk-mq.

BTW, if you can shed a light on this fact, that would be a great
occasion to learn for me.

>> 
>> I don't mean that this change is an error, it simply sends a stateful
>> scheduler in an inconsistent state, unless the scheduler properly
>> handles the requeue that precedes the re-insertion into the
>> scheduler.
>> 
>> If this clarifies the situation, but there is still some misleading
>> statement in the commit, just let me better understand, and I'll be
>> glad to rectify it, if possible somehow.
>> 
>> 
>>> And you can see blk_mq_requeue_request() is called by lots of drivers,
>>> especially it is often used in error handler, see SCSI's example.
>>> 
>>>> consequence, I/O schedulers may get the same request inserted again,
>>>> even several times, without a finish_request invoked on that request
>>>> before each re-insertion.
>>>> 
>> 
>> ...
>> 
>>>> @@ -5426,7 +5482,8 @@ static struct elevator_type iosched_bfq_mq = {
>>>> 	.ops.mq = {
>>>> 		.limit_depth		= bfq_limit_depth,
>>>> 		.prepare_request	= bfq_prepare_request,
>>>> -		.finish_request		= bfq_finish_request,
>>>> +		.requeue_request        = bfq_finish_requeue_request,
>>>> +		.finish_request		= bfq_finish_requeue_request,
>>>> 		.exit_icq		= bfq_exit_icq,
>>>> 		.insert_requests	= bfq_insert_requests,
>>>> 		.dispatch_request	= bfq_dispatch_request,
>>> 
>>> This way may not be correct since blk_mq_sched_requeue_request() can be
>>> called for one request which won't enter io scheduler.
>>> 
>> 
>> Exactly, there are two cases: requeues that lead to subsequent
>> re-insertions, and requeues that don't.  The function
>> bfq_finish_requeue_request handles both, and both need to be handled,
>> to inform bfq that it has not to wait for the completion of rq any
>> longer.
>> 
>> One special case is when bfq_finish_requeue_request gets invoked even
>> if rq has nothing to do with any scheduler.  In that case,
>> bfq_finish_requeue_request exists immediately.
>> 
>> 
>>> __blk_mq_requeue_request() is called for two cases:
>>> 
>>> 	- one is that the requeued request is added to hctx->dispatch, such
>>> 	as blk_mq_dispatch_rq_list()
>> 
>> yes
>> 
>>> 	- another case is that the request is requeued to io scheduler, such as
>>> 	blk_mq_requeue_request().
>>> 
>> 
>> yes
>> 
>>> For the 1st case, blk_mq_sched_requeue_request() shouldn't be called
>>> since it is nothing to do with scheduler,
>> 
>> No, if that rq has been inserted and then extracted from the scheduler
>> through a dispatch_request, then it has.  The scheduler is stateful,
>> and keeps state for rq, because it must do so, until a completion or a
>> requeue arrive.  In particular, bfq may decide that no other
> 
> This 'requeue' to hctx->dispatch is invisible to io scheduler, and from
> io scheduler view, seems no difference between STS_OK and STS_RESOURCE,
> since this request will be submitted to lld automatically by blk-mq
> core.
> 
> Also in legacy path, no such notification to io scheduler too.

ok, still, for reasons that at this point I don't know, in the last
9-10 years it has never been reported a failure caused by a request
that has been first dispatched by bfq and then re-inserted into bfq
without being completed.  This new event is the problem.


>> bfq_queues must be served before rq has been completed, because this
>> is necessary to preserve its target service guarantees.  If bfq is not
>> informed, either through its completion or its requeue hook, then it
>> will wait forever.
> 
> The .finish_request will be called after this request is completed for this
> case, either after this io is completed by device, or timeout.
> 
> Or .requeue_request will be called if the driver need to requeue it to
> io scheduler via blk_mq_requeue_request() explicitly.
> 
> So io scheduler will be made sure to get notified.
> 

So, you mean that the scheduler will get notified in a way or the
other, if it has a requeue hook defined, right?  Or, simply, the
request will be eventually completed, thus unblocking the scheduler?


>> 
>>> seems we only need to do that
>>> for 2nd case.
>>> 
>> 
>>> So looks we need the following patch:
>>> 
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 23de7fd8099a..a216f3c3c3ce 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -712,7 +714,6 @@ static void __blk_mq_requeue_request(struct request *rq)
>>> 
>>> 	trace_block_rq_requeue(q, rq);
>>> 	wbt_requeue(q->rq_wb, &rq->issue_stat);
>>> -	blk_mq_sched_requeue_request(rq);
>>> 
>> 
>> By doing so, if rq has 'passed' through bfq, then you block again bfq
>> forever.
> 
> Could you explain it a bit why bfq is blocked by this way?

It's a theoretical problem (unfortunately not just a contingency you
can work around).  If you want to guarantee that a process doing sync
I/O, and with a higher weight than other processes, gets a share of
the total throughput proportional to its weight, then you have to idle
the device during the service slot of that process.  Otherwise, the
pending requests of the other processes will simply slip through
*every time* the high-weight process has no pending request because it
is blocked by one of its sync request.  The final result will be loss
of control on the percentage of requests of the high-weight process
that get dispatched w.r.t. to the total number of requests
dispatched.  So, failure to provide the process with its due share of
the throughput.

I'm working on solutions to compensate for this nasty service
constraint, and its effects on throughput, but this is another story.

> 
>> 
>> Of course, I may be wrong, as I don't have a complete mental model of
>> all what blk-mq does around bfq.  But I thin that you can quickly
>> check whether a hang occurs again if you add this change.  In
>> particular, it should happen in the USB former failure case.
> 
> USB/BFQ runs well on my parted test with this change, and this test can
> trigger the IO hang issue easily on kyber or without your patch of 'block,
> bfq: add requeue-request hook'.
> 

Great!  According to the facts I reported at the beginning of
this email, I guess your patch works because it avoids the
nasty case for bfq, i.e., a request re-inserted into bfq without being
completed first, right?  And then, about the requeues without
re-insertions, everything works because the latter type of requests
eventually get a completion.  Is it correct?

Sorry for being pedantic, but because of our lack of expertise on
these requeueing mechanism, we worked really a lot on this commit, and
it would be frustrating if we bumped again into troubles, after
removing it.  Apart from that, if this commit is useless, it is
perfectly fine for me that it is reverted and replaced by a better
solution.  In the end, less code in bfq means less room for bugs ;)

Thanks,
Paolo

> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-24  7:54         ` Paolo Valente
  (?)
@ 2018-02-24 12:05         ` Ming Lei
  2018-02-24 16:15             ` Paolo Valente
  -1 siblings, 1 reply; 54+ messages in thread
From: Ming Lei @ 2018-02-24 12:05 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Jens Axboe, linux-block, Linux Kernel Mailing List, Ulf Hansson,
	Mark Brown, Linus Walleij,
	'Paolo Valente' via bfq-iosched, Oleksandr Natalenko,
	Alban Browaeys, Ivan Kozik, SERENA ZIVIANI,
	Holger Hoffstätte, efault, Serena Ziviani

On Sat, Feb 24, 2018 at 08:54:31AM +0100, Paolo Valente wrote:
> 
> 
> > Il giorno 23 feb 2018, alle ore 17:17, Ming Lei <ming.lei@redhat.com> ha scritto:
> > 
> > Hi Paolo,
> > 
> > On Fri, Feb 23, 2018 at 04:41:36PM +0100, Paolo Valente wrote:
> >> 
> >> 
> >>> Il giorno 23 feb 2018, alle ore 16:07, Ming Lei <ming.lei@redhat.com> ha scritto:
> >>> 
> >>> Hi Paolo,
> >>> 
> >>> On Wed, Feb 07, 2018 at 10:19:20PM +0100, Paolo Valente wrote:
> >>>> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
> >>>> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
> >>>> be re-inserted into the active I/O scheduler for that device. As a
> >>> 
> >>> No, this behaviour isn't related with commit a6a252e64914, and
> >>> it has been there since blk_mq_requeue_request() is introduced.
> >>> 
> >> 
> >> Hi Ming,
> >> actually, we wrote the above statement after simply following the call
> >> chain that led to the failure.  In this respect, the change in commit
> >> a6a252e64914:
> >> 
> >> static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
> >> +                                      bool has_sched,
> >>                                       struct request *rq)
> >> {
> >> -       if (rq->tag == -1) {
> >> +       /* dispatch flush rq directly */
> >> +       if (rq->rq_flags & RQF_FLUSH_SEQ) {
> >> +               spin_lock(&hctx->lock);
> >> +               list_add(&rq->queuelist, &hctx->dispatch);
> >> +               spin_unlock(&hctx->lock);
> >> +               return true;
> >> +       }
> >> +
> >> +       if (has_sched) {
> >>                rq->rq_flags |= RQF_SORTED;
> >> -               return false;
> >> +               WARN_ON(rq->tag != -1);
> >>        }
> >> 
> >> -       /*
> >> -        * If we already have a real request tag, send directly to
> >> -        * the dispatch list.
> >> -        */
> >> -       spin_lock(&hctx->lock);
> >> -       list_add(&rq->queuelist, &hctx->dispatch);
> >> -       spin_unlock(&hctx->lock);
> >> -       return true;
> >> +       return false;
> >> }
> >> 
> >> makes blk_mq_sched_bypass_insert return false for all non-flush
> >> requests.  From that, the anomaly described in our commit follows, for
> >> bfq any stateful scheduler that waits for the completion of requests
> >> that passed through it.  I'm elaborating again a little bit on this in
> >> my replies to your next points below.
> > 
> > Before a6a252e64914, follows blk_mq_sched_bypass_insert()
> > 
> >       if (rq->tag == -1) {
> >               rq->rq_flags |= RQF_SORTED;
> >               return false;
> > 	   }
> > 
> >       /*
> >        * If we already have a real request tag, send directly to
> >        * the dispatch list.
> >        */
> >       spin_lock(&hctx->lock);
> >       list_add(&rq->queuelist, &hctx->dispatch);
> >       spin_unlock(&hctx->lock);
> >       return true;
> > 
> > This function still returns false for all non-flush request, so nothing
> > changes wrt. this kind of handling.
> > 
> 
> Yep Ming.  I don't have the expertise to tell you why, but the failure
> in the USB case was caused by an rq that is not flush, but for which
> rq->tag != -1.  So, the previous version of blk_mq_sched_bypass_insert
> returned true, and there was not failure, while after commit
> a6a252e64914 the function returns true and the failure occurs if bfq
> does not exploit the requeue hook.
> 
> You have actually shown it yourself, several months ago, through the
> simple code instrumentation you made and used to show that bfq was
> stuck.  And I guess it can still be reproduced very easily, unless
> something else has changed in blk-mq.
> 
> BTW, if you can shed a light on this fact, that would be a great
> occasion to learn for me.

The difference should be made by commit 923218f6166a84 (blk-mq: don't
allocate driver tag upfront for flush rq), which releases driver tag
before requeuing the request, but that is definitely the correct thing
to do.

> 
> >> 
> >> I don't mean that this change is an error, it simply sends a stateful
> >> scheduler in an inconsistent state, unless the scheduler properly
> >> handles the requeue that precedes the re-insertion into the
> >> scheduler.
> >> 
> >> If this clarifies the situation, but there is still some misleading
> >> statement in the commit, just let me better understand, and I'll be
> >> glad to rectify it, if possible somehow.
> >> 
> >> 
> >>> And you can see blk_mq_requeue_request() is called by lots of drivers,
> >>> especially it is often used in error handler, see SCSI's example.
> >>> 
> >>>> consequence, I/O schedulers may get the same request inserted again,
> >>>> even several times, without a finish_request invoked on that request
> >>>> before each re-insertion.
> >>>> 
> >> 
> >> ...
> >> 
> >>>> @@ -5426,7 +5482,8 @@ static struct elevator_type iosched_bfq_mq = {
> >>>> 	.ops.mq = {
> >>>> 		.limit_depth		= bfq_limit_depth,
> >>>> 		.prepare_request	= bfq_prepare_request,
> >>>> -		.finish_request		= bfq_finish_request,
> >>>> +		.requeue_request        = bfq_finish_requeue_request,
> >>>> +		.finish_request		= bfq_finish_requeue_request,
> >>>> 		.exit_icq		= bfq_exit_icq,
> >>>> 		.insert_requests	= bfq_insert_requests,
> >>>> 		.dispatch_request	= bfq_dispatch_request,
> >>> 
> >>> This way may not be correct since blk_mq_sched_requeue_request() can be
> >>> called for one request which won't enter io scheduler.
> >>> 
> >> 
> >> Exactly, there are two cases: requeues that lead to subsequent
> >> re-insertions, and requeues that don't.  The function
> >> bfq_finish_requeue_request handles both, and both need to be handled,
> >> to inform bfq that it has not to wait for the completion of rq any
> >> longer.
> >> 
> >> One special case is when bfq_finish_requeue_request gets invoked even
> >> if rq has nothing to do with any scheduler.  In that case,
> >> bfq_finish_requeue_request exists immediately.
> >> 
> >> 
> >>> __blk_mq_requeue_request() is called for two cases:
> >>> 
> >>> 	- one is that the requeued request is added to hctx->dispatch, such
> >>> 	as blk_mq_dispatch_rq_list()
> >> 
> >> yes
> >> 
> >>> 	- another case is that the request is requeued to io scheduler, such as
> >>> 	blk_mq_requeue_request().
> >>> 
> >> 
> >> yes
> >> 
> >>> For the 1st case, blk_mq_sched_requeue_request() shouldn't be called
> >>> since it is nothing to do with scheduler,
> >> 
> >> No, if that rq has been inserted and then extracted from the scheduler
> >> through a dispatch_request, then it has.  The scheduler is stateful,
> >> and keeps state for rq, because it must do so, until a completion or a
> >> requeue arrive.  In particular, bfq may decide that no other
> > 
> > This 'requeue' to hctx->dispatch is invisible to io scheduler, and from
> > io scheduler view, seems no difference between STS_OK and STS_RESOURCE,
> > since this request will be submitted to lld automatically by blk-mq
> > core.
> > 
> > Also in legacy path, no such notification to io scheduler too.
> 
> ok, still, for reasons that at this point I don't know, in the last
> 9-10 years it has never been reported a failure caused by a request
> that has been first dispatched by bfq and then re-inserted into bfq
> without being completed.  This new event is the problem.

This mechanism is invented by blk-mq, and I guess it is for avoiding to
disable irq when acquiring hctx->lock since blk-mq's completion handler
is done in irq context.

Legacy's requeue won't re-insert one request to scheduler queue.

> 
> 
> >> bfq_queues must be served before rq has been completed, because this
> >> is necessary to preserve its target service guarantees.  If bfq is not
> >> informed, either through its completion or its requeue hook, then it
> >> will wait forever.
> > 
> > The .finish_request will be called after this request is completed for this
> > case, either after this io is completed by device, or timeout.
> > 
> > Or .requeue_request will be called if the driver need to requeue it to
> > io scheduler via blk_mq_requeue_request() explicitly.
> > 
> > So io scheduler will be made sure to get notified.
> > 
> 
> So, you mean that the scheduler will get notified in a way or the
> other, if it has a requeue hook defined, right?  Or, simply, the
> request will be eventually completed, thus unblocking the scheduler?

Right, once .queue_rq() returns BLK_STS_RESOURCE, this rq is added to
hctx->dispatch list, and finally it will be completed via driver or
timeout handler, or it may be re-inserted to io scheduler queue via
blk_mq_requeue_request() by driver in driver's completion handler.

So io scheduler will get the notification finally.

> 
> 
> >> 
> >>> seems we only need to do that
> >>> for 2nd case.
> >>> 
> >> 
> >>> So looks we need the following patch:
> >>> 
> >>> diff --git a/block/blk-mq.c b/block/blk-mq.c
> >>> index 23de7fd8099a..a216f3c3c3ce 100644
> >>> --- a/block/blk-mq.c
> >>> +++ b/block/blk-mq.c
> >>> @@ -712,7 +714,6 @@ static void __blk_mq_requeue_request(struct request *rq)
> >>> 
> >>> 	trace_block_rq_requeue(q, rq);
> >>> 	wbt_requeue(q->rq_wb, &rq->issue_stat);
> >>> -	blk_mq_sched_requeue_request(rq);
> >>> 
> >> 
> >> By doing so, if rq has 'passed' through bfq, then you block again bfq
> >> forever.
> > 
> > Could you explain it a bit why bfq is blocked by this way?
> 
> It's a theoretical problem (unfortunately not just a contingency you
> can work around).  If you want to guarantee that a process doing sync
> I/O, and with a higher weight than other processes, gets a share of
> the total throughput proportional to its weight, then you have to idle
> the device during the service slot of that process.  Otherwise, the
> pending requests of the other processes will simply slip through
> *every time* the high-weight process has no pending request because it
> is blocked by one of its sync request.  The final result will be loss
> of control on the percentage of requests of the high-weight process
> that get dispatched w.r.t. to the total number of requests
> dispatched.  So, failure to provide the process with its due share of
> the throughput.
> 
> I'm working on solutions to compensate for this nasty service
> constraint, and its effects on throughput, but this is another story.

As I explained, io scheduler will get notified via either
.finish_request or .requeue_request finally, so there shouldn't be
such block if this patch is in.

> 
> > 
> >> 
> >> Of course, I may be wrong, as I don't have a complete mental model of
> >> all what blk-mq does around bfq.  But I thin that you can quickly
> >> check whether a hang occurs again if you add this change.  In
> >> particular, it should happen in the USB former failure case.
> > 
> > USB/BFQ runs well on my parted test with this change, and this test can
> > trigger the IO hang issue easily on kyber or without your patch of 'block,
> > bfq: add requeue-request hook'.
> > 
> 
> Great!  According to the facts I reported at the beginning of
> this email, I guess your patch works because it avoids the
> nasty case for bfq, i.e., a request re-inserted into bfq without being
> completed first, right? 

BFQ still can get notified via .requeue_request when one request is re-inserted
to BFQ with this patch.

> And then, about the requeues without
> re-insertions, everything works because the latter type of requests
> eventually get a completion.  Is it correct?

Right.

> 
> Sorry for being pedantic, but because of our lack of expertise on
> these requeueing mechanism, we worked really a lot on this commit, and
> it would be frustrating if we bumped again into troubles, after
> removing it.  Apart from that, if this commit is useless, it is
> perfectly fine for me that it is reverted and replaced by a better
> solution.  In the end, less code in bfq means less room for bugs ;)

I agree. We should deal with this issue in a simple/clean way, just like
the patch of 'block: kyber: fix domain token leak during requeue'.

Now could you please let us know if you are fine with the patch of 'blk-mq:
don't call io sched's .requeue_request when requeueing rq to ->dispatch'?


Thanks,
Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-24 12:05         ` Ming Lei
@ 2018-02-24 16:15             ` Paolo Valente
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Valente @ 2018-02-24 16:15 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Linux Kernel Mailing List, Ulf Hansson,
	Mark Brown, Linus Walleij,
	'Paolo Valente' via bfq-iosched, Oleksandr Natalenko,
	Alban Browaeys, Ivan Kozik, SERENA ZIVIANI,
	Holger Hoffstätte, efault, Serena Ziviani



> Il giorno 24 feb 2018, alle ore 13:05, Ming Lei <ming.lei@redhat.com> =
ha scritto:
>=20
> On Sat, Feb 24, 2018 at 08:54:31AM +0100, Paolo Valente wrote:
>>=20
>>=20
>>> Il giorno 23 feb 2018, alle ore 17:17, Ming Lei =
<ming.lei@redhat.com> ha scritto:
>>>=20
>>> Hi Paolo,
>>>=20
>>> On Fri, Feb 23, 2018 at 04:41:36PM +0100, Paolo Valente wrote:
>>>>=20
>>>>=20
>>>>> Il giorno 23 feb 2018, alle ore 16:07, Ming Lei =
<ming.lei@redhat.com> ha scritto:
>>>>>=20
>>>>> Hi Paolo,
>>>>>=20
>>>>> On Wed, Feb 07, 2018 at 10:19:20PM +0100, Paolo Valente wrote:
>>>>>> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush =
rq via
>>>>>> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a =
device
>>>>>> be re-inserted into the active I/O scheduler for that device. As =
a
>>>>>=20
>>>>> No, this behaviour isn't related with commit a6a252e64914, and
>>>>> it has been there since blk_mq_requeue_request() is introduced.
>>>>>=20
>>>>=20
>>>> Hi Ming,
>>>> actually, we wrote the above statement after simply following the =
call
>>>> chain that led to the failure.  In this respect, the change in =
commit
>>>> a6a252e64914:
>>>>=20
>>>> static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
>>>> +                                      bool has_sched,
>>>>                                      struct request *rq)
>>>> {
>>>> -       if (rq->tag =3D=3D -1) {
>>>> +       /* dispatch flush rq directly */
>>>> +       if (rq->rq_flags & RQF_FLUSH_SEQ) {
>>>> +               spin_lock(&hctx->lock);
>>>> +               list_add(&rq->queuelist, &hctx->dispatch);
>>>> +               spin_unlock(&hctx->lock);
>>>> +               return true;
>>>> +       }
>>>> +
>>>> +       if (has_sched) {
>>>>               rq->rq_flags |=3D RQF_SORTED;
>>>> -               return false;
>>>> +               WARN_ON(rq->tag !=3D -1);
>>>>       }
>>>>=20
>>>> -       /*
>>>> -        * If we already have a real request tag, send directly to
>>>> -        * the dispatch list.
>>>> -        */
>>>> -       spin_lock(&hctx->lock);
>>>> -       list_add(&rq->queuelist, &hctx->dispatch);
>>>> -       spin_unlock(&hctx->lock);
>>>> -       return true;
>>>> +       return false;
>>>> }
>>>>=20
>>>> makes blk_mq_sched_bypass_insert return false for all non-flush
>>>> requests.  =46rom that, the anomaly described in our commit =
follows, for
>>>> bfq any stateful scheduler that waits for the completion of =
requests
>>>> that passed through it.  I'm elaborating again a little bit on this =
in
>>>> my replies to your next points below.
>>>=20
>>> Before a6a252e64914, follows blk_mq_sched_bypass_insert()
>>>=20
>>>      if (rq->tag =3D=3D -1) {
>>>              rq->rq_flags |=3D RQF_SORTED;
>>>              return false;
>>> 	   }
>>>=20
>>>      /*
>>>       * If we already have a real request tag, send directly to
>>>       * the dispatch list.
>>>       */
>>>      spin_lock(&hctx->lock);
>>>      list_add(&rq->queuelist, &hctx->dispatch);
>>>      spin_unlock(&hctx->lock);
>>>      return true;
>>>=20
>>> This function still returns false for all non-flush request, so =
nothing
>>> changes wrt. this kind of handling.
>>>=20
>>=20
>> Yep Ming.  I don't have the expertise to tell you why, but the =
failure
>> in the USB case was caused by an rq that is not flush, but for which
>> rq->tag !=3D -1.  So, the previous version of =
blk_mq_sched_bypass_insert
>> returned true, and there was not failure, while after commit
>> a6a252e64914 the function returns true and the failure occurs if bfq
>> does not exploit the requeue hook.
>>=20
>> You have actually shown it yourself, several months ago, through the
>> simple code instrumentation you made and used to show that bfq was
>> stuck.  And I guess it can still be reproduced very easily, unless
>> something else has changed in blk-mq.
>>=20
>> BTW, if you can shed a light on this fact, that would be a great
>> occasion to learn for me.
>=20
> The difference should be made by commit 923218f6166a84 (blk-mq: don't
> allocate driver tag upfront for flush rq), which releases driver tag
> before requeuing the request, but that is definitely the correct thing
> to do.
>=20

Ok.  We just did a bisection, guided by changes in the
function blk_mq_sched_bypass_insert, as that is the function whose
different return value led to the hang.  And commit a6a252e64914
apparently was the latest commit after which the hang occurs.

Anyway, what matters is just that, from some commit on, a requeued
request that before that commit was not re-inserted into the scheduler
started to be re-inserted.  And this was the trigger of the failure.

>>=20
>>>>=20
>>>> I don't mean that this change is an error, it simply sends a =
stateful
>>>> scheduler in an inconsistent state, unless the scheduler properly
>>>> handles the requeue that precedes the re-insertion into the
>>>> scheduler.
>>>>=20
>>>> If this clarifies the situation, but there is still some misleading
>>>> statement in the commit, just let me better understand, and I'll be
>>>> glad to rectify it, if possible somehow.
>>>>=20
>>>>=20
>>>>> And you can see blk_mq_requeue_request() is called by lots of =
drivers,
>>>>> especially it is often used in error handler, see SCSI's example.
>>>>>=20
>>>>>> consequence, I/O schedulers may get the same request inserted =
again,
>>>>>> even several times, without a finish_request invoked on that =
request
>>>>>> before each re-insertion.
>>>>>>=20
>>>>=20
>>>> ...
>>>>=20
>>>>>> @@ -5426,7 +5482,8 @@ static struct elevator_type iosched_bfq_mq =
=3D {
>>>>>> 	.ops.mq =3D {
>>>>>> 		.limit_depth		=3D bfq_limit_depth,
>>>>>> 		.prepare_request	=3D bfq_prepare_request,
>>>>>> -		.finish_request		=3D bfq_finish_request,
>>>>>> +		.requeue_request        =3D =
bfq_finish_requeue_request,
>>>>>> +		.finish_request		=3D =
bfq_finish_requeue_request,
>>>>>> 		.exit_icq		=3D bfq_exit_icq,
>>>>>> 		.insert_requests	=3D bfq_insert_requests,
>>>>>> 		.dispatch_request	=3D bfq_dispatch_request,
>>>>>=20
>>>>> This way may not be correct since blk_mq_sched_requeue_request() =
can be
>>>>> called for one request which won't enter io scheduler.
>>>>>=20
>>>>=20
>>>> Exactly, there are two cases: requeues that lead to subsequent
>>>> re-insertions, and requeues that don't.  The function
>>>> bfq_finish_requeue_request handles both, and both need to be =
handled,
>>>> to inform bfq that it has not to wait for the completion of rq any
>>>> longer.
>>>>=20
>>>> One special case is when bfq_finish_requeue_request gets invoked =
even
>>>> if rq has nothing to do with any scheduler.  In that case,
>>>> bfq_finish_requeue_request exists immediately.
>>>>=20
>>>>=20
>>>>> __blk_mq_requeue_request() is called for two cases:
>>>>>=20
>>>>> 	- one is that the requeued request is added to hctx->dispatch, =
such
>>>>> 	as blk_mq_dispatch_rq_list()
>>>>=20
>>>> yes
>>>>=20
>>>>> 	- another case is that the request is requeued to io scheduler, =
such as
>>>>> 	blk_mq_requeue_request().
>>>>>=20
>>>>=20
>>>> yes
>>>>=20
>>>>> For the 1st case, blk_mq_sched_requeue_request() shouldn't be =
called
>>>>> since it is nothing to do with scheduler,
>>>>=20
>>>> No, if that rq has been inserted and then extracted from the =
scheduler
>>>> through a dispatch_request, then it has.  The scheduler is =
stateful,
>>>> and keeps state for rq, because it must do so, until a completion =
or a
>>>> requeue arrive.  In particular, bfq may decide that no other
>>>=20
>>> This 'requeue' to hctx->dispatch is invisible to io scheduler, and =
from
>>> io scheduler view, seems no difference between STS_OK and =
STS_RESOURCE,
>>> since this request will be submitted to lld automatically by blk-mq
>>> core.
>>>=20
>>> Also in legacy path, no such notification to io scheduler too.
>>=20
>> ok, still, for reasons that at this point I don't know, in the last
>> 9-10 years it has never been reported a failure caused by a request
>> that has been first dispatched by bfq and then re-inserted into bfq
>> without being completed.  This new event is the problem.
>=20
> This mechanism is invented by blk-mq, and I guess it is for avoiding =
to
> disable irq when acquiring hctx->lock since blk-mq's completion =
handler
> is done in irq context.
>=20
> Legacy's requeue won't re-insert one request to scheduler queue.

ok, that's the root cause of this issue then.

>=20
>>=20
>>=20
>>>> bfq_queues must be served before rq has been completed, because =
this
>>>> is necessary to preserve its target service guarantees.  If bfq is =
not
>>>> informed, either through its completion or its requeue hook, then =
it
>>>> will wait forever.
>>>=20
>>> The .finish_request will be called after this request is completed =
for this
>>> case, either after this io is completed by device, or timeout.
>>>=20
>>> Or .requeue_request will be called if the driver need to requeue it =
to
>>> io scheduler via blk_mq_requeue_request() explicitly.
>>>=20
>>> So io scheduler will be made sure to get notified.
>>>=20
>>=20
>> So, you mean that the scheduler will get notified in a way or the
>> other, if it has a requeue hook defined, right?  Or, simply, the
>> request will be eventually completed, thus unblocking the scheduler?
>=20
> Right, once .queue_rq() returns BLK_STS_RESOURCE, this rq is added to
> hctx->dispatch list, and finally it will be completed via driver or
> timeout handler, or it may be re-inserted to io scheduler queue via
> blk_mq_requeue_request() by driver in driver's completion handler.
>=20
> So io scheduler will get the notification finally.

Yes, provided that the scheduler has a requeue hook, otherwise is one
case it is not notified before the re-isnertion.  And from this
troubles follow with a stateful scheduler like bfq.

>=20
>>=20
>>=20
>>>>=20
>>>>> seems we only need to do that
>>>>> for 2nd case.
>>>>>=20
>>>>=20
>>>>> So looks we need the following patch:
>>>>>=20
>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>> index 23de7fd8099a..a216f3c3c3ce 100644
>>>>> --- a/block/blk-mq.c
>>>>> +++ b/block/blk-mq.c
>>>>> @@ -712,7 +714,6 @@ static void __blk_mq_requeue_request(struct =
request *rq)
>>>>>=20
>>>>> 	trace_block_rq_requeue(q, rq);
>>>>> 	wbt_requeue(q->rq_wb, &rq->issue_stat);
>>>>> -	blk_mq_sched_requeue_request(rq);
>>>>>=20
>>>>=20
>>>> By doing so, if rq has 'passed' through bfq, then you block again =
bfq
>>>> forever.
>>>=20
>>> Could you explain it a bit why bfq is blocked by this way?
>>=20
>> It's a theoretical problem (unfortunately not just a contingency you
>> can work around).  If you want to guarantee that a process doing sync
>> I/O, and with a higher weight than other processes, gets a share of
>> the total throughput proportional to its weight, then you have to =
idle
>> the device during the service slot of that process.  Otherwise, the
>> pending requests of the other processes will simply slip through
>> *every time* the high-weight process has no pending request because =
it
>> is blocked by one of its sync request.  The final result will be loss
>> of control on the percentage of requests of the high-weight process
>> that get dispatched w.r.t. to the total number of requests
>> dispatched.  So, failure to provide the process with its due share of
>> the throughput.
>>=20
>> I'm working on solutions to compensate for this nasty service
>> constraint, and its effects on throughput, but this is another story.
>=20
> As I explained, io scheduler will get notified via either
> .finish_request or .requeue_request finally, so there shouldn't be
> such block if this patch is in.
>=20

Well, it depends on which patch is your "this patch" :)

I mean, if you refer to your new patch, then I'm afraid the
answer is "no, exactly because of what you explained".  In fact, in
case of a re-insertion, the scheduler gets notified through its
requeue hook.  But for this mechanism to work, the scheduler must have
a its requeue hook defined.  bfq had no such a hook defined, and our
patch defines exactly it.  So, I don'n see how the re-insertion case
might be handled correctly without our patch.  Maybe I'm just missing
something here.


>>=20
>>>=20
>>>>=20
>>>> Of course, I may be wrong, as I don't have a complete mental model =
of
>>>> all what blk-mq does around bfq.  But I thin that you can quickly
>>>> check whether a hang occurs again if you add this change.  In
>>>> particular, it should happen in the USB former failure case.
>>>=20
>>> USB/BFQ runs well on my parted test with this change, and this test =
can
>>> trigger the IO hang issue easily on kyber or without your patch of =
'block,
>>> bfq: add requeue-request hook'.
>>>=20
>>=20
>> Great!  According to the facts I reported at the beginning of
>> this email, I guess your patch works because it avoids the
>> nasty case for bfq, i.e., a request re-inserted into bfq without =
being
>> completed first, right?=20
>=20
> BFQ still can get notified via .requeue_request when one request is =
re-inserted
> to BFQ with this patch.
>=20
>> And then, about the requeues without
>> re-insertions, everything works because the latter type of requests
>> eventually get a completion.  Is it correct?
>=20
> Right.
>=20
>>=20
>> Sorry for being pedantic, but because of our lack of expertise on
>> these requeueing mechanism, we worked really a lot on this commit, =
and
>> it would be frustrating if we bumped again into troubles, after
>> removing it.  Apart from that, if this commit is useless, it is
>> perfectly fine for me that it is reverted and replaced by a better
>> solution.  In the end, less code in bfq means less room for bugs ;)
>=20
> I agree. We should deal with this issue in a simple/clean way, just =
like
> the patch of 'block: kyber: fix domain token leak during requeue'.
>=20

Our patch "block, bfq: add requeue-request hook" is exactly the bfq
counterpart of the patch 'block: kyber: fix domain token leak during =
requeue'

> Now could you please let us know if you are fine with the patch of =
'blk-mq:
> don't call io sched's .requeue_request when requeueing rq to =
->dispatch'?
>=20

Absolutely.  Just, in view of what you explained, and of what I wrote
in this reply, such a patch does not seem to make our patch useless.

Thanks,
Paolo


>=20
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
@ 2018-02-24 16:15             ` Paolo Valente
  0 siblings, 0 replies; 54+ messages in thread
From: Paolo Valente @ 2018-02-24 16:15 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Linux Kernel Mailing List, Ulf Hansson,
	Mark Brown, Linus Walleij,
	'Paolo Valente' via bfq-iosched, Oleksandr Natalenko,
	Alban Browaeys, Ivan Kozik, SERENA ZIVIANI,
	Holger Hoffstätte, efault, Serena Ziviani



> Il giorno 24 feb 2018, alle ore 13:05, Ming Lei <ming.lei@redhat.com> ha scritto:
> 
> On Sat, Feb 24, 2018 at 08:54:31AM +0100, Paolo Valente wrote:
>> 
>> 
>>> Il giorno 23 feb 2018, alle ore 17:17, Ming Lei <ming.lei@redhat.com> ha scritto:
>>> 
>>> Hi Paolo,
>>> 
>>> On Fri, Feb 23, 2018 at 04:41:36PM +0100, Paolo Valente wrote:
>>>> 
>>>> 
>>>>> Il giorno 23 feb 2018, alle ore 16:07, Ming Lei <ming.lei@redhat.com> ha scritto:
>>>>> 
>>>>> Hi Paolo,
>>>>> 
>>>>> On Wed, Feb 07, 2018 at 10:19:20PM +0100, Paolo Valente wrote:
>>>>>> Commit 'a6a252e64914 ("blk-mq-sched: decide how to handle flush rq via
>>>>>> RQF_FLUSH_SEQ")' makes all non-flush re-prepared requests for a device
>>>>>> be re-inserted into the active I/O scheduler for that device. As a
>>>>> 
>>>>> No, this behaviour isn't related with commit a6a252e64914, and
>>>>> it has been there since blk_mq_requeue_request() is introduced.
>>>>> 
>>>> 
>>>> Hi Ming,
>>>> actually, we wrote the above statement after simply following the call
>>>> chain that led to the failure.  In this respect, the change in commit
>>>> a6a252e64914:
>>>> 
>>>> static bool blk_mq_sched_bypass_insert(struct blk_mq_hw_ctx *hctx,
>>>> +                                      bool has_sched,
>>>>                                      struct request *rq)
>>>> {
>>>> -       if (rq->tag == -1) {
>>>> +       /* dispatch flush rq directly */
>>>> +       if (rq->rq_flags & RQF_FLUSH_SEQ) {
>>>> +               spin_lock(&hctx->lock);
>>>> +               list_add(&rq->queuelist, &hctx->dispatch);
>>>> +               spin_unlock(&hctx->lock);
>>>> +               return true;
>>>> +       }
>>>> +
>>>> +       if (has_sched) {
>>>>               rq->rq_flags |= RQF_SORTED;
>>>> -               return false;
>>>> +               WARN_ON(rq->tag != -1);
>>>>       }
>>>> 
>>>> -       /*
>>>> -        * If we already have a real request tag, send directly to
>>>> -        * the dispatch list.
>>>> -        */
>>>> -       spin_lock(&hctx->lock);
>>>> -       list_add(&rq->queuelist, &hctx->dispatch);
>>>> -       spin_unlock(&hctx->lock);
>>>> -       return true;
>>>> +       return false;
>>>> }
>>>> 
>>>> makes blk_mq_sched_bypass_insert return false for all non-flush
>>>> requests.  From that, the anomaly described in our commit follows, for
>>>> bfq any stateful scheduler that waits for the completion of requests
>>>> that passed through it.  I'm elaborating again a little bit on this in
>>>> my replies to your next points below.
>>> 
>>> Before a6a252e64914, follows blk_mq_sched_bypass_insert()
>>> 
>>>      if (rq->tag == -1) {
>>>              rq->rq_flags |= RQF_SORTED;
>>>              return false;
>>> 	   }
>>> 
>>>      /*
>>>       * If we already have a real request tag, send directly to
>>>       * the dispatch list.
>>>       */
>>>      spin_lock(&hctx->lock);
>>>      list_add(&rq->queuelist, &hctx->dispatch);
>>>      spin_unlock(&hctx->lock);
>>>      return true;
>>> 
>>> This function still returns false for all non-flush request, so nothing
>>> changes wrt. this kind of handling.
>>> 
>> 
>> Yep Ming.  I don't have the expertise to tell you why, but the failure
>> in the USB case was caused by an rq that is not flush, but for which
>> rq->tag != -1.  So, the previous version of blk_mq_sched_bypass_insert
>> returned true, and there was not failure, while after commit
>> a6a252e64914 the function returns true and the failure occurs if bfq
>> does not exploit the requeue hook.
>> 
>> You have actually shown it yourself, several months ago, through the
>> simple code instrumentation you made and used to show that bfq was
>> stuck.  And I guess it can still be reproduced very easily, unless
>> something else has changed in blk-mq.
>> 
>> BTW, if you can shed a light on this fact, that would be a great
>> occasion to learn for me.
> 
> The difference should be made by commit 923218f6166a84 (blk-mq: don't
> allocate driver tag upfront for flush rq), which releases driver tag
> before requeuing the request, but that is definitely the correct thing
> to do.
> 

Ok.  We just did a bisection, guided by changes in the
function blk_mq_sched_bypass_insert, as that is the function whose
different return value led to the hang.  And commit a6a252e64914
apparently was the latest commit after which the hang occurs.

Anyway, what matters is just that, from some commit on, a requeued
request that before that commit was not re-inserted into the scheduler
started to be re-inserted.  And this was the trigger of the failure.

>> 
>>>> 
>>>> I don't mean that this change is an error, it simply sends a stateful
>>>> scheduler in an inconsistent state, unless the scheduler properly
>>>> handles the requeue that precedes the re-insertion into the
>>>> scheduler.
>>>> 
>>>> If this clarifies the situation, but there is still some misleading
>>>> statement in the commit, just let me better understand, and I'll be
>>>> glad to rectify it, if possible somehow.
>>>> 
>>>> 
>>>>> And you can see blk_mq_requeue_request() is called by lots of drivers,
>>>>> especially it is often used in error handler, see SCSI's example.
>>>>> 
>>>>>> consequence, I/O schedulers may get the same request inserted again,
>>>>>> even several times, without a finish_request invoked on that request
>>>>>> before each re-insertion.
>>>>>> 
>>>> 
>>>> ...
>>>> 
>>>>>> @@ -5426,7 +5482,8 @@ static struct elevator_type iosched_bfq_mq = {
>>>>>> 	.ops.mq = {
>>>>>> 		.limit_depth		= bfq_limit_depth,
>>>>>> 		.prepare_request	= bfq_prepare_request,
>>>>>> -		.finish_request		= bfq_finish_request,
>>>>>> +		.requeue_request        = bfq_finish_requeue_request,
>>>>>> +		.finish_request		= bfq_finish_requeue_request,
>>>>>> 		.exit_icq		= bfq_exit_icq,
>>>>>> 		.insert_requests	= bfq_insert_requests,
>>>>>> 		.dispatch_request	= bfq_dispatch_request,
>>>>> 
>>>>> This way may not be correct since blk_mq_sched_requeue_request() can be
>>>>> called for one request which won't enter io scheduler.
>>>>> 
>>>> 
>>>> Exactly, there are two cases: requeues that lead to subsequent
>>>> re-insertions, and requeues that don't.  The function
>>>> bfq_finish_requeue_request handles both, and both need to be handled,
>>>> to inform bfq that it has not to wait for the completion of rq any
>>>> longer.
>>>> 
>>>> One special case is when bfq_finish_requeue_request gets invoked even
>>>> if rq has nothing to do with any scheduler.  In that case,
>>>> bfq_finish_requeue_request exists immediately.
>>>> 
>>>> 
>>>>> __blk_mq_requeue_request() is called for two cases:
>>>>> 
>>>>> 	- one is that the requeued request is added to hctx->dispatch, such
>>>>> 	as blk_mq_dispatch_rq_list()
>>>> 
>>>> yes
>>>> 
>>>>> 	- another case is that the request is requeued to io scheduler, such as
>>>>> 	blk_mq_requeue_request().
>>>>> 
>>>> 
>>>> yes
>>>> 
>>>>> For the 1st case, blk_mq_sched_requeue_request() shouldn't be called
>>>>> since it is nothing to do with scheduler,
>>>> 
>>>> No, if that rq has been inserted and then extracted from the scheduler
>>>> through a dispatch_request, then it has.  The scheduler is stateful,
>>>> and keeps state for rq, because it must do so, until a completion or a
>>>> requeue arrive.  In particular, bfq may decide that no other
>>> 
>>> This 'requeue' to hctx->dispatch is invisible to io scheduler, and from
>>> io scheduler view, seems no difference between STS_OK and STS_RESOURCE,
>>> since this request will be submitted to lld automatically by blk-mq
>>> core.
>>> 
>>> Also in legacy path, no such notification to io scheduler too.
>> 
>> ok, still, for reasons that at this point I don't know, in the last
>> 9-10 years it has never been reported a failure caused by a request
>> that has been first dispatched by bfq and then re-inserted into bfq
>> without being completed.  This new event is the problem.
> 
> This mechanism is invented by blk-mq, and I guess it is for avoiding to
> disable irq when acquiring hctx->lock since blk-mq's completion handler
> is done in irq context.
> 
> Legacy's requeue won't re-insert one request to scheduler queue.

ok, that's the root cause of this issue then.

> 
>> 
>> 
>>>> bfq_queues must be served before rq has been completed, because this
>>>> is necessary to preserve its target service guarantees.  If bfq is not
>>>> informed, either through its completion or its requeue hook, then it
>>>> will wait forever.
>>> 
>>> The .finish_request will be called after this request is completed for this
>>> case, either after this io is completed by device, or timeout.
>>> 
>>> Or .requeue_request will be called if the driver need to requeue it to
>>> io scheduler via blk_mq_requeue_request() explicitly.
>>> 
>>> So io scheduler will be made sure to get notified.
>>> 
>> 
>> So, you mean that the scheduler will get notified in a way or the
>> other, if it has a requeue hook defined, right?  Or, simply, the
>> request will be eventually completed, thus unblocking the scheduler?
> 
> Right, once .queue_rq() returns BLK_STS_RESOURCE, this rq is added to
> hctx->dispatch list, and finally it will be completed via driver or
> timeout handler, or it may be re-inserted to io scheduler queue via
> blk_mq_requeue_request() by driver in driver's completion handler.
> 
> So io scheduler will get the notification finally.

Yes, provided that the scheduler has a requeue hook, otherwise is one
case it is not notified before the re-isnertion.  And from this
troubles follow with a stateful scheduler like bfq.

> 
>> 
>> 
>>>> 
>>>>> seems we only need to do that
>>>>> for 2nd case.
>>>>> 
>>>> 
>>>>> So looks we need the following patch:
>>>>> 
>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>> index 23de7fd8099a..a216f3c3c3ce 100644
>>>>> --- a/block/blk-mq.c
>>>>> +++ b/block/blk-mq.c
>>>>> @@ -712,7 +714,6 @@ static void __blk_mq_requeue_request(struct request *rq)
>>>>> 
>>>>> 	trace_block_rq_requeue(q, rq);
>>>>> 	wbt_requeue(q->rq_wb, &rq->issue_stat);
>>>>> -	blk_mq_sched_requeue_request(rq);
>>>>> 
>>>> 
>>>> By doing so, if rq has 'passed' through bfq, then you block again bfq
>>>> forever.
>>> 
>>> Could you explain it a bit why bfq is blocked by this way?
>> 
>> It's a theoretical problem (unfortunately not just a contingency you
>> can work around).  If you want to guarantee that a process doing sync
>> I/O, and with a higher weight than other processes, gets a share of
>> the total throughput proportional to its weight, then you have to idle
>> the device during the service slot of that process.  Otherwise, the
>> pending requests of the other processes will simply slip through
>> *every time* the high-weight process has no pending request because it
>> is blocked by one of its sync request.  The final result will be loss
>> of control on the percentage of requests of the high-weight process
>> that get dispatched w.r.t. to the total number of requests
>> dispatched.  So, failure to provide the process with its due share of
>> the throughput.
>> 
>> I'm working on solutions to compensate for this nasty service
>> constraint, and its effects on throughput, but this is another story.
> 
> As I explained, io scheduler will get notified via either
> .finish_request or .requeue_request finally, so there shouldn't be
> such block if this patch is in.
> 

Well, it depends on which patch is your "this patch" :)

I mean, if you refer to your new patch, then I'm afraid the
answer is "no, exactly because of what you explained".  In fact, in
case of a re-insertion, the scheduler gets notified through its
requeue hook.  But for this mechanism to work, the scheduler must have
a its requeue hook defined.  bfq had no such a hook defined, and our
patch defines exactly it.  So, I don'n see how the re-insertion case
might be handled correctly without our patch.  Maybe I'm just missing
something here.


>> 
>>> 
>>>> 
>>>> Of course, I may be wrong, as I don't have a complete mental model of
>>>> all what blk-mq does around bfq.  But I thin that you can quickly
>>>> check whether a hang occurs again if you add this change.  In
>>>> particular, it should happen in the USB former failure case.
>>> 
>>> USB/BFQ runs well on my parted test with this change, and this test can
>>> trigger the IO hang issue easily on kyber or without your patch of 'block,
>>> bfq: add requeue-request hook'.
>>> 
>> 
>> Great!  According to the facts I reported at the beginning of
>> this email, I guess your patch works because it avoids the
>> nasty case for bfq, i.e., a request re-inserted into bfq without being
>> completed first, right? 
> 
> BFQ still can get notified via .requeue_request when one request is re-inserted
> to BFQ with this patch.
> 
>> And then, about the requeues without
>> re-insertions, everything works because the latter type of requests
>> eventually get a completion.  Is it correct?
> 
> Right.
> 
>> 
>> Sorry for being pedantic, but because of our lack of expertise on
>> these requeueing mechanism, we worked really a lot on this commit, and
>> it would be frustrating if we bumped again into troubles, after
>> removing it.  Apart from that, if this commit is useless, it is
>> perfectly fine for me that it is reverted and replaced by a better
>> solution.  In the end, less code in bfq means less room for bugs ;)
> 
> I agree. We should deal with this issue in a simple/clean way, just like
> the patch of 'block: kyber: fix domain token leak during requeue'.
> 

Our patch "block, bfq: add requeue-request hook" is exactly the bfq
counterpart of the patch 'block: kyber: fix domain token leak during requeue'

> Now could you please let us know if you are fine with the patch of 'blk-mq:
> don't call io sched's .requeue_request when requeueing rq to ->dispatch'?
> 

Absolutely.  Just, in view of what you explained, and of what I wrote
in this reply, such a patch does not seem to make our patch useless.

Thanks,
Paolo


> 
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-12 12:48               ` Oleksandr Natalenko
  2018-02-13  5:02                 ` Mike Galbraith
@ 2018-02-26 19:48                 ` Greg KH
  2018-02-26 20:10                   ` Oleksandr Natalenko
  1 sibling, 1 reply; 54+ messages in thread
From: Greg KH @ 2018-02-26 19:48 UTC (permalink / raw)
  To: Oleksandr Natalenko; +Cc: Mike Galbraith, Paolo Valente, Jens Axboe, stable

On Mon, Feb 12, 2018 at 01:48:28PM +0100, Oleksandr Natalenko wrote:
> Hi.
> 
> 12.02.2018 12:22, Mike Galbraith wrote:
> > How does that work without someone actually submitting patches? �CC
> > stable and pass along a conveniently sorted cherry-pick list?
> 
> From my experience, Greg is fine with the list of hashes, but he needs an
> approval from the relevant subtree maintainer (Jens in this case).

Yes, that's fine with me.

What ever happened to this?  Do you have a list of patches I should
apply?  Or a patch series?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-26 19:48                 ` Greg KH
@ 2018-02-26 20:10                   ` Oleksandr Natalenko
  2018-02-26 20:18                     ` Jens Axboe
  0 siblings, 1 reply; 54+ messages in thread
From: Oleksandr Natalenko @ 2018-02-26 20:10 UTC (permalink / raw)
  To: Greg KH; +Cc: Mike Galbraith, Paolo Valente, Jens Axboe, stable

Hi.

On pondělí 26. února 2018 20:48:33 CET Greg KH wrote:
> Yes, that's fine with me.
> 
> What ever happened to this?  Do you have a list of patches I should
> apply?  Or a patch series?

Here are BFQ commits that are already merged into Linus' tree:

a78773906147 block, bfq: add requeue-request hook
8a8747dc01ce block, bfq: limit sectors served with interactive weight raising
a52a69ea89dc block, bfq: limit tags for writes and async I/O
8993d445df38 block, bfq: fix occurrences of request finish method's old name
8abef10b3de1 bfq-iosched: don't call bfqg_and_blkg_put for !
CONFIG_BFQ_GROUP_IOSCHED
0d52af590552 block, bfq: release oom-queue ref to root group on exit
52257ffbfcaf block, bfq: put async queues for root bfq groups too
9b25bd0368d5 block, bfq: remove batches of confusing ifdefs
a34b024448eb block, bfq: consider also past I/O in soft real-time detection
4403e4e467c3 block, bfq: remove superfluous check in queue-merging setup
7b8fa3b900a0 block, bfq: let a queue be merged only shortly after starting I/O
1be6e8a964ee block, bfq: check low_latency flag in bfq_bfqq_save_state()
05e902835616 block, bfq: add missing rq_pos_tree update on rq removal
f0ba5ea2fe45 block, bfq: increase threshold to deem I/O as random

Paolo, could you please confirm we are not missing anything here?

Oleksandr

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH BUGFIX V3] block, bfq: add requeue-request hook
  2018-02-26 20:10                   ` Oleksandr Natalenko
@ 2018-02-26 20:18                     ` Jens Axboe
  0 siblings, 0 replies; 54+ messages in thread
From: Jens Axboe @ 2018-02-26 20:18 UTC (permalink / raw)
  To: Oleksandr Natalenko, Greg KH; +Cc: Mike Galbraith, Paolo Valente, stable

On 2/26/18 1:10 PM, Oleksandr Natalenko wrote:
> Hi.
> 
> On pondělí 26. února 2018 20:48:33 CET Greg KH wrote:
>> Yes, that's fine with me.
>>
>> What ever happened to this?  Do you have a list of patches I should
>> apply?  Or a patch series?
> 
> Here are BFQ commits that are already merged into Linus' tree:
> 
> a78773906147 block, bfq: add requeue-request hook
> 8a8747dc01ce block, bfq: limit sectors served with interactive weight raising
> a52a69ea89dc block, bfq: limit tags for writes and async I/O
> 8993d445df38 block, bfq: fix occurrences of request finish method's old name
> 8abef10b3de1 bfq-iosched: don't call bfqg_and_blkg_put for !
> CONFIG_BFQ_GROUP_IOSCHED
> 0d52af590552 block, bfq: release oom-queue ref to root group on exit
> 52257ffbfcaf block, bfq: put async queues for root bfq groups too
> 9b25bd0368d5 block, bfq: remove batches of confusing ifdefs
> a34b024448eb block, bfq: consider also past I/O in soft real-time detection
> 4403e4e467c3 block, bfq: remove superfluous check in queue-merging setup
> 7b8fa3b900a0 block, bfq: let a queue be merged only shortly after starting I/O
> 1be6e8a964ee block, bfq: check low_latency flag in bfq_bfqq_save_state()
> 05e902835616 block, bfq: add missing rq_pos_tree update on rq removal
> f0ba5ea2fe45 block, bfq: increase threshold to deem I/O as random
> 
> Paolo, could you please confirm we are not missing anything here?

Guys, this is crazy, I'm not going to ack this for stable backports.
This is everything including code cleanups.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2018-02-26 20:18 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-07 21:19 [PATCH BUGFIX V3] block, bfq: add requeue-request hook Paolo Valente
2018-02-07 22:18 ` Jens Axboe
2018-02-08  7:16   ` Paolo Valente
2018-02-08  7:16     ` Paolo Valente
2018-02-09 13:21     ` Oleksandr Natalenko
2018-02-09 17:17       ` Jens Axboe
2018-02-09 17:29       ` Mike Galbraith
2018-02-09 17:29         ` Mike Galbraith
2018-02-10  8:29         ` Oleksandr Natalenko
2018-02-10  8:29           ` Oleksandr Natalenko
2018-02-12  7:24           ` Paolo Valente
2018-02-12  7:24             ` Paolo Valente
2018-02-12 11:22             ` Mike Galbraith
2018-02-12 12:48               ` Oleksandr Natalenko
2018-02-13  5:02                 ` Mike Galbraith
2018-02-13  5:46                   ` Paolo Valente
2018-02-13  6:42                     ` Mike Galbraith
2018-02-13  6:50                       ` Mike Galbraith
2018-02-13  8:15                         ` Paolo Valente
2018-02-13 14:30                           ` Mike Galbraith
2018-02-14  5:58                             ` Mike Galbraith
2018-02-14  7:04                               ` Mike Galbraith
2018-02-14  7:15                                 ` Mike Galbraith
2018-02-14  8:56                                   ` Paolo Valente
2018-02-14  9:55                                     ` Oleksandr Natalenko
2018-02-14 13:25                                       ` Mike Galbraith
2018-02-14 15:19                                     ` Jens Axboe
2018-02-14 15:39                                       ` Paolo Valente
2018-02-14 15:44                                         ` Jens Axboe
2018-02-15 18:13                                           ` Paolo Valente
2018-02-16  5:39                                             ` Mike Galbraith
2018-02-16  7:16                                               ` Paolo Valente
2018-02-16  7:20                                               ` Paolo Valente
2018-02-17  5:30                                           ` Mike Galbraith
2018-02-17  5:33                                             ` Mike Galbraith
2018-02-19  5:34                                               ` Mike Galbraith
2018-02-19  6:05                                                 ` Oleksandr Natalenko
2018-02-19  7:09                                                   ` Mike Galbraith
2018-02-19  7:58                                                     ` Mike Galbraith
2018-02-19  9:37                                                 ` Paolo Valente
2018-02-19 13:18                                                   ` Mike Galbraith
2018-02-17  5:39                                             ` Mike Galbraith
2018-02-26 19:48                 ` Greg KH
2018-02-26 20:10                   ` Oleksandr Natalenko
2018-02-26 20:18                     ` Jens Axboe
2018-02-23 15:07 ` Ming Lei
2018-02-23 15:41   ` Paolo Valente
2018-02-23 15:41     ` Paolo Valente
2018-02-23 16:17     ` Ming Lei
2018-02-24  7:54       ` Paolo Valente
2018-02-24  7:54         ` Paolo Valente
2018-02-24 12:05         ` Ming Lei
2018-02-24 16:15           ` Paolo Valente
2018-02-24 16:15             ` Paolo Valente

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.