linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] blk-mq: punt failed direct issue to dispatch list
@ 2018-12-07  5:17 Jens Axboe
  2018-12-07  8:24 ` Ming Lei
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Jens Axboe @ 2018-12-07  5:17 UTC (permalink / raw)
  To: linux-block; +Cc: Mike Snitzer, Bart Van Assche, Ming Lei

After the direct dispatch corruption fix, we permanently disallow direct
dispatch of non read/write requests. This works fine off the normal IO
path, as they will be retried like any other failed direct dispatch
request. But for the blk_insert_cloned_request() that only DM uses to
bypass the bottom level scheduler, we always first attempt direct
dispatch. For some types of requests, that's now a permanent failure,
and no amount of retrying will make that succeed. This results in a
livelock.

Instead of making special cases for what we can direct issue, and now
having to deal with DM solving the livelock while still retaining a BUSY
condition feedback loop, always just add a request that has been through
->queue_rq() to the hardware queue dispatch list. These are safe to use
as no merging can take place there. Additionally, if requests do have
prepped data from drivers, we aren't dependent on them not sharing space
in the request structure to safely add them to the IO scheduler lists.

This basically reverts ffe81d45322c and is based on a patch from Ming,
but with the list insert case covered as well.

Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
Cc: stable@vger.kernel.org
Suggested-by: Ming Lei <ming.lei@redhat.com>
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

I've thrown the initial hang test reported by Bart at it, works fine.
My reproducer for the corruption case is also happy, as expected.

I'm running blktests and xfstests on it overnight. If that passes as
expected, this qualms my initial worries on using ->dispatch as a
holding place for these types of requests.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3262d83b9e07..6a7566244de3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1715,15 +1715,6 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
 		break;
 	case BLK_STS_RESOURCE:
 	case BLK_STS_DEV_RESOURCE:
-		/*
-		 * If direct dispatch fails, we cannot allow any merging on
-		 * this IO. Drivers (like SCSI) may have set up permanent state
-		 * for this request, like SG tables and mappings, and if we
-		 * merge to it later on then we'll still only do IO to the
-		 * original part.
-		 */
-		rq->cmd_flags |= REQ_NOMERGE;
-
 		blk_mq_update_dispatch_busy(hctx, true);
 		__blk_mq_requeue_request(rq);
 		break;
@@ -1736,18 +1727,6 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
 	return ret;
 }
 
-/*
- * Don't allow direct dispatch of anything but regular reads/writes,
- * as some of the other commands can potentially share request space
- * with data we need for the IO scheduler. If we attempt a direct dispatch
- * on those and fail, we can't safely add it to the scheduler afterwards
- * without potentially overwriting data that the driver has already written.
- */
-static bool blk_rq_can_direct_dispatch(struct request *rq)
-{
-	return req_op(rq) == REQ_OP_READ || req_op(rq) == REQ_OP_WRITE;
-}
-
 static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 						struct request *rq,
 						blk_qc_t *cookie,
@@ -1769,7 +1748,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 		goto insert;
 	}
 
-	if (!blk_rq_can_direct_dispatch(rq) || (q->elevator && !bypass_insert))
+	if (q->elevator && !bypass_insert)
 		goto insert;
 
 	if (!blk_mq_get_dispatch_budget(hctx))
@@ -1785,7 +1764,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 	if (bypass_insert)
 		return BLK_STS_RESOURCE;
 
-	blk_mq_sched_insert_request(rq, false, run_queue, false);
+	blk_mq_request_bypass_insert(rq, run_queue);
 	return BLK_STS_OK;
 }
 
@@ -1801,7 +1780,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
 
 	ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false);
 	if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
-		blk_mq_sched_insert_request(rq, false, true, false);
+		blk_mq_request_bypass_insert(rq, true);
 	else if (ret != BLK_STS_OK)
 		blk_mq_end_request(rq, ret);
 
@@ -1831,15 +1810,13 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
 		struct request *rq = list_first_entry(list, struct request,
 				queuelist);
 
-		if (!blk_rq_can_direct_dispatch(rq))
-			break;
-
 		list_del_init(&rq->queuelist);
 		ret = blk_mq_request_issue_directly(rq);
 		if (ret != BLK_STS_OK) {
 			if (ret == BLK_STS_RESOURCE ||
 					ret == BLK_STS_DEV_RESOURCE) {
-				list_add(&rq->queuelist, list);
+				blk_mq_request_bypass_insert(rq,
+							list_empty(list));
 				break;
 			}
 			blk_mq_end_request(rq, ret);


-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list
  2018-12-07  5:17 [PATCH v3] blk-mq: punt failed direct issue to dispatch list Jens Axboe
@ 2018-12-07  8:24 ` Ming Lei
  2018-12-07 10:50 ` Ming Lei
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Ming Lei @ 2018-12-07  8:24 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Mike Snitzer, Bart Van Assche

On Thu, Dec 06, 2018 at 10:17:44PM -0700, Jens Axboe wrote:
> After the direct dispatch corruption fix, we permanently disallow direct
> dispatch of non read/write requests. This works fine off the normal IO
> path, as they will be retried like any other failed direct dispatch
> request. But for the blk_insert_cloned_request() that only DM uses to
> bypass the bottom level scheduler, we always first attempt direct
> dispatch. For some types of requests, that's now a permanent failure,
> and no amount of retrying will make that succeed. This results in a
> livelock.
> 
> Instead of making special cases for what we can direct issue, and now
> having to deal with DM solving the livelock while still retaining a BUSY
> condition feedback loop, always just add a request that has been through
> ->queue_rq() to the hardware queue dispatch list. These are safe to use
> as no merging can take place there. Additionally, if requests do have
> prepped data from drivers, we aren't dependent on them not sharing space
> in the request structure to safely add them to the IO scheduler lists.
> 
> This basically reverts ffe81d45322c and is based on a patch from Ming,
> but with the list insert case covered as well.
> 
> Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
> Cc: stable@vger.kernel.org
> Suggested-by: Ming Lei <ming.lei@redhat.com>
> Reported-by: Bart Van Assche <bvanassche@acm.org>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> I've thrown the initial hang test reported by Bart at it, works fine.
> My reproducer for the corruption case is also happy, as expected.
> 
> I'm running blktests and xfstests on it overnight. If that passes as
> expected, this qualms my initial worries on using ->dispatch as a
> holding place for these types of requests.
> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 3262d83b9e07..6a7566244de3 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1715,15 +1715,6 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
>  		break;
>  	case BLK_STS_RESOURCE:
>  	case BLK_STS_DEV_RESOURCE:
> -		/*
> -		 * If direct dispatch fails, we cannot allow any merging on
> -		 * this IO. Drivers (like SCSI) may have set up permanent state
> -		 * for this request, like SG tables and mappings, and if we
> -		 * merge to it later on then we'll still only do IO to the
> -		 * original part.
> -		 */
> -		rq->cmd_flags |= REQ_NOMERGE;
> -
>  		blk_mq_update_dispatch_busy(hctx, true);
>  		__blk_mq_requeue_request(rq);
>  		break;
> @@ -1736,18 +1727,6 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
>  	return ret;
>  }
>  
> -/*
> - * Don't allow direct dispatch of anything but regular reads/writes,
> - * as some of the other commands can potentially share request space
> - * with data we need for the IO scheduler. If we attempt a direct dispatch
> - * on those and fail, we can't safely add it to the scheduler afterwards
> - * without potentially overwriting data that the driver has already written.
> - */
> -static bool blk_rq_can_direct_dispatch(struct request *rq)
> -{
> -	return req_op(rq) == REQ_OP_READ || req_op(rq) == REQ_OP_WRITE;
> -}
> -
>  static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  						struct request *rq,
>  						blk_qc_t *cookie,
> @@ -1769,7 +1748,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  		goto insert;
>  	}
>  
> -	if (!blk_rq_can_direct_dispatch(rq) || (q->elevator && !bypass_insert))
> +	if (q->elevator && !bypass_insert)
>  		goto insert;
>  
>  	if (!blk_mq_get_dispatch_budget(hctx))
> @@ -1785,7 +1764,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  	if (bypass_insert)
>  		return BLK_STS_RESOURCE;
>  
> -	blk_mq_sched_insert_request(rq, false, run_queue, false);
> +	blk_mq_request_bypass_insert(rq, run_queue);
>  	return BLK_STS_OK;
>  }
>  
> @@ -1801,7 +1780,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  
>  	ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false);
>  	if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
> -		blk_mq_sched_insert_request(rq, false, true, false);
> +		blk_mq_request_bypass_insert(rq, true);
>  	else if (ret != BLK_STS_OK)
>  		blk_mq_end_request(rq, ret);
>  
> @@ -1831,15 +1810,13 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
>  		struct request *rq = list_first_entry(list, struct request,
>  				queuelist);
>  
> -		if (!blk_rq_can_direct_dispatch(rq))
> -			break;
> -
>  		list_del_init(&rq->queuelist);
>  		ret = blk_mq_request_issue_directly(rq);
>  		if (ret != BLK_STS_OK) {
>  			if (ret == BLK_STS_RESOURCE ||
>  					ret == BLK_STS_DEV_RESOURCE) {
> -				list_add(&rq->queuelist, list);
> +				blk_mq_request_bypass_insert(rq,
> +							list_empty(list));
>  				break;
>  			}
>  			blk_mq_end_request(rq, ret);

Looks fine, I will run my test with this patch first.

thanks,
Ming

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list
  2018-12-07  5:17 [PATCH v3] blk-mq: punt failed direct issue to dispatch list Jens Axboe
  2018-12-07  8:24 ` Ming Lei
@ 2018-12-07 10:50 ` Ming Lei
  2018-12-07 15:15 ` Mike Snitzer
  2018-12-07 16:19 ` Bart Van Assche
  3 siblings, 0 replies; 9+ messages in thread
From: Ming Lei @ 2018-12-07 10:50 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Mike Snitzer, Bart Van Assche

On Thu, Dec 06, 2018 at 10:17:44PM -0700, Jens Axboe wrote:
> After the direct dispatch corruption fix, we permanently disallow direct
> dispatch of non read/write requests. This works fine off the normal IO
> path, as they will be retried like any other failed direct dispatch
> request. But for the blk_insert_cloned_request() that only DM uses to
> bypass the bottom level scheduler, we always first attempt direct
> dispatch. For some types of requests, that's now a permanent failure,
> and no amount of retrying will make that succeed. This results in a
> livelock.
> 
> Instead of making special cases for what we can direct issue, and now
> having to deal with DM solving the livelock while still retaining a BUSY
> condition feedback loop, always just add a request that has been through
> ->queue_rq() to the hardware queue dispatch list. These are safe to use
> as no merging can take place there. Additionally, if requests do have
> prepped data from drivers, we aren't dependent on them not sharing space
> in the request structure to safely add them to the IO scheduler lists.
> 
> This basically reverts ffe81d45322c and is based on a patch from Ming,
> but with the list insert case covered as well.
> 
> Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
> Cc: stable@vger.kernel.org
> Suggested-by: Ming Lei <ming.lei@redhat.com>
> Reported-by: Bart Van Assche <bvanassche@acm.org>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> I've thrown the initial hang test reported by Bart at it, works fine.
> My reproducer for the corruption case is also happy, as expected.
> 
> I'm running blktests and xfstests on it overnight. If that passes as
> expected, this qualms my initial worries on using ->dispatch as a
> holding place for these types of requests.
> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 3262d83b9e07..6a7566244de3 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1715,15 +1715,6 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
>  		break;
>  	case BLK_STS_RESOURCE:
>  	case BLK_STS_DEV_RESOURCE:
> -		/*
> -		 * If direct dispatch fails, we cannot allow any merging on
> -		 * this IO. Drivers (like SCSI) may have set up permanent state
> -		 * for this request, like SG tables and mappings, and if we
> -		 * merge to it later on then we'll still only do IO to the
> -		 * original part.
> -		 */
> -		rq->cmd_flags |= REQ_NOMERGE;
> -
>  		blk_mq_update_dispatch_busy(hctx, true);
>  		__blk_mq_requeue_request(rq);
>  		break;
> @@ -1736,18 +1727,6 @@ static blk_status_t __blk_mq_issue_directly(struct blk_mq_hw_ctx *hctx,
>  	return ret;
>  }
>  
> -/*
> - * Don't allow direct dispatch of anything but regular reads/writes,
> - * as some of the other commands can potentially share request space
> - * with data we need for the IO scheduler. If we attempt a direct dispatch
> - * on those and fail, we can't safely add it to the scheduler afterwards
> - * without potentially overwriting data that the driver has already written.
> - */
> -static bool blk_rq_can_direct_dispatch(struct request *rq)
> -{
> -	return req_op(rq) == REQ_OP_READ || req_op(rq) == REQ_OP_WRITE;
> -}
> -
>  static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  						struct request *rq,
>  						blk_qc_t *cookie,
> @@ -1769,7 +1748,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  		goto insert;
>  	}
>  
> -	if (!blk_rq_can_direct_dispatch(rq) || (q->elevator && !bypass_insert))
> +	if (q->elevator && !bypass_insert)
>  		goto insert;
>  
>  	if (!blk_mq_get_dispatch_budget(hctx))
> @@ -1785,7 +1764,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  	if (bypass_insert)
>  		return BLK_STS_RESOURCE;
>  
> -	blk_mq_sched_insert_request(rq, false, run_queue, false);
> +	blk_mq_request_bypass_insert(rq, run_queue);
>  	return BLK_STS_OK;
>  }
>  
> @@ -1801,7 +1780,7 @@ static void blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
>  
>  	ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false);
>  	if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
> -		blk_mq_sched_insert_request(rq, false, true, false);
> +		blk_mq_request_bypass_insert(rq, true);
>  	else if (ret != BLK_STS_OK)
>  		blk_mq_end_request(rq, ret);
>  
> @@ -1831,15 +1810,13 @@ void blk_mq_try_issue_list_directly(struct blk_mq_hw_ctx *hctx,
>  		struct request *rq = list_first_entry(list, struct request,
>  				queuelist);
>  
> -		if (!blk_rq_can_direct_dispatch(rq))
> -			break;
> -
>  		list_del_init(&rq->queuelist);
>  		ret = blk_mq_request_issue_directly(rq);
>  		if (ret != BLK_STS_OK) {
>  			if (ret == BLK_STS_RESOURCE ||
>  					ret == BLK_STS_DEV_RESOURCE) {
> -				list_add(&rq->queuelist, list);
> +				blk_mq_request_bypass_insert(rq,
> +							list_empty(list));
>  				break;
>  			}
>  			blk_mq_end_request(rq, ret);
> 

Tested-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list
  2018-12-07  5:17 [PATCH v3] blk-mq: punt failed direct issue to dispatch list Jens Axboe
  2018-12-07  8:24 ` Ming Lei
  2018-12-07 10:50 ` Ming Lei
@ 2018-12-07 15:15 ` Mike Snitzer
  2018-12-07 16:19 ` Bart Van Assche
  3 siblings, 0 replies; 9+ messages in thread
From: Mike Snitzer @ 2018-12-07 15:15 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Bart Van Assche, Ming Lei

On Fri, Dec 07 2018 at 12:17am -0500,
Jens Axboe <axboe@kernel.dk> wrote:

> After the direct dispatch corruption fix, we permanently disallow direct
> dispatch of non read/write requests. This works fine off the normal IO
> path, as they will be retried like any other failed direct dispatch
> request. But for the blk_insert_cloned_request() that only DM uses to
> bypass the bottom level scheduler, we always first attempt direct
> dispatch. For some types of requests, that's now a permanent failure,
> and no amount of retrying will make that succeed. This results in a
> livelock.
> 
> Instead of making special cases for what we can direct issue, and now
> having to deal with DM solving the livelock while still retaining a BUSY
> condition feedback loop, always just add a request that has been through
> ->queue_rq() to the hardware queue dispatch list. These are safe to use
> as no merging can take place there. Additionally, if requests do have
> prepped data from drivers, we aren't dependent on them not sharing space
> in the request structure to safely add them to the IO scheduler lists.
> 
> This basically reverts ffe81d45322c and is based on a patch from Ming,
> but with the list insert case covered as well.
> 
> Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
> Cc: stable@vger.kernel.org
> Suggested-by: Ming Lei <ming.lei@redhat.com>
> Reported-by: Bart Van Assche <bvanassche@acm.org>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Looks good, thanks!

Acked-by: Mike Snitzer <snitzer@redhat.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list
  2018-12-07  5:17 [PATCH v3] blk-mq: punt failed direct issue to dispatch list Jens Axboe
                   ` (2 preceding siblings ...)
  2018-12-07 15:15 ` Mike Snitzer
@ 2018-12-07 16:19 ` Bart Van Assche
  2018-12-07 16:24   ` Jens Axboe
  3 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2018-12-07 16:19 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Mike Snitzer, Ming Lei

On Thu, 2018-12-06 at 22:17 -0700, Jens Axboe wrote:
> Instead of making special cases for what we can direct issue, and now
> having to deal with DM solving the livelock while still retaining a BUSY
> condition feedback loop, always just add a request that has been through
> ->queue_rq() to the hardware queue dispatch list. These are safe to use
> as no merging can take place there. Additionally, if requests do have
> prepped data from drivers, we aren't dependent on them not sharing space
> in the request structure to safely add them to the IO scheduler lists.

How about making blk_mq_sched_insert_request() complain if a request is passed
to it in which the RQF_DONTPREP flag has been set to avoid that this problem is
reintroduced in the future? Otherwise this patch looks fine to me.

Bart.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list
  2018-12-07 16:19 ` Bart Van Assche
@ 2018-12-07 16:24   ` Jens Axboe
  2018-12-07 16:35     ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2018-12-07 16:24 UTC (permalink / raw)
  To: Bart Van Assche, linux-block; +Cc: Mike Snitzer, Ming Lei

On 12/7/18 9:19 AM, Bart Van Assche wrote:
> On Thu, 2018-12-06 at 22:17 -0700, Jens Axboe wrote:
>> Instead of making special cases for what we can direct issue, and now
>> having to deal with DM solving the livelock while still retaining a BUSY
>> condition feedback loop, always just add a request that has been through
>> ->queue_rq() to the hardware queue dispatch list. These are safe to use
>> as no merging can take place there. Additionally, if requests do have
>> prepped data from drivers, we aren't dependent on them not sharing space
>> in the request structure to safely add them to the IO scheduler lists.
> 
> How about making blk_mq_sched_insert_request() complain if a request is passed
> to it in which the RQF_DONTPREP flag has been set to avoid that this problem is
> reintroduced in the future? Otherwise this patch looks fine to me.

I agree, but I think we should do that as a follow up patch. I don't want to
touch this one if we can avoid it. The thought did cross my mind, too. It
should be impossible now that everything goes to the dispatch list.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list
  2018-12-07 16:24   ` Jens Axboe
@ 2018-12-07 16:35     ` Jens Axboe
  2018-12-07 16:41       ` Bart Van Assche
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2018-12-07 16:35 UTC (permalink / raw)
  To: Bart Van Assche, linux-block; +Cc: Mike Snitzer, Ming Lei

On 12/7/18 9:24 AM, Jens Axboe wrote:
> On 12/7/18 9:19 AM, Bart Van Assche wrote:
>> On Thu, 2018-12-06 at 22:17 -0700, Jens Axboe wrote:
>>> Instead of making special cases for what we can direct issue, and now
>>> having to deal with DM solving the livelock while still retaining a BUSY
>>> condition feedback loop, always just add a request that has been through
>>> ->queue_rq() to the hardware queue dispatch list. These are safe to use
>>> as no merging can take place there. Additionally, if requests do have
>>> prepped data from drivers, we aren't dependent on them not sharing space
>>> in the request structure to safely add them to the IO scheduler lists.
>>
>> How about making blk_mq_sched_insert_request() complain if a request is passed
>> to it in which the RQF_DONTPREP flag has been set to avoid that this problem is
>> reintroduced in the future? Otherwise this patch looks fine to me.
> 
> I agree, but I think we should do that as a follow up patch. I don't want to
> touch this one if we can avoid it. The thought did cross my mind, too. It
> should be impossible now that everything goes to the dispatch list.

Something like the below.


diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 29bfe8017a2d..9e5bda8800f8 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -377,6 +377,16 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
 
 	WARN_ON(e && (rq->tag != -1));
 
+	/*
+	 * It's illegal to insert a request into the scheduler that has
+	 * been through ->queue_rq(). Warn for that case, and use a bypass
+	 * insert to be safe.
+	 */
+	if (WARN_ON_ONCE(rq->rq_flags & RQF_DONTPREP)) {
+		blk_mq_request_bypass_insert(rq, false);
+		goto run;
+	}
+
 	if (blk_mq_sched_bypass_insert(hctx, !!e, rq))
 		goto run;
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 6a7566244de3..d5f890d5c814 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1595,15 +1595,25 @@ void blk_mq_insert_requests(struct blk_mq_hw_ctx *hctx, struct blk_mq_ctx *ctx,
 			    struct list_head *list)
 
 {
-	struct request *rq;
+	struct request *rq, *tmp;
 
 	/*
 	 * preemption doesn't flush plug list, so it's possible ctx->cpu is
 	 * offline now
 	 */
-	list_for_each_entry(rq, list, queuelist) {
+	list_for_each_entry_safe(rq, tmp, list, queuelist) {
 		BUG_ON(rq->mq_ctx != ctx);
 		trace_block_rq_insert(hctx->queue, rq);
+
+		/*
+		 * It's illegal to insert a request into the scheduler that has
+		 * been through ->queue_rq(). Warn for that case, and use a
+		 * bypass insert to be safe.
+		 */
+		if (WARN_ON_ONCE(rq->rq_flags & RQF_DONTPREP)) {
+			list_del_init(&rq->queuelist);
+			blk_mq_request_bypass_insert(rq, false);
+		}
 	}
 
 	spin_lock(&ctx->lock);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list
  2018-12-07 16:35     ` Jens Axboe
@ 2018-12-07 16:41       ` Bart Van Assche
  2018-12-07 16:45         ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2018-12-07 16:41 UTC (permalink / raw)
  To: Jens Axboe, linux-block; +Cc: Mike Snitzer, Ming Lei

On Fri, 2018-12-07 at 09:35 -0700, Jens Axboe wrote:
> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 29bfe8017a2d..9e5bda8800f8 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -377,6 +377,16 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
>  
>  	WARN_ON(e && (rq->tag != -1));
>  
> +	/*
> +	 * It's illegal to insert a request into the scheduler that has
> +	 * been through ->queue_rq(). Warn for that case, and use a bypass
> +	 * insert to be safe.
> +	 */

Shouldn't this refer to requests that have been prepared instead of requests
that have been through ->queue_rq()? I think this function is called for
requests that are requeued. Requeued requests have been through ->queue_rq()
but are unprepared before being requeued.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] blk-mq: punt failed direct issue to dispatch list
  2018-12-07 16:41       ` Bart Van Assche
@ 2018-12-07 16:45         ` Jens Axboe
  0 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2018-12-07 16:45 UTC (permalink / raw)
  To: Bart Van Assche, linux-block; +Cc: Mike Snitzer, Ming Lei

On 12/7/18 9:41 AM, Bart Van Assche wrote:
> On Fri, 2018-12-07 at 09:35 -0700, Jens Axboe wrote:
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> index 29bfe8017a2d..9e5bda8800f8 100644
>> --- a/block/blk-mq-sched.c
>> +++ b/block/blk-mq-sched.c
>> @@ -377,6 +377,16 @@ void blk_mq_sched_insert_request(struct request *rq, bool at_head,
>>  
>>  	WARN_ON(e && (rq->tag != -1));
>>  
>> +	/*
>> +	 * It's illegal to insert a request into the scheduler that has
>> +	 * been through ->queue_rq(). Warn for that case, and use a bypass
>> +	 * insert to be safe.
>> +	 */
> 
> Shouldn't this refer to requests that have been prepared instead of requests
> that have been through ->queue_rq()? I think this function is called for
> requests that are requeued. Requeued requests have been through ->queue_rq()
> but are unprepared before being requeued.

If they are unprepared, RQF_DONTPREP should have been cleared. But needs
testing and verification, which is exactly why I didn't want to bundle with
the fix.

I'll test it later today.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-12-07 16:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-07  5:17 [PATCH v3] blk-mq: punt failed direct issue to dispatch list Jens Axboe
2018-12-07  8:24 ` Ming Lei
2018-12-07 10:50 ` Ming Lei
2018-12-07 15:15 ` Mike Snitzer
2018-12-07 16:19 ` Bart Van Assche
2018-12-07 16:24   ` Jens Axboe
2018-12-07 16:35     ` Jens Axboe
2018-12-07 16:41       ` Bart Van Assche
2018-12-07 16:45         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).