All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] blk-mq: Return invalid cookie if bio was split
@ 2016-09-26 23:00 Keith Busch
  2016-09-27  9:25 ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2016-09-26 23:00 UTC (permalink / raw)
  To: linux-block, Jens Axboe; +Cc: Christoph Hellwig

The only user of polling requires its original request be completed in
its entirety before continuing execution. If the bio needs to be split
and chained for any reason, the direct IO path would have waited for just
that split portion to complete, leading to potential data corruption if
the remaining transfer has not yet completed.

This patch has blk-mq return an invalid cookie if a bio requires splitting
so that polling does not occur.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 block/blk-mq.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c207fa9..6385985 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1311,6 +1311,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
 	unsigned int request_count = 0;
 	struct blk_plug *plug;
 	struct request *same_queue_rq = NULL;
+	struct bio *orig = bio;
 	blk_qc_t cookie;
 
 	blk_queue_bounce(q, &bio);
@@ -1389,7 +1390,7 @@ run_queue:
 	}
 	blk_mq_put_ctx(data.ctx);
 done:
-	return cookie;
+	return bio == orig ? cookie : BLK_QC_T_NONE;
 }
 
 /*
@@ -1404,6 +1405,7 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
 	unsigned int request_count = 0;
 	struct blk_map_ctx data;
 	struct request *rq;
+	struct bio *orig = bio;
 	blk_qc_t cookie;
 
 	blk_queue_bounce(q, &bio);
@@ -1467,7 +1469,7 @@ run_queue:
 	}
 
 	blk_mq_put_ctx(data.ctx);
-	return cookie;
+	return bio == orig ? cookie : BLK_QC_T_NONE;
 }
 
 /*
-- 
2.7.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] blk-mq: Return invalid cookie if bio was split
  2016-09-26 23:00 [PATCH] blk-mq: Return invalid cookie if bio was split Keith Busch
@ 2016-09-27  9:25 ` Ming Lei
  2016-09-27 18:24   ` Keith Busch
  2016-10-03 22:00   ` Keith Busch
  0 siblings, 2 replies; 8+ messages in thread
From: Ming Lei @ 2016-09-27  9:25 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, Jens Axboe, Christoph Hellwig, Ming Lei

On Mon, 26 Sep 2016 19:00:30 -0400
Keith Busch <keith.busch@intel.com> wrote:

> The only user of polling requires its original request be completed in
> its entirety before continuing execution. If the bio needs to be split
> and chained for any reason, the direct IO path would have waited for just
> that split portion to complete, leading to potential data corruption if
> the remaining transfer has not yet completed.

The issue looks a bit tricky because there is no per-bio place for holding
the cookie, and generic_make_request() only returns the cookie for the
last bio in the current bio list, so maybe we need the following patch too.

Also seems merge case need to take care of too.

---
diff --git a/block/blk-core.c b/block/blk-core.c
index 14d7c0740dc0..f1ab547173f8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1996,6 +1996,8 @@ blk_qc_t generic_make_request(struct bio *bio)
 {
 	struct bio_list bio_list_on_stack;
 	blk_qc_t ret = BLK_QC_T_NONE;
+	blk_qc_t lret;
+	const struct bio *orig = bio;
 
 	if (!generic_make_request_checks(bio))
 		goto out;
@@ -2036,7 +2038,9 @@ blk_qc_t generic_make_request(struct bio *bio)
 		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 
 		if (likely(blk_queue_enter(q, false) == 0)) {
-			ret = q->make_request_fn(q, bio);
+			lret = q->make_request_fn(q, bio);
+			if (bio == orig)
+				ret = lret;
 
 			blk_queue_exit(q);
 


> 
> This patch has blk-mq return an invalid cookie if a bio requires splitting
> so that polling does not occur.
> 
> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  block/blk-mq.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index c207fa9..6385985 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1311,6 +1311,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
>  	unsigned int request_count = 0;
>  	struct blk_plug *plug;
>  	struct request *same_queue_rq = NULL;
> +	struct bio *orig = bio;
>  	blk_qc_t cookie;
>  
>  	blk_queue_bounce(q, &bio);
> @@ -1389,7 +1390,7 @@ run_queue:
>  	}
>  	blk_mq_put_ctx(data.ctx);
>  done:
> -	return cookie;
> +	return bio == orig ? cookie : BLK_QC_T_NONE;
>  }
>  
>  /*
> @@ -1404,6 +1405,7 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
>  	unsigned int request_count = 0;
>  	struct blk_map_ctx data;
>  	struct request *rq;
> +	struct bio *orig = bio;
>  	blk_qc_t cookie;
>  
>  	blk_queue_bounce(q, &bio);
> @@ -1467,7 +1469,7 @@ run_queue:
>  	}
>  
>  	blk_mq_put_ctx(data.ctx);
> -	return cookie;
> +	return bio == orig ? cookie : BLK_QC_T_NONE;
>  }
>  
>  /*


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] blk-mq: Return invalid cookie if bio was split
  2016-09-27  9:25 ` Ming Lei
@ 2016-09-27 18:24   ` Keith Busch
  2016-10-03 22:00   ` Keith Busch
  1 sibling, 0 replies; 8+ messages in thread
From: Keith Busch @ 2016-09-27 18:24 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, Jens Axboe, Christoph Hellwig

On Tue, Sep 27, 2016 at 05:25:36PM +0800, Ming Lei wrote:
> On Mon, 26 Sep 2016 19:00:30 -0400
> Keith Busch <keith.busch@intel.com> wrote:
> 
> > The only user of polling requires its original request be completed in
> > its entirety before continuing execution. If the bio needs to be split
> > and chained for any reason, the direct IO path would have waited for just
> > that split portion to complete, leading to potential data corruption if
> > the remaining transfer has not yet completed.
> 
> The issue looks a bit tricky because there is no per-bio place for holding
> the cookie, and generic_make_request() only returns the cookie for the
> last bio in the current bio list, so maybe we need the following patch too.
> 
> Also seems merge case need to take care of too.

Yah, I think you're right. I'll try to work out a test case that more
readily exposes the problem so I can better verify the fix.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] blk-mq: Return invalid cookie if bio was split
  2016-09-27  9:25 ` Ming Lei
  2016-09-27 18:24   ` Keith Busch
@ 2016-10-03 22:00   ` Keith Busch
  2016-10-05  3:19     ` Ming Lei
  1 sibling, 1 reply; 8+ messages in thread
From: Keith Busch @ 2016-10-03 22:00 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, Jens Axboe, Christoph Hellwig

On Tue, Sep 27, 2016 at 05:25:36PM +0800, Ming Lei wrote:
> On Mon, 26 Sep 2016 19:00:30 -0400
> Keith Busch <keith.busch@intel.com> wrote:
> 
> > The only user of polling requires its original request be completed in
> > its entirety before continuing execution. If the bio needs to be split
> > and chained for any reason, the direct IO path would have waited for just
> > that split portion to complete, leading to potential data corruption if
> > the remaining transfer has not yet completed.
> 
> The issue looks a bit tricky because there is no per-bio place for holding
> the cookie, and generic_make_request() only returns the cookie for the
> last bio in the current bio list, so maybe we need the following patch too.

I'm looking more into this, and I can't see why we're returning a cookie
to poll on in the first place. blk_poll is only invoked when we could have
called io_schedule, so we expect the task state gets set to TASK_RUNNING
when all the work completes. So why do we need to poll for a specific
tag instead of polling until task state is set back to running?

I've tried this out and it seems to work fine, and should fix any issues
from split IO requests. It also helps direct IO polling, since it can
have a list of bios, but can only save one cookie.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] blk-mq: Return invalid cookie if bio was split
  2016-10-03 22:00   ` Keith Busch
@ 2016-10-05  3:19     ` Ming Lei
  2016-10-05 16:51       ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2016-10-05  3:19 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, Jens Axboe, Christoph Hellwig

On Tue, Oct 4, 2016 at 6:00 AM, Keith Busch <keith.busch@intel.com> wrote:
> On Tue, Sep 27, 2016 at 05:25:36PM +0800, Ming Lei wrote:
>> On Mon, 26 Sep 2016 19:00:30 -0400
>> Keith Busch <keith.busch@intel.com> wrote:
>>
>> > The only user of polling requires its original request be completed in
>> > its entirety before continuing execution. If the bio needs to be split
>> > and chained for any reason, the direct IO path would have waited for just
>> > that split portion to complete, leading to potential data corruption if
>> > the remaining transfer has not yet completed.
>>
>> The issue looks a bit tricky because there is no per-bio place for holding
>> the cookie, and generic_make_request() only returns the cookie for the
>> last bio in the current bio list, so maybe we need the following patch too.
>
> I'm looking more into this, and I can't see why we're returning a cookie
> to poll on in the first place. blk_poll is only invoked when we could have
> called io_schedule, so we expect the task state gets set to TASK_RUNNING
> when all the work completes. So why do we need to poll for a specific
> tag instead of polling until task state is set back to running?

But .poll() need to check if the specific request is completed or not,
then blk_poll() can set 'current' as RUNNING if it is completed.

blk_poll():
                ...
                ret = q->mq_ops->poll(hctx, blk_qc_t_to_tag(cookie));
                if (ret > 0) {
                        hctx->poll_success++;
                        set_current_state(TASK_RUNNING);
                        return true;
                }
                ...


>
> I've tried this out and it seems to work fine, and should fix any issues
> from split IO requests. It also helps direct IO polling, since it can
> have a list of bios, but can only save one cookie.

I am glad to take a look the patch if you post it out.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] blk-mq: Return invalid cookie if bio was split
  2016-10-05  3:19     ` Ming Lei
@ 2016-10-05 16:51       ` Keith Busch
  2016-10-06 16:06         ` Ming Lei
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2016-10-05 16:51 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, Jens Axboe, Christoph Hellwig

On Wed, Oct 05, 2016 at 11:19:39AM +0800, Ming Lei wrote:
> But .poll() need to check if the specific request is completed or not,
> then blk_poll() can set 'current' as RUNNING if it is completed.
> 
> blk_poll():
>                 ...
>                 ret = q->mq_ops->poll(hctx, blk_qc_t_to_tag(cookie));
>                 if (ret > 0) {
>                         hctx->poll_success++;
>                         set_current_state(TASK_RUNNING);
>                         return true;
>                 }


Right, but the task could be waiting on a whole lot more than just that
one tag, so setting the task to running before knowing those all complete
doesn't sound right.
 
> I am glad to take a look the patch if you post it out.

Here's what I got for block + nvme. It relies on all the requests to
complete to set the task back to running.

---
diff --git a/block/blk-core.c b/block/blk-core.c
index b993f88..3c1cfbf 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3342,6 +3342,8 @@ EXPORT_SYMBOL(blk_finish_plug);
 bool blk_poll(struct request_queue *q, blk_qc_t cookie)
 {
 	struct blk_plug *plug;
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int queue_num;
 	long state;
 
 	if (!q->mq_ops || !q->mq_ops->poll || !blk_qc_t_valid(cookie) ||
@@ -3353,27 +3355,15 @@ bool blk_poll(struct request_queue *q, blk_qc_t cookie)
 		blk_flush_plug_list(plug, false);
 
 	state = current->state;
+	queue_num = blk_qc_t_to_queue_num(cookie);
+	hctx = q->queue_hw_ctx[queue_num];
 	while (!need_resched()) {
-		unsigned int queue_num = blk_qc_t_to_queue_num(cookie);
-		struct blk_mq_hw_ctx *hctx = q->queue_hw_ctx[queue_num];
-		int ret;
-
-		hctx->poll_invoked++;
-
-		ret = q->mq_ops->poll(hctx, blk_qc_t_to_tag(cookie));
-		if (ret > 0) {
-			hctx->poll_success++;
-			set_current_state(TASK_RUNNING);
-			return true;
-		}
+		q->mq_ops->poll(hctx);
 
 		if (signal_pending_state(state, current))
 			set_current_state(TASK_RUNNING);
-
 		if (current->state == TASK_RUNNING)
 			return true;
-		if (ret < 0)
-			break;
 		cpu_relax();
 	}
 
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index befac5b..2e359e0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -649,7 +651,7 @@ static inline bool nvme_cqe_valid(struct nvme_queue *nvmeq, u16 head,
 	return (le16_to_cpu(nvmeq->cqes[head].status) & 1) == phase;
 }
 
-static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag)
+static void nvme_process_cq(struct nvme_queue *nvmeq)
 {
 	u16 head, phase;
 
@@ -665,9 +667,6 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag)
 			phase = !phase;
 		}
 
-		if (tag && *tag == cqe.command_id)
-			*tag = -1;
-
 		if (unlikely(cqe.command_id >= nvmeq->q_depth)) {
 			dev_warn(nvmeq->dev->ctrl.device,
 				"invalid id %d completed on queue %d\n",
@@ -711,11 +710,6 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag)
 	nvmeq->cqe_seen = 1;
 }
 
-static void nvme_process_cq(struct nvme_queue *nvmeq)
-{
-	__nvme_process_cq(nvmeq, NULL);
-}
-
 static irqreturn_t nvme_irq(int irq, void *data)
 {
 	irqreturn_t result;
@@ -736,20 +730,15 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
 	return IRQ_NONE;
 }
 
-static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
+static void nvme_poll(struct blk_mq_hw_ctx *hctx)
 {
 	struct nvme_queue *nvmeq = hctx->driver_data;
 
 	if (nvme_cqe_valid(nvmeq, nvmeq->cq_head, nvmeq->cq_phase)) {
 		spin_lock_irq(&nvmeq->q_lock);
-		__nvme_process_cq(nvmeq, &tag);
+		nvme_process_cq(nvmeq);
 		spin_unlock_irq(&nvmeq->q_lock);
-
-		if (tag == -1)
-			return 1;
 	}
-
-	return 0;
 }
 
 static void nvme_pci_submit_async_event(struct nvme_ctrl *ctrl, int aer_idx)
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 2498fdf..a723af9 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -100,7 +100,7 @@ typedef void (exit_request_fn)(void *, struct request *, unsigned int,
 typedef void (busy_iter_fn)(struct blk_mq_hw_ctx *, struct request *, void *,
 		bool);
 typedef void (busy_tag_iter_fn)(struct request *, void *, bool);
-typedef int (poll_fn)(struct blk_mq_hw_ctx *, unsigned int);
+typedef void (poll_fn)(struct blk_mq_hw_ctx *);
 
 
 struct blk_mq_ops {
--

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] blk-mq: Return invalid cookie if bio was split
  2016-10-05 16:51       ` Keith Busch
@ 2016-10-06 16:06         ` Ming Lei
  2016-10-06 16:39           ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Ming Lei @ 2016-10-06 16:06 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, Jens Axboe, Christoph Hellwig

Hi Keith,

On Thu, Oct 6, 2016 at 12:51 AM, Keith Busch <keith.busch@intel.com> wrote:
> On Wed, Oct 05, 2016 at 11:19:39AM +0800, Ming Lei wrote:
>> But .poll() need to check if the specific request is completed or not,
>> then blk_poll() can set 'current' as RUNNING if it is completed.
>>
>> blk_poll():
>>                 ...
>>                 ret = q->mq_ops->poll(hctx, blk_qc_t_to_tag(cookie));
>>                 if (ret > 0) {
>>                         hctx->poll_success++;
>>                         set_current_state(TASK_RUNNING);
>>                         return true;
>>                 }
>
>
> Right, but the task could be waiting on a whole lot more than just that
> one tag, so setting the task to running before knowing those all complete
> doesn't sound right.
>
>> I am glad to take a look the patch if you post it out.
>
> Here's what I got for block + nvme. It relies on all the requests to
> complete to set the task back to running.

Yeah, but your patch doesn't add that change, and looks 'task_struct *'
in submission path need to be stored in request or somewhere else.

There are some issues with current polling approach:
- in dio, one dio may include lots of bios, but only the last submitted bio
is polled
- one bio can be splitted into several bios, but submit_bio() only returns
one cookie for polling

Looks your approach via polling current state can fix this issue.

>
> ---
> diff --git a/block/blk-core.c b/block/blk-core.c
> index b993f88..3c1cfbf 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -3342,6 +3342,8 @@ EXPORT_SYMBOL(blk_finish_plug);
>  bool blk_poll(struct request_queue *q, blk_qc_t cookie)
>  {
>         struct blk_plug *plug;
> +       struct blk_mq_hw_ctx *hctx;
> +       unsigned int queue_num;
>         long state;
>
>         if (!q->mq_ops || !q->mq_ops->poll || !blk_qc_t_valid(cookie) ||
> @@ -3353,27 +3355,15 @@ bool blk_poll(struct request_queue *q, blk_qc_t cookie)
>                 blk_flush_plug_list(plug, false);
>
>         state = current->state;
> +       queue_num = blk_qc_t_to_queue_num(cookie);
> +       hctx = q->queue_hw_ctx[queue_num];
>         while (!need_resched()) {
> -               unsigned int queue_num = blk_qc_t_to_queue_num(cookie);
> -               struct blk_mq_hw_ctx *hctx = q->queue_hw_ctx[queue_num];
> -               int ret;
> -
> -               hctx->poll_invoked++;
> -
> -               ret = q->mq_ops->poll(hctx, blk_qc_t_to_tag(cookie));
> -               if (ret > 0) {
> -                       hctx->poll_success++;
> -                       set_current_state(TASK_RUNNING);
> -                       return true;
> -               }
> +               q->mq_ops->poll(hctx);
>
>                 if (signal_pending_state(state, current))
>                         set_current_state(TASK_RUNNING);
> -
>                 if (current->state == TASK_RUNNING)
>                         return true;
> -               if (ret < 0)
> -                       break;
>                 cpu_relax();
>         }

Then looks the whole hw queue is polled and only the queue num
in cookie matters.

In theory, there might be one race:

- one dio need to submit several bios(suppose there are two bios: A and B)
- A is submitted to hardware queue M
- B is submitted to hardware queue N because the current task is migrated
to another CPU
- then only hardware queue N is polled

>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index befac5b..2e359e0 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -649,7 +651,7 @@ static inline bool nvme_cqe_valid(struct nvme_queue *nvmeq, u16 head,
>         return (le16_to_cpu(nvmeq->cqes[head].status) & 1) == phase;
>  }
>
> -static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag)
> +static void nvme_process_cq(struct nvme_queue *nvmeq)
>  {
>         u16 head, phase;
>
> @@ -665,9 +667,6 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag)
>                         phase = !phase;
>                 }
>
> -               if (tag && *tag == cqe.command_id)
> -                       *tag = -1;
> -
>                 if (unlikely(cqe.command_id >= nvmeq->q_depth)) {
>                         dev_warn(nvmeq->dev->ctrl.device,
>                                 "invalid id %d completed on queue %d\n",
> @@ -711,11 +710,6 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag)
>         nvmeq->cqe_seen = 1;
>  }
>
> -static void nvme_process_cq(struct nvme_queue *nvmeq)
> -{
> -       __nvme_process_cq(nvmeq, NULL);
> -}
> -
>  static irqreturn_t nvme_irq(int irq, void *data)
>  {
>         irqreturn_t result;
> @@ -736,20 +730,15 @@ static irqreturn_t nvme_irq_check(int irq, void *data)
>         return IRQ_NONE;
>  }
>
> -static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag)
> +static void nvme_poll(struct blk_mq_hw_ctx *hctx)
>  {
>         struct nvme_queue *nvmeq = hctx->driver_data;
>
>         if (nvme_cqe_valid(nvmeq, nvmeq->cq_head, nvmeq->cq_phase)) {
>                 spin_lock_irq(&nvmeq->q_lock);
> -               __nvme_process_cq(nvmeq, &tag);
> +               nvme_process_cq(nvmeq);
>                 spin_unlock_irq(&nvmeq->q_lock);
> -
> -               if (tag == -1)
> -                       return 1;
>         }
> -
> -       return 0;
>  }
>
>  static void nvme_pci_submit_async_event(struct nvme_ctrl *ctrl, int aer_idx)
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 2498fdf..a723af9 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -100,7 +100,7 @@ typedef void (exit_request_fn)(void *, struct request *, unsigned int,
>  typedef void (busy_iter_fn)(struct blk_mq_hw_ctx *, struct request *, void *,
>                 bool);
>  typedef void (busy_tag_iter_fn)(struct request *, void *, bool);
> -typedef int (poll_fn)(struct blk_mq_hw_ctx *, unsigned int);
> +typedef void (poll_fn)(struct blk_mq_hw_ctx *);
>
>
>  struct blk_mq_ops {
> --



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] blk-mq: Return invalid cookie if bio was split
  2016-10-06 16:06         ` Ming Lei
@ 2016-10-06 16:39           ` Keith Busch
  0 siblings, 0 replies; 8+ messages in thread
From: Keith Busch @ 2016-10-06 16:39 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, Jens Axboe, Christoph Hellwig

On Fri, Oct 07, 2016 at 12:06:27AM +0800, Ming Lei wrote:
> Hi Keith,
> 
> On Thu, Oct 6, 2016 at 12:51 AM, Keith Busch <keith.busch@intel.com> wrote:
> > On Wed, Oct 05, 2016 at 11:19:39AM +0800, Ming Lei wrote:
> >> But .poll() need to check if the specific request is completed or not,
> >> then blk_poll() can set 'current' as RUNNING if it is completed.
> >>
> >> blk_poll():
> >>                 ...
> >>                 ret = q->mq_ops->poll(hctx, blk_qc_t_to_tag(cookie));
> >>                 if (ret > 0) {
> >>                         hctx->poll_success++;
> >>                         set_current_state(TASK_RUNNING);
> >>                         return true;
> >>                 }
> >
> >
> > Right, but the task could be waiting on a whole lot more than just that
> > one tag, so setting the task to running before knowing those all complete
> > doesn't sound right.
> >
> >> I am glad to take a look the patch if you post it out.
> >
> > Here's what I got for block + nvme. It relies on all the requests to
> > complete to set the task back to running.
> 
> Yeah, but your patch doesn't add that change, and looks 'task_struct *'
> in submission path need to be stored in request or somewhere else.

The polling function shouldn't have to set the task to running. The
task_struct is the dio's "waiter", and dio_bio_end_io sets its state
to noromal running when every bio submitted and split chained bios
complete. Hopefully those all complete through the ->poll(), and
blk_poll will automatically observe the state changed.

> Then looks the whole hw queue is polled and only the queue num
> in cookie matters.
>
> In theory, there might be one race:
> 
> - one dio need to submit several bios(suppose there are two bios: A and B)
> - A is submitted to hardware queue M
> - B is submitted to hardware queue N because the current task is migrated
> to another CPU
> - then only hardware queue N is polled

Yeah, in that case we'd rely on the queue M's interrupt handler to do
the completion.

Avoiding the context switch was the biggest win for polling, as I
understand it. If the task is being migrated to other CPUs, I think
we've already lost the latency benefit we'd have got.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-10-06 16:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-26 23:00 [PATCH] blk-mq: Return invalid cookie if bio was split Keith Busch
2016-09-27  9:25 ` Ming Lei
2016-09-27 18:24   ` Keith Busch
2016-10-03 22:00   ` Keith Busch
2016-10-05  3:19     ` Ming Lei
2016-10-05 16:51       ` Keith Busch
2016-10-06 16:06         ` Ming Lei
2016-10-06 16:39           ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.