blk: optimization for classic polling
diff mbox series

Message ID 3578876466-3733-1-git-send-email-nj.shetty@samsung.com
State New, archived
Headers show
Series
  • blk: optimization for classic polling
Related show

Commit Message

Nitesh Shetty May 30, 2083, 4:21 a.m. UTC
This removes the dependency on interrupts to wake up task. Set task
state as TASK_RUNNING, if need_resched() returns true,
while polling for IO completion.
Earlier, polling task used to sleep, relying on interrupt to wake it up.
This made some IO take very long when interrupt-coalescing is enabled in
NVMe.

Reference:
http://lists.infradead.org/pipermail/linux-nvme/2018-February/015435.html
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
---
 fs/block_dev.c | 16 ++++++++++++----
 fs/direct-io.c |  8 ++++++--
 fs/iomap.c     | 10 +++++++---
 3 files changed, 25 insertions(+), 9 deletions(-)

Comments

Keith Busch Feb. 8, 2018, 3:27 p.m. UTC | #1
On Sun, May 30, 2083 at 09:51:06AM +0530, Nitesh Shetty wrote:
> This removes the dependency on interrupts to wake up task. Set task
> state as TASK_RUNNING, if need_resched() returns true,
> while polling for IO completion.
> Earlier, polling task used to sleep, relying on interrupt to wake it up.
> This made some IO take very long when interrupt-coalescing is enabled in
> NVMe.
> 
> Reference:
> http://lists.infradead.org/pipermail/linux-nvme/2018-February/015435.html
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> ---
>  fs/block_dev.c | 16 ++++++++++++----
>  fs/direct-io.c |  8 ++++++--
>  fs/iomap.c     | 10 +++++++---
>  3 files changed, 25 insertions(+), 9 deletions(-)

I think it'd be simpler to have blk_poll set it back to running if
need_resched is true rather than repeat this patter across all the
callers:

---
diff --git a/block/blk-mq.c b/block/blk-mq.c
index df93102e2149..40285fe1c8ad 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3164,6 +3164,7 @@ static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
 		cpu_relax();
 	}
 
+	set_current_state(TASK_RUNNING);
 	return false;
 }
 
--
Sagi Grimberg Feb. 8, 2018, 4:03 p.m. UTC | #2
> I think it'd be simpler to have blk_poll set it back to running if
> need_resched is true rather than repeat this patter across all the
> callers:
> 
> ---
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index df93102e2149..40285fe1c8ad 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -3164,6 +3164,7 @@ static bool __blk_mq_poll(struct blk_mq_hw_ctx *hctx, struct request *rq)
>   		cpu_relax();
>   	}
>   
> +	set_current_state(TASK_RUNNING);
>   	return false;
>   }
>   
> --

Nice!
Bart Van Assche Feb. 12, 2018, 3:50 p.m. UTC | #3
On 05/29/83 20:21, Nitesh Shetty wrote:
> [ ... ]

Hello Nitesh,

Can you check the clock of the system you used to send this e-mail? In 
the header of your e-mail I found the following:

Date: Sun, 30 May 2083 09:51:06 +0530

Thanks,

Bart.
Peter Zijlstra Feb. 20, 2018, 1:21 p.m. UTC | #4
On Sun, May 30, 2083 at 09:51:06AM +0530, Nitesh Shetty wrote:
> This removes the dependency on interrupts to wake up task. Set task
> state as TASK_RUNNING, if need_resched() returns true,
> while polling for IO completion.
> Earlier, polling task used to sleep, relying on interrupt to wake it up.
> This made some IO take very long when interrupt-coalescing is enabled in
> NVMe.

This is a horrible Changelog.. it does not in fact explain why the patch
works or is correct.

Also, set_current_state(TASK_RUNNING) is dodgy (similarly in
__blk_mq_poll), why do you need that memory barrier?


> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> ---
>  fs/block_dev.c | 16 ++++++++++++----
>  fs/direct-io.c |  8 ++++++--
>  fs/iomap.c     | 10 +++++++---
>  3 files changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 4a181fc..a87d8b7 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -236,9 +236,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
>  		set_current_state(TASK_UNINTERRUPTIBLE);
>  		if (!READ_ONCE(bio.bi_private))
>  			break;
> -		if (!(iocb->ki_flags & IOCB_HIPRI) ||
> -		    !blk_poll(bdev_get_queue(bdev), qc))
> +		if (!(iocb->ki_flags & IOCB_HIPRI))
>  			io_schedule();
> +		else if (!blk_poll(bdev_get_queue(bdev), qc)) {
> +			if (need_resched())
> +				set_current_state(TASK_RUNNING);
> +			io_schedule();
> +		}
>  	}
>  	__set_current_state(TASK_RUNNING);
>
Keith Busch Feb. 20, 2018, 4:27 p.m. UTC | #5
On Tue, Feb 20, 2018 at 02:21:37PM +0100, Peter Zijlstra wrote:
> Also, set_current_state(TASK_RUNNING) is dodgy (similarly in
> __blk_mq_poll), why do you need that memory barrier?

You're right. The subsequent revision that was committed removed the
barrier. The commit is here:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=67b4110f8c8d16e588d7730db8e8b01b32c1bd8b

I hope the code at least looks more reasonable. The changelog isn't much
different, though.
Jens Axboe Feb. 20, 2018, 10:37 p.m. UTC | #6
On 2/20/18 3:21 AM, Peter Zijlstra wrote:
> On Sun, May 30, 2083 at 09:51:06AM +0530, Nitesh Shetty wrote:
>> This removes the dependency on interrupts to wake up task. Set task
>> state as TASK_RUNNING, if need_resched() returns true,
>> while polling for IO completion.
>> Earlier, polling task used to sleep, relying on interrupt to wake it up.
>> This made some IO take very long when interrupt-coalescing is enabled in
>> NVMe.
> 
> This is a horrible Changelog.. it does not in fact explain why the patch
> works or is correct.

Yeah, that should have been improved.

> Also, set_current_state(TASK_RUNNING) is dodgy (similarly in
> __blk_mq_poll), why do you need that memory barrier?

I pointed that out in the review, and v2 fixed it. v2 is the
one that got merged.
Peter Zijlstra Feb. 21, 2018, 8:32 a.m. UTC | #7
On Tue, Feb 20, 2018 at 12:37:07PM -1000, Jens Axboe wrote:
> On 2/20/18 3:21 AM, Peter Zijlstra wrote:
> > On Sun, May 30, 2083 at 09:51:06AM +0530, Nitesh Shetty wrote:
> >> This removes the dependency on interrupts to wake up task. Set task
> >> state as TASK_RUNNING, if need_resched() returns true,
> >> while polling for IO completion.
> >> Earlier, polling task used to sleep, relying on interrupt to wake it up.
> >> This made some IO take very long when interrupt-coalescing is enabled in
> >> NVMe.
> > 
> > This is a horrible Changelog.. it does not in fact explain why the patch
> > works or is correct.
> 
> Yeah, that should have been improved.

Being ever more forgetful (I blame the kids) I find Changelogs more and
more important, ymmv ;-)

> > Also, set_current_state(TASK_RUNNING) is dodgy (similarly in
> > __blk_mq_poll), why do you need that memory barrier?
> 
> I pointed that out in the review, and v2 fixed it. v2 is the
> one that got merged.

Right missed that. In fact, possibly the only reason I saw this is that
Nitesh had this computer configured wrong and the email is from the
future and thus the very first entry in my lkml folder.
Florian Fainelli Oct. 10, 2018, 6:52 p.m. UTC | #8
On 05/29/2083 08:21 PM, Nitesh Shetty wrote:
> This removes the dependency on interrupts to wake up task. Set task
> state as TASK_RUNNING, if need_resched() returns true,
> while polling for IO completion.
> Earlier, polling task used to sleep, relying on interrupt to wake it up.
> This made some IO take very long when interrupt-coalescing is enabled in
> NVMe.
> 
> Reference:
> http://lists.infradead.org/pipermail/linux-nvme/2018-February/015435.html
> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>

There is something seriously wrong with the date on your computer, this
patch is dated from May 29th 2083,... I would not be surprised if it
messed up people's email client and you did not get any response because
of that...

Patch
diff mbox series

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 4a181fc..a87d8b7 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -236,9 +236,13 @@  __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter,
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (!READ_ONCE(bio.bi_private))
 			break;
-		if (!(iocb->ki_flags & IOCB_HIPRI) ||
-		    !blk_poll(bdev_get_queue(bdev), qc))
+		if (!(iocb->ki_flags & IOCB_HIPRI))
 			io_schedule();
+		else if (!blk_poll(bdev_get_queue(bdev), qc)) {
+			if (need_resched())
+				set_current_state(TASK_RUNNING);
+			io_schedule();
+		}
 	}
 	__set_current_state(TASK_RUNNING);
 
@@ -401,9 +405,13 @@  __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages)
 		if (!READ_ONCE(dio->waiter))
 			break;
 
-		if (!(iocb->ki_flags & IOCB_HIPRI) ||
-		    !blk_poll(bdev_get_queue(bdev), qc))
+		if (!(iocb->ki_flags & IOCB_HIPRI))
 			io_schedule();
+		else if (!blk_poll(bdev_get_queue(bdev), qc)) {
+			if (need_resched())
+				set_current_state(TASK_RUNNING);
+			io_schedule();
+		}
 	}
 	__set_current_state(TASK_RUNNING);
 
diff --git a/fs/direct-io.c b/fs/direct-io.c
index a0ca9e4..c815ac9 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -518,9 +518,13 @@  static struct bio *dio_await_one(struct dio *dio)
 		__set_current_state(TASK_UNINTERRUPTIBLE);
 		dio->waiter = current;
 		spin_unlock_irqrestore(&dio->bio_lock, flags);
-		if (!(dio->iocb->ki_flags & IOCB_HIPRI) ||
-		    !blk_poll(dio->bio_disk->queue, dio->bio_cookie))
+		if (!(dio->iocb->ki_flags & IOCB_HIPRI))
 			io_schedule();
+		else if (!blk_poll(dio->bio_disk->queue, dio->bio_cookie)) {
+			if (need_resched())
+				__set_current_state(TASK_RUNNING);
+			io_schedule();
+		}
 		/* wake up sets us TASK_RUNNING */
 		spin_lock_irqsave(&dio->bio_lock, flags);
 		dio->waiter = NULL;
diff --git a/fs/iomap.c b/fs/iomap.c
index afd1635..b51569d 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -1072,10 +1072,14 @@  iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 				break;
 
 			if (!(iocb->ki_flags & IOCB_HIPRI) ||
-			    !dio->submit.last_queue ||
-			    !blk_poll(dio->submit.last_queue,
-					 dio->submit.cookie))
+			    !dio->submit.last_queue)
 				io_schedule();
+			else if (!blk_poll(dio->submit.last_queue,
+					 dio->submit.cookie)) {
+				if (need_resched())
+					set_current_state(TASK_RUNNING);
+				io_schedule();
+			}
 		}
 		__set_current_state(TASK_RUNNING);
 	}