Re: Reduce latencies for syncronous writes and high I/O priority requests in deadline IO scheduler

From: Corrado Zoccolo <czoccolo@gmail.com>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: Aaron Carroll <aaronc@cse.unsw.edu.au>,
	Linux-Kernel <linux-kernel@vger.kernel.org>
Subject: Re: Reduce latencies for syncronous writes and high I/O priority  requests in deadline IO scheduler
Date: Fri, 24 Apr 2009 18:07:53 +0200	[thread overview]
Message-ID: <4e5e476b0904240907h61efc0ej93d04488003ec104@mail.gmail.com> (raw)
In-Reply-To: <20090424063944.GA4593@kernel.dk>

[-- Attachment #1: Type: text/plain, Size: 4232 bytes --]

On Fri, Apr 24, 2009 at 8:39 AM, Jens Axboe <jens.axboe@oracle.com> wrote:
> I find your solution quite confusing - the statement is that it CFQ
> isn't optimal on SSD, so you modify deadline? ;-)

Well, I find CFQ too confusing to start with, so I chose a simpler one.
If I can prove something with deadline, maybe you will decide to
implement it also on CFQ ;)

>
> Most of the "CFQ doesn't work well on SSD" statements are mostly wrong.
> Now, you seem to have done some testing, so when you say that you
> probably have actually done some testing that tells you that this is the
> case. But lets attempt to fix that issue, then!
>
> One thing you pointed out is that CFQ doesn't treat the device as a
> "real" SSD unless it does queueing. This is very much on purpose, for
> two reasons:
>
> 1) I have never seen a non-queueing SSD that actually performs well for
>   reads-vs-write situations, so CFQ still does idling for those.

Does CFQ idle only when switching between reads and writes, or even
when switching between reads from one process, and reads from an
other?
I think I'll have to instrument CFQ a bit to understand how it works.
Is there a better way instead of scattering printks all around?

> 2) It's a problem that is going away. SSD that are coming out today and
>   in the future WILL definitely do queuing. We can attribute most of
>   the crap behaviour to the lacking jmicron flash controller, which
>   also has a crappy SATA interface.

I think SD cards will still be around a lot, and I don't expect them
to have queuing, so some support for them might still be needed.

> What I am worried about in the future is even faster SSD devices. CFQ is
> already down a percent or two when we are doing 100k iops and such, this
> problem will only get worse. So I'm very much interested in speeding up
> CFQ for such devices, which I think will mainly be slimming down the IO
> path and bypassing much of the (unneeded) complexity for them. The last
> thing I want is to have to tell people to use deadline or noop on SSD
> devices.
>

Totally agree. Having the main IOscheduler perform good on most
scenarios is surely needed.
But this could be achieved in various ways.
What if the main IO scheduler had in his toolbox various strategies,
and could switch between them based on the workload or type of
hardware?
FIFO scheduling for reads could be one such strategy, used only when
the conditions are good for it.
An other possibility is to use auto-tuning strategies, but those are
more difficult to devise and test.

>> In the meantime, I wanted to overcome also deadline limitations, i.e.
>> the high latencies on fsync/fdatasync.
>
> This is very much something you could pull out of the patchset and we
> could include without much questioning.
>

Ok, this is the first patch of the series, and contains code cleanup
needed before changing read/write to sync/async. No behavioral change
is introduced by this patch.

I found where the random read performance is gained, but I didn't
include in this patch, because it require sync/async separation to not
negatively impact sync write latencies.

If the following new code, that replicates existing behaviour:
       if (!dd->next_rq
           || rq_data_dir(dd->next_rq) != data_dir
           || deadline_check_fifo(dd, data_dir)) {
                /*
                 * A deadline has expired, the last request was in the other
                 * direction, or we have run out of higher-sectored requests.
is changed to:
       if (!dd->next_rq
           || rq_data_dir(dd->next_rq) > data_dir
           || deadline_check_fifo(dd, data_dir)) {
                /*
                 * A deadline has expired, the last request was less
important (where WRITE is less important than READ),
                 * or we have run out of higher-sectored requests.

you get both higher random read throughput and higher write latencies.

Corrado

> --
> Jens Axboe
>
>

-- 
__________________________________________________________________________

dott. Corrado Zoccolo                          mailto:czoccolo@gmail.com
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------

[-- Attachment #2: deadline-patch-cleanup --]
[-- Type: application/octet-stream, Size: 4874 bytes --]

Deadline IOscheduler code cleanup, preparation for sync/async patch

This is the first patch of the series, and contains code cleanup
needed before changing read/write to sync/async.
No behavioral change is introduced by this patch.

Code cleanups:
* A single next_rq is sufficient.
* we store fifo insertion time on request, and compute deadline on the
  fly, to handle fifo_expire changes better (fifos remain sorted)
* remove unused field
* deadline_latter_request becomes deadline_next_request.

Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com>

diff --git a/block/deadline-iosched.c b/block/deadline-iosched.c
index c4d991d..5713595 100644
--- a/block/deadline-iosched.c
+++ b/block/deadline-iosched.c
@@ -35,11 +35,10 @@ struct deadline_data {
 	struct list_head fifo_list[2];
 
 	/*
-	 * next in sort order. read, write or both are NULL
+	 * next in sort order.
 	 */
-	struct request *next_rq[2];
+	struct request *next_rq;
 	unsigned int batching;		/* number of sequential requests made */
-	sector_t last_sector;		/* head position */
 	unsigned int starved;		/* times reads have starved writes */
 
 	/*
@@ -63,7 +62,7 @@ deadline_rb_root(struct deadline_data *dd, struct request *rq)
  * get the request after `rq' in sector-sorted order
  */
 static inline struct request *
-deadline_latter_request(struct request *rq)
+deadline_next_request(struct request *rq)
 {
 	struct rb_node *node = rb_next(&rq->rb_node);
 
@@ -86,10 +85,8 @@ deadline_add_rq_rb(struct deadline_data *dd, struct request *rq)
 static inline void
 deadline_del_rq_rb(struct deadline_data *dd, struct request *rq)
 {
-	const int data_dir = rq_data_dir(rq);
-
-	if (dd->next_rq[data_dir] == rq)
-		dd->next_rq[data_dir] = deadline_latter_request(rq);
+	if (dd->next_rq == rq)
+		dd->next_rq = deadline_next_request(rq);
 
 	elv_rb_del(deadline_rb_root(dd, rq), rq);
 }
@@ -101,15 +98,14 @@ static void
 deadline_add_request(struct request_queue *q, struct request *rq)
 {
 	struct deadline_data *dd = q->elevator->elevator_data;
-	const int data_dir = rq_data_dir(rq);
 
 	deadline_add_rq_rb(dd, rq);
 
 	/*
-	 * set expire time and add to fifo list
+	 * set request creation time and add to fifo list
 	 */
-	rq_set_fifo_time(rq, jiffies + dd->fifo_expire[data_dir]);
-	list_add_tail(&rq->queuelist, &dd->fifo_list[data_dir]);
+	rq_set_fifo_time(rq, jiffies);
+	list_add_tail(&rq->queuelist, &dd->fifo_list[rq_data_dir(rq)]);
 }
 
 /*
@@ -206,13 +202,7 @@ deadline_move_to_dispatch(struct deadline_data *dd, struct request *rq)
 static void
 deadline_move_request(struct deadline_data *dd, struct request *rq)
 {
-	const int data_dir = rq_data_dir(rq);
-
-	dd->next_rq[READ] = NULL;
-	dd->next_rq[WRITE] = NULL;
-	dd->next_rq[data_dir] = deadline_latter_request(rq);
-
-	dd->last_sector = rq_end_sector(rq);
+	dd->next_rq = deadline_next_request(rq);
 
 	/*
 	 * take it off the sort and fifo list, move
@@ -227,15 +217,13 @@ deadline_move_request(struct deadline_data *dd, struct request *rq)
  */
 static inline int deadline_check_fifo(struct deadline_data *dd, int ddir)
 {
-	struct request *rq = rq_entry_fifo(dd->fifo_list[ddir].next);
-
+	BUG_ON(list_empty(&dd->fifo_list[ddir]));
 	/*
-	 * rq is expired!
+	 * deadline is expired!
 	 */
-	if (time_after(jiffies, rq_fifo_time(rq)))
-		return 1;
-
-	return 0;
+	return time_after(jiffies, dd->fifo_expire[ddir] +
+			  rq_fifo_time(rq_entry_fifo(dd->fifo_list[ddir].next))
+			  );
 }
 
 /*
@@ -247,20 +235,13 @@ static int deadline_dispatch_requests(struct request_queue *q, int force)
 	struct deadline_data *dd = q->elevator->elevator_data;
 	const int reads = !list_empty(&dd->fifo_list[READ]);
 	const int writes = !list_empty(&dd->fifo_list[WRITE]);
-	struct request *rq;
+	struct request *rq = dd->next_rq;
 	int data_dir;
 
-	/*
-	 * batches are currently reads XOR writes
-	 */
-	if (dd->next_rq[WRITE])
-		rq = dd->next_rq[WRITE];
-	else
-		rq = dd->next_rq[READ];
-
-	if (rq && dd->batching < dd->fifo_batch)
+	if (rq && dd->batching < dd->fifo_batch) {
 		/* we have a next request are still entitled to batch */
 		goto dispatch_request;
+	}
 
 	/*
 	 * at this point we are not running a batch. select the appropriate
@@ -299,7 +280,9 @@ dispatch_find_request:
 	/*
 	 * we are not running a batch, find best request for selected data_dir
 	 */
-	if (deadline_check_fifo(dd, data_dir) || !dd->next_rq[data_dir]) {
+	if (!dd->next_rq
+	    || rq_data_dir(dd->next_rq) != data_dir
+	    || deadline_check_fifo(dd, data_dir)) {
 		/*
 		 * A deadline has expired, the last request was in the other
 		 * direction, or we have run out of higher-sectored requests.
@@ -311,7 +294,7 @@ dispatch_find_request:
 		 * The last req was the same dir and we have a next request in
 		 * sort order. No expired requests so continue on from here.
 		 */
-		rq = dd->next_rq[data_dir];
+		rq = dd->next_rq;
 	}
 
 	dd->batching = 0;