All of lore.kernel.org
 help / color / mirror / Atom feed
* Reduce latencies for syncronous writes and high I/O priority requests  in deadline IO scheduler
@ 2009-04-22 21:07 Corrado Zoccolo
  2009-04-23 11:18 ` Paolo Ciarrocchi
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Corrado Zoccolo @ 2009-04-22 21:07 UTC (permalink / raw)
  To: jens.axboe, Linux-Kernel

[-- Attachment #1: Type: text/plain, Size: 10194 bytes --]

Hi,
deadline I/O scheduler currently classifies all I/O requests in only 2
classes, reads (always considered high priority) and writes (always
lower).
The attached patch, intended to reduce latencies for syncronous writes
and high I/O priority requests, introduces more levels of priorities:
* real time reads: highest priority and shortest deadline, can starve
other levels
* syncronous operations (either best effort reads or RT/BE writes),
mid priority, starvation for lower level is prevented as usual
* asyncronous operations (async writes and all IDLE class requests),
lowest priority and longest deadline

The patch also introduces some new heuristics:
* for non-rotational devices, reads (within a given priority level)
are issued in FIFO order, to improve the latency perceived by readers
* minimum batch timespan (time quantum): partners with fifo_batch to
improve throughput, by sending more consecutive requests together. A
given number of requests will not always take the same time (due to
amount of seek needed), therefore fifo_batch must be tuned for worst
cases, while in best cases, having longer batches would give a
throughput boost.
* batch start request is chosen fifo_batch/3 requests before the
expired one, to improve fairness for requests with lower start sector,
that otherwise have higher probability to miss a deadline than
mid-sector requests.

I did few performance comparisons:
* HDD, ext3 partition with data=writeback, tiotest with 32 threads,
each writing 80MB of data

** deadline-original
Tiotest results for 32 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        2560 MBs |  103.0 s |  24.848 MB/s |  10.6 %  | 522.2 % |
| Random Write  125 MBs |   98.8 s |   1.265 MB/s |  -1.6 %  |  16.1 % |
| Read         2560 MBs |  166.2 s |  15.400 MB/s |   4.2 %  |  82.7 % |
| Random Read   125 MBs |  193.3 s |   0.647 MB/s |  -0.8 %  |  14.5 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item         | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write        |        4.122 ms |    17922.920 ms |  0.07980 |   0.00061 |
| Random Write |        0.599 ms |     1245.200 ms |  0.00000 |   0.00000 |
| Read         |        8.032 ms |     1125.759 ms |  0.00000 |   0.00000 |
| Random Read  |      181.968 ms |      972.657 ms |  0.00000 |   0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total        |       10.044 ms |    17922.920 ms |  0.03804 |   0.00029 |
`--------------+-----------------+-----------------+----------+-----------'

** cfq (2.6.30-rc2)
Tiotest results for 32 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        2560 MBs |  132.4 s |  19.342 MB/s |   8.5 %  | 400.4 % |
| Random Write  125 MBs |  107.8 s |   1.159 MB/s |  -1.6 %  |  16.8 % |
| Read         2560 MBs |  107.6 s |  23.788 MB/s |   5.4 %  |  95.7 % |
| Random Read   125 MBs |  158.4 s |   0.789 MB/s |   0.9 %  |   7.7 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item         | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write        |        5.362 ms |    21081.012 ms |  0.09811 |   0.00244 |
| Random Write |       23.310 ms |    31865.095 ms |  0.13437 |   0.06250 |
| Read         |        5.048 ms |     3694.001 ms |  0.15167 |   0.00000 |
| Random Read  |      146.523 ms |     2880.409 ms |  0.52187 |   0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total        |        8.916 ms |    31865.095 ms |  0.13435 |   0.00262 |
`--------------+-----------------+-----------------+----------+-----------'

** deadline-patched
Tiotest results for 32 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        2560 MBs |  105.3 s |  24.301 MB/s |  10.5 %  | 514.8 % |
| Random Write  125 MBs |   95.9 s |   1.304 MB/s |  -1.8 %  |  17.3 % |
| Read         2560 MBs |  165.1 s |  15.507 MB/s |   2.7 %  |  61.9 % |
| Random Read   125 MBs |  110.6 s |   1.130 MB/s |   0.8 %  |  12.2 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item         | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write        |        4.131 ms |    17456.831 ms |  0.08041 |   0.00275 |
| Random Write |        2.780 ms |     5073.180 ms |  0.07500 |   0.00000 |
| Read         |        7.748 ms |      936.499 ms |  0.00000 |   0.00000 |
| Random Read  |      104.849 ms |      695.192 ms |  0.00000 |   0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total        |        8.168 ms |    17456.831 ms |  0.04008 |   0.00131 |
`--------------+-----------------+-----------------+----------+-----------'

* SD card, nilfs2 partition, tiotest with 16 threads, each writing 80MB of data
** cfq(2.6.30-rc2)
Tiotest results for 16 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        1280 MBs |  217.8 s |   5.878 MB/s |   3.7 %  |  92.2 % |
| Random Write   62 MBs |   18.2 s |   3.432 MB/s |  -2.3 %  |  28.7 % |
| Read         1280 MBs |  114.7 s |  11.156 MB/s |   7.3 %  |  76.6 % |
| Random Read    62 MBs |    3.4 s |  18.615 MB/s |  -5.4 %  | 274.2 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item         | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write        |        9.943 ms |    10223.581 ms |  0.14252 |   0.00488 |
| Random Write |       12.287 ms |     5097.196 ms |  0.25625 |   0.00000 |
| Read         |        5.352 ms |     1550.162 ms |  0.00000 |   0.00000 |
| Random Read  |        3.051 ms |     1507.837 ms |  0.00000 |   0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total        |        7.649 ms |    10223.581 ms |  0.07391 |   0.00233 |
`--------------+-----------------+-----------------+----------+-----------'

** deadline-patched:
Tiotest results for 16 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        1280 MBs |  220.9 s |   5.794 MB/s |   4.0 %  |  93.9 % |
| Random Write   62 MBs |   20.5 s |   3.044 MB/s |  -2.2 %  |  24.9 % |
| Read         1280 MBs |  113.2 s |  11.304 MB/s |   6.8 %  |  72.8 % |
| Random Read    62 MBs |    2.9 s |  21.896 MB/s |   5.1 %  | 293.8 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item         | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write        |       10.078 ms |    13303.036 ms |  0.14160 |   0.00031 |
| Random Write |       14.350 ms |     5265.088 ms |  0.40000 |   0.00000 |
| Read         |        5.455 ms |      434.495 ms |  0.00000 |   0.00000 |
| Random Read  |        2.685 ms |       12.652 ms |  0.00000 |   0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total        |        7.801 ms |    13303.036 ms |  0.07682 |   0.00015 |
`--------------+-----------------+-----------------+----------+-----------'

* fsync-tester results, on HDD, empty ext3 partition, mounted with
data=writeback
** deadline-original:
fsync time: 0.7963
fsync time: 4.5914
fsync time: 4.2347
fsync time: 1.1670
fsync time: 0.8164
fsync time: 1.9783
fsync time: 4.9726
fsync time: 2.4929
fsync time: 2.5448
fsync time: 3.9627
** cfq 2.6.30-rc2
fsync time: 0.0288
fsync time: 0.0528
fsync time: 0.0299
fsync time: 0.0397
fsync time: 0.5720
fsync time: 0.0409
fsync time: 0.0876
fsync time: 0.0294
fsync time: 0.0485
** deadline-patched
fsync time: 0.0772
fsync time: 0.0381
fsync time: 0.0604
fsync time: 0.2923
fsync time: 0.2488
fsync time: 0.0924
fsync time: 0.0144
fsync time: 1.4824
fsync time: 0.0789
fsync time: 0.0565
fsync time: 0.0550
fsync time: 0.0421
** deadline-patched, ionice -c1:
fsync time: 0.2569
fsync time: 0.0500
fsync time: 0.0681
fsync time: 0.2863
fsync time: 0.0140
fsync time: 0.0171
fsync time: 0.1198
fsync time: 0.0530
fsync time: 0.0503
fsync time: 0.0462
fsync time: 0.0484
fsync time: 0.0328
fsync time: 0.0562
fsync time: 0.0451
fsync time: 0.0576
fsync time: 0.0444
fsync time: 0.0469
fsync time: 0.0368
fsync time: 0.2865

Corrado

-- 
__________________________________________________________________________

dott. Corrado Zoccolo                          mailto:czoccolo@gmail.com
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------

[-- Attachment #2: deadline-iosched.c.patch --]
[-- Type: application/octet-stream, Size: 17479 bytes --]

diff --git a/block/deadline-iosched.c b/block/deadline-iosched.c
index c4d991d..5222b61 100644
--- a/block/deadline-iosched.c
+++ b/block/deadline-iosched.c
@@ -17,11 +17,12 @@
 /*
  * See Documentation/block/deadline-iosched.txt
  */
-static const int read_expire = HZ / 2;  /* max time before a read is submitted. */
-static const int write_expire = 5 * HZ; /* ditto for writes, these limits are SOFT! */
-static const int writes_starved = 2;    /* max times reads can starve a write */
-static const int fifo_batch = 16;       /* # of sequential requests treated as one
-				     by the above parameters. For throughput. */
+static const int rt_sync_expire = HZ / 8;  /* max time before a real-time sync operation (e.g. read for RT class process) is submitted. */
+static const int sync_expire = HZ / 2;     /* max time before a sync operation (e.g. read) is submitted. */
+static const int async_expire = 5 * HZ;    /* ditto for async operations (e.g. writes), these limits are SOFT! */
+static const int async_starved = 2;        /* max times SYNC can starve ASYNC requests */
+static const int fifo_batch = 16;          /* min # of sequential requests treated as one by the above parameters. For throughput. */
+static const int time_quantum  = HZ / 10;  /* min duration for a batch */
 
 struct deadline_data {
 	/*
@@ -31,27 +32,33 @@ struct deadline_data {
 	/*
 	 * requests (deadline_rq s) are present on both sort_list and fifo_list
 	 */
-	struct rb_root sort_list[2];	
-	struct list_head fifo_list[2];
+	struct rb_root sort_list[2]; /* READ, WRITE */
+	struct list_head fifo_list[3]; /* 0=ASYNC (or IDLE), 1=SYNC (or RT ASYNC), 2=RT SYNC */
 
 	/*
-	 * next in sort order. read, write or both are NULL
+	 * next in sort order.
 	 */
-	struct request *next_rq[2];
+	struct request *next_rq;
 	unsigned int batching;		/* number of sequential requests made */
-	sector_t last_sector;		/* head position */
 	unsigned int starved;		/* times reads have starved writes */
 
 	/*
 	 * settings that change how the i/o scheduler behaves
 	 */
-	int fifo_expire[2];
+	int fifo_expire[3];
+	int time_quantum;
 	int fifo_batch;
-	int writes_starved;
+	int async_starved;
 	int front_merges;
+
+	/*
+	  current batch data & stats
+	 */
+	int cur_batch_prio;
+	unsigned long cur_batch_start;
 };
 
-static void deadline_move_request(struct deadline_data *, struct request *);
+static void deadline_move_request(struct deadline_data *, struct request *, int nonrot);
 
 static inline struct rb_root *
 deadline_rb_root(struct deadline_data *dd, struct request *rq)
@@ -63,7 +70,7 @@ deadline_rb_root(struct deadline_data *dd, struct request *rq)
  * get the request after `rq' in sector-sorted order
  */
 static inline struct request *
-deadline_latter_request(struct request *rq)
+deadline_next_request(struct request *rq)
 {
 	struct rb_node *node = rb_next(&rq->rb_node);
 
@@ -73,27 +80,118 @@ deadline_latter_request(struct request *rq)
 	return NULL;
 }
 
+/*
+ * get the request before `rq' in sector-sorted order
+ */
+static inline struct request *
+deadline_prev_request(struct request *rq)
+{
+	struct rb_node *node = rb_prev(&rq->rb_node);
+
+	if (node)
+		return rb_entry_rq(node);
+
+	return NULL;
+}
+
 static void
-deadline_add_rq_rb(struct deadline_data *dd, struct request *rq)
+deadline_add_rq_rb(struct deadline_data *dd, struct request *rq, int nonrot)
 {
 	struct rb_root *root = deadline_rb_root(dd, rq);
 	struct request *__alias;
 
 	while (unlikely(__alias = elv_rb_add(root, rq)))
-		deadline_move_request(dd, __alias);
+		deadline_move_request(dd, __alias, nonrot);
 }
 
 static inline void
 deadline_del_rq_rb(struct deadline_data *dd, struct request *rq)
 {
-	const int data_dir = rq_data_dir(rq);
-
-	if (dd->next_rq[data_dir] == rq)
-		dd->next_rq[data_dir] = deadline_latter_request(rq);
+	if (dd->next_rq == rq)
+		dd->next_rq = deadline_next_request(rq);
 
 	elv_rb_del(deadline_rb_root(dd, rq), rq);
 }
 
+static void
+list_add_timesorted(struct list_head *q, struct request *rq)
+{
+	struct list_head *entry;
+	int stop_flags = REQ_SOFTBARRIER | REQ_HARDBARRIER | REQ_STARTED;
+	list_for_each_prev(entry, q) {
+		struct request *pos = list_entry_rq(entry);
+		if (pos->cmd_flags & stop_flags)
+			break;
+		if (rq_fifo_time(rq) > rq_fifo_time(pos))
+			break;
+		if (rq_fifo_time(rq) == rq_fifo_time(pos) &&
+		    rq->sector >= pos->sector)
+			break;
+	}
+	list_add(&rq->queuelist, entry);
+}
+
+static int ioprio_lub(unsigned short aprio, unsigned short bprio)
+{
+	unsigned short aclass = IOPRIO_PRIO_CLASS(aprio);
+	unsigned short bclass = IOPRIO_PRIO_CLASS(bprio);
+
+	if (aclass == IOPRIO_CLASS_NONE)
+		return bprio;
+	if (bclass == IOPRIO_CLASS_NONE)
+		return aprio;
+
+	if (aclass == bclass)
+		return min(aprio, bprio);
+	if (aclass > bclass)
+		return bprio;
+	else
+		return aprio;
+}
+
+static void
+deadline_merge_prio_data(struct request_queue *q, struct request *rq)
+{
+	struct task_struct *tsk = current;
+	struct io_context *ioc = get_io_context(GFP_ATOMIC,q->node);
+	int ioprio_class = IOPRIO_CLASS_NONE;
+	int ioprio = IOPRIO_NORM;
+
+	if(ioc) {
+		ioprio_class = task_ioprio_class(ioc);
+	}
+
+	switch (ioprio_class) {
+	default:
+		printk(KERN_ERR "deadline: bad prio %x\n", ioprio_class);
+	case IOPRIO_CLASS_NONE:
+		/*
+		 * no prio set, inherit CPU scheduling settings
+		 */
+		ioprio = task_nice_ioprio(tsk);
+		ioprio_class = task_nice_ioclass(tsk);
+		break;
+	case IOPRIO_CLASS_RT:
+	case IOPRIO_CLASS_BE:
+		ioprio = task_ioprio(ioc);
+		break;
+	case IOPRIO_CLASS_IDLE:
+		ioprio = 7;
+		break;
+	}
+
+	ioprio=IOPRIO_PRIO_VALUE(ioprio_class,ioprio);
+	rq->ioprio=ioprio_lub(rq->ioprio,ioprio);
+}
+
+static int
+deadline_compute_request_priority(struct request *req)
+{
+	unsigned short ioprio_class=IOPRIO_PRIO_CLASS(req_get_ioprio(req));
+	return (ioprio_class!=IOPRIO_CLASS_IDLE)*
+		(!!rq_is_sync(req) + (rq_data_dir(req)==READ)*(ioprio_class==IOPRIO_CLASS_RT));
+}
+
 /*
  * add rq to rbtree and fifo
  */
@@ -101,15 +199,17 @@ static void
 deadline_add_request(struct request_queue *q, struct request *rq)
 {
 	struct deadline_data *dd = q->elevator->elevator_data;
-	const int data_dir = rq_data_dir(rq);
 
-	deadline_add_rq_rb(dd, rq);
+	deadline_merge_prio_data(q,rq);
+	deadline_add_rq_rb(dd, rq, blk_queue_nonrot(q));
 
 	/*
-	 * set expire time and add to fifo list
+	 * set request creation time and add to fifo list
 	 */
-	rq_set_fifo_time(rq, jiffies + dd->fifo_expire[data_dir]);
-	list_add_tail(&rq->queuelist, &dd->fifo_list[data_dir]);
+
+	rq_set_fifo_time(rq, jiffies);
+	
+	list_add_timesorted(&dd->fifo_list[deadline_compute_request_priority(rq)],rq);
 }
 
 /*
@@ -157,14 +257,16 @@ static void deadline_merged_request(struct request_queue *q,
 				    struct request *req, int type)
 {
 	struct deadline_data *dd = q->elevator->elevator_data;
-
 	/*
 	 * if the merge was a front merge, we need to reposition request
 	 */
 	if (type == ELEVATOR_FRONT_MERGE) {
 		elv_rb_del(deadline_rb_root(dd, req), req);
-		deadline_add_rq_rb(dd, req);
+		deadline_add_rq_rb(dd, req, blk_queue_nonrot(q));
 	}
+
+	deadline_merge_prio_data(q,req);
+
 }
 
 static void
@@ -172,7 +274,7 @@ deadline_merged_requests(struct request_queue *q, struct request *req,
 			 struct request *next)
 {
 	/*
-	 * if next expires before rq, assign its expire time to rq
+	 * request that cannot idle. if next expires before rq, assign its expire time to rq
 	 * and move into next position (next will be deleted) in fifo
 	 */
 	if (!list_empty(&req->queuelist) && !list_empty(&next->queuelist)) {
@@ -204,15 +306,33 @@ deadline_move_to_dispatch(struct deadline_data *dd, struct request *rq)
  * move an entry to dispatch queue
  */
 static void
-deadline_move_request(struct deadline_data *dd, struct request *rq)
+deadline_move_request(struct deadline_data *dd, struct request *rq, int nonrot)
 {
 	const int data_dir = rq_data_dir(rq);
+	dd->next_rq = NULL;
+
+	if(data_dir != READ || !nonrot) {
+		int max_search = dd->fifo_batch;
+		/* for rot devices, or writes on non-rot, requests are dispatched in disk order */
+		dd->next_rq = rq;
+		/* try to get requests of at least the same priority as current one */
+		while(max_search-- && (dd->next_rq = deadline_next_request(dd->next_rq)) && dd->cur_batch_prio>deadline_compute_request_priority(dd->next_rq));
+		if(!max_search || !dd->next_rq) { // did not get a next of the same priority, demote batch to lower, and continue in disk order
+			dd->next_rq = deadline_next_request(rq);
+			if(dd->next_rq) dd->cur_batch_prio = deadline_compute_request_priority(dd->next_rq);
+		}
 
-	dd->next_rq[READ] = NULL;
-	dd->next_rq[WRITE] = NULL;
-	dd->next_rq[data_dir] = deadline_latter_request(rq);
-
-	dd->last_sector = rq_end_sector(rq);
+	} else { /* nonrot && data_dir==READ : requests are dispatched in deadline order */
+		struct list_head *entry;
+		list_for_each(entry, &dd->fifo_list[dd->cur_batch_prio]) {
+			struct request *pos = list_entry_rq(entry);
+			if(pos==rq) continue;
+			if(rq_data_dir(pos)==data_dir) { /* find same direction (always READ) */
+				dd->next_rq = pos;
+				break;
+			}
+		}
+	}
 
 	/*
 	 * take it off the sort and fifo list, move
@@ -222,20 +342,16 @@ deadline_move_request(struct deadline_data *dd, struct request *rq)
 }
 
 /*
- * deadline_check_fifo returns 0 if there are no expired requests on the fifo,
- * 1 otherwise. Requires !list_empty(&dd->fifo_list[data_dir])
+ * deadline_check_fifo returns 0 if there are no expired requests on the fifo for given priority,
+ * 1 otherwise. Requires !list_empty(&dd->fifo_list[prio])
  */
-static inline int deadline_check_fifo(struct deadline_data *dd, int ddir)
+static inline int deadline_check_request(struct deadline_data *dd, unsigned prio) 
 {
-	struct request *rq = rq_entry_fifo(dd->fifo_list[ddir].next);
-
+	BUG_ON(list_empty(&dd->fifo_list[prio]));
 	/*
-	 * rq is expired!
+	 * deadline is expired!
 	 */
-	if (time_after(jiffies, rq_fifo_time(rq)))
-		return 1;
-
-	return 0;
+	return time_after(jiffies, dd->fifo_expire[prio] + rq_fifo_time(rq_entry_fifo(dd->fifo_list[prio].next)));
 }
 
 /*
@@ -245,36 +361,31 @@ static inline int deadline_check_fifo(struct deadline_data *dd, int ddir)
 static int deadline_dispatch_requests(struct request_queue *q, int force)
 {
 	struct deadline_data *dd = q->elevator->elevator_data;
-	const int reads = !list_empty(&dd->fifo_list[READ]);
-	const int writes = !list_empty(&dd->fifo_list[WRITE]);
-	struct request *rq;
-	int data_dir;
+	int rt_ = !list_empty(&dd->fifo_list[2]);
+	int sync_ = !list_empty(&dd->fifo_list[1]);
+	int async_ = !list_empty(&dd->fifo_list[0]);
+	struct request *rq = dd->next_rq;
+	int request_prio = dd->cur_batch_prio;
 
-	/*
-	 * batches are currently reads XOR writes
-	 */
-	if (dd->next_rq[WRITE])
-		rq = dd->next_rq[WRITE];
-	else
-		rq = dd->next_rq[READ];
-
-	if (rq && dd->batching < dd->fifo_batch)
+	if (rq && (dd->batching < dd->fifo_batch || jiffies-dd->cur_batch_start < dd->time_quantum)) {
 		/* we have a next request are still entitled to batch */
 		goto dispatch_request;
+	}
 
 	/*
 	 * at this point we are not running a batch. select the appropriate
 	 * data direction (read / write)
 	 */
 
-	if (reads) {
-		BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[READ]));
-
-		if (writes && (dd->starved++ >= dd->writes_starved))
-			goto dispatch_writes;
-
-		data_dir = READ;
+	if (rt_) {
+		request_prio = 2;
+		goto dispatch_find_request;
+	}
 
+	if (sync_) {
+		if (async_ && (dd->starved++ >= dd->async_starved))
+			goto dispatch_async;
+		request_prio = 1;
 		goto dispatch_find_request;
 	}
 
@@ -282,37 +393,44 @@ static int deadline_dispatch_requests(struct request_queue *q, int force)
 	 * there are either no reads or writes have been starved
 	 */
 
-	if (writes) {
-dispatch_writes:
-		BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[WRITE]));
-
+	if (async_) {
+dispatch_async:
 		dd->starved = 0;
-
-		data_dir = WRITE;
-
+		request_prio = 0;
 		goto dispatch_find_request;
 	}
 
+	dd->cur_batch_start=jiffies;
+	dd->batching = 0;
 	return 0;
 
 dispatch_find_request:
+
 	/*
-	 * we are not running a batch, find best request for selected data_dir
+	 * we are not running a batch, find best request for selected request_prio
 	 */
-	if (deadline_check_fifo(dd, data_dir) || !dd->next_rq[data_dir]) {
+	if (!dd->next_rq ||
+	    dd->cur_batch_prio < request_prio ||
+	    deadline_check_request(dd, request_prio)) {
 		/*
-		 * A deadline has expired, the last request was in the other
-		 * direction, or we have run out of higher-sectored requests.
-		 * Start again from the request with the earliest expiry time.
+		 * A deadline has expired, the previous batch had a lower priority,
+		 * or we have run out of higher-sectored requests.
+		 * Start again (a bit before) the request with the earliest expiry time.
 		 */
-		rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
+		struct request * nrq = rq_entry_fifo(dd->fifo_list[request_prio].next);
+		int batch = dd->fifo_batch/3;
+		rq=nrq;
+		while(batch-- && (nrq = deadline_prev_request(nrq)))
+			if(request_prio<=deadline_compute_request_priority(nrq)) rq = nrq;
 	} else {
 		/*
-		 * The last req was the same dir and we have a next request in
+		 * The last batch was same or higher priority and we have a next request in
 		 * sort order. No expired requests so continue on from here.
 		 */
-		rq = dd->next_rq[data_dir];
+		rq = dd->next_rq;
 	}
+	dd->cur_batch_prio = request_prio;
+	dd->cur_batch_start = jiffies;
 
 	dd->batching = 0;
 
@@ -320,26 +438,29 @@ dispatch_request:
 	/*
 	 * rq is the selected appropriate request.
 	 */
-	dd->batching++;
-	deadline_move_request(dd, rq);
 
+	dd->batching++;
+	deadline_move_request(dd, rq, blk_queue_nonrot(q));
 	return 1;
 }
 
 static int deadline_queue_empty(struct request_queue *q)
 {
 	struct deadline_data *dd = q->elevator->elevator_data;
-
-	return list_empty(&dd->fifo_list[WRITE])
-		&& list_empty(&dd->fifo_list[READ]);
+	return list_empty(&dd->fifo_list[0])
+		&& list_empty(&dd->fifo_list[1])
+		&& list_empty(&dd->fifo_list[2]);
 }
 
+
+
 static void deadline_exit_queue(struct elevator_queue *e)
 {
 	struct deadline_data *dd = e->elevator_data;
 
-	BUG_ON(!list_empty(&dd->fifo_list[READ]));
-	BUG_ON(!list_empty(&dd->fifo_list[WRITE]));
+	BUG_ON(!list_empty(&dd->fifo_list[0]));
+	BUG_ON(!list_empty(&dd->fifo_list[1]));
+	BUG_ON(!list_empty(&dd->fifo_list[2]));
 
 	kfree(dd);
 }
@@ -355,13 +476,16 @@ static void *deadline_init_queue(struct request_queue *q)
 	if (!dd)
 		return NULL;
 
-	INIT_LIST_HEAD(&dd->fifo_list[READ]);
-	INIT_LIST_HEAD(&dd->fifo_list[WRITE]);
+	INIT_LIST_HEAD(&dd->fifo_list[0]);
+	INIT_LIST_HEAD(&dd->fifo_list[1]);
+	INIT_LIST_HEAD(&dd->fifo_list[2]);
 	dd->sort_list[READ] = RB_ROOT;
 	dd->sort_list[WRITE] = RB_ROOT;
-	dd->fifo_expire[READ] = read_expire;
-	dd->fifo_expire[WRITE] = write_expire;
-	dd->writes_starved = writes_starved;
+	dd->fifo_expire[0] = async_expire;
+	dd->fifo_expire[1] = sync_expire;
+	dd->fifo_expire[2] = rt_sync_expire;
+	dd->time_quantum = time_quantum;
+	dd->async_starved = async_starved;
 	dd->front_merges = 1;
 	dd->fifo_batch = fifo_batch;
 	return dd;
@@ -395,9 +519,12 @@ static ssize_t __FUNC(struct elevator_queue *e, char *page)		\
 		__data = jiffies_to_msecs(__data);			\
 	return deadline_var_show(__data, (page));			\
 }
-SHOW_FUNCTION(deadline_read_expire_show, dd->fifo_expire[READ], 1);
-SHOW_FUNCTION(deadline_write_expire_show, dd->fifo_expire[WRITE], 1);
-SHOW_FUNCTION(deadline_writes_starved_show, dd->writes_starved, 0);
+SHOW_FUNCTION(deadline_async_expire_show, dd->fifo_expire[0], 1);
+SHOW_FUNCTION(deadline_sync_expire_show, dd->fifo_expire[1], 1);
+SHOW_FUNCTION(deadline_rt_sync_expire_show, dd->fifo_expire[2], 1);
+SHOW_FUNCTION(deadline_time_quantum_show, dd->time_quantum, 1);
+
+SHOW_FUNCTION(deadline_async_starved_show, dd->async_starved, 0);
 SHOW_FUNCTION(deadline_front_merges_show, dd->front_merges, 0);
 SHOW_FUNCTION(deadline_fifo_batch_show, dd->fifo_batch, 0);
 #undef SHOW_FUNCTION
@@ -418,9 +545,12 @@ static ssize_t __FUNC(struct elevator_queue *e, const char *page, size_t count)
 		*(__PTR) = __data;					\
 	return ret;							\
 }
-STORE_FUNCTION(deadline_read_expire_store, &dd->fifo_expire[READ], 0, INT_MAX, 1);
-STORE_FUNCTION(deadline_write_expire_store, &dd->fifo_expire[WRITE], 0, INT_MAX, 1);
-STORE_FUNCTION(deadline_writes_starved_store, &dd->writes_starved, INT_MIN, INT_MAX, 0);
+STORE_FUNCTION(deadline_async_expire_store, &dd->fifo_expire[0], 0, INT_MAX, 1);
+STORE_FUNCTION(deadline_sync_expire_store, &dd->fifo_expire[1], 0, INT_MAX, 1);
+STORE_FUNCTION(deadline_rt_sync_expire_store, &dd->fifo_expire[2], 0, INT_MAX, 1);
+STORE_FUNCTION(deadline_time_quantum_store, &dd->time_quantum, 0, INT_MAX, 1);
+
+STORE_FUNCTION(deadline_async_starved_store, &dd->async_starved, INT_MIN, INT_MAX, 0);
 STORE_FUNCTION(deadline_front_merges_store, &dd->front_merges, 0, 1, 0);
 STORE_FUNCTION(deadline_fifo_batch_store, &dd->fifo_batch, 0, INT_MAX, 0);
 #undef STORE_FUNCTION
@@ -430,9 +560,11 @@ STORE_FUNCTION(deadline_fifo_batch_store, &dd->fifo_batch, 0, INT_MAX, 0);
 				      deadline_##name##_store)
 
 static struct elv_fs_entry deadline_attrs[] = {
-	DD_ATTR(read_expire),
-	DD_ATTR(write_expire),
-	DD_ATTR(writes_starved),
+	DD_ATTR(async_expire),
+	DD_ATTR(sync_expire),
+	DD_ATTR(rt_sync_expire),
+	DD_ATTR(time_quantum),
+	DD_ATTR(async_starved),
 	DD_ATTR(front_merges),
 	DD_ATTR(fifo_batch),
 	__ATTR_NULL

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-05-01 19:38 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-22 21:07 Reduce latencies for syncronous writes and high I/O priority requests in deadline IO scheduler Corrado Zoccolo
2009-04-23 11:18 ` Paolo Ciarrocchi
2009-04-23 11:28 ` Jens Axboe
2009-04-23 15:57   ` Corrado Zoccolo
2009-04-23 11:52 ` Aaron Carroll
2009-04-23 12:13   ` Jens Axboe
2009-04-23 16:10   ` Corrado Zoccolo
2009-04-23 23:30     ` Aaron Carroll
2009-04-24  6:13       ` Corrado Zoccolo
2009-04-24  6:39     ` Jens Axboe
2009-04-24 16:07       ` Corrado Zoccolo
2009-04-24 21:37         ` Corrado Zoccolo
2009-04-26 12:43           ` Corrado Zoccolo
2009-05-01 19:30             ` Corrado Zoccolo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.