All of lore.kernel.org
 help / color / mirror / Atom feed
From: Corrado Zoccolo <czoccolo@gmail.com>
To: jens.axboe@oracle.com, Linux-Kernel <linux-kernel@vger.kernel.org>
Subject: Reduce latencies for syncronous writes and high I/O priority requests  in deadline IO scheduler
Date: Wed, 22 Apr 2009 23:07:32 +0200	[thread overview]
Message-ID: <4e5e476b0904221407v7f43c058l8fc61198a2e4bb6e@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 10194 bytes --]

Hi,
deadline I/O scheduler currently classifies all I/O requests in only 2
classes, reads (always considered high priority) and writes (always
lower).
The attached patch, intended to reduce latencies for syncronous writes
and high I/O priority requests, introduces more levels of priorities:
* real time reads: highest priority and shortest deadline, can starve
other levels
* syncronous operations (either best effort reads or RT/BE writes),
mid priority, starvation for lower level is prevented as usual
* asyncronous operations (async writes and all IDLE class requests),
lowest priority and longest deadline

The patch also introduces some new heuristics:
* for non-rotational devices, reads (within a given priority level)
are issued in FIFO order, to improve the latency perceived by readers
* minimum batch timespan (time quantum): partners with fifo_batch to
improve throughput, by sending more consecutive requests together. A
given number of requests will not always take the same time (due to
amount of seek needed), therefore fifo_batch must be tuned for worst
cases, while in best cases, having longer batches would give a
throughput boost.
* batch start request is chosen fifo_batch/3 requests before the
expired one, to improve fairness for requests with lower start sector,
that otherwise have higher probability to miss a deadline than
mid-sector requests.

I did few performance comparisons:
* HDD, ext3 partition with data=writeback, tiotest with 32 threads,
each writing 80MB of data

** deadline-original
Tiotest results for 32 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        2560 MBs |  103.0 s |  24.848 MB/s |  10.6 %  | 522.2 % |
| Random Write  125 MBs |   98.8 s |   1.265 MB/s |  -1.6 %  |  16.1 % |
| Read         2560 MBs |  166.2 s |  15.400 MB/s |   4.2 %  |  82.7 % |
| Random Read   125 MBs |  193.3 s |   0.647 MB/s |  -0.8 %  |  14.5 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item         | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write        |        4.122 ms |    17922.920 ms |  0.07980 |   0.00061 |
| Random Write |        0.599 ms |     1245.200 ms |  0.00000 |   0.00000 |
| Read         |        8.032 ms |     1125.759 ms |  0.00000 |   0.00000 |
| Random Read  |      181.968 ms |      972.657 ms |  0.00000 |   0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total        |       10.044 ms |    17922.920 ms |  0.03804 |   0.00029 |
`--------------+-----------------+-----------------+----------+-----------'

** cfq (2.6.30-rc2)
Tiotest results for 32 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        2560 MBs |  132.4 s |  19.342 MB/s |   8.5 %  | 400.4 % |
| Random Write  125 MBs |  107.8 s |   1.159 MB/s |  -1.6 %  |  16.8 % |
| Read         2560 MBs |  107.6 s |  23.788 MB/s |   5.4 %  |  95.7 % |
| Random Read   125 MBs |  158.4 s |   0.789 MB/s |   0.9 %  |   7.7 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item         | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write        |        5.362 ms |    21081.012 ms |  0.09811 |   0.00244 |
| Random Write |       23.310 ms |    31865.095 ms |  0.13437 |   0.06250 |
| Read         |        5.048 ms |     3694.001 ms |  0.15167 |   0.00000 |
| Random Read  |      146.523 ms |     2880.409 ms |  0.52187 |   0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total        |        8.916 ms |    31865.095 ms |  0.13435 |   0.00262 |
`--------------+-----------------+-----------------+----------+-----------'

** deadline-patched
Tiotest results for 32 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        2560 MBs |  105.3 s |  24.301 MB/s |  10.5 %  | 514.8 % |
| Random Write  125 MBs |   95.9 s |   1.304 MB/s |  -1.8 %  |  17.3 % |
| Read         2560 MBs |  165.1 s |  15.507 MB/s |   2.7 %  |  61.9 % |
| Random Read   125 MBs |  110.6 s |   1.130 MB/s |   0.8 %  |  12.2 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item         | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write        |        4.131 ms |    17456.831 ms |  0.08041 |   0.00275 |
| Random Write |        2.780 ms |     5073.180 ms |  0.07500 |   0.00000 |
| Read         |        7.748 ms |      936.499 ms |  0.00000 |   0.00000 |
| Random Read  |      104.849 ms |      695.192 ms |  0.00000 |   0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total        |        8.168 ms |    17456.831 ms |  0.04008 |   0.00131 |
`--------------+-----------------+-----------------+----------+-----------'

* SD card, nilfs2 partition, tiotest with 16 threads, each writing 80MB of data
** cfq(2.6.30-rc2)
Tiotest results for 16 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        1280 MBs |  217.8 s |   5.878 MB/s |   3.7 %  |  92.2 % |
| Random Write   62 MBs |   18.2 s |   3.432 MB/s |  -2.3 %  |  28.7 % |
| Read         1280 MBs |  114.7 s |  11.156 MB/s |   7.3 %  |  76.6 % |
| Random Read    62 MBs |    3.4 s |  18.615 MB/s |  -5.4 %  | 274.2 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item         | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write        |        9.943 ms |    10223.581 ms |  0.14252 |   0.00488 |
| Random Write |       12.287 ms |     5097.196 ms |  0.25625 |   0.00000 |
| Read         |        5.352 ms |     1550.162 ms |  0.00000 |   0.00000 |
| Random Read  |        3.051 ms |     1507.837 ms |  0.00000 |   0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total        |        7.649 ms |    10223.581 ms |  0.07391 |   0.00233 |
`--------------+-----------------+-----------------+----------+-----------'

** deadline-patched:
Tiotest results for 16 concurrent io threads:
,----------------------------------------------------------------------.
| Item                  | Time     | Rate         | Usr CPU  | Sys CPU |
+-----------------------+----------+--------------+----------+---------+
| Write        1280 MBs |  220.9 s |   5.794 MB/s |   4.0 %  |  93.9 % |
| Random Write   62 MBs |   20.5 s |   3.044 MB/s |  -2.2 %  |  24.9 % |
| Read         1280 MBs |  113.2 s |  11.304 MB/s |   6.8 %  |  72.8 % |
| Random Read    62 MBs |    2.9 s |  21.896 MB/s |   5.1 %  | 293.8 % |
`----------------------------------------------------------------------'
Tiotest latency results:
,-------------------------------------------------------------------------.
| Item         | Average latency | Maximum latency | % >2 sec | % >10 sec |
+--------------+-----------------+-----------------+----------+-----------+
| Write        |       10.078 ms |    13303.036 ms |  0.14160 |   0.00031 |
| Random Write |       14.350 ms |     5265.088 ms |  0.40000 |   0.00000 |
| Read         |        5.455 ms |      434.495 ms |  0.00000 |   0.00000 |
| Random Read  |        2.685 ms |       12.652 ms |  0.00000 |   0.00000 |
|--------------+-----------------+-----------------+----------+-----------|
| Total        |        7.801 ms |    13303.036 ms |  0.07682 |   0.00015 |
`--------------+-----------------+-----------------+----------+-----------'

* fsync-tester results, on HDD, empty ext3 partition, mounted with
data=writeback
** deadline-original:
fsync time: 0.7963
fsync time: 4.5914
fsync time: 4.2347
fsync time: 1.1670
fsync time: 0.8164
fsync time: 1.9783
fsync time: 4.9726
fsync time: 2.4929
fsync time: 2.5448
fsync time: 3.9627
** cfq 2.6.30-rc2
fsync time: 0.0288
fsync time: 0.0528
fsync time: 0.0299
fsync time: 0.0397
fsync time: 0.5720
fsync time: 0.0409
fsync time: 0.0876
fsync time: 0.0294
fsync time: 0.0485
** deadline-patched
fsync time: 0.0772
fsync time: 0.0381
fsync time: 0.0604
fsync time: 0.2923
fsync time: 0.2488
fsync time: 0.0924
fsync time: 0.0144
fsync time: 1.4824
fsync time: 0.0789
fsync time: 0.0565
fsync time: 0.0550
fsync time: 0.0421
** deadline-patched, ionice -c1:
fsync time: 0.2569
fsync time: 0.0500
fsync time: 0.0681
fsync time: 0.2863
fsync time: 0.0140
fsync time: 0.0171
fsync time: 0.1198
fsync time: 0.0530
fsync time: 0.0503
fsync time: 0.0462
fsync time: 0.0484
fsync time: 0.0328
fsync time: 0.0562
fsync time: 0.0451
fsync time: 0.0576
fsync time: 0.0444
fsync time: 0.0469
fsync time: 0.0368
fsync time: 0.2865

Corrado

-- 
__________________________________________________________________________

dott. Corrado Zoccolo                          mailto:czoccolo@gmail.com
PhD - Department of Computer Science - University of Pisa, Italy
--------------------------------------------------------------------------

[-- Attachment #2: deadline-iosched.c.patch --]
[-- Type: application/octet-stream, Size: 17479 bytes --]

diff --git a/block/deadline-iosched.c b/block/deadline-iosched.c
index c4d991d..5222b61 100644
--- a/block/deadline-iosched.c
+++ b/block/deadline-iosched.c
@@ -17,11 +17,12 @@
 /*
  * See Documentation/block/deadline-iosched.txt
  */
-static const int read_expire = HZ / 2;  /* max time before a read is submitted. */
-static const int write_expire = 5 * HZ; /* ditto for writes, these limits are SOFT! */
-static const int writes_starved = 2;    /* max times reads can starve a write */
-static const int fifo_batch = 16;       /* # of sequential requests treated as one
-				     by the above parameters. For throughput. */
+static const int rt_sync_expire = HZ / 8;  /* max time before a real-time sync operation (e.g. read for RT class process) is submitted. */
+static const int sync_expire = HZ / 2;     /* max time before a sync operation (e.g. read) is submitted. */
+static const int async_expire = 5 * HZ;    /* ditto for async operations (e.g. writes), these limits are SOFT! */
+static const int async_starved = 2;        /* max times SYNC can starve ASYNC requests */
+static const int fifo_batch = 16;          /* min # of sequential requests treated as one by the above parameters. For throughput. */
+static const int time_quantum  = HZ / 10;  /* min duration for a batch */
 
 struct deadline_data {
 	/*
@@ -31,27 +32,33 @@ struct deadline_data {
 	/*
 	 * requests (deadline_rq s) are present on both sort_list and fifo_list
 	 */
-	struct rb_root sort_list[2];	
-	struct list_head fifo_list[2];
+	struct rb_root sort_list[2]; /* READ, WRITE */
+	struct list_head fifo_list[3]; /* 0=ASYNC (or IDLE), 1=SYNC (or RT ASYNC), 2=RT SYNC */
 
 	/*
-	 * next in sort order. read, write or both are NULL
+	 * next in sort order.
 	 */
-	struct request *next_rq[2];
+	struct request *next_rq;
 	unsigned int batching;		/* number of sequential requests made */
-	sector_t last_sector;		/* head position */
 	unsigned int starved;		/* times reads have starved writes */
 
 	/*
 	 * settings that change how the i/o scheduler behaves
 	 */
-	int fifo_expire[2];
+	int fifo_expire[3];
+	int time_quantum;
 	int fifo_batch;
-	int writes_starved;
+	int async_starved;
 	int front_merges;
+
+	/*
+	  current batch data & stats
+	 */
+	int cur_batch_prio;
+	unsigned long cur_batch_start;
 };
 
-static void deadline_move_request(struct deadline_data *, struct request *);
+static void deadline_move_request(struct deadline_data *, struct request *, int nonrot);
 
 static inline struct rb_root *
 deadline_rb_root(struct deadline_data *dd, struct request *rq)
@@ -63,7 +70,7 @@ deadline_rb_root(struct deadline_data *dd, struct request *rq)
  * get the request after `rq' in sector-sorted order
  */
 static inline struct request *
-deadline_latter_request(struct request *rq)
+deadline_next_request(struct request *rq)
 {
 	struct rb_node *node = rb_next(&rq->rb_node);
 
@@ -73,27 +80,118 @@ deadline_latter_request(struct request *rq)
 	return NULL;
 }
 
+/*
+ * get the request before `rq' in sector-sorted order
+ */
+static inline struct request *
+deadline_prev_request(struct request *rq)
+{
+	struct rb_node *node = rb_prev(&rq->rb_node);
+
+	if (node)
+		return rb_entry_rq(node);
+
+	return NULL;
+}
+
 static void
-deadline_add_rq_rb(struct deadline_data *dd, struct request *rq)
+deadline_add_rq_rb(struct deadline_data *dd, struct request *rq, int nonrot)
 {
 	struct rb_root *root = deadline_rb_root(dd, rq);
 	struct request *__alias;
 
 	while (unlikely(__alias = elv_rb_add(root, rq)))
-		deadline_move_request(dd, __alias);
+		deadline_move_request(dd, __alias, nonrot);
 }
 
 static inline void
 deadline_del_rq_rb(struct deadline_data *dd, struct request *rq)
 {
-	const int data_dir = rq_data_dir(rq);
-
-	if (dd->next_rq[data_dir] == rq)
-		dd->next_rq[data_dir] = deadline_latter_request(rq);
+	if (dd->next_rq == rq)
+		dd->next_rq = deadline_next_request(rq);
 
 	elv_rb_del(deadline_rb_root(dd, rq), rq);
 }
 
+static void
+list_add_timesorted(struct list_head *q, struct request *rq)
+{
+	struct list_head *entry;
+	int stop_flags = REQ_SOFTBARRIER | REQ_HARDBARRIER | REQ_STARTED;
+	list_for_each_prev(entry, q) {
+		struct request *pos = list_entry_rq(entry);
+		if (pos->cmd_flags & stop_flags)
+			break;
+		if (rq_fifo_time(rq) > rq_fifo_time(pos))
+			break;
+		if (rq_fifo_time(rq) == rq_fifo_time(pos) &&
+		    rq->sector >= pos->sector)
+			break;
+	}
+	list_add(&rq->queuelist, entry);
+}
+
+static int ioprio_lub(unsigned short aprio, unsigned short bprio)
+{
+	unsigned short aclass = IOPRIO_PRIO_CLASS(aprio);
+	unsigned short bclass = IOPRIO_PRIO_CLASS(bprio);
+
+	if (aclass == IOPRIO_CLASS_NONE)
+		return bprio;
+	if (bclass == IOPRIO_CLASS_NONE)
+		return aprio;
+
+	if (aclass == bclass)
+		return min(aprio, bprio);
+	if (aclass > bclass)
+		return bprio;
+	else
+		return aprio;
+}
+
+static void
+deadline_merge_prio_data(struct request_queue *q, struct request *rq)
+{
+	struct task_struct *tsk = current;
+	struct io_context *ioc = get_io_context(GFP_ATOMIC,q->node);
+	int ioprio_class = IOPRIO_CLASS_NONE;
+	int ioprio = IOPRIO_NORM;
+
+	if(ioc) {
+		ioprio_class = task_ioprio_class(ioc);
+	}
+
+	switch (ioprio_class) {
+	default:
+		printk(KERN_ERR "deadline: bad prio %x\n", ioprio_class);
+	case IOPRIO_CLASS_NONE:
+		/*
+		 * no prio set, inherit CPU scheduling settings
+		 */
+		ioprio = task_nice_ioprio(tsk);
+		ioprio_class = task_nice_ioclass(tsk);
+		break;
+	case IOPRIO_CLASS_RT:
+	case IOPRIO_CLASS_BE:
+		ioprio = task_ioprio(ioc);
+		break;
+	case IOPRIO_CLASS_IDLE:
+		ioprio = 7;
+		break;
+	}
+
+	ioprio=IOPRIO_PRIO_VALUE(ioprio_class,ioprio);
+	rq->ioprio=ioprio_lub(rq->ioprio,ioprio);
+}
+
+static int
+deadline_compute_request_priority(struct request *req)
+{
+	unsigned short ioprio_class=IOPRIO_PRIO_CLASS(req_get_ioprio(req));
+	return (ioprio_class!=IOPRIO_CLASS_IDLE)*
+		(!!rq_is_sync(req) + (rq_data_dir(req)==READ)*(ioprio_class==IOPRIO_CLASS_RT));
+}
+
 /*
  * add rq to rbtree and fifo
  */
@@ -101,15 +199,17 @@ static void
 deadline_add_request(struct request_queue *q, struct request *rq)
 {
 	struct deadline_data *dd = q->elevator->elevator_data;
-	const int data_dir = rq_data_dir(rq);
 
-	deadline_add_rq_rb(dd, rq);
+	deadline_merge_prio_data(q,rq);
+	deadline_add_rq_rb(dd, rq, blk_queue_nonrot(q));
 
 	/*
-	 * set expire time and add to fifo list
+	 * set request creation time and add to fifo list
 	 */
-	rq_set_fifo_time(rq, jiffies + dd->fifo_expire[data_dir]);
-	list_add_tail(&rq->queuelist, &dd->fifo_list[data_dir]);
+
+	rq_set_fifo_time(rq, jiffies);
+	
+	list_add_timesorted(&dd->fifo_list[deadline_compute_request_priority(rq)],rq);
 }
 
 /*
@@ -157,14 +257,16 @@ static void deadline_merged_request(struct request_queue *q,
 				    struct request *req, int type)
 {
 	struct deadline_data *dd = q->elevator->elevator_data;
-
 	/*
 	 * if the merge was a front merge, we need to reposition request
 	 */
 	if (type == ELEVATOR_FRONT_MERGE) {
 		elv_rb_del(deadline_rb_root(dd, req), req);
-		deadline_add_rq_rb(dd, req);
+		deadline_add_rq_rb(dd, req, blk_queue_nonrot(q));
 	}
+
+	deadline_merge_prio_data(q,req);
+
 }
 
 static void
@@ -172,7 +274,7 @@ deadline_merged_requests(struct request_queue *q, struct request *req,
 			 struct request *next)
 {
 	/*
-	 * if next expires before rq, assign its expire time to rq
+	 * request that cannot idle. if next expires before rq, assign its expire time to rq
 	 * and move into next position (next will be deleted) in fifo
 	 */
 	if (!list_empty(&req->queuelist) && !list_empty(&next->queuelist)) {
@@ -204,15 +306,33 @@ deadline_move_to_dispatch(struct deadline_data *dd, struct request *rq)
  * move an entry to dispatch queue
  */
 static void
-deadline_move_request(struct deadline_data *dd, struct request *rq)
+deadline_move_request(struct deadline_data *dd, struct request *rq, int nonrot)
 {
 	const int data_dir = rq_data_dir(rq);
+	dd->next_rq = NULL;
+
+	if(data_dir != READ || !nonrot) {
+		int max_search = dd->fifo_batch;
+		/* for rot devices, or writes on non-rot, requests are dispatched in disk order */
+		dd->next_rq = rq;
+		/* try to get requests of at least the same priority as current one */
+		while(max_search-- && (dd->next_rq = deadline_next_request(dd->next_rq)) && dd->cur_batch_prio>deadline_compute_request_priority(dd->next_rq));
+		if(!max_search || !dd->next_rq) { // did not get a next of the same priority, demote batch to lower, and continue in disk order
+			dd->next_rq = deadline_next_request(rq);
+			if(dd->next_rq) dd->cur_batch_prio = deadline_compute_request_priority(dd->next_rq);
+		}
 
-	dd->next_rq[READ] = NULL;
-	dd->next_rq[WRITE] = NULL;
-	dd->next_rq[data_dir] = deadline_latter_request(rq);
-
-	dd->last_sector = rq_end_sector(rq);
+	} else { /* nonrot && data_dir==READ : requests are dispatched in deadline order */
+		struct list_head *entry;
+		list_for_each(entry, &dd->fifo_list[dd->cur_batch_prio]) {
+			struct request *pos = list_entry_rq(entry);
+			if(pos==rq) continue;
+			if(rq_data_dir(pos)==data_dir) { /* find same direction (always READ) */
+				dd->next_rq = pos;
+				break;
+			}
+		}
+	}
 
 	/*
 	 * take it off the sort and fifo list, move
@@ -222,20 +342,16 @@ deadline_move_request(struct deadline_data *dd, struct request *rq)
 }
 
 /*
- * deadline_check_fifo returns 0 if there are no expired requests on the fifo,
- * 1 otherwise. Requires !list_empty(&dd->fifo_list[data_dir])
+ * deadline_check_fifo returns 0 if there are no expired requests on the fifo for given priority,
+ * 1 otherwise. Requires !list_empty(&dd->fifo_list[prio])
  */
-static inline int deadline_check_fifo(struct deadline_data *dd, int ddir)
+static inline int deadline_check_request(struct deadline_data *dd, unsigned prio) 
 {
-	struct request *rq = rq_entry_fifo(dd->fifo_list[ddir].next);
-
+	BUG_ON(list_empty(&dd->fifo_list[prio]));
 	/*
-	 * rq is expired!
+	 * deadline is expired!
 	 */
-	if (time_after(jiffies, rq_fifo_time(rq)))
-		return 1;
-
-	return 0;
+	return time_after(jiffies, dd->fifo_expire[prio] + rq_fifo_time(rq_entry_fifo(dd->fifo_list[prio].next)));
 }
 
 /*
@@ -245,36 +361,31 @@ static inline int deadline_check_fifo(struct deadline_data *dd, int ddir)
 static int deadline_dispatch_requests(struct request_queue *q, int force)
 {
 	struct deadline_data *dd = q->elevator->elevator_data;
-	const int reads = !list_empty(&dd->fifo_list[READ]);
-	const int writes = !list_empty(&dd->fifo_list[WRITE]);
-	struct request *rq;
-	int data_dir;
+	int rt_ = !list_empty(&dd->fifo_list[2]);
+	int sync_ = !list_empty(&dd->fifo_list[1]);
+	int async_ = !list_empty(&dd->fifo_list[0]);
+	struct request *rq = dd->next_rq;
+	int request_prio = dd->cur_batch_prio;
 
-	/*
-	 * batches are currently reads XOR writes
-	 */
-	if (dd->next_rq[WRITE])
-		rq = dd->next_rq[WRITE];
-	else
-		rq = dd->next_rq[READ];
-
-	if (rq && dd->batching < dd->fifo_batch)
+	if (rq && (dd->batching < dd->fifo_batch || jiffies-dd->cur_batch_start < dd->time_quantum)) {
 		/* we have a next request are still entitled to batch */
 		goto dispatch_request;
+	}
 
 	/*
 	 * at this point we are not running a batch. select the appropriate
 	 * data direction (read / write)
 	 */
 
-	if (reads) {
-		BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[READ]));
-
-		if (writes && (dd->starved++ >= dd->writes_starved))
-			goto dispatch_writes;
-
-		data_dir = READ;
+	if (rt_) {
+		request_prio = 2;
+		goto dispatch_find_request;
+	}
 
+	if (sync_) {
+		if (async_ && (dd->starved++ >= dd->async_starved))
+			goto dispatch_async;
+		request_prio = 1;
 		goto dispatch_find_request;
 	}
 
@@ -282,37 +393,44 @@ static int deadline_dispatch_requests(struct request_queue *q, int force)
 	 * there are either no reads or writes have been starved
 	 */
 
-	if (writes) {
-dispatch_writes:
-		BUG_ON(RB_EMPTY_ROOT(&dd->sort_list[WRITE]));
-
+	if (async_) {
+dispatch_async:
 		dd->starved = 0;
-
-		data_dir = WRITE;
-
+		request_prio = 0;
 		goto dispatch_find_request;
 	}
 
+	dd->cur_batch_start=jiffies;
+	dd->batching = 0;
 	return 0;
 
 dispatch_find_request:
+
 	/*
-	 * we are not running a batch, find best request for selected data_dir
+	 * we are not running a batch, find best request for selected request_prio
 	 */
-	if (deadline_check_fifo(dd, data_dir) || !dd->next_rq[data_dir]) {
+	if (!dd->next_rq ||
+	    dd->cur_batch_prio < request_prio ||
+	    deadline_check_request(dd, request_prio)) {
 		/*
-		 * A deadline has expired, the last request was in the other
-		 * direction, or we have run out of higher-sectored requests.
-		 * Start again from the request with the earliest expiry time.
+		 * A deadline has expired, the previous batch had a lower priority,
+		 * or we have run out of higher-sectored requests.
+		 * Start again (a bit before) the request with the earliest expiry time.
 		 */
-		rq = rq_entry_fifo(dd->fifo_list[data_dir].next);
+		struct request * nrq = rq_entry_fifo(dd->fifo_list[request_prio].next);
+		int batch = dd->fifo_batch/3;
+		rq=nrq;
+		while(batch-- && (nrq = deadline_prev_request(nrq)))
+			if(request_prio<=deadline_compute_request_priority(nrq)) rq = nrq;
 	} else {
 		/*
-		 * The last req was the same dir and we have a next request in
+		 * The last batch was same or higher priority and we have a next request in
 		 * sort order. No expired requests so continue on from here.
 		 */
-		rq = dd->next_rq[data_dir];
+		rq = dd->next_rq;
 	}
+	dd->cur_batch_prio = request_prio;
+	dd->cur_batch_start = jiffies;
 
 	dd->batching = 0;
 
@@ -320,26 +438,29 @@ dispatch_request:
 	/*
 	 * rq is the selected appropriate request.
 	 */
-	dd->batching++;
-	deadline_move_request(dd, rq);
 
+	dd->batching++;
+	deadline_move_request(dd, rq, blk_queue_nonrot(q));
 	return 1;
 }
 
 static int deadline_queue_empty(struct request_queue *q)
 {
 	struct deadline_data *dd = q->elevator->elevator_data;
-
-	return list_empty(&dd->fifo_list[WRITE])
-		&& list_empty(&dd->fifo_list[READ]);
+	return list_empty(&dd->fifo_list[0])
+		&& list_empty(&dd->fifo_list[1])
+		&& list_empty(&dd->fifo_list[2]);
 }
 
+
+
 static void deadline_exit_queue(struct elevator_queue *e)
 {
 	struct deadline_data *dd = e->elevator_data;
 
-	BUG_ON(!list_empty(&dd->fifo_list[READ]));
-	BUG_ON(!list_empty(&dd->fifo_list[WRITE]));
+	BUG_ON(!list_empty(&dd->fifo_list[0]));
+	BUG_ON(!list_empty(&dd->fifo_list[1]));
+	BUG_ON(!list_empty(&dd->fifo_list[2]));
 
 	kfree(dd);
 }
@@ -355,13 +476,16 @@ static void *deadline_init_queue(struct request_queue *q)
 	if (!dd)
 		return NULL;
 
-	INIT_LIST_HEAD(&dd->fifo_list[READ]);
-	INIT_LIST_HEAD(&dd->fifo_list[WRITE]);
+	INIT_LIST_HEAD(&dd->fifo_list[0]);
+	INIT_LIST_HEAD(&dd->fifo_list[1]);
+	INIT_LIST_HEAD(&dd->fifo_list[2]);
 	dd->sort_list[READ] = RB_ROOT;
 	dd->sort_list[WRITE] = RB_ROOT;
-	dd->fifo_expire[READ] = read_expire;
-	dd->fifo_expire[WRITE] = write_expire;
-	dd->writes_starved = writes_starved;
+	dd->fifo_expire[0] = async_expire;
+	dd->fifo_expire[1] = sync_expire;
+	dd->fifo_expire[2] = rt_sync_expire;
+	dd->time_quantum = time_quantum;
+	dd->async_starved = async_starved;
 	dd->front_merges = 1;
 	dd->fifo_batch = fifo_batch;
 	return dd;
@@ -395,9 +519,12 @@ static ssize_t __FUNC(struct elevator_queue *e, char *page)		\
 		__data = jiffies_to_msecs(__data);			\
 	return deadline_var_show(__data, (page));			\
 }
-SHOW_FUNCTION(deadline_read_expire_show, dd->fifo_expire[READ], 1);
-SHOW_FUNCTION(deadline_write_expire_show, dd->fifo_expire[WRITE], 1);
-SHOW_FUNCTION(deadline_writes_starved_show, dd->writes_starved, 0);
+SHOW_FUNCTION(deadline_async_expire_show, dd->fifo_expire[0], 1);
+SHOW_FUNCTION(deadline_sync_expire_show, dd->fifo_expire[1], 1);
+SHOW_FUNCTION(deadline_rt_sync_expire_show, dd->fifo_expire[2], 1);
+SHOW_FUNCTION(deadline_time_quantum_show, dd->time_quantum, 1);
+
+SHOW_FUNCTION(deadline_async_starved_show, dd->async_starved, 0);
 SHOW_FUNCTION(deadline_front_merges_show, dd->front_merges, 0);
 SHOW_FUNCTION(deadline_fifo_batch_show, dd->fifo_batch, 0);
 #undef SHOW_FUNCTION
@@ -418,9 +545,12 @@ static ssize_t __FUNC(struct elevator_queue *e, const char *page, size_t count)
 		*(__PTR) = __data;					\
 	return ret;							\
 }
-STORE_FUNCTION(deadline_read_expire_store, &dd->fifo_expire[READ], 0, INT_MAX, 1);
-STORE_FUNCTION(deadline_write_expire_store, &dd->fifo_expire[WRITE], 0, INT_MAX, 1);
-STORE_FUNCTION(deadline_writes_starved_store, &dd->writes_starved, INT_MIN, INT_MAX, 0);
+STORE_FUNCTION(deadline_async_expire_store, &dd->fifo_expire[0], 0, INT_MAX, 1);
+STORE_FUNCTION(deadline_sync_expire_store, &dd->fifo_expire[1], 0, INT_MAX, 1);
+STORE_FUNCTION(deadline_rt_sync_expire_store, &dd->fifo_expire[2], 0, INT_MAX, 1);
+STORE_FUNCTION(deadline_time_quantum_store, &dd->time_quantum, 0, INT_MAX, 1);
+
+STORE_FUNCTION(deadline_async_starved_store, &dd->async_starved, INT_MIN, INT_MAX, 0);
 STORE_FUNCTION(deadline_front_merges_store, &dd->front_merges, 0, 1, 0);
 STORE_FUNCTION(deadline_fifo_batch_store, &dd->fifo_batch, 0, INT_MAX, 0);
 #undef STORE_FUNCTION
@@ -430,9 +560,11 @@ STORE_FUNCTION(deadline_fifo_batch_store, &dd->fifo_batch, 0, INT_MAX, 0);
 				      deadline_##name##_store)
 
 static struct elv_fs_entry deadline_attrs[] = {
-	DD_ATTR(read_expire),
-	DD_ATTR(write_expire),
-	DD_ATTR(writes_starved),
+	DD_ATTR(async_expire),
+	DD_ATTR(sync_expire),
+	DD_ATTR(rt_sync_expire),
+	DD_ATTR(time_quantum),
+	DD_ATTR(async_starved),
 	DD_ATTR(front_merges),
 	DD_ATTR(fifo_batch),
 	__ATTR_NULL

             reply	other threads:[~2009-04-22 21:07 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-22 21:07 Corrado Zoccolo [this message]
2009-04-23 11:18 ` Reduce latencies for syncronous writes and high I/O priority requests in deadline IO scheduler Paolo Ciarrocchi
2009-04-23 11:28 ` Jens Axboe
2009-04-23 15:57   ` Corrado Zoccolo
2009-04-23 11:52 ` Aaron Carroll
2009-04-23 12:13   ` Jens Axboe
2009-04-23 16:10   ` Corrado Zoccolo
2009-04-23 23:30     ` Aaron Carroll
2009-04-24  6:13       ` Corrado Zoccolo
2009-04-24  6:39     ` Jens Axboe
2009-04-24 16:07       ` Corrado Zoccolo
2009-04-24 21:37         ` Corrado Zoccolo
2009-04-26 12:43           ` Corrado Zoccolo
2009-05-01 19:30             ` Corrado Zoccolo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4e5e476b0904221407v7f43c058l8fc61198a2e4bb6e@mail.gmail.com \
    --to=czoccolo@gmail.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.