All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] Correct sorting problem in cfq_service_tree_add
@ 2009-11-24 13:11 Alan D. Brunelle
  2009-11-24 13:13 ` [PATCH 1/1] " Alan D. Brunelle
  2009-11-24 14:03 ` [PATCH 0/1] " Corrado Zoccolo
  0 siblings, 2 replies; 5+ messages in thread
From: Alan D. Brunelle @ 2009-11-24 13:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jens Axboe

Found this whilst reviewing the CFQ I/O scheduler code: Currently, this
routine only sorts using the I/O priority class - it does not properly
sort prioritized queues within a specific class. The patch changes the
sort to utilize the full I/O priority (class & priority).

A simple test shows the problem & fixed results: on a 16-way box, for
each of 12 attached disks I started up 17 processes (one process at each
possible class/priority). Each process operated on a separate file in
the file system. I then did two types of tests: (a) direct/synchronous
and (b) direct/asynchronous w/ an 80/20 read/write split.

I then tabulated the overall I/O performed per task: (first column is
priority class (1==RT, 2==BE, 3==IDLE), second column is the I/O
priority (0==highest), then two groupings of read/write data moved
(total KiBs over a span of 120 seconds):

Synchronous:
         2.6.32-rc8     2.6.32-rc8+patch
        Read    Write     Read    Write
     ----------------   ----------------
1 0 |  311164  310760 |  424260  424116 | 
1 1 |  129712  129792 |  390208  393232 | 
1 2 |   72312   71284 |     448     420 | 
1 3 |   40364   41052 |      28      20 | 
1 4 |   26788   26352 |      28      24 | 
1 5 |   16936   16940 |      52      32 | 
1 6 |   11196   11140 |      28      20 | 
1 7 |    6476    6648 |      20      28 | 

2 0 |      24      24 |      40       8 | 
2 1 |      24      24 |      12      36 | 
2 2 |      20      28 |      20      28 | 
2 3 |      28      20 |      24      24 | 
2 4 |      28      20 |      28      20 | 
2 5 |      28      20 |      20      28 | 
2 6 |      24      24 |      20      28 | 
2 7 |      24      24 |      36      12 | 

3   |      36      12 |      28      20 |
     ----------------   ----------------
Sum    615184  614164    815300  818096

You can see that due to the "random" nature of the unpatched kernel
lower priority real-time processes get elevated I/O amounts. With the
patched kernel, real-time priorities 0&1 get the vast majority of the
available throughput (as expected). [Basically: priority 0 & 1
flip-flop: when priority 0's I/O finishes, priority 1's gets inserted
then priority 0 comes back with another I/O quick enough (most of the
time) and bumps all the other queues out of the way.]

More I/O is performed with the patched kernel (most likely) because
there is much less thrashing/seeking on the disk.

Asynchronous:
         2.6.32-rc8     2.6.32-rc8+patch
        Read    Write     Read    Write
     ----------------   ----------------
1 0 | 1969220 1967036 | 2272660 2266220 | 
1 1 |   65880   66216 |   71552   71424 | 
1 2 |   30760   30808 |    3532    3508 | 
1 3 |   17352   17336 |    2996    3148 | 
1 4 |   11496   11288 |    3028    3116 | 
1 5 |    7836    8036 |    3008    3136 | 
1 6 |    5432    5448 |    2992    3152 | 
1 7 |    3692    3860 |    3068    3076 | 

2 0 |    3172    2972 |    3052    3092 | 
2 1 |    3100    3044 |    3000    3144 | 
2 2 |    3140    3004 |    3056    3088 | 
2 3 |    3108    3036 |    3084    3060 | 
2 4 |    3116    3028 |    2968    3176 | 
2 5 |    3068    3076 |    3096    3048 | 
2 6 |    2884    3260 |    3084    3060 | 
2 7 |    3112    3032 |    3208    2936 | 

3   |    3172    2972 |    2968    3176 | 
     ----------------   ----------------
Sum   2139540 2137452   2390352  2384560

With Asynch I/O priority 0 gets the vast (vast!) majority of the
bandwidth because it is issuing more I/Os in one go (128 asynch I/Os at
a time).


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] Correct sorting problem in cfq_service_tree_add
  2009-11-24 13:11 [PATCH 0/1] Correct sorting problem in cfq_service_tree_add Alan D. Brunelle
@ 2009-11-24 13:13 ` Alan D. Brunelle
  2009-11-24 14:03 ` [PATCH 0/1] " Corrado Zoccolo
  1 sibling, 0 replies; 5+ messages in thread
From: Alan D. Brunelle @ 2009-11-24 13:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jens Axboe

Sort order was being incorrectly calculated using just the class, this
patch includes the priority within the classes when deciding sort order.

Note: IOPRIO_CLASS_NONE classes are converted to IOPRIO_CLASS_BE before
getting to this function, hence the WARN_ON in the added function
cfq_class_prio.

Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
---
 block/cfq-iosched.c |   27 ++++++++++++++++-----------
 1 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index aa1e953..7b9ca4d 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -226,6 +226,12 @@ CFQ_CFQQ_FNS(coop);
 CFQ_CFQQ_FNS(coop_preempt);
 #undef CFQ_CFQQ_FNS
 
+static inline int cfq_class_prio(struct cfq_queue *cfqq)
+{
+	WARN_ON(cfqq->ioprio_class == IOPRIO_CLASS_NONE);
+	return IOPRIO_PRIO_VALUE(cfqq->ioprio_class, cfqq->ioprio);
+}
+
 #define cfq_log_cfqq(cfqd, cfqq, fmt, args...)	\
 	blk_add_trace_msg((cfqd)->queue, "cfq%d " fmt, (cfqq)->pid, ##args)
 #define cfq_log(cfqd, fmt, args...)	\
@@ -536,30 +542,29 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 	p = &cfqd->service_tree.rb.rb_node;
 	while (*p) {
 		struct rb_node **n;
+		int cfqq_prio, __cfqq_prio;
 
 		parent = *p;
 		__cfqq = rb_entry(parent, struct cfq_queue, rb_node);
 
+		cfqq_prio = cfq_class_prio(cfqq);
+		__cfqq_prio = cfq_class_prio(__cfqq);
+
 		/*
 		 * sort RT queues first, we always want to give
 		 * preference to them. IDLE queues goes to the back.
 		 * after that, sort on the next service time.
+		 * (Lower priority values represent higher priorities.)
 		 */
-		if (cfq_class_rt(cfqq) > cfq_class_rt(__cfqq))
+		if (cfqq_prio < __cfqq_prio)
 			n = &(*p)->rb_left;
-		else if (cfq_class_rt(cfqq) < cfq_class_rt(__cfqq))
-			n = &(*p)->rb_right;
-		else if (cfq_class_idle(cfqq) < cfq_class_idle(__cfqq))
+		else if (cfqq_prio == __cfqq_prio &&
+			 time_before(rb_key, __cfqq->rb_key))
 			n = &(*p)->rb_left;
-		else if (cfq_class_idle(cfqq) > cfq_class_idle(__cfqq))
-			n = &(*p)->rb_right;
-		else if (time_before(rb_key, __cfqq->rb_key))
-			n = &(*p)->rb_left;
-		else
+		else {
 			n = &(*p)->rb_right;
-
-		if (n == &(*p)->rb_right)
 			left = 0;
+		}
 
 		p = n;
 	}
-- 
1.6.3.3




^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/1] Correct sorting problem in cfq_service_tree_add
  2009-11-24 13:11 [PATCH 0/1] Correct sorting problem in cfq_service_tree_add Alan D. Brunelle
  2009-11-24 13:13 ` [PATCH 1/1] " Alan D. Brunelle
@ 2009-11-24 14:03 ` Corrado Zoccolo
  2009-11-24 14:49   ` Alan D. Brunelle
  1 sibling, 1 reply; 5+ messages in thread
From: Corrado Zoccolo @ 2009-11-24 14:03 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: linux-kernel, Jens Axboe

On Tue, Nov 24, 2009 at 2:11 PM, Alan D. Brunelle <Alan.Brunelle@hp.com> wrote:
> Found this whilst reviewing the CFQ I/O scheduler code: Currently, this
> routine only sorts using the I/O priority class - it does not properly
> sort prioritized queues within a specific class. The patch changes the
> sort to utilize the full I/O priority (class & priority).

This changes mixes the interpretation of classes and levels within class.
In the original code, those different things have different meanings:
* priority class decides who can use the disk
* priority level within a class determines how much of the disk time
each queue will obtain
In your case. instead, you completely remove the second meaning, and
provide a larger number of levels to just decide the first.

>
> A simple test shows the problem & fixed results: on a 16-way box, for
> each of 12 attached disks I started up 17 processes (one process at each
> possible class/priority). Each process operated on a separate file in
> the file system. I then did two types of tests: (a) direct/synchronous
> and (b) direct/asynchronous w/ an 80/20 read/write split.
>
> I then tabulated the overall I/O performed per task: (first column is
> priority class (1==RT, 2==BE, 3==IDLE), second column is the I/O
> priority (0==highest), then two groupings of read/write data moved
> (total KiBs over a span of 120 seconds):
>
> Synchronous:
>         2.6.32-rc8     2.6.32-rc8+patch
>        Read    Write     Read    Write
>     ----------------   ----------------
> 1 0 |  311164  310760 |  424260  424116 |
> 1 1 |  129712  129792 |  390208  393232 |
> 1 2 |   72312   71284 |     448     420 |
> 1 3 |   40364   41052 |      28      20 |
> 1 4 |   26788   26352 |      28      24 |
> 1 5 |   16936   16940 |      52      32 |
> 1 6 |   11196   11140 |      28      20 |
> 1 7 |    6476    6648 |      20      28 |

The numbers for the patched kernel are bad.
All priority levels > 2 are starved. They can complete an amount of
I/O comparable with lower priority class:
> 2 0 |      24      24 |      40       8 |
> 2 1 |      24      24 |      12      36 |
> 2 2 |      20      28 |      20      28 |
> 2 3 |      28      20 |      24      24 |
> 2 4 |      28      20 |      28      20 |
> 2 5 |      28      20 |      20      28 |
> 2 6 |      24      24 |      20      28 |
> 2 7 |      24      24 |      36      12 |
>
> 3   |      36      12 |      28      20 |
>     ----------------   ----------------
> Sum    615184  614164    815300  818096
>
This is not the intended behaviour, and you don't need 14 priority
levels to get only one use the disk.

Cheers,
Corrado

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/1] Correct sorting problem in cfq_service_tree_add
  2009-11-24 14:03 ` [PATCH 0/1] " Corrado Zoccolo
@ 2009-11-24 14:49   ` Alan D. Brunelle
  2009-11-24 15:07     ` Corrado Zoccolo
  0 siblings, 1 reply; 5+ messages in thread
From: Alan D. Brunelle @ 2009-11-24 14:49 UTC (permalink / raw)
  To: Corrado Zoccolo; +Cc: linux-kernel, Jens Axboe

On Tue, 2009-11-24 at 15:03 +0100, Corrado Zoccolo wrote:
> On Tue, Nov 24, 2009 at 2:11 PM, Alan D. Brunelle <Alan.Brunelle@hp.com> wrote:
> > Found this whilst reviewing the CFQ I/O scheduler code: Currently, this
> > routine only sorts using the I/O priority class - it does not properly
> > sort prioritized queues within a specific class. The patch changes the
> > sort to utilize the full I/O priority (class & priority).
> 
> This changes mixes the interpretation of classes and levels within class.
> In the original code, those different things have different meanings:
> * priority class decides who can use the disk
> * priority level within a class determines how much of the disk time
> each queue will obtain
> In your case. instead, you completely remove the second meaning, and
> provide a larger number of levels to just decide the first.

Having read the ioprio.txt I had thought that the priorities within a
class should still be honored and that the time slice calculations in
cfq_prio_slice would be left as is. "ioprio" is probably the wrong field
name in the code (and text) then, as it is not meant as a priority but
as a time slice indicator?! The text in ioprio.txt and in the man page
for ionice are very inconsistent here: For example, the ionice man page
states: "This [best effort] class takes a priority argument from 0-7,
with lower number being higher priority. Programs running at the same
best effort priority are served in a round-robin fashion." Which implies
a secondary sort-order for priority within a class. Of course, both
ioprio.txt and the ionice man page also talk about class levels in a way
that may indicate it is not priority based. Hm...

Regards,
Alan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/1] Correct sorting problem in cfq_service_tree_add
  2009-11-24 14:49   ` Alan D. Brunelle
@ 2009-11-24 15:07     ` Corrado Zoccolo
  0 siblings, 0 replies; 5+ messages in thread
From: Corrado Zoccolo @ 2009-11-24 15:07 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: linux-kernel, Jens Axboe

Hi Alan,
On Tue, Nov 24, 2009 at 3:49 PM, Alan D. Brunelle <Alan.Brunelle@hp.com> wrote:
>
> Having read the ioprio.txt I had thought that the priorities within a
> class should still be honored and that the time slice calculations in
> cfq_prio_slice would be left as is. "ioprio" is probably the wrong field
> name in the code (and text) then, as it is not meant as a priority but
> as a time slice indicator?! The text in ioprio.txt and in the man page
> for ionice are very inconsistent here: For example, the ionice man page
> states: "This [best effort] class takes a priority argument from 0-7,
> with lower number being higher priority. Programs running at the same
> best effort priority are served in a round-robin fashion." Which implies
> a secondary sort-order for priority within a class. Of course, both
> ioprio.txt and the ionice man page also talk about class levels in a way
> that may indicate it is not priority based. Hm...

ionice man page is usually not in sync with the kernel. It should
really just point at the kernel doc, since it is just an interface to
set up parameters, that will then be interpreted by the kernel (or
list the different interpretations for those at each kernel
incarnation, but this is infeasible).

ioprio.txt in my kernel tree says quite explicitly that RT can starve
BE, and IDLE is starved by anyone.
Moreover, it says:
RT: Within the RT class, there are 8 levels of class data that
determine exactly how much time this process needs the disk for on
each service.
BE: The class data determines how much io bandwidth the process will
get, it's directly mappable to the cpu nice levels just more coarsely
implemented. 0 is the highest BE prio level, 7 is the lowest.

This is still a bit confusing, since it says disk time for RT and bw
for BE, while the implementation is the same for both, time based.

BTW, if you want to review CFQ, you should look at
http://git.kernel.dk/?p=linux-2.6-block.git;a=shortlog;h=refs/heads/for-2.6.33,
where cfq development for next kernel version is taking place.

Thanks
Corrado

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-11-24 15:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-24 13:11 [PATCH 0/1] Correct sorting problem in cfq_service_tree_add Alan D. Brunelle
2009-11-24 13:13 ` [PATCH 1/1] " Alan D. Brunelle
2009-11-24 14:03 ` [PATCH 0/1] " Corrado Zoccolo
2009-11-24 14:49   ` Alan D. Brunelle
2009-11-24 15:07     ` Corrado Zoccolo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.