linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Time sliced cfq with basic io priorities
@ 2004-12-13 12:50 Jens Axboe
  2004-12-13 13:09 ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2004-12-13 12:50 UTC (permalink / raw)
  To: Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 2541 bytes --]

Hi,

I added basic io priority support to the time sliced cfq base. Right now
this is just proof of concept, the interface for setting/querying io
prio will change. There are 8 basic io priorities now, 0 being highest
prio and 7 the lowest. The scheduling type is best effort, in the future
there will be a realtime class as well (and hence the need to change
sys_ioprio_set etc). If a process hasn't set its io priority explicitly,
io priority is determined from the process nice level. CPU nice level of
0 yields io priority 4, cpu nice -20 gives you 0, and finally cpu nice
19 will give you an io priority of 7. Values in-between are
appropriately scaled. If a process sets its io priority explicitly, that
value is used from then on.

A test run with 7 readers are various priorities:

thread1 (read): err=0, prio=0 maxl=634msec, run=30012msec, bw=5884KiB/sec
thread2 (read): err=0, prio=1 maxl=650msec, run=30041msec, bw=5102KiB/sec
thread3 (read): err=0, prio=1 maxl=646msec, run=30057msec, bw=5062KiB/sec
thread4 (read): err=0, prio=3 maxl=687msec, run=30079msec, bw=3551KiB/sec
thread5 (read): err=0, prio=6 maxl=750msec, run=30208msec, bw=1253KiB/sec
thread6 (read): err=0, prio=3 maxl=690msec, run=30100msec, bw=3562KiB/sec
thread7 (read): err=0, prio=4 maxl=758msec, run=30181msec, bw=2631KiB/sec
Run status:
   READ: io=775MiB, aggrb=26927, minl=634, maxl=758, minb=1253, maxb=5884, mint=30012msec, maxt=30208msec

Note that aggregate bandwidth stays the same as without io priorities.
Only io scheduling cares about the io priority currently, request
allocation policy, queue congestion etc doesn't yet.

I have attached a sample ionice.c file, so that you can do:

# ionice -n3 some_process

which will run that process at io priority 3.

Other changes:

- Disable TCQ in the hardware/driver by default. Can be changed (as
  always) with the max_depth setting. If you do that, don't expect
  fairness or priorities to work as well.

- Import thinktime stats from AS. We use this to determine when to
  preempt a queue during its idle window.

- Kill find_best_crq setting. It was on by default before, and it would
  be a bug if it didn't work well.

- Add ability for a given process to preempt another process slice.

- Allow idle window to slide, if there are no other potential queues we
  could service requests from.

- Various little cleanups and optimizations.

2.6.10-rc2-mm4 patch:

http://www.kernel.org/pub/linux/kernel/people/axboe/patches/v2.6/2.6.10-rc2-mm4/cfq-time-slices-10-2.6.10-rc2-mm4.gz

-- 
Jens Axboe


[-- Attachment #2: ionice.c --]
[-- Type: text/plain, Size: 1154 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <getopt.h>
#include <unistd.h>
#include <sys/ptrace.h>
#include <asm/unistd.h>

extern int sys_ioprio_set(int);
extern int sys_ioprio_get(void);

#if defined(__i386__)
#define __NR_ioprio_set		295
#define __NR_ioprio_get		296
#elif defined(__ppc__)
#define __NR_ioprio_set		278
#define __NR_ioprio_get		279
#elif defined(__x86_64__)
#define __NR_ioprio_set		254
#define __NR_ioprio_get		255
#elif defined(__ia64__)
#define __NR_ioprio_set		1274
#define __NR_ioprio_get		1275
#else
#error "Unsupported arch"
#endif

_syscall1(int, ioprio_set, int, ioprio);
_syscall0(int, ioprio_get);

int main(int argc, char *argv[])
{
	int ioprio = 2, set = 0;
	int c;

	while ((c = getopt(argc, argv, "+n:")) != EOF) {
		switch (c) {
		case 'n':
			ioprio = strtol(optarg, NULL, 10);
			set = 1;
			break;
		}
	}

	if (!set) {
		int ioprio = ioprio_get();
		if (ioprio == -1)
			perror("ioprio_get");
		else
			printf("%d\n", ioprio_get());
	} else if (argv[optind]) {
		if (ioprio_set(ioprio) == -1) {
			perror("ioprio_set");
			return 1;
		}
		execvp(argv[optind], &argv[optind]);
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Time sliced cfq with basic io priorities
  2004-12-13 12:50 [PATCH] Time sliced cfq with basic io priorities Jens Axboe
@ 2004-12-13 13:09 ` Jens Axboe
  2004-12-13 17:57   ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2004-12-13 13:09 UTC (permalink / raw)
  To: Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 323 bytes --]

On Mon, Dec 13 2004, Jens Axboe wrote:
> 2.6.10-rc2-mm4 patch:

So 2.6.10-rc3-mm1 is out I notice, here's a patch for that:

http://www.kernel.org/pub/linux/kernel/people/axboe/patches/v2.6/2.6.10-rc3-mm1/cfq-time-slices-10-2.6.10-rc3-mm1.gz

And an updated ionice.c attached, the syscall numbers changed.

-- 
Jens Axboe


[-- Attachment #2: ionice.c --]
[-- Type: text/plain, Size: 1154 bytes --]

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <getopt.h>
#include <unistd.h>
#include <sys/ptrace.h>
#include <asm/unistd.h>

extern int sys_ioprio_set(int);
extern int sys_ioprio_get(void);

#if defined(__i386__)
#define __NR_ioprio_set		294
#define __NR_ioprio_get		295
#elif defined(__ppc__)
#define __NR_ioprio_set		277
#define __NR_ioprio_get		278
#elif defined(__x86_64__)
#define __NR_ioprio_set		254
#define __NR_ioprio_get		255
#elif defined(__ia64__)
#define __NR_ioprio_set		1274
#define __NR_ioprio_get		1275
#else
#error "Unsupported arch"
#endif

_syscall1(int, ioprio_set, int, ioprio);
_syscall0(int, ioprio_get);

int main(int argc, char *argv[])
{
	int ioprio = 2, set = 0;
	int c;

	while ((c = getopt(argc, argv, "+n:")) != EOF) {
		switch (c) {
		case 'n':
			ioprio = strtol(optarg, NULL, 10);
			set = 1;
			break;
		}
	}

	if (!set) {
		int ioprio = ioprio_get();
		if (ioprio == -1)
			perror("ioprio_get");
		else
			printf("%d\n", ioprio_get());
	} else if (argv[optind]) {
		if (ioprio_set(ioprio) == -1) {
			perror("ioprio_set");
			return 1;
		}
		execvp(argv[optind], &argv[optind]);
	}

	return 0;
}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Time sliced cfq with basic io priorities
  2004-12-13 13:09 ` Jens Axboe
@ 2004-12-13 17:57   ` Jens Axboe
  2004-12-14 13:37     ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2004-12-13 17:57 UTC (permalink / raw)
  To: Linux Kernel

On Mon, Dec 13 2004, Jens Axboe wrote:
> On Mon, Dec 13 2004, Jens Axboe wrote:
> > 2.6.10-rc2-mm4 patch:
> 
> So 2.6.10-rc3-mm1 is out I notice, here's a patch for that:
> 
> http://www.kernel.org/pub/linux/kernel/people/axboe/patches/v2.6/2.6.10-rc3-mm1/cfq-time-slices-10-2.6.10-rc3-mm1.gz
> 
> And an updated ionice.c attached, the syscall numbers changed.

Posted -11 for -mm and -BK as well. Changes:

- Preemption fairness fixes

- Enable preemption

For 2.6.10-rc3-mm1:

http://www.kernel.org/pub/linux/kernel/people/axboe/patches/v2.6/2.6.10-rc3-mm1/cfq-time-slices-11-2.6.10-rc3-mm1.gz

For 2.6-BK:

http://www.kernel.org/pub/linux/kernel/people/axboe/patches/v2.6/2.6.10-rc3/cfq-time-slices-11.gz

Note that the syscall numbers are different yet again, I will
consolidate these on next release. For now, find your sys_ioprio_set/get
numbers from include/asm-<your arch/unistd.h and change ionice for your
arch appropriately (if in doubt, just mail me).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Time sliced cfq with basic io priorities
  2004-12-13 17:57   ` Jens Axboe
@ 2004-12-14 13:37     ` Jens Axboe
  2004-12-14 21:31       ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2004-12-14 13:37 UTC (permalink / raw)
  To: Linux Kernel

Hi,

Version -12 has been uploaded. Changes:

- Small optimization to choose next request logic

- An idle queue that exited would waste time for the next process

- Request allocation changes. Should get a smooth stream for writes now,
  not as bursty as before. Also simplified the may_queue/check_waiters
  logic, rely more on the regular block rq allocation congestion and
  don't waste sys time doing multiple wakeups.

- Fix compilation on x86_64

No io priority specific fixes, the above are all to improve the cfq time
slicing.

For 2.6.10-rc3-mm1:

http://www.kernel.org/pub/linux/kernel/people/axboe/patches/v2.6/2.6.10-rc3-mm1/cfq-time-slices-12-2.6.10-rc3-mm1.gz

For 2.6-BK:

http://www.kernel.org/pub/linux/kernel/people/axboe/patches/v2.6/2.6.10-rc3/cfq-time-slices-12.gz

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Time sliced cfq with basic io priorities
  2004-12-14 13:37     ` Jens Axboe
@ 2004-12-14 21:31       ` Paul E. McKenney
  2004-12-15  6:36         ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2004-12-14 21:31 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linux Kernel

On Tue, Dec 14, 2004 at 02:37:25PM +0100, Jens Axboe wrote:
> Hi,
> 
> Version -12 has been uploaded. Changes:
> 
> - Small optimization to choose next request logic
> 
> - An idle queue that exited would waste time for the next process
> 
> - Request allocation changes. Should get a smooth stream for writes now,
>   not as bursty as before. Also simplified the may_queue/check_waiters
>   logic, rely more on the regular block rq allocation congestion and
>   don't waste sys time doing multiple wakeups.
> 
> - Fix compilation on x86_64
> 
> No io priority specific fixes, the above are all to improve the cfq time
> slicing.
> 
> For 2.6.10-rc3-mm1:
> 
> http://www.kernel.org/pub/linux/kernel/people/axboe/patches/v2.6/2.6.10-rc3-mm1/cfq-time-slices-12-2.6.10-rc3-mm1.gz
> 
> For 2.6-BK:
> 
> http://www.kernel.org/pub/linux/kernel/people/axboe/patches/v2.6/2.6.10-rc3/cfq-time-slices-12.gz

OK...  I confess, I am confused...

I see the comment stating that only one thread updates, hence no need
for locking.  But I can't find the readers!  There is a section of
code under rcu_read_lock(), but this same function updates the list
as well.  If there really is only one updater, then the rcu_read_lock()
is not needed, because rcu_read_lock() is only required to protect against
concurrent deletion.

Either way, in cfq_exit_io_context(), the list_for_each_safe_rcu() should
be able to be simply list_for_each_safe(), since this is apparently the
sole updater thread, so no concurrent updates are possible.

If only one task is referencing the list at all, no need for RCU or for
any other synchronization mechanism.  If multiple threads are referencing
the list, I cannot find any pure readers.  If multiple threads are updating
the list, I don't see how they are excluding each other.

Any enlightenment available?  I most definitely need a clue here...

						Thanx, Paul

> -- 
> Jens Axboe
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Time sliced cfq with basic io priorities
  2004-12-14 21:31       ` Paul E. McKenney
@ 2004-12-15  6:36         ` Jens Axboe
  2004-12-15 15:18           ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2004-12-15  6:36 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Linux Kernel

On Tue, Dec 14 2004, Paul E. McKenney wrote:
> On Tue, Dec 14, 2004 at 02:37:25PM +0100, Jens Axboe wrote:
> > Hi,
> > 
> > Version -12 has been uploaded. Changes:
> > 
> > - Small optimization to choose next request logic
> > 
> > - An idle queue that exited would waste time for the next process
> > 
> > - Request allocation changes. Should get a smooth stream for writes now,
> >   not as bursty as before. Also simplified the may_queue/check_waiters
> >   logic, rely more on the regular block rq allocation congestion and
> >   don't waste sys time doing multiple wakeups.
> > 
> > - Fix compilation on x86_64
> > 
> > No io priority specific fixes, the above are all to improve the cfq time
> > slicing.
> > 
> > For 2.6.10-rc3-mm1:
> > 
> > http://www.kernel.org/pub/linux/kernel/people/axboe/patches/v2.6/2.6.10-rc3-mm1/cfq-time-slices-12-2.6.10-rc3-mm1.gz
> > 
> > For 2.6-BK:
> > 
> > http://www.kernel.org/pub/linux/kernel/people/axboe/patches/v2.6/2.6.10-rc3/cfq-time-slices-12.gz
> 
> OK...  I confess, I am confused...
> 
> I see the comment stating that only one thread updates, hence no need
> for locking.  But I can't find the readers!  There is a section of
> code under rcu_read_lock(), but this same function updates the list
> as well.  If there really is only one updater, then the rcu_read_lock()
> is not needed, because rcu_read_lock() is only required to protect against
> concurrent deletion.
> 
> Either way, in cfq_exit_io_context(), the list_for_each_safe_rcu() should
> be able to be simply list_for_each_safe(), since this is apparently the
> sole updater thread, so no concurrent updates are possible.
> 
> If only one task is referencing the list at all, no need for RCU or for
> any other synchronization mechanism.  If multiple threads are referencing
> the list, I cannot find any pure readers.  If multiple threads are updating
> the list, I don't see how they are excluding each other.
> 
> Any enlightenment available?  I most definitely need a clue here...

No, you are about right :-)

The RCU stuff can go again, because I moved everything to happen under
the same task. The section under rcu_read_lock() is the reader, it just
later on moved the hot entry to the front as well which does indeed mean
it's buggy if there were concurrent updaters. So that's why it's in a
state of being a little messy right now.

A note on the list itself - a task has a cfq_io_context per queue it's
doing io against and it needs to be looked up when we this process
queues io. The task sets this up itself on first io and tears this down
on exit. So only the task itself ever updates or searches this list.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Time sliced cfq with basic io priorities
  2004-12-15  6:36         ` Jens Axboe
@ 2004-12-15 15:18           ` Paul E. McKenney
  0 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2004-12-15 15:18 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linux Kernel

On Wed, Dec 15, 2004 at 07:36:28AM +0100, Jens Axboe wrote:
> On Tue, Dec 14 2004, Paul E. McKenney wrote:
> > If only one task is referencing the list at all, no need for RCU or for
> > any other synchronization mechanism.  If multiple threads are referencing
> > the list, I cannot find any pure readers.  If multiple threads are updating
> > the list, I don't see how they are excluding each other.
> > 
> > Any enlightenment available?  I most definitely need a clue here...
> 
> No, you are about right :-)
> 
> The RCU stuff can go again, because I moved everything to happen under
> the same task. The section under rcu_read_lock() is the reader, it just
> later on moved the hot entry to the front as well which does indeed mean
> it's buggy if there were concurrent updaters. So that's why it's in a
> state of being a little messy right now.
> 
> A note on the list itself - a task has a cfq_io_context per queue it's
> doing io against and it needs to be looked up when we this process
> queues io. The task sets this up itself on first io and tears this down
> on exit. So only the task itself ever updates or searches this list.

Whew!!!  I feel much better!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-12-15 15:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-13 12:50 [PATCH] Time sliced cfq with basic io priorities Jens Axboe
2004-12-13 13:09 ` Jens Axboe
2004-12-13 17:57   ` Jens Axboe
2004-12-14 13:37     ` Jens Axboe
2004-12-14 21:31       ` Paul E. McKenney
2004-12-15  6:36         ` Jens Axboe
2004-12-15 15:18           ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).