linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] maximize dispatching in block throttle
@ 2010-11-26 14:46 Hillf Danton
  2010-11-30 14:57 ` Vivek Goyal
  0 siblings, 1 reply; 11+ messages in thread
From: Hillf Danton @ 2010-11-26 14:46 UTC (permalink / raw)
  To: linux-kernel

When dispatching bio, the quantum is divided into read/write budgets,
and dispatching for write could not exceed the write budget even if
the read budget is not exhausted, either dispatching for read.

It is changed to exhaust the quantum, if possible, in this work for
dispatching bio.

Though it is hard to understand that 50/50 division is not selected,
the difference between divisions could impact little on dispatching as
much as quantum allows then.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
---

--- a/block/blk-throttle.c	2010-11-01 19:54:12.000000000 +0800
+++ b/block/blk-throttle.c	2010-11-26 21:49:00.000000000 +0800
@@ -647,11 +647,16 @@ static int throtl_dispatch_tg(struct thr
 	unsigned int max_nr_reads = throtl_grp_quantum*3/4;
 	unsigned int max_nr_writes = throtl_grp_quantum - nr_reads;
 	struct bio *bio;
+	int read_throttled = 0, write_throttled = 0;

 	/* Try to dispatch 75% READS and 25% WRITES */
-
+ try_read:
 	while ((bio = bio_list_peek(&tg->bio_lists[READ]))
-		&& tg_may_dispatch(td, tg, bio, NULL)) {
+		&& ! read_throttled) {
+		if (! tg_may_dispatch(td, tg, bio, NULL)) {
+			read_throttled = 1;
+			break;
+		}

 		tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
 		nr_reads++;
@@ -659,9 +664,15 @@ static int throtl_dispatch_tg(struct thr
 		if (nr_reads >= max_nr_reads)
 			break;
 	}
-
+	if (! bio)
+		read_throttled = 1;
+ try_write:
 	while ((bio = bio_list_peek(&tg->bio_lists[WRITE]))
-		&& tg_may_dispatch(td, tg, bio, NULL)) {
+		&& ! write_throttled) {
+		if (! tg_may_dispatch(td, tg, bio, NULL)) {
+			write_throttled = 1;
+			break;
+		}

 		tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
 		nr_writes++;
@@ -669,7 +680,23 @@ static int throtl_dispatch_tg(struct thr
 		if (nr_writes >= max_nr_writes)
 			break;
 	}
+	if (! bio)
+		write_throttled = 1;
+
+	if (write_throttled && read_throttled)
+		goto out;

+	if (! (throtl_grp_quantum > nr_writes + nr_reads))
+		goto out;
+		
+	if (read_throttled) {
+		max_nr_writes = throtl_grp_quantum - nr_reads;
+		goto try_write;
+	} else {
+		max_nr_reads = throtl_grp_quantum - nr_writes;
+		goto try_read;
+	}
+ out:
 	return nr_reads + nr_writes;
 }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] maximize dispatching in block throttle
  2010-11-26 14:46 [PATCH] maximize dispatching in block throttle Hillf Danton
@ 2010-11-30 14:57 ` Vivek Goyal
  2010-12-01 13:30   ` Hillf Danton
  0 siblings, 1 reply; 11+ messages in thread
From: Vivek Goyal @ 2010-11-30 14:57 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-kernel

On Fri, Nov 26, 2010 at 10:46:01PM +0800, Hillf Danton wrote:
> When dispatching bio, the quantum is divided into read/write budgets,
> and dispatching for write could not exceed the write budget even if
> the read budget is not exhausted, either dispatching for read.
> 
> It is changed to exhaust the quantum, if possible, in this work for
> dispatching bio.
> 
> Though it is hard to understand that 50/50 division is not selected,
> the difference between divisions could impact little on dispatching as
> much as quantum allows then.
> 
> Signed-off-by: Hillf Danton <dhillf@gmail.com>
> ---

Hi Hillf,

Even if there are not enough READS/WRITES to consume the quantum, I don't
think that it changes anyting much. The next dispatch round will be
scheduled almost immediately (If there are bios which are ready to
be dispatched). Look at throtl_schedule_next_dispatch().

Have you noticed some issues/improvements with this patch?

Generally READS are more latency sensitive as compared to WRITE hence
I thought of dispatching more READS per quantum.

Thanks
Vivek

> 
> --- a/block/blk-throttle.c	2010-11-01 19:54:12.000000000 +0800
> +++ b/block/blk-throttle.c	2010-11-26 21:49:00.000000000 +0800
> @@ -647,11 +647,16 @@ static int throtl_dispatch_tg(struct thr
>  	unsigned int max_nr_reads = throtl_grp_quantum*3/4;
>  	unsigned int max_nr_writes = throtl_grp_quantum - nr_reads;
>  	struct bio *bio;
> +	int read_throttled = 0, write_throttled = 0;
> 
>  	/* Try to dispatch 75% READS and 25% WRITES */
> -
> + try_read:
>  	while ((bio = bio_list_peek(&tg->bio_lists[READ]))
> -		&& tg_may_dispatch(td, tg, bio, NULL)) {
> +		&& ! read_throttled) {
> +		if (! tg_may_dispatch(td, tg, bio, NULL)) {
> +			read_throttled = 1;
> +			break;
> +		}
> 
>  		tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
>  		nr_reads++;
> @@ -659,9 +664,15 @@ static int throtl_dispatch_tg(struct thr
>  		if (nr_reads >= max_nr_reads)
>  			break;
>  	}
> -
> +	if (! bio)
> +		read_throttled = 1;
> + try_write:
>  	while ((bio = bio_list_peek(&tg->bio_lists[WRITE]))
> -		&& tg_may_dispatch(td, tg, bio, NULL)) {
> +		&& ! write_throttled) {
> +		if (! tg_may_dispatch(td, tg, bio, NULL)) {
> +			write_throttled = 1;
> +			break;
> +		}
> 
>  		tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
>  		nr_writes++;
> @@ -669,7 +680,23 @@ static int throtl_dispatch_tg(struct thr
>  		if (nr_writes >= max_nr_writes)
>  			break;
>  	}
> +	if (! bio)
> +		write_throttled = 1;
> +
> +	if (write_throttled && read_throttled)
> +		goto out;
> 
> +	if (! (throtl_grp_quantum > nr_writes + nr_reads))
> +		goto out;
> +		
> +	if (read_throttled) {
> +		max_nr_writes = throtl_grp_quantum - nr_reads;
> +		goto try_write;
> +	} else {
> +		max_nr_reads = throtl_grp_quantum - nr_writes;
> +		goto try_read;
> +	}
> + out:
>  	return nr_reads + nr_writes;
>  }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] maximize dispatching in block throttle
  2010-11-30 14:57 ` Vivek Goyal
@ 2010-12-01 13:30   ` Hillf Danton
  2010-12-01 14:41     ` Vivek Goyal
  0 siblings, 1 reply; 11+ messages in thread
From: Hillf Danton @ 2010-12-01 13:30 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-kernel

On Tue, Nov 30, 2010 at 10:57 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Nov 26, 2010 at 10:46:01PM +0800, Hillf Danton wrote:
>> When dispatching bio, the quantum is divided into read/write budgets,
>> and dispatching for write could not exceed the write budget even if
>> the read budget is not exhausted, either dispatching for read.
>>
>> It is changed to exhaust the quantum, if possible, in this work for
>> dispatching bio.
>>
>> Though it is hard to understand that 50/50 division is not selected,
>> the difference between divisions could impact little on dispatching as
>> much as quantum allows then.
>>
>> Signed-off-by: Hillf Danton <dhillf@gmail.com>
>> ---
>
> Hi Hillf,
>
> Even if there are not enough READS/WRITES to consume the quantum, I don't
> think that it changes anyting much. The next dispatch round will be

Why not exhaust quantum this round directly if not throttled?

> scheduled almost immediately (If there are bios which are ready to
> be dispatched). Look at throtl_schedule_next_dispatch().
>
> Have you noticed some issues/improvements with this patch?

Originally I wanted to get 75/25 division parameterized through sysfs or proc,
but I changed mind since exhausting quantum is simpler and much applicable.

As you see, the application environments change from one user to another,
even though are more latency sensitive.

It looks nicer, I think, to provide both the default division and the
methods to
change for users to play their games.

Thanks
Hillf

>
> Generally READS are more latency sensitive as compared to WRITE hence
> I thought of dispatching more READS per quantum.
>
> Thanks
> Vivek
>
>>
>> --- a/block/blk-throttle.c    2010-11-01 19:54:12.000000000 +0800
>> +++ b/block/blk-throttle.c    2010-11-26 21:49:00.000000000 +0800
>> @@ -647,11 +647,16 @@ static int throtl_dispatch_tg(struct thr
>>       unsigned int max_nr_reads = throtl_grp_quantum*3/4;
>>       unsigned int max_nr_writes = throtl_grp_quantum - nr_reads;
>>       struct bio *bio;
>> +     int read_throttled = 0, write_throttled = 0;
>>
>>       /* Try to dispatch 75% READS and 25% WRITES */
>> -
>> + try_read:
>>       while ((bio = bio_list_peek(&tg->bio_lists[READ]))
>> -             && tg_may_dispatch(td, tg, bio, NULL)) {
>> +             && ! read_throttled) {
>> +             if (! tg_may_dispatch(td, tg, bio, NULL)) {
>> +                     read_throttled = 1;
>> +                     break;
>> +             }
>>
>>               tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
>>               nr_reads++;
>> @@ -659,9 +664,15 @@ static int throtl_dispatch_tg(struct thr
>>               if (nr_reads >= max_nr_reads)
>>                       break;
>>       }
>> -
>> +     if (! bio)
>> +             read_throttled = 1;
>> + try_write:
>>       while ((bio = bio_list_peek(&tg->bio_lists[WRITE]))
>> -             && tg_may_dispatch(td, tg, bio, NULL)) {
>> +             && ! write_throttled) {
>> +             if (! tg_may_dispatch(td, tg, bio, NULL)) {
>> +                     write_throttled = 1;
>> +                     break;
>> +             }
>>
>>               tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
>>               nr_writes++;
>> @@ -669,7 +680,23 @@ static int throtl_dispatch_tg(struct thr
>>               if (nr_writes >= max_nr_writes)
>>                       break;
>>       }
>> +     if (! bio)
>> +             write_throttled = 1;
>> +
>> +     if (write_throttled && read_throttled)
>> +             goto out;
>>
>> +     if (! (throtl_grp_quantum > nr_writes + nr_reads))
>> +             goto out;
>> +
>> +     if (read_throttled) {
>> +             max_nr_writes = throtl_grp_quantum - nr_reads;
>> +             goto try_write;
>> +     } else {
>> +             max_nr_reads = throtl_grp_quantum - nr_writes;
>> +             goto try_read;
>> +     }
>> + out:
>>       return nr_reads + nr_writes;
>>  }
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] maximize dispatching in block throttle
  2010-12-01 13:30   ` Hillf Danton
@ 2010-12-01 14:41     ` Vivek Goyal
  2010-12-01 14:56       ` Hillf Danton
  2010-12-03 14:26       ` Hillf Danton
  0 siblings, 2 replies; 11+ messages in thread
From: Vivek Goyal @ 2010-12-01 14:41 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-kernel

On Wed, Dec 01, 2010 at 09:30:00PM +0800, Hillf Danton wrote:
> On Tue, Nov 30, 2010 at 10:57 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Fri, Nov 26, 2010 at 10:46:01PM +0800, Hillf Danton wrote:
> >> When dispatching bio, the quantum is divided into read/write budgets,
> >> and dispatching for write could not exceed the write budget even if
> >> the read budget is not exhausted, either dispatching for read.
> >>
> >> It is changed to exhaust the quantum, if possible, in this work for
> >> dispatching bio.
> >>
> >> Though it is hard to understand that 50/50 division is not selected,
> >> the difference between divisions could impact little on dispatching as
> >> much as quantum allows then.
> >>
> >> Signed-off-by: Hillf Danton <dhillf@gmail.com>
> >> ---
> >
> > Hi Hillf,
> >
> > Even if there are not enough READS/WRITES to consume the quantum, I don't
> > think that it changes anyting much. The next dispatch round will be
> 
> Why not exhaust quantum this round directly if not throttled?
> 

Yes we can do that. I am wondering what do we gain by putting extra code
in?

> > scheduled almost immediately (If there are bios which are ready to
> > be dispatched). Look at throtl_schedule_next_dispatch().
> >
> > Have you noticed some issues/improvements with this patch?
> 
> Originally I wanted to get 75/25 division parameterized through sysfs or proc,
> but I changed mind since exhausting quantum is simpler and much applicable.

But these are two different things isn't it?

We can introduce some sysfs tunables (if need be) so that a user can decide
the division of READS/WRITES.

> 
> As you see, the application environments change from one user to another,
> even though are more latency sensitive.
> 
> It looks nicer, I think, to provide both the default division and the
> methods to
> change for users to play their games.

Yep, tunables can help here and introducing tunables is easy. At the same
time I prefer to intoroduce tunables only on need basis otherwise it
runs the risk of too many tunables which are not being used.

So, conceptually this patch looks fine to me. Jeff Moyer also raised the
same question of why not fill up the quantum if READ/WRITE are not using
their quota. So I think it does not hurt to take this patch in.

Acked-by: Vivek Goyal <vgoyal@redhat.com> 

Thanks
Vivek

> 
> Thanks
> Hillf
> 
> >
> > Generally READS are more latency sensitive as compared to WRITE hence
> > I thought of dispatching more READS per quantum.
> >
> > Thanks
> > Vivek
> >
> >>
> >> --- a/block/blk-throttle.c    2010-11-01 19:54:12.000000000 +0800
> >> +++ b/block/blk-throttle.c    2010-11-26 21:49:00.000000000 +0800
> >> @@ -647,11 +647,16 @@ static int throtl_dispatch_tg(struct thr
> >>       unsigned int max_nr_reads = throtl_grp_quantum*3/4;
> >>       unsigned int max_nr_writes = throtl_grp_quantum - nr_reads;
> >>       struct bio *bio;
> >> +     int read_throttled = 0, write_throttled = 0;
> >>
> >>       /* Try to dispatch 75% READS and 25% WRITES */
> >> -
> >> + try_read:
> >>       while ((bio = bio_list_peek(&tg->bio_lists[READ]))
> >> -             && tg_may_dispatch(td, tg, bio, NULL)) {
> >> +             && ! read_throttled) {
> >> +             if (! tg_may_dispatch(td, tg, bio, NULL)) {
> >> +                     read_throttled = 1;
> >> +                     break;
> >> +             }
> >>
> >>               tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
> >>               nr_reads++;
> >> @@ -659,9 +664,15 @@ static int throtl_dispatch_tg(struct thr
> >>               if (nr_reads >= max_nr_reads)
> >>                       break;
> >>       }
> >> -
> >> +     if (! bio)
> >> +             read_throttled = 1;
> >> + try_write:
> >>       while ((bio = bio_list_peek(&tg->bio_lists[WRITE]))
> >> -             && tg_may_dispatch(td, tg, bio, NULL)) {
> >> +             && ! write_throttled) {
> >> +             if (! tg_may_dispatch(td, tg, bio, NULL)) {
> >> +                     write_throttled = 1;
> >> +                     break;
> >> +             }
> >>
> >>               tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
> >>               nr_writes++;
> >> @@ -669,7 +680,23 @@ static int throtl_dispatch_tg(struct thr
> >>               if (nr_writes >= max_nr_writes)
> >>                       break;
> >>       }
> >> +     if (! bio)
> >> +             write_throttled = 1;
> >> +
> >> +     if (write_throttled && read_throttled)
> >> +             goto out;
> >>
> >> +     if (! (throtl_grp_quantum > nr_writes + nr_reads))
> >> +             goto out;
> >> +
> >> +     if (read_throttled) {
> >> +             max_nr_writes = throtl_grp_quantum - nr_reads;
> >> +             goto try_write;
> >> +     } else {
> >> +             max_nr_reads = throtl_grp_quantum - nr_writes;
> >> +             goto try_read;
> >> +     }
> >> + out:
> >>       return nr_reads + nr_writes;
> >>  }
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at  http://www.tux.org/lkml/
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] maximize dispatching in block throttle
  2010-12-01 14:41     ` Vivek Goyal
@ 2010-12-01 14:56       ` Hillf Danton
  2010-12-03 14:26       ` Hillf Danton
  1 sibling, 0 replies; 11+ messages in thread
From: Hillf Danton @ 2010-12-01 14:56 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-kernel

On Wed, Dec 1, 2010 at 10:41 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Dec 01, 2010 at 09:30:00PM +0800, Hillf Danton wrote:
>> On Tue, Nov 30, 2010 at 10:57 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Fri, Nov 26, 2010 at 10:46:01PM +0800, Hillf Danton wrote:
>> >> When dispatching bio, the quantum is divided into read/write budgets,
>> >> and dispatching for write could not exceed the write budget even if
>> >> the read budget is not exhausted, either dispatching for read.
>> >>
>> >> It is changed to exhaust the quantum, if possible, in this work for
>> >> dispatching bio.
>> >>
>> >> Though it is hard to understand that 50/50 division is not selected,
>> >> the difference between divisions could impact little on dispatching as
>> >> much as quantum allows then.
>> >>
>> >> Signed-off-by: Hillf Danton <dhillf@gmail.com>
>> >> ---
>> >
>> > Hi Hillf,
>> >
>> > Even if there are not enough READS/WRITES to consume the quantum, I don't
>> > think that it changes anyting much. The next dispatch round will be
>>
>> Why not exhaust quantum this round directly if not throttled?
>>
>
> Yes we can do that. I am wondering what do we gain by putting extra code
> in?
>
>> > scheduled almost immediately (If there are bios which are ready to
>> > be dispatched). Look at throtl_schedule_next_dispatch().
>> >
>> > Have you noticed some issues/improvements with this patch?
>>
>> Originally I wanted to get 75/25 division parameterized through sysfs or proc,
>> but I changed mind since exhausting quantum is simpler and much applicable.
>
> But these are two different things isn't it?

Yes with no doubt.

>
> We can introduce some sysfs tunables (if need be) so that a user can decide
> the division of READS/WRITES.
>
>>
>> As you see, the application environments change from one user to another,
>> even though are more latency sensitive.
>>
>> It looks nicer, I think, to provide both the default division and the
>> methods to
>> change for users to play their games.
>
> Yep, tunables can help here and introducing tunables is easy. At the same

Would you please, Vivek, add the tunable, since I could not work it
out in simple way?
Just show the patch, I really need it.

Thanks
Hillf

> time I prefer to intoroduce tunables only on need basis otherwise it
> runs the risk of too many tunables which are not being used.
>
> So, conceptually this patch looks fine to me. Jeff Moyer also raised the
> same question of why not fill up the quantum if READ/WRITE are not using
> their quota. So I think it does not hurt to take this patch in.
>
> Acked-by: Vivek Goyal <vgoyal@redhat.com>
>
> Thanks
> Vivek
>
>>
>> Thanks
>> Hillf
>>
>> >
>> > Generally READS are more latency sensitive as compared to WRITE hence
>> > I thought of dispatching more READS per quantum.
>> >
>> > Thanks
>> > Vivek
>> >
>> >>
>> >> --- a/block/blk-throttle.c    2010-11-01 19:54:12.000000000 +0800
>> >> +++ b/block/blk-throttle.c    2010-11-26 21:49:00.000000000 +0800
>> >> @@ -647,11 +647,16 @@ static int throtl_dispatch_tg(struct thr
>> >>       unsigned int max_nr_reads = throtl_grp_quantum*3/4;
>> >>       unsigned int max_nr_writes = throtl_grp_quantum - nr_reads;
>> >>       struct bio *bio;
>> >> +     int read_throttled = 0, write_throttled = 0;
>> >>
>> >>       /* Try to dispatch 75% READS and 25% WRITES */
>> >> -
>> >> + try_read:
>> >>       while ((bio = bio_list_peek(&tg->bio_lists[READ]))
>> >> -             && tg_may_dispatch(td, tg, bio, NULL)) {
>> >> +             && ! read_throttled) {
>> >> +             if (! tg_may_dispatch(td, tg, bio, NULL)) {
>> >> +                     read_throttled = 1;
>> >> +                     break;
>> >> +             }
>> >>
>> >>               tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
>> >>               nr_reads++;
>> >> @@ -659,9 +664,15 @@ static int throtl_dispatch_tg(struct thr
>> >>               if (nr_reads >= max_nr_reads)
>> >>                       break;
>> >>       }
>> >> -
>> >> +     if (! bio)
>> >> +             read_throttled = 1;
>> >> + try_write:
>> >>       while ((bio = bio_list_peek(&tg->bio_lists[WRITE]))
>> >> -             && tg_may_dispatch(td, tg, bio, NULL)) {
>> >> +             && ! write_throttled) {
>> >> +             if (! tg_may_dispatch(td, tg, bio, NULL)) {
>> >> +                     write_throttled = 1;
>> >> +                     break;
>> >> +             }
>> >>
>> >>               tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
>> >>               nr_writes++;
>> >> @@ -669,7 +680,23 @@ static int throtl_dispatch_tg(struct thr
>> >>               if (nr_writes >= max_nr_writes)
>> >>                       break;
>> >>       }
>> >> +     if (! bio)
>> >> +             write_throttled = 1;
>> >> +
>> >> +     if (write_throttled && read_throttled)
>> >> +             goto out;
>> >>
>> >> +     if (! (throtl_grp_quantum > nr_writes + nr_reads))
>> >> +             goto out;
>> >> +
>> >> +     if (read_throttled) {
>> >> +             max_nr_writes = throtl_grp_quantum - nr_reads;
>> >> +             goto try_write;
>> >> +     } else {
>> >> +             max_nr_reads = throtl_grp_quantum - nr_writes;
>> >> +             goto try_read;
>> >> +     }
>> >> + out:
>> >>       return nr_reads + nr_writes;
>> >>  }
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> Please read the FAQ at  http://www.tux.org/lkml/
>> >
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] maximize dispatching in block throttle
  2010-12-01 14:41     ` Vivek Goyal
  2010-12-01 14:56       ` Hillf Danton
@ 2010-12-03 14:26       ` Hillf Danton
  2010-12-03 14:32         ` Vivek Goyal
  1 sibling, 1 reply; 11+ messages in thread
From: Hillf Danton @ 2010-12-03 14:26 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-kernel

On Wed, Dec 1, 2010 at 10:41 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Dec 01, 2010 at 09:30:00PM +0800, Hillf Danton wrote:
>> On Tue, Nov 30, 2010 at 10:57 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Fri, Nov 26, 2010 at 10:46:01PM +0800, Hillf Danton wrote:
>> >> When dispatching bio, the quantum is divided into read/write budgets,
>> >> and dispatching for write could not exceed the write budget even if
>> >> the read budget is not exhausted, either dispatching for read.
>> >>
>> >> It is changed to exhaust the quantum, if possible, in this work for
>> >> dispatching bio.
>> >>
>> >> Though it is hard to understand that 50/50 division is not selected,
>> >> the difference between divisions could impact little on dispatching as
>> >> much as quantum allows then.
>> >>
>> >> Signed-off-by: Hillf Danton <dhillf@gmail.com>
>> >> ---
>> >
>> > Hi Hillf,
>> >
>> > Even if there are not enough READS/WRITES to consume the quantum, I don't
>> > think that it changes anyting much. The next dispatch round will be
>>
>> Why not exhaust quantum this round directly if not throttled?
>>
>
> Yes we can do that. I am wondering what do we gain by putting extra code
> in?
>
>> > scheduled almost immediately (If there are bios which are ready to
>> > be dispatched). Look at throtl_schedule_next_dispatch().
>> >
>> > Have you noticed some issues/improvements with this patch?
>>
>> Originally I wanted to get 75/25 division parameterized through sysfs or proc,
>> but I changed mind since exhausting quantum is simpler and much applicable.
>
> But these are two different things isn't it?
>
> We can introduce some sysfs tunables (if need be) so that a user can decide
> the division of READS/WRITES.
>
>>
>> As you see, the application environments change from one user to another,
>> even though are more latency sensitive.
>>
>> It looks nicer, I think, to provide both the default division and the
>> methods to
>> change for users to play their games.
>
> Yep, tunables can help here and introducing tunables is easy. At the same

If it is too hard, please change to what is more valuable.
Hillf

> time I prefer to intoroduce tunables only on need basis otherwise it
> runs the risk of too many tunables which are not being used.
>
> So, conceptually this patch looks fine to me. Jeff Moyer also raised the
> same question of why not fill up the quantum if READ/WRITE are not using
> their quota. So I think it does not hurt to take this patch in.
>
> Acked-by: Vivek Goyal <vgoyal@redhat.com>
>
> Thanks
> Vivek
>
>>
>> Thanks
>> Hillf
>>
>> >
>> > Generally READS are more latency sensitive as compared to WRITE hence
>> > I thought of dispatching more READS per quantum.
>> >
>> > Thanks
>> > Vivek
>> >
>> >>
>> >> --- a/block/blk-throttle.c    2010-11-01 19:54:12.000000000 +0800
>> >> +++ b/block/blk-throttle.c    2010-11-26 21:49:00.000000000 +0800
>> >> @@ -647,11 +647,16 @@ static int throtl_dispatch_tg(struct thr
>> >>       unsigned int max_nr_reads = throtl_grp_quantum*3/4;
>> >>       unsigned int max_nr_writes = throtl_grp_quantum - nr_reads;
>> >>       struct bio *bio;
>> >> +     int read_throttled = 0, write_throttled = 0;
>> >>
>> >>       /* Try to dispatch 75% READS and 25% WRITES */
>> >> -
>> >> + try_read:
>> >>       while ((bio = bio_list_peek(&tg->bio_lists[READ]))
>> >> -             && tg_may_dispatch(td, tg, bio, NULL)) {
>> >> +             && ! read_throttled) {
>> >> +             if (! tg_may_dispatch(td, tg, bio, NULL)) {
>> >> +                     read_throttled = 1;
>> >> +                     break;
>> >> +             }
>> >>
>> >>               tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
>> >>               nr_reads++;
>> >> @@ -659,9 +664,15 @@ static int throtl_dispatch_tg(struct thr
>> >>               if (nr_reads >= max_nr_reads)
>> >>                       break;
>> >>       }
>> >> -
>> >> +     if (! bio)
>> >> +             read_throttled = 1;
>> >> + try_write:
>> >>       while ((bio = bio_list_peek(&tg->bio_lists[WRITE]))
>> >> -             && tg_may_dispatch(td, tg, bio, NULL)) {
>> >> +             && ! write_throttled) {
>> >> +             if (! tg_may_dispatch(td, tg, bio, NULL)) {
>> >> +                     write_throttled = 1;
>> >> +                     break;
>> >> +             }
>> >>
>> >>               tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
>> >>               nr_writes++;
>> >> @@ -669,7 +680,23 @@ static int throtl_dispatch_tg(struct thr
>> >>               if (nr_writes >= max_nr_writes)
>> >>                       break;
>> >>       }
>> >> +     if (! bio)
>> >> +             write_throttled = 1;
>> >> +
>> >> +     if (write_throttled && read_throttled)
>> >> +             goto out;
>> >>
>> >> +     if (! (throtl_grp_quantum > nr_writes + nr_reads))
>> >> +             goto out;
>> >> +
>> >> +     if (read_throttled) {
>> >> +             max_nr_writes = throtl_grp_quantum - nr_reads;
>> >> +             goto try_write;
>> >> +     } else {
>> >> +             max_nr_reads = throtl_grp_quantum - nr_writes;
>> >> +             goto try_read;
>> >> +     }
>> >> + out:
>> >>       return nr_reads + nr_writes;
>> >>  }
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> Please read the FAQ at  http://www.tux.org/lkml/
>> >
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] maximize dispatching in block throttle
  2010-12-03 14:26       ` Hillf Danton
@ 2010-12-03 14:32         ` Vivek Goyal
  2010-12-03 14:39           ` Hillf Danton
  2010-12-04 13:36           ` Hillf Danton
  0 siblings, 2 replies; 11+ messages in thread
From: Vivek Goyal @ 2010-12-03 14:32 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-kernel

On Fri, Dec 03, 2010 at 10:26:50PM +0800, Hillf Danton wrote:
> On Wed, Dec 1, 2010 at 10:41 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Wed, Dec 01, 2010 at 09:30:00PM +0800, Hillf Danton wrote:
> >> On Tue, Nov 30, 2010 at 10:57 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > On Fri, Nov 26, 2010 at 10:46:01PM +0800, Hillf Danton wrote:
> >> >> When dispatching bio, the quantum is divided into read/write budgets,
> >> >> and dispatching for write could not exceed the write budget even if
> >> >> the read budget is not exhausted, either dispatching for read.
> >> >>
> >> >> It is changed to exhaust the quantum, if possible, in this work for
> >> >> dispatching bio.
> >> >>
> >> >> Though it is hard to understand that 50/50 division is not selected,
> >> >> the difference between divisions could impact little on dispatching as
> >> >> much as quantum allows then.
> >> >>
> >> >> Signed-off-by: Hillf Danton <dhillf@gmail.com>
> >> >> ---
> >> >
> >> > Hi Hillf,
> >> >
> >> > Even if there are not enough READS/WRITES to consume the quantum, I don't
> >> > think that it changes anyting much. The next dispatch round will be
> >>
> >> Why not exhaust quantum this round directly if not throttled?
> >>
> >
> > Yes we can do that. I am wondering what do we gain by putting extra code
> > in?
> >
> >> > scheduled almost immediately (If there are bios which are ready to
> >> > be dispatched). Look at throtl_schedule_next_dispatch().
> >> >
> >> > Have you noticed some issues/improvements with this patch?
> >>
> >> Originally I wanted to get 75/25 division parameterized through sysfs or proc,
> >> but I changed mind since exhausting quantum is simpler and much applicable.
> >
> > But these are two different things isn't it?
> >
> > We can introduce some sysfs tunables (if need be) so that a user can decide
> > the division of READS/WRITES.
> >
> >>
> >> As you see, the application environments change from one user to another,
> >> even though are more latency sensitive.
> >>
> >> It looks nicer, I think, to provide both the default division and the
> >> methods to
> >> change for users to play their games.
> >
> > Yep, tunables can help here and introducing tunables is easy. At the same
> 
> If it is too hard, please change to what is more valuable.
> Hillf

It should not be too hard. IO schedulers already create
/sys/block/<dev>/queue/iosched/ dir and we can create <dev>/queue/throttle/
dir and export throttle related tunables there.

What I am not clear about is that what problem are you running into.
Tunbales and your patch to fill the quantum are two different things. So
how can both the things solve your problem. One of these should.

So if you can explain the issue you are facing with some more details, it
will help. I am not sure how your patch of filling the quantum with
requests of other direction is helping you?

Vivek

> 
> > time I prefer to intoroduce tunables only on need basis otherwise it
> > runs the risk of too many tunables which are not being used.
> >
> > So, conceptually this patch looks fine to me. Jeff Moyer also raised the
> > same question of why not fill up the quantum if READ/WRITE are not using
> > their quota. So I think it does not hurt to take this patch in.
> >
> > Acked-by: Vivek Goyal <vgoyal@redhat.com>
> >
> > Thanks
> > Vivek
> >
> >>
> >> Thanks
> >> Hillf
> >>
> >> >
> >> > Generally READS are more latency sensitive as compared to WRITE hence
> >> > I thought of dispatching more READS per quantum.
> >> >
> >> > Thanks
> >> > Vivek
> >> >
> >> >>
> >> >> --- a/block/blk-throttle.c    2010-11-01 19:54:12.000000000 +0800
> >> >> +++ b/block/blk-throttle.c    2010-11-26 21:49:00.000000000 +0800
> >> >> @@ -647,11 +647,16 @@ static int throtl_dispatch_tg(struct thr
> >> >>       unsigned int max_nr_reads = throtl_grp_quantum*3/4;
> >> >>       unsigned int max_nr_writes = throtl_grp_quantum - nr_reads;
> >> >>       struct bio *bio;
> >> >> +     int read_throttled = 0, write_throttled = 0;
> >> >>
> >> >>       /* Try to dispatch 75% READS and 25% WRITES */
> >> >> -
> >> >> + try_read:
> >> >>       while ((bio = bio_list_peek(&tg->bio_lists[READ]))
> >> >> -             && tg_may_dispatch(td, tg, bio, NULL)) {
> >> >> +             && ! read_throttled) {
> >> >> +             if (! tg_may_dispatch(td, tg, bio, NULL)) {
> >> >> +                     read_throttled = 1;
> >> >> +                     break;
> >> >> +             }
> >> >>
> >> >>               tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
> >> >>               nr_reads++;
> >> >> @@ -659,9 +664,15 @@ static int throtl_dispatch_tg(struct thr
> >> >>               if (nr_reads >= max_nr_reads)
> >> >>                       break;
> >> >>       }
> >> >> -
> >> >> +     if (! bio)
> >> >> +             read_throttled = 1;
> >> >> + try_write:
> >> >>       while ((bio = bio_list_peek(&tg->bio_lists[WRITE]))
> >> >> -             && tg_may_dispatch(td, tg, bio, NULL)) {
> >> >> +             && ! write_throttled) {
> >> >> +             if (! tg_may_dispatch(td, tg, bio, NULL)) {
> >> >> +                     write_throttled = 1;
> >> >> +                     break;
> >> >> +             }
> >> >>
> >> >>               tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
> >> >>               nr_writes++;
> >> >> @@ -669,7 +680,23 @@ static int throtl_dispatch_tg(struct thr
> >> >>               if (nr_writes >= max_nr_writes)
> >> >>                       break;
> >> >>       }
> >> >> +     if (! bio)
> >> >> +             write_throttled = 1;
> >> >> +
> >> >> +     if (write_throttled && read_throttled)
> >> >> +             goto out;
> >> >>
> >> >> +     if (! (throtl_grp_quantum > nr_writes + nr_reads))
> >> >> +             goto out;
> >> >> +
> >> >> +     if (read_throttled) {
> >> >> +             max_nr_writes = throtl_grp_quantum - nr_reads;
> >> >> +             goto try_write;
> >> >> +     } else {
> >> >> +             max_nr_reads = throtl_grp_quantum - nr_writes;
> >> >> +             goto try_read;
> >> >> +     }
> >> >> + out:
> >> >>       return nr_reads + nr_writes;
> >> >>  }
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> Please read the FAQ at  http://www.tux.org/lkml/
> >> >
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] maximize dispatching in block throttle
  2010-12-03 14:32         ` Vivek Goyal
@ 2010-12-03 14:39           ` Hillf Danton
  2010-12-04 13:36           ` Hillf Danton
  1 sibling, 0 replies; 11+ messages in thread
From: Hillf Danton @ 2010-12-03 14:39 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-kernel

On Fri, Dec 3, 2010 at 10:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Fri, Dec 03, 2010 at 10:26:50PM +0800, Hillf Danton wrote:
>> On Wed, Dec 1, 2010 at 10:41 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Wed, Dec 01, 2010 at 09:30:00PM +0800, Hillf Danton wrote:
>> >> On Tue, Nov 30, 2010 at 10:57 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> >> > On Fri, Nov 26, 2010 at 10:46:01PM +0800, Hillf Danton wrote:
>> >> >> When dispatching bio, the quantum is divided into read/write budgets,
>> >> >> and dispatching for write could not exceed the write budget even if
>> >> >> the read budget is not exhausted, either dispatching for read.
>> >> >>
>> >> >> It is changed to exhaust the quantum, if possible, in this work for
>> >> >> dispatching bio.
>> >> >>
>> >> >> Though it is hard to understand that 50/50 division is not selected,
>> >> >> the difference between divisions could impact little on dispatching as
>> >> >> much as quantum allows then.
>> >> >>
>> >> >> Signed-off-by: Hillf Danton <dhillf@gmail.com>
>> >> >> ---
>> >> >
>> >> > Hi Hillf,
>> >> >
>> >> > Even if there are not enough READS/WRITES to consume the quantum, I don't
>> >> > think that it changes anyting much. The next dispatch round will be
>> >>
>> >> Why not exhaust quantum this round directly if not throttled?
>> >>
>> >
>> > Yes we can do that. I am wondering what do we gain by putting extra code
>> > in?
>> >
>> >> > scheduled almost immediately (If there are bios which are ready to
>> >> > be dispatched). Look at throtl_schedule_next_dispatch().
>> >> >
>> >> > Have you noticed some issues/improvements with this patch?
>> >>
>> >> Originally I wanted to get 75/25 division parameterized through sysfs or proc,
>> >> but I changed mind since exhausting quantum is simpler and much applicable.
>> >
>> > But these are two different things isn't it?
>> >
>> > We can introduce some sysfs tunables (if need be) so that a user can decide
>> > the division of READS/WRITES.
>> >
>> >>
>> >> As you see, the application environments change from one user to another,
>> >> even though are more latency sensitive.
>> >>
>> >> It looks nicer, I think, to provide both the default division and the
>> >> methods to
>> >> change for users to play their games.
>> >
>> > Yep, tunables can help here and introducing tunables is easy. At the same
>>
>> If it is too hard, please change to what is more valuable.
>> Hillf
>
> It should not be too hard. IO schedulers already create
> /sys/block/<dev>/queue/iosched/ dir and we can create <dev>/queue/throttle/
> dir and export throttle related tunables there.

I will try along this direction, thanks.

>
> What I am not clear about is that what problem are you running into.
> Tunbales and your patch to fill the quantum are two different things. So
> how can both the things solve your problem. One of these should.

I could not work it out over one week. That is all.

Thanks again for sharing so much on the tunable.

Good weekend

Hillf

>
> So if you can explain the issue you are facing with some more details, it
> will help. I am not sure how your patch of filling the quantum with
> requests of other direction is helping you?
>
> Vivek
>
>>
>> > time I prefer to intoroduce tunables only on need basis otherwise it
>> > runs the risk of too many tunables which are not being used.
>> >
>> > So, conceptually this patch looks fine to me. Jeff Moyer also raised the
>> > same question of why not fill up the quantum if READ/WRITE are not using
>> > their quota. So I think it does not hurt to take this patch in.
>> >
>> > Acked-by: Vivek Goyal <vgoyal@redhat.com>
>> >
>> > Thanks
>> > Vivek
>> >
>> >>
>> >> Thanks
>> >> Hillf
>> >>
>> >> >
>> >> > Generally READS are more latency sensitive as compared to WRITE hence
>> >> > I thought of dispatching more READS per quantum.
>> >> >
>> >> > Thanks
>> >> > Vivek
>> >> >
>> >> >>
>> >> >> --- a/block/blk-throttle.c    2010-11-01 19:54:12.000000000 +0800
>> >> >> +++ b/block/blk-throttle.c    2010-11-26 21:49:00.000000000 +0800
>> >> >> @@ -647,11 +647,16 @@ static int throtl_dispatch_tg(struct thr
>> >> >>       unsigned int max_nr_reads = throtl_grp_quantum*3/4;
>> >> >>       unsigned int max_nr_writes = throtl_grp_quantum - nr_reads;
>> >> >>       struct bio *bio;
>> >> >> +     int read_throttled = 0, write_throttled = 0;
>> >> >>
>> >> >>       /* Try to dispatch 75% READS and 25% WRITES */
>> >> >> -
>> >> >> + try_read:
>> >> >>       while ((bio = bio_list_peek(&tg->bio_lists[READ]))
>> >> >> -             && tg_may_dispatch(td, tg, bio, NULL)) {
>> >> >> +             && ! read_throttled) {
>> >> >> +             if (! tg_may_dispatch(td, tg, bio, NULL)) {
>> >> >> +                     read_throttled = 1;
>> >> >> +                     break;
>> >> >> +             }
>> >> >>
>> >> >>               tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
>> >> >>               nr_reads++;
>> >> >> @@ -659,9 +664,15 @@ static int throtl_dispatch_tg(struct thr
>> >> >>               if (nr_reads >= max_nr_reads)
>> >> >>                       break;
>> >> >>       }
>> >> >> -
>> >> >> +     if (! bio)
>> >> >> +             read_throttled = 1;
>> >> >> + try_write:
>> >> >>       while ((bio = bio_list_peek(&tg->bio_lists[WRITE]))
>> >> >> -             && tg_may_dispatch(td, tg, bio, NULL)) {
>> >> >> +             && ! write_throttled) {
>> >> >> +             if (! tg_may_dispatch(td, tg, bio, NULL)) {
>> >> >> +                     write_throttled = 1;
>> >> >> +                     break;
>> >> >> +             }
>> >> >>
>> >> >>               tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
>> >> >>               nr_writes++;
>> >> >> @@ -669,7 +680,23 @@ static int throtl_dispatch_tg(struct thr
>> >> >>               if (nr_writes >= max_nr_writes)
>> >> >>                       break;
>> >> >>       }
>> >> >> +     if (! bio)
>> >> >> +             write_throttled = 1;
>> >> >> +
>> >> >> +     if (write_throttled && read_throttled)
>> >> >> +             goto out;
>> >> >>
>> >> >> +     if (! (throtl_grp_quantum > nr_writes + nr_reads))
>> >> >> +             goto out;
>> >> >> +
>> >> >> +     if (read_throttled) {
>> >> >> +             max_nr_writes = throtl_grp_quantum - nr_reads;
>> >> >> +             goto try_write;
>> >> >> +     } else {
>> >> >> +             max_nr_reads = throtl_grp_quantum - nr_writes;
>> >> >> +             goto try_read;
>> >> >> +     }
>> >> >> + out:
>> >> >>       return nr_reads + nr_writes;
>> >> >>  }
>> >> >> --
>> >> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >> Please read the FAQ at  http://www.tux.org/lkml/
>> >> >
>> >
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] maximize dispatching in block throttle
  2010-12-03 14:32         ` Vivek Goyal
  2010-12-03 14:39           ` Hillf Danton
@ 2010-12-04 13:36           ` Hillf Danton
  2010-12-06 14:54             ` Vivek Goyal
  1 sibling, 1 reply; 11+ messages in thread
From: Hillf Danton @ 2010-12-04 13:36 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-kernel

On Fri, Dec 3, 2010 at 10:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> It should not be too hard. IO schedulers already create
> /sys/block/<dev>/queue/iosched/ dir and we can create <dev>/queue/throttle/
> dir and export throttle related tunables there.

Evening, Vivek.

I worked the framework for the tunable out.

If in right direction, I will complete it soon.

Thanks
Hillf



--- a/block/blk-throttle.c	2010-11-01 19:54:12.000000000 +0800
+++ b/block/blk-throttle.c	2010-12-04 21:24:58.000000000 +0800
@@ -98,6 +98,27 @@ struct throtl_data
 	struct delayed_work throtl_work;

 	atomic_t limits_changed;
+
+	/*
+	 * following is sysfs stuff about queue throttle
+	 */
+	struct kobject	kobj;
+
+	struct mutex	sysfs_lock;
+
+	/* Max dispatch from a group in 1 round */
+	int grp_quantum;
+
+	/* Total max dispatch from all groups in one round */
+	int q_quantum;
+
+	/* Throttling is performed over 100ms slice and after that
+	 * slice is renewed
+	 */
+	unsigned long time_slice;
+
+	/* read percentage of dispatch from a group in 1 round */
+	int read_percentage;
 };

 enum tg_state_flags {
@@ -644,11 +665,19 @@ static int throtl_dispatch_tg(struct thr
 				struct bio_list *bl)
 {
 	unsigned int nr_reads = 0, nr_writes = 0;
-	unsigned int max_nr_reads = throtl_grp_quantum*3/4;
-	unsigned int max_nr_writes = throtl_grp_quantum - nr_reads;
+	unsigned int max_nr_reads, max_nr_writes;
 	struct bio *bio;

-	/* Try to dispatch 75% READS and 25% WRITES */
+	max_nr_reads = td->read_percentage * td->grp_quantum /100;
+	if (! max_nr_reads)
+		max_nr_reads = 1;
+	/*
+	 * both are not computed stricktly here to throttle I/O
+	 */
+	if (max_nr_reads < td->grp_quantum)
+		max_nr_writes = td->grp_quantum - max_nr_reads;
+	else
+		max_nr_writes = 1;

 	while ((bio = bio_list_peek(&tg->bio_lists[READ]))
 		&& tg_may_dispatch(td, tg, bio, NULL)) {
@@ -1025,10 +1054,169 @@ out:
 	return 0;
 }

+/*
+ * sysfs stuff
+ */
+
+struct throttle_sysfs_entry {
+	struct attribute attr;
+	ssize_t (*show)(struct throtl_data *, char *);
+	ssize_t (*store)(struct throtl_data *, const char *, size_t);
+};
+
+static ssize_t
+throttle_show_time_slice(struct throtl_data *td, char *page)
+{
+	unsigned int msecs = jiffies_to_msecs(td->time_slice);
+	return sprintf(page, "%lu\n", msecs);
+}
+static ssize_t
+throttle_store_time_slice(struct throtl_data *td,
+				const char *page, size_t len)
+{
+	char *p = (char *) page;
+	unsigned long msecs = simple_strtoul(p, &p, 10);
+
+	td->time_slice = msecs_to_jiffies(msecs);
+	return len;
+}
+static struct throttle_sysfs_entry  throttle_time_slice_entry = {
+	.attr = { .name = "time_slice", .mode = S_IRUGO | S_IWUSR },
+	.show = throttle_show_time_slice,
+	.store = throttle_store_time_slice,
+};
+
+static ssize_t
+throttle_show_grp_quantum(struct throtl_data *td, char *page)
+{
+	return sprintf(page, "%d\n", td->grp_quantum);
+}
+static ssize_t
+throttle_store_grp_quantum(struct throtl_data *td,
+				const char *page, size_t len)
+{
+	char *p = (char *) page;
+	unsigned long v = simple_strtoul(p, &p, 10);
+
+	td->grp_quantum = (int) v;
+	return len;
+}
+static struct throttle_sysfs_entry  throttle_grp_quantum_entry = {
+	.attr = { .name = "grp_quantum", .mode = S_IRUGO | S_IWUSR },
+	.show = throttle_show_grp_quantum,
+	.store = throttle_store_grp_quantum,
+};
+
+static ssize_t
+throttle_show_q_quantum(struct throtl_data *td, char *page)
+{
+	return sprintf(page, "%d\n", td->q_quantum);
+}
+static ssize_t
+throttle_store_q_quantum(struct throtl_data *td,
+				const char *page, size_t len)
+{
+	char *p = (char *) page;
+	unsigned long v = simple_strtoul(p, &p, 10);
+
+	td->q_quantum = (int) v;
+	return len;
+}
+static struct throttle_sysfs_entry  throttle_q_quantum_entry = {
+	.attr = { .name = "q_quantum", .mode = S_IRUGO | S_IWUSR },
+	.show = throttle_show_q_quantum,
+	.store = throttle_store_q_quantum,
+};
+
+static ssize_t
+throttle_show_read_percentage(struct throtl_data *td, char *page)
+{
+	return sprintf(page, "%d\n", td->read_percentage);
+}
+static ssize_t
+throttle_store_read_percentage(struct throtl_data *td,
+				const char *page, size_t len)
+{
+	char *p = (char *) page;
+	unsigned long v = simple_strtoul(p, &p, 10);
+
+	if (v > 99)
+		v = 99;
+	else if (v < 1)
+		v = 1;
+	td->read_percentage = (int) v;
+	return len;
+}
+static struct throttle_sysfs_entry  throttle_read_percentage_entry = {
+	.attr = { .name = "read_percentage", .mode = S_IRUGO | S_IWUSR },
+	.show = throttle_show_read_percentage,
+	.store = throttle_store_read_percentage,
+};
+
+static struct attribute *throttle_attrs[] = {
+	&throttle_grp_quantum_entry.attr,
+	&throttle_q_quantum_entry.attr,
+	&throttle_time_slice_entry.attr,
+	&throttle_read_percentage_entry.attr,
+	NULL,
+};
+
+static ssize_t
+throttle_attr_store(struct kobject *kobj, struct attribute *attr,
+		    const char *page, size_t length)
+{
+	struct throttle_sysfs_entry *entry =
+		container_of(attr, struct throttle_sysfs_entry, attr);
+	struct throtl_data *td =
+		container_of(kobj, struct throtl_data, kobj);
+	ssize_t rtn;
+
+	if (! entry->store)
+		return -EIO;
+	mutex_lock(&td->sysfs_lock);
+	rtn = entry->store(td, page, length);
+	mutex_unlock(&td->sysfs_lock);
+	return rtn;
+}
+
+static ssize_t
+throttle_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
+{
+	struct throttle_sysfs_entry *entry =
+		container_of(attr, struct throttle_sysfs_entry, attr);
+	struct throtl_data *td =
+		container_of(kobj, struct throtl_data, kobj);
+	ssize_t rtn;
+
+	if (! entry->show)
+		return -EIO;
+	mutex_lock(&td->sysfs_lock);
+	rtn = entry->show(td, page);
+	mutex_unlock(&td->sysfs_lock);
+	return rtn;
+}
+
+static const struct sysfs_ops  throttle_sysfs_ops = {
+	.show	= throttle_attr_show,
+	.store	= throttle_attr_store,
+};
+
+static void throttle_release(struct kobject *kobj)
+{
+}
+
+static struct kobj_type blk_throttle_ktype = {
+	.sysfs_ops	= &throttle_sysfs_ops,
+	.default_attrs	= throttle_attrs,
+	.release	= throttle_release,
+};
+
+
 int blk_throtl_init(struct request_queue *q)
 {
 	struct throtl_data *td;
 	struct throtl_grp *tg;
+	int rtn;

 	td = kzalloc_node(sizeof(*td), GFP_KERNEL, q->node);
 	if (!td)
@@ -1049,6 +1237,20 @@ int blk_throtl_init(struct request_queue
 	tg->bps[0] = tg->bps[1] = -1;
 	tg->iops[0] = tg->iops[1] = -1;

+	mutex_init(&td->sysfs_lock);
+	td->grp_quantum = throtl_grp_quantum;
+	td->q_quantum = throtl_quantum;
+	td->time_slice = throtl_slice;
+	/* Try to dispatch 75% READS and 25% WRITES by default */
+	td->read_percentage = 75;
+	kobject_init(&td->kobj, &blk_throttle_ktype);
+	rtn = kobject_add(&td->kobj, kobject_get(&q->kobj), "%s", "throttle");
+	if (rtn < 0) {
+		kfree(td);
+		return rtn;
+	}
+	kobject_uevent(&td->kobj, KOBJ_ADD);
+
 	/*
 	 * Set root group reference to 2. One reference will be dropped when
 	 * all groups on tg_list are being deleted during queue exit. Other
@@ -1111,6 +1313,9 @@ void blk_throtl_exit(struct request_queu
 	 * it.
 	 */
 	throtl_shutdown_timer_wq(q);
+	kobject_uevent(&td->kobj, KOBJ_REMOVE);
+	kobject_del(&td->kobj);
+	kobject_put(&q->kobj);
 	throtl_td_free(td);
 }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] maximize dispatching in block throttle
  2010-12-04 13:36           ` Hillf Danton
@ 2010-12-06 14:54             ` Vivek Goyal
  2010-12-07 13:22               ` Hillf Danton
  0 siblings, 1 reply; 11+ messages in thread
From: Vivek Goyal @ 2010-12-06 14:54 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-kernel

On Sat, Dec 04, 2010 at 09:36:40PM +0800, Hillf Danton wrote:
> On Fri, Dec 3, 2010 at 10:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > It should not be too hard. IO schedulers already create
> > /sys/block/<dev>/queue/iosched/ dir and we can create <dev>/queue/throttle/
> > dir and export throttle related tunables there.
> 
> Evening, Vivek.
> 
> I worked the framework for the tunable out.
> 
> If in right direction, I will complete it soon.

Hillf, 

You still have not answered my questions in previous mail. 

- What's the problem are you facing and how filling the quantum to the
  capacity is helping you.

- Tunable and filling the quantum are two different things. If filling
  the quantum solved your problem, then how tunable is going to solve
  same problem.

I don't want to introduce tunables if they are really not put to use 
by somebody. So before we move in this direction, lets first answer
above questions.

Thanks
Vivek
 
> 
> Thanks
> Hillf
> 
> 
> 
> --- a/block/blk-throttle.c	2010-11-01 19:54:12.000000000 +0800
> +++ b/block/blk-throttle.c	2010-12-04 21:24:58.000000000 +0800
> @@ -98,6 +98,27 @@ struct throtl_data
>  	struct delayed_work throtl_work;
> 
>  	atomic_t limits_changed;
> +
> +	/*
> +	 * following is sysfs stuff about queue throttle
> +	 */
> +	struct kobject	kobj;
> +
> +	struct mutex	sysfs_lock;
> +
> +	/* Max dispatch from a group in 1 round */
> +	int grp_quantum;
> +
> +	/* Total max dispatch from all groups in one round */
> +	int q_quantum;
> +
> +	/* Throttling is performed over 100ms slice and after that
> +	 * slice is renewed
> +	 */
> +	unsigned long time_slice;
> +
> +	/* read percentage of dispatch from a group in 1 round */
> +	int read_percentage;
>  };
> 
>  enum tg_state_flags {
> @@ -644,11 +665,19 @@ static int throtl_dispatch_tg(struct thr
>  				struct bio_list *bl)
>  {
>  	unsigned int nr_reads = 0, nr_writes = 0;
> -	unsigned int max_nr_reads = throtl_grp_quantum*3/4;
> -	unsigned int max_nr_writes = throtl_grp_quantum - nr_reads;
> +	unsigned int max_nr_reads, max_nr_writes;
>  	struct bio *bio;
> 
> -	/* Try to dispatch 75% READS and 25% WRITES */
> +	max_nr_reads = td->read_percentage * td->grp_quantum /100;
> +	if (! max_nr_reads)
> +		max_nr_reads = 1;
> +	/*
> +	 * both are not computed stricktly here to throttle I/O
> +	 */
> +	if (max_nr_reads < td->grp_quantum)
> +		max_nr_writes = td->grp_quantum - max_nr_reads;
> +	else
> +		max_nr_writes = 1;
> 
>  	while ((bio = bio_list_peek(&tg->bio_lists[READ]))
>  		&& tg_may_dispatch(td, tg, bio, NULL)) {
> @@ -1025,10 +1054,169 @@ out:
>  	return 0;
>  }
> 
> +/*
> + * sysfs stuff
> + */
> +
> +struct throttle_sysfs_entry {
> +	struct attribute attr;
> +	ssize_t (*show)(struct throtl_data *, char *);
> +	ssize_t (*store)(struct throtl_data *, const char *, size_t);
> +};
> +
> +static ssize_t
> +throttle_show_time_slice(struct throtl_data *td, char *page)
> +{
> +	unsigned int msecs = jiffies_to_msecs(td->time_slice);
> +	return sprintf(page, "%lu\n", msecs);
> +}
> +static ssize_t
> +throttle_store_time_slice(struct throtl_data *td,
> +				const char *page, size_t len)
> +{
> +	char *p = (char *) page;
> +	unsigned long msecs = simple_strtoul(p, &p, 10);
> +
> +	td->time_slice = msecs_to_jiffies(msecs);
> +	return len;
> +}
> +static struct throttle_sysfs_entry  throttle_time_slice_entry = {
> +	.attr = { .name = "time_slice", .mode = S_IRUGO | S_IWUSR },
> +	.show = throttle_show_time_slice,
> +	.store = throttle_store_time_slice,
> +};
> +
> +static ssize_t
> +throttle_show_grp_quantum(struct throtl_data *td, char *page)
> +{
> +	return sprintf(page, "%d\n", td->grp_quantum);
> +}
> +static ssize_t
> +throttle_store_grp_quantum(struct throtl_data *td,
> +				const char *page, size_t len)
> +{
> +	char *p = (char *) page;
> +	unsigned long v = simple_strtoul(p, &p, 10);
> +
> +	td->grp_quantum = (int) v;
> +	return len;
> +}
> +static struct throttle_sysfs_entry  throttle_grp_quantum_entry = {
> +	.attr = { .name = "grp_quantum", .mode = S_IRUGO | S_IWUSR },
> +	.show = throttle_show_grp_quantum,
> +	.store = throttle_store_grp_quantum,
> +};
> +
> +static ssize_t
> +throttle_show_q_quantum(struct throtl_data *td, char *page)
> +{
> +	return sprintf(page, "%d\n", td->q_quantum);
> +}
> +static ssize_t
> +throttle_store_q_quantum(struct throtl_data *td,
> +				const char *page, size_t len)
> +{
> +	char *p = (char *) page;
> +	unsigned long v = simple_strtoul(p, &p, 10);
> +
> +	td->q_quantum = (int) v;
> +	return len;
> +}
> +static struct throttle_sysfs_entry  throttle_q_quantum_entry = {
> +	.attr = { .name = "q_quantum", .mode = S_IRUGO | S_IWUSR },
> +	.show = throttle_show_q_quantum,
> +	.store = throttle_store_q_quantum,
> +};
> +
> +static ssize_t
> +throttle_show_read_percentage(struct throtl_data *td, char *page)
> +{
> +	return sprintf(page, "%d\n", td->read_percentage);
> +}
> +static ssize_t
> +throttle_store_read_percentage(struct throtl_data *td,
> +				const char *page, size_t len)
> +{
> +	char *p = (char *) page;
> +	unsigned long v = simple_strtoul(p, &p, 10);
> +
> +	if (v > 99)
> +		v = 99;
> +	else if (v < 1)
> +		v = 1;
> +	td->read_percentage = (int) v;
> +	return len;
> +}
> +static struct throttle_sysfs_entry  throttle_read_percentage_entry = {
> +	.attr = { .name = "read_percentage", .mode = S_IRUGO | S_IWUSR },
> +	.show = throttle_show_read_percentage,
> +	.store = throttle_store_read_percentage,
> +};
> +
> +static struct attribute *throttle_attrs[] = {
> +	&throttle_grp_quantum_entry.attr,
> +	&throttle_q_quantum_entry.attr,
> +	&throttle_time_slice_entry.attr,
> +	&throttle_read_percentage_entry.attr,
> +	NULL,
> +};
> +
> +static ssize_t
> +throttle_attr_store(struct kobject *kobj, struct attribute *attr,
> +		    const char *page, size_t length)
> +{
> +	struct throttle_sysfs_entry *entry =
> +		container_of(attr, struct throttle_sysfs_entry, attr);
> +	struct throtl_data *td =
> +		container_of(kobj, struct throtl_data, kobj);
> +	ssize_t rtn;
> +
> +	if (! entry->store)
> +		return -EIO;
> +	mutex_lock(&td->sysfs_lock);
> +	rtn = entry->store(td, page, length);
> +	mutex_unlock(&td->sysfs_lock);
> +	return rtn;
> +}
> +
> +static ssize_t
> +throttle_attr_show(struct kobject *kobj, struct attribute *attr, char *page)
> +{
> +	struct throttle_sysfs_entry *entry =
> +		container_of(attr, struct throttle_sysfs_entry, attr);
> +	struct throtl_data *td =
> +		container_of(kobj, struct throtl_data, kobj);
> +	ssize_t rtn;
> +
> +	if (! entry->show)
> +		return -EIO;
> +	mutex_lock(&td->sysfs_lock);
> +	rtn = entry->show(td, page);
> +	mutex_unlock(&td->sysfs_lock);
> +	return rtn;
> +}
> +
> +static const struct sysfs_ops  throttle_sysfs_ops = {
> +	.show	= throttle_attr_show,
> +	.store	= throttle_attr_store,
> +};
> +
> +static void throttle_release(struct kobject *kobj)
> +{
> +}
> +
> +static struct kobj_type blk_throttle_ktype = {
> +	.sysfs_ops	= &throttle_sysfs_ops,
> +	.default_attrs	= throttle_attrs,
> +	.release	= throttle_release,
> +};
> +
> +
>  int blk_throtl_init(struct request_queue *q)
>  {
>  	struct throtl_data *td;
>  	struct throtl_grp *tg;
> +	int rtn;
> 
>  	td = kzalloc_node(sizeof(*td), GFP_KERNEL, q->node);
>  	if (!td)
> @@ -1049,6 +1237,20 @@ int blk_throtl_init(struct request_queue
>  	tg->bps[0] = tg->bps[1] = -1;
>  	tg->iops[0] = tg->iops[1] = -1;
> 
> +	mutex_init(&td->sysfs_lock);
> +	td->grp_quantum = throtl_grp_quantum;
> +	td->q_quantum = throtl_quantum;
> +	td->time_slice = throtl_slice;
> +	/* Try to dispatch 75% READS and 25% WRITES by default */
> +	td->read_percentage = 75;
> +	kobject_init(&td->kobj, &blk_throttle_ktype);
> +	rtn = kobject_add(&td->kobj, kobject_get(&q->kobj), "%s", "throttle");
> +	if (rtn < 0) {
> +		kfree(td);
> +		return rtn;
> +	}
> +	kobject_uevent(&td->kobj, KOBJ_ADD);
> +
>  	/*
>  	 * Set root group reference to 2. One reference will be dropped when
>  	 * all groups on tg_list are being deleted during queue exit. Other
> @@ -1111,6 +1313,9 @@ void blk_throtl_exit(struct request_queu
>  	 * it.
>  	 */
>  	throtl_shutdown_timer_wq(q);
> +	kobject_uevent(&td->kobj, KOBJ_REMOVE);
> +	kobject_del(&td->kobj);
> +	kobject_put(&q->kobj);
>  	throtl_td_free(td);
>  }

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] maximize dispatching in block throttle
  2010-12-06 14:54             ` Vivek Goyal
@ 2010-12-07 13:22               ` Hillf Danton
  0 siblings, 0 replies; 11+ messages in thread
From: Hillf Danton @ 2010-12-07 13:22 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-kernel

On Mon, Dec 6, 2010 at 10:54 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Sat, Dec 04, 2010 at 09:36:40PM +0800, Hillf Danton wrote:
>> On Fri, Dec 3, 2010 at 10:32 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > It should not be too hard. IO schedulers already create
>> > /sys/block/<dev>/queue/iosched/ dir and we can create <dev>/queue/throttle/
>> > dir and export throttle related tunables there.
>>
>> Evening, Vivek.
>>
>> I worked the framework for the tunable out.
>>
>> If in right direction, I will complete it soon.
>
> Hillf,
>
> You still have not answered my questions in previous mail.

Thank you, Vivek, for patience first.

>
> - What's the problem are you facing and how filling the quantum to the
>  capacity is helping you.

The community in which we live holds multiple elements, you see, and
it is not rare that readers could have different sights from the
writer of the artwork concerned. As the quantum defined, dispatching
should be as natural and smoothly as definition permits. And it
happened that the same concern is raised by other readers.

There are only two chances left in block throttle, a new concept in
block dir, for its readers. I was moved by the art of rb_tree at first
peek, and even tried to find the mm leakage of throttle group but
failed :-0)

>
> - Tunable and filling the quantum are two different things. If filling
>  the quantum solved your problem, then how tunable is going to solve
>  same problem.

Tunable looks necessary here since the maintainer of block offers
sysfs methods to tune his elevators, even in case of the CFQ. That is
the cool way to solve problems, I think.

>
> I don't want to introduce tunables if they are really not put to use
> by somebody. So before we move in this direction, lets first answer
> above questions.

The naive framework for the tunable is prepared for playing my games,
and shared if others concern, not ready for block throttle yet
currently, but it should be in right direction first. And it is honor
to ask for confirm from the author, right?

Cheers
Hillf

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-12-07 13:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-26 14:46 [PATCH] maximize dispatching in block throttle Hillf Danton
2010-11-30 14:57 ` Vivek Goyal
2010-12-01 13:30   ` Hillf Danton
2010-12-01 14:41     ` Vivek Goyal
2010-12-01 14:56       ` Hillf Danton
2010-12-03 14:26       ` Hillf Danton
2010-12-03 14:32         ` Vivek Goyal
2010-12-03 14:39           ` Hillf Danton
2010-12-04 13:36           ` Hillf Danton
2010-12-06 14:54             ` Vivek Goyal
2010-12-07 13:22               ` Hillf Danton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).