Re: [PATCH] maximize dispatching in block throttle

From: Hillf Danton <dhillf@gmail.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] maximize dispatching in block throttle
Date: Fri, 3 Dec 2010 22:26:50 +0800	[thread overview]
Message-ID: <AANLkTinGChRaNUeH+AzS+6Y3Au+0BR1Mz97W8aDSoyQf@mail.gmail.com> (raw)
In-Reply-To: <20101201144128.GC4672@redhat.com>

On Wed, Dec 1, 2010 at 10:41 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Dec 01, 2010 at 09:30:00PM +0800, Hillf Danton wrote:
>> On Tue, Nov 30, 2010 at 10:57 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
>> > On Fri, Nov 26, 2010 at 10:46:01PM +0800, Hillf Danton wrote:
>> >> When dispatching bio, the quantum is divided into read/write budgets,
>> >> and dispatching for write could not exceed the write budget even if
>> >> the read budget is not exhausted, either dispatching for read.
>> >>
>> >> It is changed to exhaust the quantum, if possible, in this work for
>> >> dispatching bio.
>> >>
>> >> Though it is hard to understand that 50/50 division is not selected,
>> >> the difference between divisions could impact little on dispatching as
>> >> much as quantum allows then.
>> >>
>> >> Signed-off-by: Hillf Danton <dhillf@gmail.com>
>> >> ---
>> >
>> > Hi Hillf,
>> >
>> > Even if there are not enough READS/WRITES to consume the quantum, I don't
>> > think that it changes anyting much. The next dispatch round will be
>>
>> Why not exhaust quantum this round directly if not throttled?
>>
>
> Yes we can do that. I am wondering what do we gain by putting extra code
> in?
>
>> > scheduled almost immediately (If there are bios which are ready to
>> > be dispatched). Look at throtl_schedule_next_dispatch().
>> >
>> > Have you noticed some issues/improvements with this patch?
>>
>> Originally I wanted to get 75/25 division parameterized through sysfs or proc,
>> but I changed mind since exhausting quantum is simpler and much applicable.
>
> But these are two different things isn't it?
>
> We can introduce some sysfs tunables (if need be) so that a user can decide
> the division of READS/WRITES.
>
>>
>> As you see, the application environments change from one user to another,
>> even though are more latency sensitive.
>>
>> It looks nicer, I think, to provide both the default division and the
>> methods to
>> change for users to play their games.
>
> Yep, tunables can help here and introducing tunables is easy. At the same

If it is too hard, please change to what is more valuable.
Hillf

> time I prefer to intoroduce tunables only on need basis otherwise it
> runs the risk of too many tunables which are not being used.
>
> So, conceptually this patch looks fine to me. Jeff Moyer also raised the
> same question of why not fill up the quantum if READ/WRITE are not using
> their quota. So I think it does not hurt to take this patch in.
>
> Acked-by: Vivek Goyal <vgoyal@redhat.com>
>
> Thanks
> Vivek
>
>>
>> Thanks
>> Hillf
>>
>> >
>> > Generally READS are more latency sensitive as compared to WRITE hence
>> > I thought of dispatching more READS per quantum.
>> >
>> > Thanks
>> > Vivek
>> >
>> >>
>> >> --- a/block/blk-throttle.c    2010-11-01 19:54:12.000000000 +0800
>> >> +++ b/block/blk-throttle.c    2010-11-26 21:49:00.000000000 +0800
>> >> @@ -647,11 +647,16 @@ static int throtl_dispatch_tg(struct thr
>> >>       unsigned int max_nr_reads = throtl_grp_quantum*3/4;
>> >>       unsigned int max_nr_writes = throtl_grp_quantum - nr_reads;
>> >>       struct bio *bio;
>> >> +     int read_throttled = 0, write_throttled = 0;
>> >>
>> >>       /* Try to dispatch 75% READS and 25% WRITES */
>> >> -
>> >> + try_read:
>> >>       while ((bio = bio_list_peek(&tg->bio_lists[READ]))
>> >> -             && tg_may_dispatch(td, tg, bio, NULL)) {
>> >> +             && ! read_throttled) {
>> >> +             if (! tg_may_dispatch(td, tg, bio, NULL)) {
>> >> +                     read_throttled = 1;
>> >> +                     break;
>> >> +             }
>> >>
>> >>               tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
>> >>               nr_reads++;
>> >> @@ -659,9 +664,15 @@ static int throtl_dispatch_tg(struct thr
>> >>               if (nr_reads >= max_nr_reads)
>> >>                       break;
>> >>       }
>> >> -
>> >> +     if (! bio)
>> >> +             read_throttled = 1;
>> >> + try_write:
>> >>       while ((bio = bio_list_peek(&tg->bio_lists[WRITE]))
>> >> -             && tg_may_dispatch(td, tg, bio, NULL)) {
>> >> +             && ! write_throttled) {
>> >> +             if (! tg_may_dispatch(td, tg, bio, NULL)) {
>> >> +                     write_throttled = 1;
>> >> +                     break;
>> >> +             }
>> >>
>> >>               tg_dispatch_one_bio(td, tg, bio_data_dir(bio), bl);
>> >>               nr_writes++;
>> >> @@ -669,7 +680,23 @@ static int throtl_dispatch_tg(struct thr
>> >>               if (nr_writes >= max_nr_writes)
>> >>                       break;
>> >>       }
>> >> +     if (! bio)
>> >> +             write_throttled = 1;
>> >> +
>> >> +     if (write_throttled && read_throttled)
>> >> +             goto out;
>> >>
>> >> +     if (! (throtl_grp_quantum > nr_writes + nr_reads))
>> >> +             goto out;
>> >> +
>> >> +     if (read_throttled) {
>> >> +             max_nr_writes = throtl_grp_quantum - nr_reads;
>> >> +             goto try_write;
>> >> +     } else {
>> >> +             max_nr_reads = throtl_grp_quantum - nr_writes;
>> >> +             goto try_read;
>> >> +     }
>> >> + out:
>> >>       return nr_reads + nr_writes;
>> >>  }
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> Please read the FAQ at  http://www.tux.org/lkml/
>> >
>