All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shaohua Li <shaohua.li@intel.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Corrado Zoccolo <czoccolo@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"jens.axboe@oracle.com" <jens.axboe@oracle.com>,
	"Zhang, Yanmin" <yanmin.zhang@intel.com>
Subject: Re: [RFC]cfq-iosched: quantum check tweak
Date: Mon, 11 Jan 2010 10:34:09 +0800	[thread overview]
Message-ID: <20100111023409.GE22362@sli10-desk.sh.intel.com> (raw)
In-Reply-To: <20100108205948.GH22219@redhat.com>

On Sat, Jan 09, 2010 at 04:59:48AM +0800, Vivek Goyal wrote:
> On Fri, Jan 08, 2010 at 09:35:33PM +0100, Corrado Zoccolo wrote:
> > On Fri, Jan 8, 2010 at 6:15 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > > On Thu, Jan 07, 2010 at 10:44:27PM +0100, Corrado Zoccolo wrote:
> > >> Hi Shahoua,
> > >>
> > >> On Thu, Jan 7, 2010 at 3:04 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> > >> > On Mon, 2009-12-28 at 17:02 +0800, Corrado Zoccolo wrote:
> > >> >> Hi Shaohua,
> > >> >> On Mon, Dec 28, 2009 at 4:35 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> > >> >> > On Fri, Dec 25, 2009 at 05:44:40PM +0800, Corrado Zoccolo wrote:
> > >> >> >> On Fri, Dec 25, 2009 at 10:10 AM, Shaohua Li <shaohua.li@intel.com> wrote:
> > >> >> >> > Currently a queue can only dispatch up to 4 requests if there are other queues.
> > >> >> >> > This isn't optimal, device can handle more requests, for example, AHCI can
> > >> >> >> > handle 31 requests. I can understand the limit is for fairness, but we could
> > >> >> >> > do some tweaks:
> > >> >> >> > 1. if the queue still has a lot of slice left, sounds we could ignore the limit
> > >> >> >> ok. You can even scale the limit proportionally to the remaining slice
> > >> >> >> (see below).
> > >> >> > I can't understand the meaning of below scale. cfq_slice_used_soon() means
> > >> >> > dispatched requests can finish before slice is used, so other queues will not be
> > >> >> > impacted. I thought/hope a cfq_slice_idle time is enough to finish the
> > >> >> > dispatched requests.
> > >> >> cfq_slice_idle is 8ms, that is the average time to complete 1 request
> > >> >> on most disks. If you have more requests dispatched on a
> > >> >> NCQ-rotational disk (non-RAID), it will take more time. Probably a
> > >> >> linear formula is not the most accurate, but still more accurate than
> > >> >> taking just 1 cfq_slice_idle. If you can experiment a bit, you could
> > >> >> also try:
> > >> >>  cfq_slice_idle * ilog2(nr_dispatched+1)
> > >> >>  cfq_slice_idle * (1<<(ilog2(nr_dispatched+1)>>1))
> > >> >>
> > >> >> >
> > >> >> >> > 2. we could keep the check only when cfq_latency is on. For uses who don't care
> > >> >> >> > about latency should be happy to have device fully piped on.
> > >> >> >> I wouldn't overload low_latency with this meaning. You can obtain the
> > >> >> >> same by setting the quantum to 32.
> > >> >> > As this impact fairness, so natually thought we could use low_latency. I'll remove
> > >> >> > the check in next post.
> > >> >> Great.
> > >> >> >> > I have a test of random direct io of two threads, each has 32 requests one time
> > >> >> >> > without patch: 78m/s
> > >> >> >> > with tweak 1: 138m/s
> > >> >> >> > with two tweaks and disable latency: 156m/s
> > >> >> >>
> > >> >> >> Please, test also with competing seq/random(depth1)/async workloads,
> > >> >> >> and measure also introduced latencies.
> > >> >> > depth1 should be ok, as if device can only send one request, it should not require
> > >> >> > more requests from ioscheduler.
> > >> >> I mean have a run with, at the same time:
> > >> >> * one seq reader,
> > >> >> * h random readers with depth 1 (non-aio)
> > >> >> * one async seq writer
> > >> >> * k random readers with large depth.
> > >> >> In this way, you can see if the changes you introduce to boost your
> > >> >> workload affect more realistic scenarios, in which various workloads
> > >> >> are mixed.
> > >> >> I explicitly add the depth1 random readers, since they are sceduled
> > >> >> differently than the large (>4) depth ones.
> > >> > I tried a fio script which does like your description, but the data
> > >> > isn't stable, especially the write speed, other kind of io speed is
> > >> > stable. Apply below patch doesn't make things worse (still write speed
> > >> > isn't stable, other io is stable), so I can't say if the patch passes
> > >> > the test, but it appears latency reported by fio hasn't change. I adopt
> > >> > the slice_idle * dispatched approach, which I thought should be safe.
> > >>
> > >> I'm doing some tests right now on a single ncq rotational disk, and
> > >> the average service time when submitting with a high depth is halved
> > >> w.r.t. depth 1, so I think you could test also with the formula :
> > >> slice_idle * dispatched / 2. It should give a performance boost,
> > >> without noticeable impact on latency.
> > >>
> > >
> > > But I guess the right comparison here would service times vary when we
> > > push queue depths from 4 to higher (as done by this patch).
> > 
> > I think here we want to determine the average cost of a request, when
> > there are many submitted.
> > 
> > > Were you
> > > running deep seeky queues or sequential queues. Curious to know whether
> > > service times reduced even in case of deep seeky queues on this single
> > > disk.
> > 
> > Seeky queues. Seeks where rather small (not more than 1/64 of the
> > whole disk), but already meaningful for comparison.
> > 
> > >
> > > I think this patch breaks the meaning of cfq_quantum? Now we can allow
> > > dispatch of more requests from the same queue. I had kind of liked the
> > > idea of respecting cfq_quantum. Especially it can help in testing. With
> > > this patch cfq_quantum will more or less loose its meaning.
> > cfq_quantum will still be enforced at the end of the slice, so its
> > meaning of how many requests can be still pending when you finish your
> > slice is preserved.
> 
> Not always and it will depend how accurate your approximation of service
> time is. If per request completion time is more than approximation (in
> this case slice_idle), than you will end up with more requests in dispatch
> queue from one cfqq at the time of slice expiry.
we use slice_idle for a long time and no complain. So assume the approximation
of service time is good.

> > 
> > One can argue, instead, that this reduces a bit the effectiveness of
> > preemption on ncq disks.
> > However, I don't think preemption is the solution for low latency,
> > while cfq_quantum reduction is.
> > With this change in place, we could change the default cfq_quantum to
> > a smaller number (ideally 1), to have lowest number of leftovers when
> > the slice finishes, while still driving deep queues at the beginning
> > of the slice.
> 
> I think using cfq_quantum as hard limit might be a better idea as it gives
> more predictable control. Instead of treating it as soft limit and trying
> to meet it at the end of slice expiry based on our approximation of
> predicted completion time.
Current patch has such hard limit too (100ms/8m = 12 for sync io and 40ms/8
 = 5 for async io).
 
> > This needs thorough testing, though. Maybe it is better to delay those
> > changes to 2.6.34...
> 
> Agreed. This should be tested more throughly and should be candidate for
> 2.6.34.
Sure, this needs a lot of test.

Thanks,
Shaohua

  reply	other threads:[~2010-01-11  2:34 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-25  9:10 [RFC]cfq-iosched: quantum check tweak Shaohua Li
2009-12-25  9:44 ` Corrado Zoccolo
2009-12-28  3:35   ` Shaohua Li
2009-12-28  9:02     ` Corrado Zoccolo
2010-01-07  2:04       ` Shaohua Li
2010-01-07 21:44         ` Corrado Zoccolo
2010-01-08  0:57           ` Shaohua Li
2010-01-08 20:22             ` Corrado Zoccolo
2010-01-11  1:49               ` Shaohua Li
2010-01-11  2:01               ` Shaohua Li
2010-01-08 17:15           ` Vivek Goyal
2010-01-08 17:40             ` Vivek Goyal
2010-01-08 20:35             ` Corrado Zoccolo
2010-01-08 20:59               ` Vivek Goyal
2010-01-11  2:34                 ` Shaohua Li [this message]
2010-01-11 17:03                   ` Vivek Goyal
2010-01-12  3:07                     ` Shaohua Li
2010-01-12 15:48                       ` Vivek Goyal
2010-01-13  8:17                         ` Shaohua Li
2010-01-13 11:18                           ` Vivek Goyal
2010-01-14  4:16                             ` Shaohua Li
2010-01-14 11:31                               ` Vivek Goyal
2010-01-14 13:49                                 ` Jens Axboe
2010-01-15  3:20                                 ` Li, Shaohua

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100111023409.GE22362@sli10-desk.sh.intel.com \
    --to=shaohua.li@intel.com \
    --cc=czoccolo@gmail.com \
    --cc=jens.axboe@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vgoyal@redhat.com \
    --cc=yanmin.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.