All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Corrado Zoccolo <czoccolo@gmail.com>
Cc: Kirill Afonshin <kirill_nnov@mail.ru>,
	Jeff Moyer <jmoyer@redhat.com>,
	Jens Axboe <jens.axboe@oracle.com>,
	Linux-Kernel <linux-kernel@vger.kernel.org>,
	Shaohua Li <shaohua.li@intel.com>,
	Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Subject: Re: [PATCH] cfq-iosched: non-rot devices do not need read queue merging
Date: Fri, 8 Jan 2010 13:53:39 -0500	[thread overview]
Message-ID: <20100108185339.GF22219@redhat.com> (raw)
In-Reply-To: <4e5e476b1001071216k2da28c4awc91c5d0c89013035@mail.gmail.com>

On Thu, Jan 07, 2010 at 09:16:30PM +0100, Corrado Zoccolo wrote:
> On Thu, Jan 7, 2010 at 7:37 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Thu, Jan 07, 2010 at 06:00:54PM +0100, Corrado Zoccolo wrote:
> >> On Thu, Jan 7, 2010 at 3:36 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> >> > Hi Corrado,
> >> >
> >> > How does idle time value relate to flash card being slower for writes? If
> >> > flash card is slow and we choose to idle on queue (because of direct
> >> > writes), idle time value does not even kick in. We just continue to remain
> >> > on same cfqq and don't do dispatch from next cfqq.
> >> >
> >> > Idle time value will matter only if there was delay from cpu side or from
> >> > workload side in issuing next request after completion of previous one.
> >> >
> >> > Thanks
> >> > Vivek
> >> Hi Vivek,
> >> for me, the optimal idle value should approximate the cost of
> >> switching to an other queue.
> >> So, for reads, if we are waiting for more than 1 ms, then we are
> >> wasting bandwidth.
> >> But if we switch from reads to writes (since the reader thought
> >> slightly more than 1ms), and the write is really slow, we can have a
> >> really long latency before the reader can complete its new request.
> >
> > What workload do you have where reader is thinking more than a 1ms?
> My representative workload is booting my netbook. I found that if I
> let cfq autotune to a lower slice idle, boot slows down, and bootchart
> clearly shows that I/O wait increases and I/O bandwidth decreases.
> This tells me that the writes are getting into the picture earlier
> than with 8ms idle, and causing a regression.
> Note that the reader doesn't need to be one. I could have a set of
> readers, and I want to switch between them in 1ms, but idle up to 10ms
> or more before switching to async writes.

Ok, so booting on your netbook where write cost is high is the case. So
in this particular case you prefer to delay writes a bit to reduce the
read latency and writes can catch up little later.

> >
> > To me one issue probably is that for sync queues we drive shallow (1-2)
> > queue depths and this can be an issue on high end storage where there
> > can be multiple disks behind the array and this sync queue is just
> > not keeping array fully utilized. Buffered sequential reads mitigate
> > this issue up to some extent as requests size is big.
> I think for sequential queues, you should tune your readahead to hit
> all the disks of the raid. In that case, idling makes sense, because
> all the disks will now be ready to serve the new request immediately.
> 
> >
> > Idling on the queue helps in providing differentiated service for higher
> > priority queue and also helps to get more out of disk on rotational media
> > with single disk. But I suspect that on big arrays, this idling on sync
> > queues and not driving deeper queue depths might hurt.
> We should have some numbers to support. In all tests I saw, setting
> slice idle to 0 causes regression also on decently sized arrays, at
> least when the number of concurrent processes is big enough that 2 of
> them likely will make requests to the same disk (and by the birthday
> paradox, this can be a quite small number, even with very large
> arrays: e.g. with 365-disk raids, 23 concurrent processes have 50%
> probability of colliding on the same disk at every single request
> sent).

I will do some tests and see if there are cases where driving shallower
depths hurts.

Vivek

> 
> >
> > So if we had a way to detect that we got a big storage array underneath,
> > may be we can get more throughput by not idling at all. But we will also
> > loose the service differentiation between various ioprio queues. I guess
> > your patches of monitoring service times might be useful here.
> It might, but we need to identify an hardware in which not idling is
> beneficial, measure its behaviour and see which measurable parameter
> can clearly distinguish it from other hardware where idling is
> required. If we are speaking of raid of rotational disks, seek time
> (which I was measuring) is not a good parameter, because it can be
> still high.
> >
> >> So the optimal choice would be to have two different idle times, one
> >> for switch between readers, and one when switching from readers to
> >> writers.
> >
> > Sounds like read and write batches. With you workload type, we are already
> > doing it. Idle per service tree. At least it solves the problem for
> > sync-noidle queues where we don't idle between read queues but do idle
> > between read and buffered write (async queues).
> >
> In fact those changes improved my netbook boot time a lot, and I'm not
> even using sreadahead. But if autotuning reduces the slice idle, then
> I see again the huge penalty of small writes.
> 
> > In my testing so far, I have not encountered the workloads where readers
> > are thinking a lot. Think time has been very small.
> Sometimes real workloads have more variable think times than our
> syntetic benchmarks.
> 
> >
> > Thanks
> > Vivek
> >
> Thanks,
> Corrado

  reply	other threads:[~2010-01-08 18:54 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-30 12:10 [PATCH] cfq-iosched: non-rot devices do not need queue merging Corrado Zoccolo
2009-12-30 18:45 ` Jens Axboe
2009-12-30 20:31   ` Corrado Zoccolo
2009-12-30 21:11     ` Jens Axboe
2009-12-30 21:21       ` Corrado Zoccolo
2009-12-30 21:34         ` Jens Axboe
2009-12-30 22:22           ` [PATCH] cfq-iosched: non-rot devices do not need read " Corrado Zoccolo
2010-01-04 14:47             ` Vivek Goyal
2010-01-04 16:36               ` Corrado Zoccolo
2010-01-04 16:51                 ` Jeff Moyer
2010-01-04 18:32                   ` Vivek Goyal
2010-01-04 18:37                   ` Corrado Zoccolo
2010-01-04 18:51                     ` Vivek Goyal
2010-01-04 19:04                       ` Jeff Moyer
2010-01-04 20:37                         ` Corrado Zoccolo
2010-01-05 14:58                           ` Jeff Moyer
2010-01-05 15:13                             ` Vivek Goyal
2010-01-05 21:19                               ` Jeff Moyer
2010-01-05 21:48                                 ` Corrado Zoccolo
2010-01-07 10:56                                   ` Kirill Afonshin
2010-01-07 13:38                                     ` Corrado Zoccolo
2010-01-07 14:36                                       ` Vivek Goyal
2010-01-07 17:00                                         ` Corrado Zoccolo
2010-01-07 18:37                                           ` Vivek Goyal
2010-01-07 20:16                                             ` Corrado Zoccolo
2010-01-08 18:53                                               ` Vivek Goyal [this message]
2010-01-10 12:55                                   ` Corrado Zoccolo
2010-01-10 21:04             ` [PATCH] cfq-iosched: NCQ SSDs " Corrado Zoccolo
2010-01-10 21:08               ` Corrado Zoccolo
2010-01-11 11:25               ` Jeff Garzik
2010-01-11 12:26                 ` Corrado Zoccolo
2010-01-11 13:13                   ` Jens Axboe
2010-01-11 13:18                     ` Jeff Garzik
2010-01-11 13:24                       ` Jens Axboe
2010-01-11 14:53                       ` Corrado Zoccolo
2010-01-11 16:44                         ` Vivek Goyal
2010-01-11 17:00                           ` Corrado Zoccolo
2010-01-11 17:07                             ` Vivek Goyal
2010-01-11 19:05                               ` Corrado Zoccolo
2010-01-11 17:11                             ` Vivek Goyal
2010-01-11 19:09                               ` Corrado Zoccolo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100108185339.GF22219@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=czoccolo@gmail.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=jens.axboe@oracle.com \
    --cc=jmoyer@redhat.com \
    --cc=kirill_nnov@mail.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shaohua.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.