linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shaohua Li <shaohua.li@intel.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>,
	linux-kernel@vger.kernel.org, axboe@kernel.dk, jmoyer@redhat.com,
	zhu.yanhai@gmail.com
Subject: Re: [RFC 0/3]block: An IOPS based ioscheduler
Date: Tue, 17 Jan 2012 09:06:28 +0800	[thread overview]
Message-ID: <1326762388.22361.613.camel@sli10-conroe> (raw)
In-Reply-To: <20120116082927.GF3174@redhat.com>

On Mon, 2012-01-16 at 03:29 -0500, Vivek Goyal wrote:
> On Mon, Jan 16, 2012 at 03:55:41PM +0800, Shaohua Li wrote:
> > On Mon, 2012-01-16 at 02:11 -0500, Vivek Goyal wrote:
> > > On Mon, Jan 16, 2012 at 12:36:30PM +0800, Shaohua Li wrote:
> > > > On Sun, 2012-01-15 at 17:45 -0500, Vivek Goyal wrote:
> > > > > On Mon, Jan 09, 2012 at 09:09:35AM +0800, Shaohua Li wrote:
> > > > > 
> > > > > [..]
> > > > > > > You need to present raw numbers and give us some idea of how close
> > > > > > > those numbers are to raw hardware capability for us to have any idea
> > > > > > > what improvements these numbers actually demonstrate.
> > > > > > Yes, your guess is right. The hardware has limitation. 12 SSD exceeds
> > > > > > the jbod capability, for both throughput and IOPS, that's why only
> > > > > > read/write mixed workload impacts. I'll use less SSD in later tests,
> > > > > > which will demonstrate the performance better. I'll report both raw
> > > > > > numbers and fiops/cfq numbers later.
> > > > > 
> > > > > If fiops number are better please explain why those numbers are better.
> > > > > If you cut down on idling, it is obivious that you will get higher
> > > > > throughput on these flash devices. CFQ does disable queue idling for
> > > > > non rotational NCQ devices. If higher throughput is due to driving
> > > > > deeper queue depths, then CFQ can do that too just by changing quantum
> > > > > and disabling idling. 
> > > > it's because of quantum. Surely you can change the quantum, and CFQ
> > > > performance will increase, but you will find CFQ is very unfair then.
> > > 
> > > Why increasing quantum leads to CFQ being unfair? In terms of time it
> > > still tries to be fair. 
> > we can dispatch a lot of requests to NCQ SSD with very small time
> > interval. The disk can finish a lot of requests in small time interval
> > too. The time is much smaller than 1 jiffy. Increasing quantum can lead
> > a task dispatches request more faster and makes the accounting worse,
> > because with small quantum the task needs wait to dispatch. you can
> > easily verify this with a simple fio test.
> > 
> > > That's a different thing that with NCQ, right
> > > time measurement is not possible with requests from multiple queues
> > > being in the driver/disk at the same time. So accouting in terms of
> > > iops per queue might make sense.
> > yes.
> > 
> > > > > So I really don't understand that what are you doing fundamentally
> > > > > different in FIOPS ioscheduler. 
> > > > > 
> > > > > The only thing I can think of more accurate accounting per queue in
> > > > > terms of number of IOs instead of time. Which can just serve to improve
> > > > > fairness a bit for certain workloads. In practice, I think it might
> > > > > not matter much.
> > > > If quantum is big, CFQ will have better performance, but it actually
> > > > fallbacks to Noop, no any fairness. fairness is important and is why we
> > > > introduce CFQ.
> > > 
> > > It is not exactly noop. It still preempts writes and prioritizes reads
> > > and direct writes. 
> > sure, I mean fairness mostly here.
> > 
> > > Also, what's the real life workload where you face issues with using
> > > say deadline with these flash based storage.
> > deadline doesn't provide fairness. mainly cgroup workload. workload with
> > different ioprio has issues too, but I don't know which real workload
> > uses ioprio.
> 
> Personally I have not run into any workload which provides deep queue depths
> constantly for a very long time. I had to run fio to create such
> scnearios.
> 
> Not running deep queue depths will lead to expiration of queue (Otherwise
> idling will kill performance on these fast devices). And without idling
> most of the logic of slice and accounting does not help. A queue
> dispatches some requests and expires (irrespective of what time slice
> you had allocated it based on ioprio).
That's true, if workload doesn't drive deep queue depths, any accounting
can't help for NCQ disks as far as I tried. Idling is the only method to
make accounting correct, but it impacts performance too much.

> That's why I am insisting that it would be nice that any move in this
> direction should be driven by some real workload instead of just coming
> up with synthetic workloads.
I thought yanhai from taobao (cc-ed) has real workload and he found cfq
performance suffers a lot.

Thanks,
Shaohua


  reply	other threads:[~2012-01-17  1:08 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-04  6:53 [RFC 0/3]block: An IOPS based ioscheduler Shaohua Li
2012-01-04  6:53 ` [RFC 1/3]block: seperate CFQ io context management code Shaohua Li
2012-01-04  8:19   ` Namhyung Kim
2012-01-04  6:53 ` [RFC 2/3]block: FIOPS ioscheduler core Shaohua Li
2012-01-06  6:05   ` Namjae Jeon
2012-01-07  1:06   ` Zhu Yanhai
2012-01-04  6:53 ` [RFC 3/3]block: fiops read/write request scale Shaohua Li
2012-01-04  7:19 ` [RFC 0/3]block: An IOPS based ioscheduler Dave Chinner
2012-01-05  6:50   ` Shaohua Li
2012-01-06  5:12     ` Shaohua Li
2012-01-06  9:10       ` Namhyung Kim
2012-01-06 14:37       ` Jan Kara
2012-01-09  1:26         ` Shaohua Li
2012-01-15 22:32           ` Vivek Goyal
2012-01-08 22:16       ` Dave Chinner
2012-01-09  1:09         ` Shaohua Li
2012-01-15 22:45           ` Vivek Goyal
2012-01-16  4:36             ` Shaohua Li
2012-01-16  7:11               ` Vivek Goyal
2012-01-16  7:55                 ` Shaohua Li
2012-01-16  8:29                   ` Vivek Goyal
2012-01-17  1:06                     ` Shaohua Li [this message]
2012-01-17  9:02                       ` Vivek Goyal
2012-01-18  1:20                         ` Shaohua Li
2012-01-18 13:04                           ` Vivek Goyal
2012-01-19  1:21                             ` Shaohua Li
2012-01-15 22:28       ` Vivek Goyal
2012-01-06  9:41 ` Zhu Yanhai
2012-01-15 22:24 ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1326762388.22361.613.camel@sli10-conroe \
    --to=shaohua.li@intel.com \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vgoyal@redhat.com \
    --cc=zhu.yanhai@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).