All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shaohua Li <shaohua.li@intel.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>,
	linux-kernel@vger.kernel.org, axboe@kernel.dk, jmoyer@redhat.com
Subject: Re: [RFC 0/3]block: An IOPS based ioscheduler
Date: Mon, 16 Jan 2012 15:55:41 +0800	[thread overview]
Message-ID: <1326700541.22361.607.camel@sli10-conroe> (raw)
In-Reply-To: <20120116071132.GE3174@redhat.com>

On Mon, 2012-01-16 at 02:11 -0500, Vivek Goyal wrote:
> On Mon, Jan 16, 2012 at 12:36:30PM +0800, Shaohua Li wrote:
> > On Sun, 2012-01-15 at 17:45 -0500, Vivek Goyal wrote:
> > > On Mon, Jan 09, 2012 at 09:09:35AM +0800, Shaohua Li wrote:
> > > 
> > > [..]
> > > > > You need to present raw numbers and give us some idea of how close
> > > > > those numbers are to raw hardware capability for us to have any idea
> > > > > what improvements these numbers actually demonstrate.
> > > > Yes, your guess is right. The hardware has limitation. 12 SSD exceeds
> > > > the jbod capability, for both throughput and IOPS, that's why only
> > > > read/write mixed workload impacts. I'll use less SSD in later tests,
> > > > which will demonstrate the performance better. I'll report both raw
> > > > numbers and fiops/cfq numbers later.
> > > 
> > > If fiops number are better please explain why those numbers are better.
> > > If you cut down on idling, it is obivious that you will get higher
> > > throughput on these flash devices. CFQ does disable queue idling for
> > > non rotational NCQ devices. If higher throughput is due to driving
> > > deeper queue depths, then CFQ can do that too just by changing quantum
> > > and disabling idling. 
> > it's because of quantum. Surely you can change the quantum, and CFQ
> > performance will increase, but you will find CFQ is very unfair then.
> 
> Why increasing quantum leads to CFQ being unfair? In terms of time it
> still tries to be fair. 
we can dispatch a lot of requests to NCQ SSD with very small time
interval. The disk can finish a lot of requests in small time interval
too. The time is much smaller than 1 jiffy. Increasing quantum can lead
a task dispatches request more faster and makes the accounting worse,
because with small quantum the task needs wait to dispatch. you can
easily verify this with a simple fio test.

> That's a different thing that with NCQ, right
> time measurement is not possible with requests from multiple queues
> being in the driver/disk at the same time. So accouting in terms of
> iops per queue might make sense.
yes.

> > > So I really don't understand that what are you doing fundamentally
> > > different in FIOPS ioscheduler. 
> > > 
> > > The only thing I can think of more accurate accounting per queue in
> > > terms of number of IOs instead of time. Which can just serve to improve
> > > fairness a bit for certain workloads. In practice, I think it might
> > > not matter much.
> > If quantum is big, CFQ will have better performance, but it actually
> > fallbacks to Noop, no any fairness. fairness is important and is why we
> > introduce CFQ.
> 
> It is not exactly noop. It still preempts writes and prioritizes reads
> and direct writes. 
sure, I mean fairness mostly here.

> Also, what's the real life workload where you face issues with using
> say deadline with these flash based storage.
deadline doesn't provide fairness. mainly cgroup workload. workload with
different ioprio has issues too, but I don't know which real workload
uses ioprio.

> > 
> > In summary, CFQ isn't both fair and good performance. FIOPS is trying to
> > be fair and have good performance. I didn't think any time based
> > accounting can make the goal happen for NCQ and SSD (even cfq cgroup
> > code has iops mode, so suppose you should already know this well).
> > 
> > Surely you can change CFQ to make it IOPS based, but this will mess the
> > code a lot, and FIOPS shares a lot of code with CFQ. So I'd like to have
> > a separate ioscheduler which is IOPS based.
> 
> I think writing a separate IO scheduler just to do accouting in IOPS while
> retaining rest of the CFQ code is not a very good idea. Modifying CFQ code
> to be able to deal with both time based as well as IOPS accounting might
> turn out to be simpler.
changing CFQ works, but I really want to avoid having something like
if (iops) 
   xxx
else
   xxx
I plan adding scales for read/write, request size, etc, because
read/write cost is different and request with different size has
different cost in SSD. This can be added to CFQ too with pain. That said
I didn't completely object to make CFQ support IOPS accounting, but my
feeling is a separate ioscheduler is more clean.

Thanks,
Shaohua


  reply	other threads:[~2012-01-16  7:55 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-04  6:53 [RFC 0/3]block: An IOPS based ioscheduler Shaohua Li
2012-01-04  6:53 ` [RFC 1/3]block: seperate CFQ io context management code Shaohua Li
2012-01-04  8:19   ` Namhyung Kim
2012-01-04  6:53 ` [RFC 2/3]block: FIOPS ioscheduler core Shaohua Li
2012-01-06  6:05   ` Namjae Jeon
2012-01-07  1:06   ` Zhu Yanhai
2012-01-04  6:53 ` [RFC 3/3]block: fiops read/write request scale Shaohua Li
2012-01-04  7:19 ` [RFC 0/3]block: An IOPS based ioscheduler Dave Chinner
2012-01-05  6:50   ` Shaohua Li
2012-01-06  5:12     ` Shaohua Li
2012-01-06  9:10       ` Namhyung Kim
2012-01-06 14:37       ` Jan Kara
2012-01-09  1:26         ` Shaohua Li
2012-01-15 22:32           ` Vivek Goyal
2012-01-08 22:16       ` Dave Chinner
2012-01-09  1:09         ` Shaohua Li
2012-01-15 22:45           ` Vivek Goyal
2012-01-16  4:36             ` Shaohua Li
2012-01-16  7:11               ` Vivek Goyal
2012-01-16  7:55                 ` Shaohua Li [this message]
2012-01-16  8:29                   ` Vivek Goyal
2012-01-17  1:06                     ` Shaohua Li
2012-01-17  9:02                       ` Vivek Goyal
2012-01-18  1:20                         ` Shaohua Li
2012-01-18 13:04                           ` Vivek Goyal
2012-01-19  1:21                             ` Shaohua Li
2012-01-15 22:28       ` Vivek Goyal
2012-01-06  9:41 ` Zhu Yanhai
2012-01-15 22:24 ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1326700541.22361.607.camel@sli10-conroe \
    --to=shaohua.li@intel.com \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.