linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shaohua Li <shaohua.li@intel.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>,
	linux-kernel@vger.kernel.org, axboe@kernel.dk, jmoyer@redhat.com,
	zhu.yanhai@gmail.com
Subject: Re: [RFC 0/3]block: An IOPS based ioscheduler
Date: Thu, 19 Jan 2012 09:21:06 +0800	[thread overview]
Message-ID: <1326936066.22361.646.camel@sli10-conroe> (raw)
In-Reply-To: <20120118130425.GA30204@redhat.com>

On Wed, 2012-01-18 at 08:04 -0500, Vivek Goyal wrote:
> On Wed, Jan 18, 2012 at 09:20:37AM +0800, Shaohua Li wrote:
> 
> [..]
> > > I think trying to make to make CFQ work (Or trying to come up with CFQ
> > > like IOPS scheduler) on these fast devices might not lead us anywhere.
> > If only performance matters, I'd rather use noop for ssd. There is
> > requirement to have cgroup support (maybe ioprio) to give different
> > tasks different bandwidth.
> 
> Sure but the issue is that we need to idle in an effort to prioritize
> a task and idling kills performance. So you can implement something but
> I have doubts that on a fast hardware it is going to be very useful.
I didn't do idle in fiops. If workload iodepth is low, this will cause
fairness problem and I just leave it be. There is no way to make low
iodepth workload fair without performance sacrifice. CFQ for SSD has the
same problem too.

> Another issue is that with flash based storage, it can drive really deep
> queue depths. If that's the case, then just ioscheduler can't solve the
> prioritazaion issues (until and unless ioscheduler does not drive deep
> queue depths and kills performance). We need some kind of cooperation
> from device (like device understanding the notion of iopriority), so
> that device can prioritize the requests and one need not to idle. That
> way, we might be able to get service differentiation while getting
> reasonable throughput.
SSD is still like normal disk in terms of queue depth. Don't know the
iodepth of pcie flash card. If the queue depth of such card is very big
(I suppose there is a limitation, because after a critical point
increasing queue depth doesn't increase device performance), we
definitely need reconsider this.

it would be great device let ioscheduler know more info. In my mind, I
hope device can dynamatically adjust its queue depth. For example, in my
ssd, if request size is 4k, I get max throughput with queue depth 32. If
request size is 128k, just 2 queue depth is enough to get peek
throughput. If device can stop fetching request after 2 128k requests
pending, this will solve some fairness issues for low iodepth workload.
Unfortunately device capacity for different request size highly depends
on vendor. The fiops request size scale tries to solve the issue, but I
still didn't find a good scale model for this yet.

I suppose device can not do good prioritization if workload iodepth is
low. If there are just few requests pending, nobody (device or
ioscheduler) can do correct judgment in such case because there isn't
enough info.

Thanks,
Shaohua


  reply	other threads:[~2012-01-19  1:20 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-04  6:53 [RFC 0/3]block: An IOPS based ioscheduler Shaohua Li
2012-01-04  6:53 ` [RFC 1/3]block: seperate CFQ io context management code Shaohua Li
2012-01-04  8:19   ` Namhyung Kim
2012-01-04  6:53 ` [RFC 2/3]block: FIOPS ioscheduler core Shaohua Li
2012-01-06  6:05   ` Namjae Jeon
2012-01-07  1:06   ` Zhu Yanhai
2012-01-04  6:53 ` [RFC 3/3]block: fiops read/write request scale Shaohua Li
2012-01-04  7:19 ` [RFC 0/3]block: An IOPS based ioscheduler Dave Chinner
2012-01-05  6:50   ` Shaohua Li
2012-01-06  5:12     ` Shaohua Li
2012-01-06  9:10       ` Namhyung Kim
2012-01-06 14:37       ` Jan Kara
2012-01-09  1:26         ` Shaohua Li
2012-01-15 22:32           ` Vivek Goyal
2012-01-08 22:16       ` Dave Chinner
2012-01-09  1:09         ` Shaohua Li
2012-01-15 22:45           ` Vivek Goyal
2012-01-16  4:36             ` Shaohua Li
2012-01-16  7:11               ` Vivek Goyal
2012-01-16  7:55                 ` Shaohua Li
2012-01-16  8:29                   ` Vivek Goyal
2012-01-17  1:06                     ` Shaohua Li
2012-01-17  9:02                       ` Vivek Goyal
2012-01-18  1:20                         ` Shaohua Li
2012-01-18 13:04                           ` Vivek Goyal
2012-01-19  1:21                             ` Shaohua Li [this message]
2012-01-15 22:28       ` Vivek Goyal
2012-01-06  9:41 ` Zhu Yanhai
2012-01-15 22:24 ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1326936066.22361.646.camel@sli10-conroe \
    --to=shaohua.li@intel.com \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vgoyal@redhat.com \
    --cc=zhu.yanhai@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).