All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kai Krakow <hurikhan77@gmail.com>
To: linux-bcache@vger.kernel.org
Subject: Re: Reasoning of exposing queue/rotational=0
Date: Fri, 5 May 2017 20:04:54 +0200	[thread overview]
Message-ID: <20170505200454.48777837@jupiter.sol.kaishome.de> (raw)
In-Reply-To: f99de33e-f90e-1492-7561-5a63d6814a9e@coly.li

Am Sat, 6 May 2017 00:11:13 +0800
schrieb Coly Li <i@coly.li>:

> On 2017/5/5 上午5:24, Kai Krakow wrote:
> > Hello!
> > 
> > What's the reasoning for exposing bcache devices as being
> > non-rotational? Currently, it fools btrfs into using ssd allocation
> > scheme on the underlying harddisks which isn't really what I
> > expected to get. So I used a udev rule to change this:
> > 
> > ACTION=="add|change", KERNEL=="bcache*", ATTR{queue/rotational}="1"
> > 
> > Wouldn't it make more sense to set this to the same value as the
> > underlying backing device by default?
> > 
> > Because in reality, the bcache is still what the backing device is:
> > A rotational medium. A cache doesn't make this non-rotational.
> > 
> > Thoughts?  
> 
> It depends on hit ration. If a non-rotational device used as cache,
> and hit ration is high enough, the cached device just responses as
> non-rotational device.
> 
> But yes, I feel your opinion makes sense, in the btrfs case. How
> about a policy like this:
> 
> 
> cache-device-rotational   backing-device-rotational export-rotational
> Y                            Y                      Y
> Y                            N                      N
> N                            Y                      N
> N                            N                      N

This probably makes most sense, although it won't fix my particular
situation... Because I have:

 cdev    bdev    bcache
  N   &&  Y   ==  N
 
But I'd like to have bcache == Y

Hit rate is around 70-85% for me (500GB cache on 2TB data). So your
particular reasoning makes sense, too: 80% of accesses hit the cache
which makes it behave like non-rotational in 80% of all accesses.

But the bcache device itself is only a transition layer, especially we
cannot set any IO scheduler for it, this is left to the lower layers.
And these correctly expose the rotational flag, and that is where I set
deadline for SSD, and cfq for HDD. I also experimented with slice_idle
= 0 on SSD with cfq but deadline gave better results.

Given that, what could the rotational flag also be used for? Currently
it's used by btrfs to select an allocation scheme. I can imagine that
other filesystems do that, too. Does the kernel depend any decision on
this flag? Or anything else other then allocation decision?

Given the case of allocation decision: It makes no sense to pretend SSD
allocation through bcache as bcache block allocation is translated to
the real device and has nothing to do with the actual physical layout
of the backing device. So why pretend it is non-rotational?

Also think of discarding the cache: Now it would be clearly rotational
until cache hit rate builds up again.

Also I don't think applications should mis-interpret the bcache as
non-rotational to optimize workloads for it, because bcache is a
caching layer. It operates exactly for the purpose of optimizing those
workloads itself. Doing otherwise could work against what bcache tries
to achieve, e.g. doing lots of random IO because we pretend to be
non-rotational would push my precious cache data out of the cache for
no reason. Bcache is there to turn random IO into sequential IO - but
not for the sake of "because it can". Applications should still
optimize for rotational media even when running through bcache.

Without further clues it makes most sense to me to set
bcache.rotational = bdev.rotational.

I almost think that bcache does not explicitly set this flag, so it
stays 0. I think the same applies to iscsi and other network block
devices which pretend to be also non-rotational although in reality
they probably aren't. Only, they should probably explicitly not use an
IO scheduler as that is best left to the host system - as in virtual
guests, and as with enterprise RAID controllers, which do their own IO
scheduling. But "rotational" is totally not a decision we would
automatically select a default IO scheduler by. This should be left to
layers that more exactly know what a device is, e.g. udev or the
administrator.

> That is, a bcache device is exposed as non-rotational device only when
> all devices of cache devices and backing devices are all rotational.

I didn't really get that sentence... Either appearance of rotational
seems to be wrong in your sentence. ;-)


-- 
Regards,
Kai

Replies to list-only preferred.

      parent reply	other threads:[~2017-05-05 18:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-04 21:24 Reasoning of exposing queue/rotational=0 Kai Krakow
2017-05-05 16:11 ` Coly Li
2017-05-05 17:44   ` Vojtech Pavlik
2017-05-05 18:23     ` Kai Krakow
2017-05-05 19:02       ` Vojtech Pavlik
2017-05-05 19:14         ` Kai Krakow
2017-05-09 18:11           ` Eric Wheeler
2017-05-10 20:18             ` Kai Krakow
2017-05-05 19:01     ` Kai Krakow
2017-05-05 18:04   ` Kai Krakow [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170505200454.48777837@jupiter.sol.kaishome.de \
    --to=hurikhan77@gmail.com \
    --cc=linux-bcache@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.