All of lore.kernel.org
 help / color / mirror / Atom feed
* Ignore O_SYNC for rbd cache
@ 2012-10-10 11:30 Andrey Korolyov
  2012-10-10 16:23 ` Sage Weil
  0 siblings, 1 reply; 4+ messages in thread
From: Andrey Korolyov @ 2012-10-10 11:30 UTC (permalink / raw)
  To: ceph-devel

Hi,

Recent tests on my test rack with 20G IB(iboip, 64k mtu, default
CUBIC, CFQ, LSI SAS 2108 w/ wb cache) interconnect shows a quite
fantastic performance - on both reads and writes Ceph completely
utilizing all disk bandwidth as high as 0.9 of theoretical limit of
sum of all bandwidths bearing in mind replication level. The only
thing that may bring down overall performance is a O_SYNC|O_DIRECT
writes which will be issued by almost every database server in the
default setup. Assuming that the database config may be untouchable
and somehow I can build very reliable hardware setup which `ll never
fail on power, should ceph have an option to ignore these flags? May
be there is another real-world cases for including such or I am very
wrong even thinking on fool client application in this way.

Thank you for any suggestion!

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Ignore O_SYNC for rbd cache
  2012-10-10 11:30 Ignore O_SYNC for rbd cache Andrey Korolyov
@ 2012-10-10 16:23 ` Sage Weil
  2012-10-10 16:29   ` Josh Durgin
  2012-10-12 16:54   ` Tommi Virtanen
  0 siblings, 2 replies; 4+ messages in thread
From: Sage Weil @ 2012-10-10 16:23 UTC (permalink / raw)
  To: Andrey Korolyov; +Cc: ceph-devel

On Wed, 10 Oct 2012, Andrey Korolyov wrote:
> Hi,
> 
> Recent tests on my test rack with 20G IB(iboip, 64k mtu, default
> CUBIC, CFQ, LSI SAS 2108 w/ wb cache) interconnect shows a quite
> fantastic performance - on both reads and writes Ceph completely
> utilizing all disk bandwidth as high as 0.9 of theoretical limit of
> sum of all bandwidths bearing in mind replication level. The only
> thing that may bring down overall performance is a O_SYNC|O_DIRECT
> writes which will be issued by almost every database server in the
> default setup. Assuming that the database config may be untouchable
> and somehow I can build very reliable hardware setup which `ll never
> fail on power, should ceph have an option to ignore these flags? May
> be there is another real-world cases for including such or I am very
> wrong even thinking on fool client application in this way.

I certainly wouldn't recommend it, but there are probably use cases where 
it makes sense (i.e., the data isn't as important as the performance).  
Any such option would probably be called

 rbd async flush danger danger = true

and would trigger a flush but not wait for it, or perhaps

 rbd ignore flush danger danger = true

which would not honor flush at all. 

This would jeopoardize the integrity of the file system living on the RBD 
image; they rely on flush to order their commits, and playing fast and 
loose with that can lead to any number of corruptions.  The only silver 
lining is that in the not-so-distant future (3-4 years ago) this was 
poorly supported by the block layer and file systems alike and ext3 didn't 
crash and burn as quite often as you might have expected.

Anyway, not something I would recommend, certainly for a generic VM 
platform.  Maybe if you have a sepcific performance-sensitive application 
you can afford to let crash and burn...

sage

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Ignore O_SYNC for rbd cache
  2012-10-10 16:23 ` Sage Weil
@ 2012-10-10 16:29   ` Josh Durgin
  2012-10-12 16:54   ` Tommi Virtanen
  1 sibling, 0 replies; 4+ messages in thread
From: Josh Durgin @ 2012-10-10 16:29 UTC (permalink / raw)
  To: Sage Weil; +Cc: Andrey Korolyov, ceph-devel

On 10/10/2012 09:23 AM, Sage Weil wrote:
> On Wed, 10 Oct 2012, Andrey Korolyov wrote:
>> Hi,
>>
>> Recent tests on my test rack with 20G IB(iboip, 64k mtu, default
>> CUBIC, CFQ, LSI SAS 2108 w/ wb cache) interconnect shows a quite
>> fantastic performance - on both reads and writes Ceph completely
>> utilizing all disk bandwidth as high as 0.9 of theoretical limit of
>> sum of all bandwidths bearing in mind replication level. The only
>> thing that may bring down overall performance is a O_SYNC|O_DIRECT
>> writes which will be issued by almost every database server in the
>> default setup. Assuming that the database config may be untouchable
>> and somehow I can build very reliable hardware setup which `ll never
>> fail on power, should ceph have an option to ignore these flags? May
>> be there is another real-world cases for including such or I am very
>> wrong even thinking on fool client application in this way.
>
> I certainly wouldn't recommend it, but there are probably use cases where
> it makes sense (i.e., the data isn't as important as the performance).
> Any such option would probably be called
>
>   rbd async flush danger danger = true
>
> and would trigger a flush but not wait for it, or perhaps
>
>   rbd ignore flush danger danger = true
>
> which would not honor flush at all.

qemu already has a cache=unsafe option which does exactly that.

> This would jeopoardize the integrity of the file system living on the RBD
> image; they rely on flush to order their commits, and playing fast and
> loose with that can lead to any number of corruptions.  The only silver
> lining is that in the not-so-distant future (3-4 years ago) this was
> poorly supported by the block layer and file systems alike and ext3 didn't
> crash and burn as quite often as you might have expected.
>
> Anyway, not something I would recommend, certainly for a generic VM
> platform.  Maybe if you have a sepcific performance-sensitive application
> you can afford to let crash and burn...
>
> sage


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Ignore O_SYNC for rbd cache
  2012-10-10 16:23 ` Sage Weil
  2012-10-10 16:29   ` Josh Durgin
@ 2012-10-12 16:54   ` Tommi Virtanen
  1 sibling, 0 replies; 4+ messages in thread
From: Tommi Virtanen @ 2012-10-12 16:54 UTC (permalink / raw)
  To: Sage Weil; +Cc: Andrey Korolyov, ceph-devel

On Wed, Oct 10, 2012 at 9:23 AM, Sage Weil <sage@inktank.com> wrote:
> I certainly wouldn't recommend it, but there are probably use cases where
> it makes sense (i.e., the data isn't as important as the performance).

This would make a lot of sense for e.g. service orchestration-style
setups where you run an elastic pool of webapps. The persistent
storage is the database, not the local disk, but you might still e.g.
spool uploads to local disk first, or have a local cache a la varnish.
Crashing a machine in such a setup tends to mean deleting the image,
not trying to recover it.

Also, for anyone running virtualized mapreduce worker nodes.. Cephfs
plugged in as the FS, compute wanting local storage for the temporary
files, but crashes just mean the task is restarted elsewhere..

(Now, in both of the above, you might ask, why not use a local disk
for this then, why use RBD? Because a lot of people are interested in
running diskless compute servers, or ones booting off of a minimal
SSD/SD-card, with just the base OS, no vm images stored locally.
Tremendously helps with density, especially on low-power platforms
like ARM.)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-10-12 16:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-10 11:30 Ignore O_SYNC for rbd cache Andrey Korolyov
2012-10-10 16:23 ` Sage Weil
2012-10-10 16:29   ` Josh Durgin
2012-10-12 16:54   ` Tommi Virtanen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.