All of lore.kernel.org
 help / color / mirror / Atom feed
* rbd write rate
@ 2011-05-26  1:06 huang jun
  2011-05-26 16:42 ` Sage Weil
  0 siblings, 1 reply; 2+ messages in thread
From: huang jun @ 2011-05-26  1:06 UTC (permalink / raw)
  To: ceph-devel

hi,all
i have another rbd question,when i use rbd to wite a file to OSD
my configuration is : 4 OSDs 1MON 1MDS,linux kernel version is 2.6.37.6
and the average write rate is about 5MB/s.
when i look into the log in /var/log/kernel.log, and find that the
client get items from request_queue
with the write size between 1 page and 31 pages.
i think it results in the low write rate.am i right?
if not, please give me some advice?

thanks!

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: rbd write rate
  2011-05-26  1:06 rbd write rate huang jun
@ 2011-05-26 16:42 ` Sage Weil
  0 siblings, 0 replies; 2+ messages in thread
From: Sage Weil @ 2011-05-26 16:42 UTC (permalink / raw)
  To: huang jun; +Cc: ceph-devel

On Thu, 26 May 2011, huang jun wrote:
> hi,all
> i have another rbd question,when i use rbd to wite a file to OSD
> my configuration is : 4 OSDs 1MON 1MDS,linux kernel version is 2.6.37.6
> and the average write rate is about 5MB/s.
> when i look into the log in /var/log/kernel.log, and find that the
> client get items from request_queue
> with the write size between 1 page and 31 pages.
> i think it results in the low write rate.am i right?

What are you using to measure throughput?

The request size is determined by something in the block layer above RBD; 
I typically see 128k reads/writes.  I'm not sure offhand if/how that is 
adjusted.

There is a larger problem with RBD performance in general that frequently 
comes up and we haven't had time to address.  Overall, read and write 
latency is comparable to that of a standard disk (although it can vary 
depending on the hardware you're using for the rados cluster).  On average 
RBD write latencies are probably a bit higher, although with the right 
hardware they can be much lower.  The big difference, though, is that a 
normal disk has a write cache of several megabytes and acknowledges writes 
before they are stable.  Modern sane file systems issue flush commands at 
critical points to ensure that previous writes really hit disk.  RBD does 
no such thing; every write goes all the way to disk (on all replicas) 
before it is acknowledges.  For many (most?) workloads this makes the 
storage appear very slow, even though the overall throughput may be much 
higher.

We think the solution is to make the rbd layer have some tunable that puts 
a cap on the number of written bytes that will be acknowledged before 
they are actually written, more or less simulating a write cache, and make 
it behave more like a disk.  There are open issues for this in teh tracker 
for both librbd (for qemu) and the kernel implementation, but we haven't 
had time to look at it yet.  Anyone on the list who is interested in this 
is more than welcome to take a stab at it!

sage


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-05-26 16:40 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-26  1:06 rbd write rate huang jun
2011-05-26 16:42 ` Sage Weil

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.