* rbd write rate
@ 2011-05-26 1:06 huang jun
2011-05-26 16:42 ` Sage Weil
0 siblings, 1 reply; 2+ messages in thread
From: huang jun @ 2011-05-26 1:06 UTC (permalink / raw)
To: ceph-devel
hi,all
i have another rbd question,when i use rbd to wite a file to OSD
my configuration is : 4 OSDs 1MON 1MDS,linux kernel version is 2.6.37.6
and the average write rate is about 5MB/s.
when i look into the log in /var/log/kernel.log, and find that the
client get items from request_queue
with the write size between 1 page and 31 pages.
i think it results in the low write rate.am i right?
if not, please give me some advice?
thanks!
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: rbd write rate
2011-05-26 1:06 rbd write rate huang jun
@ 2011-05-26 16:42 ` Sage Weil
0 siblings, 0 replies; 2+ messages in thread
From: Sage Weil @ 2011-05-26 16:42 UTC (permalink / raw)
To: huang jun; +Cc: ceph-devel
On Thu, 26 May 2011, huang jun wrote:
> hi,all
> i have another rbd question,when i use rbd to wite a file to OSD
> my configuration is : 4 OSDs 1MON 1MDS,linux kernel version is 2.6.37.6
> and the average write rate is about 5MB/s.
> when i look into the log in /var/log/kernel.log, and find that the
> client get items from request_queue
> with the write size between 1 page and 31 pages.
> i think it results in the low write rate.am i right?
What are you using to measure throughput?
The request size is determined by something in the block layer above RBD;
I typically see 128k reads/writes. I'm not sure offhand if/how that is
adjusted.
There is a larger problem with RBD performance in general that frequently
comes up and we haven't had time to address. Overall, read and write
latency is comparable to that of a standard disk (although it can vary
depending on the hardware you're using for the rados cluster). On average
RBD write latencies are probably a bit higher, although with the right
hardware they can be much lower. The big difference, though, is that a
normal disk has a write cache of several megabytes and acknowledges writes
before they are stable. Modern sane file systems issue flush commands at
critical points to ensure that previous writes really hit disk. RBD does
no such thing; every write goes all the way to disk (on all replicas)
before it is acknowledges. For many (most?) workloads this makes the
storage appear very slow, even though the overall throughput may be much
higher.
We think the solution is to make the rbd layer have some tunable that puts
a cap on the number of written bytes that will be acknowledged before
they are actually written, more or less simulating a write cache, and make
it behave more like a disk. There are open issues for this in teh tracker
for both librbd (for qemu) and the kernel implementation, but we haven't
had time to look at it yet. Anyone on the list who is interested in this
is more than welcome to take a stab at it!
sage
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2011-05-26 16:40 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-26 1:06 rbd write rate huang jun
2011-05-26 16:42 ` Sage Weil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.