All of lore.kernel.org
 help / color / mirror / Atom feed
* 4x write amplification?
@ 2013-07-10  2:08 Li Wang
  2013-07-10 17:43 ` Gregory Farnum
  0 siblings, 1 reply; 3+ messages in thread
From: Li Wang @ 2013-07-10  2:08 UTC (permalink / raw)
  To: ceph-devel; +Cc: Sage Weil

Hi,
   We did a simple throughput test on Ceph with 2 OSD nodes configured
with one replica policy. For each OSD node, the throughput measured by 
'dd' run locally is 117MB/s. Therefore, in theory, the two OSDs could 
provide 200+MB/s throughput. However, using 'iozone' from clients we 
only get a peak throughput at around 40MB/s. Is that because a write 
will incur 2 replica * 2 journal write? That is, writing into journal
and replica doubled the traffic, respectively, which results in a total
4x write amplification. Is that true, or our understanding is wrong,
and some performance tuning hint?

Cheers,
Li Wang



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 4x write amplification?
  2013-07-10  2:08 4x write amplification? Li Wang
@ 2013-07-10 17:43 ` Gregory Farnum
  2013-07-11  8:23   ` Oleg Krasnianskiy
  0 siblings, 1 reply; 3+ messages in thread
From: Gregory Farnum @ 2013-07-10 17:43 UTC (permalink / raw)
  To: Li Wang; +Cc: ceph-devel, Sage Weil

On Tue, Jul 9, 2013 at 7:08 PM, Li Wang <liwang@ubuntukylin.com> wrote:
> Hi,
>   We did a simple throughput test on Ceph with 2 OSD nodes configured
> with one replica policy. For each OSD node, the throughput measured by 'dd'
> run locally is 117MB/s. Therefore, in theory, the two OSDs could provide
> 200+MB/s throughput. However, using 'iozone' from clients we only get a peak
> throughput at around 40MB/s. Is that because a write will incur 2 replica *
> 2 journal write? That is, writing into journal
> and replica doubled the traffic, respectively, which results in a total
> 4x write amplification. Is that true, or our understanding is wrong,
> and some performance tuning hint?

Well, that write amplification is certainly happening. However, I'd
expect to get a better ratio out of the disk than that. Couple things
to check:
1) is your benchmark dispatching multiple requests at a time, or just
one? (the latency on a single request is going to make throughput
numbers come out badly.)
2) How do your disks handle two simultaneous write streams? (Most
disks should be fine, but sometimes they struggle.)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 4x write amplification?
  2013-07-10 17:43 ` Gregory Farnum
@ 2013-07-11  8:23   ` Oleg Krasnianskiy
  0 siblings, 0 replies; 3+ messages in thread
From: Oleg Krasnianskiy @ 2013-07-11  8:23 UTC (permalink / raw)
  To: Gregory Farnum; +Cc: Li Wang, ceph-devel, Sage Weil

2013/7/10 Gregory Farnum <greg@inktank.com>:
> On Tue, Jul 9, 2013 at 7:08 PM, Li Wang <liwang@ubuntukylin.com> wrote:
>> Hi,
>>   We did a simple throughput test on Ceph with 2 OSD nodes configured
>> with one replica policy. For each OSD node, the throughput measured by 'dd'
>> run locally is 117MB/s. Therefore, in theory, the two OSDs could provide
>> 200+MB/s throughput. However, using 'iozone' from clients we only get a peak
>> throughput at around 40MB/s. Is that because a write will incur 2 replica *
>> 2 journal write? That is, writing into journal
>> and replica doubled the traffic, respectively, which results in a total
>> 4x write amplification. Is that true, or our understanding is wrong,
>> and some performance tuning hint?
>
> Well, that write amplification is certainly happening. However, I'd
> expect to get a better ratio out of the disk than that. Couple things
> to check:
> 1) is your benchmark dispatching multiple requests at a time, or just
> one? (the latency on a single request is going to make throughput
> numbers come out badly.)
> 2) How do your disks handle two simultaneous write streams? (Most
> disks should be fine, but sometimes they struggle.)
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

I have the same problem:
2 machines with several osds each, default replica count (2), btrfs on
osd partitions, default journal location (file on a osd partition),
journal size 2Gb, ceph 0.65, 1 Gbit network between machines that is
shared with the client.

dd benchmark shows ~ 110 Mb/s on osd partition.

Object write via librados: 1 object - 43 Mb/s, 3 parallel objects - 66 Mb/s.
At the same time I monitor osd partitions with iostat. Partitions are
loaded 85-110 Mb/s.

If I shutdown one node, write occurs only on one partition on one
machine. Same results:
disk is loaded 85-110 Mb/s and I get same throughput on client - 49
Mb/s for single obj and 66 Mb/s for 3 parallel objects.

We are deciding whether to use ceph in production and we are stuck due
to our weak understanding of this situation.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-07-11  8:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-10  2:08 4x write amplification? Li Wang
2013-07-10 17:43 ` Gregory Farnum
2013-07-11  8:23   ` Oleg Krasnianskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.