All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexandre DERUMIER <aderumier@odiso.com>
To: Andreas Joachim Peters <Andreas.Joachim.Peters@cern.ch>
Cc: ceph-devel@vger.kernel.org
Subject: Re: CEPH IOPS Baseline Measurements with MemStore
Date: Thu, 19 Jun 2014 11:21:39 +0200 (CEST)	[thread overview]
Message-ID: <103e426f-570e-44be-a663-52518b0c87e0@mailpro> (raw)
In-Reply-To: <3472A07E6605974CBC9BC573F1BC02E4AE7433A2@CERNXCHG44.cern.ch>

Hi,

Thanks for your benchmark !

>>If you have some ideas for parameters to tune or see some mistakes in this measurement - let me know! 

>>1) Default Logging has an important impact on the IOPS & latency [0.1-0.2ms] 
how do you enable|disable stats ? (ceph.conf)


>>2) OSD implementation without journaling does not scale linear with concurrent IOs - need several OSDs to scale IOPS - lock contention/threading model? 
It's quite possible, I have see a lot of benchmark with ssd, and osd daemon was always the bottleneck, more osd more scale.

>>3) a writing OSD never fills more than 4 cores 
>>4) a reading OSD never fills more than 5 cores 

maybe "osd op threads"  could improve this ?
default is 2 (don't known if with hyperthreading it's going on 4cores instead 2 ?)


----- Mail original ----- 

De: "Andreas Joachim Peters" <Andreas.Joachim.Peters@cern.ch> 
À: ceph-devel@vger.kernel.org 
Envoyé: Jeudi 19 Juin 2014 11:05:18 
Objet: CEPH IOPS Baseline Measurements with MemStore 

Hi, 

I made some benchmarks/testing using the firefly branch and GCC 4.9. Hardware is 2 CPUs with 6-core Intel(R) Xeon(R) CPU E5-2630L 0 @ 2.00GHz with Hyperthreading and 256 GB of memory (kernel 2.6.32-431.17.1.el6.x86_64). 

In my tests I run two OSD configurations on a single box: 

[A] 4 OSDs running with MemStore 
[B] 1 OSD running with MemStore 

I use a pool with 'size=1' and read and read/write 1-byte objects all via localhost. 

The local RTT reported by ping is 15 micro seconds, the RTT measured with ZMQ is 100 micro seconds (10 kHZ synchronous 1-byte messages). 
RTT measured with another file IO daemon (XRootD) we are using at CERN (31-byte messages) is 9.9 kHZ. 

------------------------------------------------------------------------------------------------------------------------- 
4 OSDs 
------------------------------------------------------------------------------------------------------------------------- 

{1} [A] 
******* 
I measure IOPS with 1 byte objects for separate write and read operations disabling logging of any subsystem: 

Type : IOPS[kHz] : Latency [ms] : ConcurIO [#] 
=================================== 
Write : 01.7 : 0.50 : 1 
Write: 11.2 : 0.88 : 10 
Write: 11.8 : 1.69 : 10 x 2 [ 2 rados bench processes ] 
Write: 11.2 : 3.57 : 10 x 4 [ 4 rados bench processes ] 
Read : 02.6 : 0.33 : 1 
Read : 22.4 : 0.43 : 10 
Read : 40.0 : 0.97 : 20 x 2 [ 2 rados bench processes ] 
Read : 46.0 : 0.88 : 10 x 4 [ 4 rados bench processes ] 
Read : 40.0 : 1.60 : 20 x 4 [ 4 rados bench processes ] 

{2} [A] 
******* 
I measure IOPS with the CEPH firefly branch as is (default logging) : 

Type : IOPS[kHz] : Latency [ms] : ConcurIO [#] 
=================================== 
Write : 01.2 : 0.78 : 1 
Write : 09.1 : 1.00 : 10 
Read : 01.8 : 0.50 : 1 
Read : 14.0 : 1.00 : 10 
Read : 18:0 : 2.00 : 20 x 2 [ 2 rados bench processes ] 
Read : 18.0 : 2.20 : 10 x 4 [ 4 rados bench processes ] 

------------------------------------------------------------------------------------------------------------------------- 
1 OSD 
------------------------------------------------------------------------------------------------------------------------- 

{1} [B] (subsys logging disabled, 1 OSD) 
******* 
Write : 02.0 : 0.46 : 1 
Write : 10.0 : 0.95 : 10 
Write : 11.1 : 1.74 : 20 
Write : 12.0 : 1.80 : 10 x 2 [ 2 rados bench processes ] 
Write : 10.8 : 3.60 : 10 x 4 [ 4 rados bench processes ] 
Read : 03.6 : 0.27 : 1 
Read : 16.9 : 0.50 : 10 
Read : 28.0 : 0.70 : 10 x 2 [ 2 rados bench processes ] 
Read : 29.6 : 1.37 : 20 x 2 [ 2 rados bench processes ] 
Read : 27.2 : 1.50 : 10 x 4 [ 4 rados bench processes ] 

{2} [B] (defaultlogging, 1 OSD) 
******* 
Write : 01.4 : 0.68 : 1 
Write : 04.0 : 2.35 : 10 
Write : 04.0 : 4.69 : 10 x 2 [ 2 rados bench processes ] 

I also played with OSD thread number (no change) and used an in memory filesystem + journaling (filestore backend). Here the{1} [A] result is 1.4 kHz write for 1 IOPS in flight and the peak write performance putting many IOPS in flight and several rados bench processes is 2.3 kHz! 


Some summarizing remarks: 

1) Default Logging has an important impact on the IOPS & latency [0.1-0.2ms] 
2) OSD implementation without journaling does not scale linear with concurrent IOs - need several OSDs to scale IOPS - lock contention/threading model? 
3) a writing OSD never fills more than 4 cores 
4) a reading OSD never fills more than 5 cores 
5) running 'rados bench' on a remote machine gives similar or slghltly worse results (upto -20%) 
6) CEPH delivering 20k read IOPS uses 4 cores on server side, while identical operations with higher payload (XRootD) uses one core for 3x higher performance (60k IOPS) 
7) I can scale the other IO daemon (XRootD) to use 10 cores and to deliver 300.000 IOPS on the same box. 

Looking forward to SSDs and volatile memory backend stores I see some improvements to be done in the OSD/communication layer. 

If you have some ideas for parameters to tune or see some mistakes in this measurement - let me know! 

Cheers Andreas. 























-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2014-06-19  9:21 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-19  9:05 CEPH IOPS Baseline Measurements with MemStore Andreas Joachim Peters
2014-06-19  9:21 ` Alexandre DERUMIER [this message]
2014-06-19  9:29   ` Andreas Joachim Peters
2014-06-19 11:08     ` Alexandre DERUMIER
2014-06-19 22:18       ` Milosz Tanski
2014-06-20  4:35         ` Alexandre DERUMIER
2014-06-20  4:41           ` Alexandre DERUMIER
2014-06-23 17:41             ` Gregory Farnum
2014-06-23 20:33               ` Milosz Tanski
2014-06-24 12:13                 ` Andreas Joachim Peters
2014-06-24 16:53                   ` Somnath Roy
2014-06-25  2:55                   ` Haomai Wang
2014-06-24  5:55               ` Alexandre DERUMIER
2014-06-20 21:49 ` Andreas Joachim Peters

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=103e426f-570e-44be-a663-52518b0c87e0@mailpro \
    --to=aderumier@odiso.com \
    --cc=Andreas.Joachim.Peters@cern.ch \
    --cc=ceph-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.