All of lore.kernel.org
 help / color / mirror / Atom feed
* Memstore performance improvements v0.90 vs v0.87
@ 2015-01-14  7:05 Blinick, Stephen L
  2015-01-14 22:32 ` Blinick, Stephen L
  0 siblings, 1 reply; 25+ messages in thread
From: Blinick, Stephen L @ 2015-01-14  7:05 UTC (permalink / raw)
  To: Ceph Development

In the process of moving to a new cluster (RHEL7 based) I grabbed v0.90, compiled RPM's and re-ran the simple local-node memstore test I've run on .80 - .87.  It's a single Memstore OSD and a single Rados Bench client locally on the same node.  Increasing queue depth and measuring latency /IOPS.  So far, the measurements have been consistent across different hardware and code releases (with about a 30% improvement with the OpWQ Sharding changes that came in after Firefly). 

These are just very early results, but I'm seeing a very large improvement in latency and throughput with v90 on RHEL7.   Next  I'm working to get lttng installed and working in RHEL7 to determine where the improvement is.   On previous levels, these measurements have been roughly the same using a real (fast) backend (i.e. NVMe flash), and I will verify here as well.   Just wondering if anyone else has measured similar improvements?


100% Reads or Writes, 4K Objects, Rados Bench

========================
V0.87: Ubuntu 14.04LTS

*Writes*
#Thr	IOPS	Latency(ms)
1	618.80		1.61
2	1401.70		1.42
4	3962.73		1.00
8	7354.37		1.10
16	7654.67		2.10
32	7320.33		4.37
64	7424.27		8.62

*Reads*
#thr	IOPS	Latency(ms)
1	837.57		1.19
2	1950.00		1.02
4	6494.03		0.61
8	7243.53		1.10
16	7473.73		2.14
32	7682.80		4.16
64	7727.10		8.28


========================
V0.90:  RHEL7

*Writes*
#Thr	IOPS	Latency(ms)
1	2558.53		0.39
2	6014.67		0.33
4	10061.33	0.40
8	14169.60	0.56
16	14355.63	1.11
32	14150.30	2.26
64	15283.33	4.19

*Reads*
#Thr	IOPS	Latency(ms)
1	4535.63		0.22
2	9969.73		0.20
4	17049.43	0.23
8	19909.70	0.40
16	20320.80	0.79
32	19827.93	1.61
64	22371.17	2.86

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Memstore performance improvements v0.90 vs v0.87
  2015-01-14  7:05 Memstore performance improvements v0.90 vs v0.87 Blinick, Stephen L
@ 2015-01-14 22:32 ` Blinick, Stephen L
  2015-01-14 22:43   ` Mark Nelson
  2015-01-14 22:44   ` Somnath Roy
  0 siblings, 2 replies; 25+ messages in thread
From: Blinick, Stephen L @ 2015-01-14 22:32 UTC (permalink / raw)
  To: Ceph Development

I went back and grabbed 87 and built it on RHEL7 as well, and performance is also similar (much better).  I've also run it on a few systems (Dual socket 10-core E5v2,  Dual socket 6-core E5v3).  So, it's related to my switch to RHEL7, and not to the code changes between v0.90 and v0.87.     Will post when I get more data. 

Thanks,

Stephen

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick, Stephen L
Sent: Wednesday, January 14, 2015 12:06 AM
To: Ceph Development
Subject: Memstore performance improvements v0.90 vs v0.87

In the process of moving to a new cluster (RHEL7 based) I grabbed v0.90, compiled RPM's and re-ran the simple local-node memstore test I've run on .80 - .87.  It's a single Memstore OSD and a single Rados Bench client locally on the same node.  Increasing queue depth and measuring latency /IOPS.  So far, the measurements have been consistent across different hardware and code releases (with about a 30% improvement with the OpWQ Sharding changes that came in after Firefly). 

These are just very early results, but I'm seeing a very large improvement in latency and throughput with v90 on RHEL7.   Next  I'm working to get lttng installed and working in RHEL7 to determine where the improvement is.   On previous levels, these measurements have been roughly the same using a real (fast) backend (i.e. NVMe flash), and I will verify here as well.   Just wondering if anyone else has measured similar improvements?


100% Reads or Writes, 4K Objects, Rados Bench

========================
V0.87: Ubuntu 14.04LTS

*Writes*
#Thr	IOPS	Latency(ms)
1	618.80		1.61
2	1401.70		1.42
4	3962.73		1.00
8	7354.37		1.10
16	7654.67		2.10
32	7320.33		4.37
64	7424.27		8.62

*Reads*
#thr	IOPS	Latency(ms)
1	837.57		1.19
2	1950.00		1.02
4	6494.03		0.61
8	7243.53		1.10
16	7473.73		2.14
32	7682.80		4.16
64	7727.10		8.28


========================
V0.90:  RHEL7

*Writes*
#Thr	IOPS	Latency(ms)
1	2558.53		0.39
2	6014.67		0.33
4	10061.33	0.40
8	14169.60	0.56
16	14355.63	1.11
32	14150.30	2.26
64	15283.33	4.19

*Reads*
#Thr	IOPS	Latency(ms)
1	4535.63		0.22
2	9969.73		0.20
4	17049.43	0.23
8	19909.70	0.40
16	20320.80	0.79
32	19827.93	1.61
64	22371.17	2.86
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-01-14 22:32 ` Blinick, Stephen L
@ 2015-01-14 22:43   ` Mark Nelson
  2015-01-14 23:39     ` Blinick, Stephen L
  2015-01-14 22:44   ` Somnath Roy
  1 sibling, 1 reply; 25+ messages in thread
From: Mark Nelson @ 2015-01-14 22:43 UTC (permalink / raw)
  To: Blinick, Stephen L, Ceph Development

On 01/14/2015 04:32 PM, Blinick, Stephen L wrote:
> I went back and grabbed 87 and built it on RHEL7 as well, and performance is also similar (much better).  I've also run it on a few systems (Dual socket 10-core E5v2,  Dual socket 6-core E5v3).  So, it's related to my switch to RHEL7, and not to the code changes between v0.90 and v0.87.     Will post when I get more data.

Stephen, you are practically writing press releases for the RHEL guys 
here! ;)

Mark

>
> Thanks,
>
> Stephen
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick, Stephen L
> Sent: Wednesday, January 14, 2015 12:06 AM
> To: Ceph Development
> Subject: Memstore performance improvements v0.90 vs v0.87
>
> In the process of moving to a new cluster (RHEL7 based) I grabbed v0.90, compiled RPM's and re-ran the simple local-node memstore test I've run on .80 - .87.  It's a single Memstore OSD and a single Rados Bench client locally on the same node.  Increasing queue depth and measuring latency /IOPS.  So far, the measurements have been consistent across different hardware and code releases (with about a 30% improvement with the OpWQ Sharding changes that came in after Firefly).
>
> These are just very early results, but I'm seeing a very large improvement in latency and throughput with v90 on RHEL7.   Next  I'm working to get lttng installed and working in RHEL7 to determine where the improvement is.   On previous levels, these measurements have been roughly the same using a real (fast) backend (i.e. NVMe flash), and I will verify here as well.   Just wondering if anyone else has measured similar improvements?
>
>
> 100% Reads or Writes, 4K Objects, Rados Bench
>
> ========================
> V0.87: Ubuntu 14.04LTS
>
> *Writes*
> #Thr	IOPS	Latency(ms)
> 1	618.80		1.61
> 2	1401.70		1.42
> 4	3962.73		1.00
> 8	7354.37		1.10
> 16	7654.67		2.10
> 32	7320.33		4.37
> 64	7424.27		8.62
>
> *Reads*
> #thr	IOPS	Latency(ms)
> 1	837.57		1.19
> 2	1950.00		1.02
> 4	6494.03		0.61
> 8	7243.53		1.10
> 16	7473.73		2.14
> 32	7682.80		4.16
> 64	7727.10		8.28
>
>
> ========================
> V0.90:  RHEL7
>
> *Writes*
> #Thr	IOPS	Latency(ms)
> 1	2558.53		0.39
> 2	6014.67		0.33
> 4	10061.33	0.40
> 8	14169.60	0.56
> 16	14355.63	1.11
> 32	14150.30	2.26
> 64	15283.33	4.19
>
> *Reads*
> #Thr	IOPS	Latency(ms)
> 1	4535.63		0.22
> 2	9969.73		0.20
> 4	17049.43	0.23
> 8	19909.70	0.40
> 16	20320.80	0.79
> 32	19827.93	1.61
> 64	22371.17	2.86
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Memstore performance improvements v0.90 vs v0.87
  2015-01-14 22:32 ` Blinick, Stephen L
  2015-01-14 22:43   ` Mark Nelson
@ 2015-01-14 22:44   ` Somnath Roy
  2015-01-14 23:37     ` Blinick, Stephen L
  2015-01-15 10:43     ` Andreas Bluemle
  1 sibling, 2 replies; 25+ messages in thread
From: Somnath Roy @ 2015-01-14 22:44 UTC (permalink / raw)
  To: Blinick, Stephen L, Ceph Development

Stephen,
You may want to tweak the following parameter(s) in your ceph.conf file and see if it is further improving your performance or not.

debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0
osd_op_num_threads_per_shard = 2 //You may want to try with 1 as well
osd_op_num_shards = 10    //Depends on your cpu util
ms_nocrc = true
cephx_sign_messages = false
cephx_require_signatures = false
ms_dispatch_throttle_bytes = 0
throttler_perf_counter = false

[osd]
osd_client_message_size_cap = 0
osd_client_message_cap = 0
osd_enable_op_tracker = false

Also, run more clients (in your case rados bench) and see if it is scaling or not (it should, till it saturates your cpu).

But, your observation on RHEL7 vs UBUNTU 14.04 LTS is interesting !

Thanks & Regards
Somnath
-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick, Stephen L
Sent: Wednesday, January 14, 2015 2:32 PM
To: Ceph Development
Subject: RE: Memstore performance improvements v0.90 vs v0.87

I went back and grabbed 87 and built it on RHEL7 as well, and performance is also similar (much better).  I've also run it on a few systems (Dual socket 10-core E5v2,  Dual socket 6-core E5v3).  So, it's related to my switch to RHEL7, and not to the code changes between v0.90 and v0.87.     Will post when I get more data.

Thanks,

Stephen

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick, Stephen L
Sent: Wednesday, January 14, 2015 12:06 AM
To: Ceph Development
Subject: Memstore performance improvements v0.90 vs v0.87

In the process of moving to a new cluster (RHEL7 based) I grabbed v0.90, compiled RPM's and re-ran the simple local-node memstore test I've run on .80 - .87.  It's a single Memstore OSD and a single Rados Bench client locally on the same node.  Increasing queue depth and measuring latency /IOPS.  So far, the measurements have been consistent across different hardware and code releases (with about a 30% improvement with the OpWQ Sharding changes that came in after Firefly).

These are just very early results, but I'm seeing a very large improvement in latency and throughput with v90 on RHEL7.   Next  I'm working to get lttng installed and working in RHEL7 to determine where the improvement is.   On previous levels, these measurements have been roughly the same using a real (fast) backend (i.e. NVMe flash), and I will verify here as well.   Just wondering if anyone else has measured similar improvements?


100% Reads or Writes, 4K Objects, Rados Bench

========================
V0.87: Ubuntu 14.04LTS

*Writes*
#Thr    IOPS    Latency(ms)
1       618.80          1.61
2       1401.70         1.42
4       3962.73         1.00
8       7354.37         1.10
16      7654.67         2.10
32      7320.33         4.37
64      7424.27         8.62

*Reads*
#thr    IOPS    Latency(ms)
1       837.57          1.19
2       1950.00         1.02
4       6494.03         0.61
8       7243.53         1.10
16      7473.73         2.14
32      7682.80         4.16
64      7727.10         8.28


========================
V0.90:  RHEL7

*Writes*
#Thr    IOPS    Latency(ms)
1       2558.53         0.39
2       6014.67         0.33
4       10061.33        0.40
8       14169.60        0.56
16      14355.63        1.11
32      14150.30        2.26
64      15283.33        4.19

*Reads*
#Thr    IOPS    Latency(ms)
1       4535.63         0.22
2       9969.73         0.20
4       17049.43        0.23
8       19909.70        0.40
16      20320.80        0.79
32      19827.93        1.61
64      22371.17        2.86
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Memstore performance improvements v0.90 vs v0.87
  2015-01-14 22:44   ` Somnath Roy
@ 2015-01-14 23:37     ` Blinick, Stephen L
  2015-01-15 10:43     ` Andreas Bluemle
  1 sibling, 0 replies; 25+ messages in thread
From: Blinick, Stephen L @ 2015-01-14 23:37 UTC (permalink / raw)
  To: Somnath Roy, Ceph Development

Somnath -- thanks for the info. . I held out for a while but eventually settled on running without debug as you've specified below..   I am not currently disabling crc & signatures on messages, however.  I can see if that makes an improvement.  

The latency reduction for single-client single-stream IO is what seems to be the most different, so I'm very interested to figure out what that is (and hoping it's not something I've simply got wrong)..

As far as scaling goes, we did previously replicate your earlier results using 4 client systems, 2 smalliobench threads per system, even with higher latency so I am interested to see how things scale here.

Thanks,

Stephen


-----Original Message-----
From: Somnath Roy [mailto:Somnath.Roy@sandisk.com] 
Sent: Wednesday, January 14, 2015 3:44 PM
To: Blinick, Stephen L; Ceph Development
Subject: RE: Memstore performance improvements v0.90 vs v0.87

Stephen,
You may want to tweak the following parameter(s) in your ceph.conf file and see if it is further improving your performance or not.

debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0
osd_op_num_threads_per_shard = 2 //You may want to try with 1 as well
osd_op_num_shards = 10    //Depends on your cpu util
ms_nocrc = true
cephx_sign_messages = false
cephx_require_signatures = false
ms_dispatch_throttle_bytes = 0
throttler_perf_counter = false

[osd]
osd_client_message_size_cap = 0
osd_client_message_cap = 0
osd_enable_op_tracker = false

Also, run more clients (in your case rados bench) and see if it is scaling or not (it should, till it saturates your cpu).

But, your observation on RHEL7 vs UBUNTU 14.04 LTS is interesting !

Thanks & Regards
Somnath
-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick, Stephen L
Sent: Wednesday, January 14, 2015 2:32 PM
To: Ceph Development
Subject: RE: Memstore performance improvements v0.90 vs v0.87

I went back and grabbed 87 and built it on RHEL7 as well, and performance is also similar (much better).  I've also run it on a few systems (Dual socket 10-core E5v2,  Dual socket 6-core E5v3).  So, it's related to my switch to RHEL7, and not to the code changes between v0.90 and v0.87.     Will post when I get more data.

Thanks,

Stephen

-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick, Stephen L
Sent: Wednesday, January 14, 2015 12:06 AM
To: Ceph Development
Subject: Memstore performance improvements v0.90 vs v0.87

In the process of moving to a new cluster (RHEL7 based) I grabbed v0.90, compiled RPM's and re-ran the simple local-node memstore test I've run on .80 - .87.  It's a single Memstore OSD and a single Rados Bench client locally on the same node.  Increasing queue depth and measuring latency /IOPS.  So far, the measurements have been consistent across different hardware and code releases (with about a 30% improvement with the OpWQ Sharding changes that came in after Firefly).

These are just very early results, but I'm seeing a very large improvement in latency and throughput with v90 on RHEL7.   Next  I'm working to get lttng installed and working in RHEL7 to determine where the improvement is.   On previous levels, these measurements have been roughly the same using a real (fast) backend (i.e. NVMe flash), and I will verify here as well.   Just wondering if anyone else has measured similar improvements?


100% Reads or Writes, 4K Objects, Rados Bench

========================
V0.87: Ubuntu 14.04LTS

*Writes*
#Thr    IOPS    Latency(ms)
1       618.80          1.61
2       1401.70         1.42
4       3962.73         1.00
8       7354.37         1.10
16      7654.67         2.10
32      7320.33         4.37
64      7424.27         8.62

*Reads*
#thr    IOPS    Latency(ms)
1       837.57          1.19
2       1950.00         1.02
4       6494.03         0.61
8       7243.53         1.10
16      7473.73         2.14
32      7682.80         4.16
64      7727.10         8.28


========================
V0.90:  RHEL7

*Writes*
#Thr    IOPS    Latency(ms)
1       2558.53         0.39
2       6014.67         0.33
4       10061.33        0.40
8       14169.60        0.56
16      14355.63        1.11
32      14150.30        2.26
64      15283.33        4.19

*Reads*
#Thr    IOPS    Latency(ms)
1       4535.63         0.22
2       9969.73         0.20
4       17049.43        0.23
8       19909.70        0.40
16      20320.80        0.79
32      19827.93        1.61
64      22371.17        2.86
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

________________________________

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Memstore performance improvements v0.90 vs v0.87
  2015-01-14 22:43   ` Mark Nelson
@ 2015-01-14 23:39     ` Blinick, Stephen L
  2015-01-27 21:03       ` Mark Nelson
  0 siblings, 1 reply; 25+ messages in thread
From: Blinick, Stephen L @ 2015-01-14 23:39 UTC (permalink / raw)
  To: mnelson, Ceph Development

Haha :)  Well, my intuition is still pointing to something I've configured wrong (or had wrong).. but it will be interesting to see what it is.

-----Original Message-----
From: Mark Nelson [mailto:mark.nelson@inktank.com] 
Sent: Wednesday, January 14, 2015 3:43 PM
To: Blinick, Stephen L; Ceph Development
Subject: Re: Memstore performance improvements v0.90 vs v0.87

On 01/14/2015 04:32 PM, Blinick, Stephen L wrote:
> I went back and grabbed 87 and built it on RHEL7 as well, and performance is also similar (much better).  I've also run it on a few systems (Dual socket 10-core E5v2,  Dual socket 6-core E5v3).  So, it's related to my switch to RHEL7, and not to the code changes between v0.90 and v0.87.     Will post when I get more data.

Stephen, you are practically writing press releases for the RHEL guys here! ;)

Mark

>
> Thanks,
>
> Stephen
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org 
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick, 
> Stephen L
> Sent: Wednesday, January 14, 2015 12:06 AM
> To: Ceph Development
> Subject: Memstore performance improvements v0.90 vs v0.87
>
> In the process of moving to a new cluster (RHEL7 based) I grabbed v0.90, compiled RPM's and re-ran the simple local-node memstore test I've run on .80 - .87.  It's a single Memstore OSD and a single Rados Bench client locally on the same node.  Increasing queue depth and measuring latency /IOPS.  So far, the measurements have been consistent across different hardware and code releases (with about a 30% improvement with the OpWQ Sharding changes that came in after Firefly).
>
> These are just very early results, but I'm seeing a very large improvement in latency and throughput with v90 on RHEL7.   Next  I'm working to get lttng installed and working in RHEL7 to determine where the improvement is.   On previous levels, these measurements have been roughly the same using a real (fast) backend (i.e. NVMe flash), and I will verify here as well.   Just wondering if anyone else has measured similar improvements?
>
>
> 100% Reads or Writes, 4K Objects, Rados Bench
>
> ========================
> V0.87: Ubuntu 14.04LTS
>
> *Writes*
> #Thr	IOPS	Latency(ms)
> 1	618.80		1.61
> 2	1401.70		1.42
> 4	3962.73		1.00
> 8	7354.37		1.10
> 16	7654.67		2.10
> 32	7320.33		4.37
> 64	7424.27		8.62
>
> *Reads*
> #thr	IOPS	Latency(ms)
> 1	837.57		1.19
> 2	1950.00		1.02
> 4	6494.03		0.61
> 8	7243.53		1.10
> 16	7473.73		2.14
> 32	7682.80		4.16
> 64	7727.10		8.28
>
>
> ========================
> V0.90:  RHEL7
>
> *Writes*
> #Thr	IOPS	Latency(ms)
> 1	2558.53		0.39
> 2	6014.67		0.33
> 4	10061.33	0.40
> 8	14169.60	0.56
> 16	14355.63	1.11
> 32	14150.30	2.26
> 64	15283.33	4.19
>
> *Reads*
> #Thr	IOPS	Latency(ms)
> 1	4535.63		0.22
> 2	9969.73		0.20
> 4	17049.43	0.23
> 8	19909.70	0.40
> 16	20320.80	0.79
> 32	19827.93	1.61
> 64	22371.17	2.86
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to majordomo@vger.kernel.org More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-01-14 22:44   ` Somnath Roy
  2015-01-14 23:37     ` Blinick, Stephen L
@ 2015-01-15 10:43     ` Andreas Bluemle
  2015-01-15 17:09       ` Sage Weil
  2015-01-15 17:15       ` Sage Weil
  1 sibling, 2 replies; 25+ messages in thread
From: Andreas Bluemle @ 2015-01-15 10:43 UTC (permalink / raw)
  To: Somnath Roy; +Cc: Blinick, Stephen L, Ceph Development, Sage Weil

Hi,

I went from using v0.88 to v0.90 and can confirm the
that performance is similar between these two versions.

I am using config settings similar to the values given by
Somnath.

There is one difference in my settings: where Somanth has disabled
message throttling both for the number of messages and the amount
of message data, I am using the settiings:

  "osd_client_message_size_cap": "524288000"   (default)
  "osd_client_message_cap": "3000"             (default is 100)

With my test profile (small, i.e. 4k, random writes), the message
size throttle is no problem - but the message count throttle
is worth to look at:

with my test profile I hit this throttle - but this seems to depend
on the number of placement groups (or the distribution achieved
by different placement group set).

I have configured my cluster (3 nodes, 12 osds) with 3 rbd pools:
   rbd with 768 pgs (default)
   rbd2 with 3000 pgs
   rbd3 with 769 pgs

I don't see any hits of message count throttle on rbd2 and rbd3 - but
on rbd, I see the message throttle hits about 3.750 times during a
test with a total of 262.144 client write request within a period
of approx. 35 seconds. All of the hits of the message count throttle 
happen on a single osd; there is no hit of the message count throttle
on any of the other 11 osds. 

Sage: yesterday, you had been asking for a value of the message
count throttle where my tests start to run smoothly - and I can't
give an answer. It depends on the distribution of I/O requests
achieved by the specific set of pg's - and vice versa, a different
pattern of I/O requests will change the behavior again.



Regards

Andreas Bluemle

On Wed, 14 Jan 2015 22:44:01 +0000
Somnath Roy <Somnath.Roy@sandisk.com> wrote:

> Stephen,
> You may want to tweak the following parameter(s) in your ceph.conf
> file and see if it is further improving your performance or not.
> 
> debug_lockdep = 0/0
> debug_context = 0/0
> debug_crush = 0/0
> debug_buffer = 0/0
> debug_timer = 0/0
> debug_filer = 0/0
> debug_objecter = 0/0
> debug_rados = 0/0
> debug_rbd = 0/0
> debug_journaler = 0/0
> debug_objectcatcher = 0/0
> debug_client = 0/0
> debug_osd = 0/0
> debug_optracker = 0/0
> debug_objclass = 0/0
> debug_filestore = 0/0
> debug_journal = 0/0
> debug_ms = 0/0
> debug_monc = 0/0
> debug_tp = 0/0
> debug_auth = 0/0
> debug_finisher = 0/0
> debug_heartbeatmap = 0/0
> debug_perfcounter = 0/0
> debug_asok = 0/0
> debug_throttle = 0/0
> debug_mon = 0/0
> debug_paxos = 0/0
> debug_rgw = 0/0
> osd_op_num_threads_per_shard = 2 //You may want to try with 1 as well
> osd_op_num_shards = 10    //Depends on your cpu util
> ms_nocrc = true
> cephx_sign_messages = false
> cephx_require_signatures = false
> ms_dispatch_throttle_bytes = 0
> throttler_perf_counter = false
> 
> [osd]
> osd_client_message_size_cap = 0
> osd_client_message_cap = 0
> osd_enable_op_tracker = false
> 
> Also, run more clients (in your case rados bench) and see if it is
> scaling or not (it should, till it saturates your cpu).
> 
> But, your observation on RHEL7 vs UBUNTU 14.04 LTS is interesting !
> 
> Thanks & Regards
> Somnath
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick,
> Stephen L Sent: Wednesday, January 14, 2015 2:32 PM To: Ceph
> Development Subject: RE: Memstore performance improvements v0.90 vs
> v0.87
> 
> I went back and grabbed 87 and built it on RHEL7 as well, and
> performance is also similar (much better).  I've also run it on a few
> systems (Dual socket 10-core E5v2,  Dual socket 6-core E5v3).  So,
> it's related to my switch to RHEL7, and not to the code changes
> between v0.90 and v0.87.     Will post when I get more data.
> 
> Thanks,
> 
> Stephen
> 
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org
> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick,
> Stephen L Sent: Wednesday, January 14, 2015 12:06 AM To: Ceph
> Development Subject: Memstore performance improvements v0.90 vs v0.87
> 
> In the process of moving to a new cluster (RHEL7 based) I grabbed
> v0.90, compiled RPM's and re-ran the simple local-node memstore test
> I've run on .80 - .87.  It's a single Memstore OSD and a single Rados
> Bench client locally on the same node.  Increasing queue depth and
> measuring latency /IOPS.  So far, the measurements have been
> consistent across different hardware and code releases (with about a
> 30% improvement with the OpWQ Sharding changes that came in after
> Firefly).
> 
> These are just very early results, but I'm seeing a very large
> improvement in latency and throughput with v90 on RHEL7.   Next  I'm
> working to get lttng installed and working in RHEL7 to determine
> where the improvement is.   On previous levels, these measurements
> have been roughly the same using a real (fast) backend (i.e. NVMe
> flash), and I will verify here as well.   Just wondering if anyone
> else has measured similar improvements?
> 
> 
> 100% Reads or Writes, 4K Objects, Rados Bench
> 
> ========================
> V0.87: Ubuntu 14.04LTS
> 
> *Writes*
> #Thr    IOPS    Latency(ms)
> 1       618.80          1.61
> 2       1401.70         1.42
> 4       3962.73         1.00
> 8       7354.37         1.10
> 16      7654.67         2.10
> 32      7320.33         4.37
> 64      7424.27         8.62
> 
> *Reads*
> #thr    IOPS    Latency(ms)
> 1       837.57          1.19
> 2       1950.00         1.02
> 4       6494.03         0.61
> 8       7243.53         1.10
> 16      7473.73         2.14
> 32      7682.80         4.16
> 64      7727.10         8.28
> 
> 
> ========================
> V0.90:  RHEL7
> 
> *Writes*
> #Thr    IOPS    Latency(ms)
> 1       2558.53         0.39
> 2       6014.67         0.33
> 4       10061.33        0.40
> 8       14169.60        0.56
> 16      14355.63        1.11
> 32      14150.30        2.26
> 64      15283.33        4.19
> 
> *Reads*
> #Thr    IOPS    Latency(ms)
> 1       4535.63         0.22
> 2       9969.73         0.20
> 4       17049.43        0.23
> 8       19909.70        0.40
> 16      20320.80        0.79
> 32      19827.93        1.61
> 64      22371.17        2.86
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> in the body of a message to majordomo@vger.kernel.org More majordomo
> info at  http://vger.kernel.org/majordomo-info.html -- To unsubscribe
> from this list: send the line "unsubscribe ceph-devel" in the body of
> a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
> 
> ________________________________
> 
> PLEASE NOTE: The information contained in this electronic mail
> message is intended only for the use of the designated recipient(s)
> named above. If the reader of this message is not the intended
> recipient, you are hereby notified that you have received this
> message in error and that any review, dissemination, distribution, or
> copying of this message is strictly prohibited. If you have received
> this communication in error, please notify the sender by telephone or
> e-mail (as shown above) immediately and destroy any and all copies of
> this message in your possession (whether hard copies or
> electronically stored copies).
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> in the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 



-- 
Andreas Bluemle                     mailto:Andreas.Bluemle@itxperts.de
ITXperts GmbH                       http://www.itxperts.de
Balanstrasse 73, Geb. 08            Phone: (+49) 89 89044917
D-81541 Muenchen (Germany)          Fax:   (+49) 89 89044910

Company details: http://www.itxperts.de/imprint.htm

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-01-15 10:43     ` Andreas Bluemle
@ 2015-01-15 17:09       ` Sage Weil
  2015-01-15 17:15       ` Sage Weil
  1 sibling, 0 replies; 25+ messages in thread
From: Sage Weil @ 2015-01-15 17:09 UTC (permalink / raw)
  To: Andreas Bluemle; +Cc: Somnath Roy, Blinick, Stephen L, Ceph Development

Hey,

The 0.91 release includes the 'pgmeta' changes that split out the PG 
metadata (info and logs) across collections and should avoid some lock 
contention in the FileStore.

Do you mind repeating this experiment with that version?

Thanks!
sage


On Thu, 15 Jan 2015, Andreas Bluemle wrote:

> Hi,
> 
> I went from using v0.88 to v0.90 and can confirm the
> that performance is similar between these two versions.
> 
> I am using config settings similar to the values given by
> Somnath.
> 
> There is one difference in my settings: where Somanth has disabled
> message throttling both for the number of messages and the amount
> of message data, I am using the settiings:
> 
>   "osd_client_message_size_cap": "524288000"   (default)
>   "osd_client_message_cap": "3000"             (default is 100)
> 
> With my test profile (small, i.e. 4k, random writes), the message
> size throttle is no problem - but the message count throttle
> is worth to look at:
> 
> with my test profile I hit this throttle - but this seems to depend
> on the number of placement groups (or the distribution achieved
> by different placement group set).
> 
> I have configured my cluster (3 nodes, 12 osds) with 3 rbd pools:
>    rbd with 768 pgs (default)
>    rbd2 with 3000 pgs
>    rbd3 with 769 pgs
> 
> I don't see any hits of message count throttle on rbd2 and rbd3 - but
> on rbd, I see the message throttle hits about 3.750 times during a
> test with a total of 262.144 client write request within a period
> of approx. 35 seconds. All of the hits of the message count throttle 
> happen on a single osd; there is no hit of the message count throttle
> on any of the other 11 osds. 
> 
> Sage: yesterday, you had been asking for a value of the message
> count throttle where my tests start to run smoothly - and I can't
> give an answer. It depends on the distribution of I/O requests
> achieved by the specific set of pg's - and vice versa, a different
> pattern of I/O requests will change the behavior again.
> 
> 
> 
> Regards
> 
> Andreas Bluemle
> 
> On Wed, 14 Jan 2015 22:44:01 +0000
> Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> 
> > Stephen,
> > You may want to tweak the following parameter(s) in your ceph.conf
> > file and see if it is further improving your performance or not.
> > 
> > debug_lockdep = 0/0
> > debug_context = 0/0
> > debug_crush = 0/0
> > debug_buffer = 0/0
> > debug_timer = 0/0
> > debug_filer = 0/0
> > debug_objecter = 0/0
> > debug_rados = 0/0
> > debug_rbd = 0/0
> > debug_journaler = 0/0
> > debug_objectcatcher = 0/0
> > debug_client = 0/0
> > debug_osd = 0/0
> > debug_optracker = 0/0
> > debug_objclass = 0/0
> > debug_filestore = 0/0
> > debug_journal = 0/0
> > debug_ms = 0/0
> > debug_monc = 0/0
> > debug_tp = 0/0
> > debug_auth = 0/0
> > debug_finisher = 0/0
> > debug_heartbeatmap = 0/0
> > debug_perfcounter = 0/0
> > debug_asok = 0/0
> > debug_throttle = 0/0
> > debug_mon = 0/0
> > debug_paxos = 0/0
> > debug_rgw = 0/0
> > osd_op_num_threads_per_shard = 2 //You may want to try with 1 as well
> > osd_op_num_shards = 10    //Depends on your cpu util
> > ms_nocrc = true
> > cephx_sign_messages = false
> > cephx_require_signatures = false
> > ms_dispatch_throttle_bytes = 0
> > throttler_perf_counter = false
> > 
> > [osd]
> > osd_client_message_size_cap = 0
> > osd_client_message_cap = 0
> > osd_enable_op_tracker = false
> > 
> > Also, run more clients (in your case rados bench) and see if it is
> > scaling or not (it should, till it saturates your cpu).
> > 
> > But, your observation on RHEL7 vs UBUNTU 14.04 LTS is interesting !
> > 
> > Thanks & Regards
> > Somnath
> > -----Original Message-----
> > From: ceph-devel-owner@vger.kernel.org
> > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick,
> > Stephen L Sent: Wednesday, January 14, 2015 2:32 PM To: Ceph
> > Development Subject: RE: Memstore performance improvements v0.90 vs
> > v0.87
> > 
> > I went back and grabbed 87 and built it on RHEL7 as well, and
> > performance is also similar (much better).  I've also run it on a few
> > systems (Dual socket 10-core E5v2,  Dual socket 6-core E5v3).  So,
> > it's related to my switch to RHEL7, and not to the code changes
> > between v0.90 and v0.87.     Will post when I get more data.
> > 
> > Thanks,
> > 
> > Stephen
> > 
> > -----Original Message-----
> > From: ceph-devel-owner@vger.kernel.org
> > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick,
> > Stephen L Sent: Wednesday, January 14, 2015 12:06 AM To: Ceph
> > Development Subject: Memstore performance improvements v0.90 vs v0.87
> > 
> > In the process of moving to a new cluster (RHEL7 based) I grabbed
> > v0.90, compiled RPM's and re-ran the simple local-node memstore test
> > I've run on .80 - .87.  It's a single Memstore OSD and a single Rados
> > Bench client locally on the same node.  Increasing queue depth and
> > measuring latency /IOPS.  So far, the measurements have been
> > consistent across different hardware and code releases (with about a
> > 30% improvement with the OpWQ Sharding changes that came in after
> > Firefly).
> > 
> > These are just very early results, but I'm seeing a very large
> > improvement in latency and throughput with v90 on RHEL7.   Next  I'm
> > working to get lttng installed and working in RHEL7 to determine
> > where the improvement is.   On previous levels, these measurements
> > have been roughly the same using a real (fast) backend (i.e. NVMe
> > flash), and I will verify here as well.   Just wondering if anyone
> > else has measured similar improvements?
> > 
> > 
> > 100% Reads or Writes, 4K Objects, Rados Bench
> > 
> > ========================
> > V0.87: Ubuntu 14.04LTS
> > 
> > *Writes*
> > #Thr    IOPS    Latency(ms)
> > 1       618.80          1.61
> > 2       1401.70         1.42
> > 4       3962.73         1.00
> > 8       7354.37         1.10
> > 16      7654.67         2.10
> > 32      7320.33         4.37
> > 64      7424.27         8.62
> > 
> > *Reads*
> > #thr    IOPS    Latency(ms)
> > 1       837.57          1.19
> > 2       1950.00         1.02
> > 4       6494.03         0.61
> > 8       7243.53         1.10
> > 16      7473.73         2.14
> > 32      7682.80         4.16
> > 64      7727.10         8.28
> > 
> > 
> > ========================
> > V0.90:  RHEL7
> > 
> > *Writes*
> > #Thr    IOPS    Latency(ms)
> > 1       2558.53         0.39
> > 2       6014.67         0.33
> > 4       10061.33        0.40
> > 8       14169.60        0.56
> > 16      14355.63        1.11
> > 32      14150.30        2.26
> > 64      15283.33        4.19
> > 
> > *Reads*
> > #Thr    IOPS    Latency(ms)
> > 1       4535.63         0.22
> > 2       9969.73         0.20
> > 4       17049.43        0.23
> > 8       19909.70        0.40
> > 16      20320.80        0.79
> > 32      19827.93        1.61
> > 64      22371.17        2.86
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html -- To unsubscribe
> > from this list: send the line "unsubscribe ceph-devel" in the body of
> > a message to majordomo@vger.kernel.org More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
> > 
> > ________________________________
> > 
> > PLEASE NOTE: The information contained in this electronic mail
> > message is intended only for the use of the designated recipient(s)
> > named above. If the reader of this message is not the intended
> > recipient, you are hereby notified that you have received this
> > message in error and that any review, dissemination, distribution, or
> > copying of this message is strictly prohibited. If you have received
> > this communication in error, please notify the sender by telephone or
> > e-mail (as shown above) immediately and destroy any and all copies of
> > this message in your possession (whether hard copies or
> > electronically stored copies).
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> 
> 
> 
> -- 
> Andreas Bluemle                     mailto:Andreas.Bluemle@itxperts.de
> ITXperts GmbH                       http://www.itxperts.de
> Balanstrasse 73, Geb. 08            Phone: (+49) 89 89044917
> D-81541 Muenchen (Germany)          Fax:   (+49) 89 89044910
> 
> Company details: http://www.itxperts.de/imprint.htm
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-01-15 10:43     ` Andreas Bluemle
  2015-01-15 17:09       ` Sage Weil
@ 2015-01-15 17:15       ` Sage Weil
  2015-01-19  9:28         ` Andreas Bluemle
  1 sibling, 1 reply; 25+ messages in thread
From: Sage Weil @ 2015-01-15 17:15 UTC (permalink / raw)
  To: Andreas Bluemle; +Cc: Somnath Roy, Blinick, Stephen L, Ceph Development

On Thu, 15 Jan 2015, Andreas Bluemle wrote:
> Hi,
> 
> I went from using v0.88 to v0.90 and can confirm the
> that performance is similar between these two versions.
> 
> I am using config settings similar to the values given by
> Somnath.
> 
> There is one difference in my settings: where Somanth has disabled
> message throttling both for the number of messages and the amount
> of message data, I am using the settiings:
> 
>   "osd_client_message_size_cap": "524288000"   (default)
>   "osd_client_message_cap": "3000"             (default is 100)
> 
> With my test profile (small, i.e. 4k, random writes), the message
> size throttle is no problem - but the message count throttle
> is worth to look at:
> 
> with my test profile I hit this throttle - but this seems to depend
> on the number of placement groups (or the distribution achieved
> by different placement group set).
> 
> I have configured my cluster (3 nodes, 12 osds) with 3 rbd pools:
>    rbd with 768 pgs (default)
>    rbd2 with 3000 pgs
>    rbd3 with 769 pgs
> 
> I don't see any hits of message count throttle on rbd2 and rbd3 - but
> on rbd, I see the message throttle hits about 3.750 times during a
> test with a total of 262.144 client write request within a period
> of approx. 35 seconds. All of the hits of the message count throttle 
> happen on a single osd; there is no hit of the message count throttle
> on any of the other 11 osds. 

The difference between rbd and rbd3 is pretty surprising.. it makes me 
think that the rbd distribution is just a bit unlucky.  Can you try this?

 ceph osd reweight-by-pg 110

or possibly 105 and see if that changes things?

> Sage: yesterday, you had been asking for a value of the message
> count throttle where my tests start to run smoothly - and I can't
> give an answer. It depends on the distribution of I/O requests
> achieved by the specific set of pg's - and vice versa, a different
> pattern of I/O requests will change the behavior again.

Understood.  This is actually good news, I think.. it's not the throttle 
itself that's problematic but that the rbd distrbution is too inbalanced.

Thanks!
sage




> 
> 
> 
> Regards
> 
> Andreas Bluemle
> 
> On Wed, 14 Jan 2015 22:44:01 +0000
> Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> 
> > Stephen,
> > You may want to tweak the following parameter(s) in your ceph.conf
> > file and see if it is further improving your performance or not.
> > 
> > debug_lockdep = 0/0
> > debug_context = 0/0
> > debug_crush = 0/0
> > debug_buffer = 0/0
> > debug_timer = 0/0
> > debug_filer = 0/0
> > debug_objecter = 0/0
> > debug_rados = 0/0
> > debug_rbd = 0/0
> > debug_journaler = 0/0
> > debug_objectcatcher = 0/0
> > debug_client = 0/0
> > debug_osd = 0/0
> > debug_optracker = 0/0
> > debug_objclass = 0/0
> > debug_filestore = 0/0
> > debug_journal = 0/0
> > debug_ms = 0/0
> > debug_monc = 0/0
> > debug_tp = 0/0
> > debug_auth = 0/0
> > debug_finisher = 0/0
> > debug_heartbeatmap = 0/0
> > debug_perfcounter = 0/0
> > debug_asok = 0/0
> > debug_throttle = 0/0
> > debug_mon = 0/0
> > debug_paxos = 0/0
> > debug_rgw = 0/0
> > osd_op_num_threads_per_shard = 2 //You may want to try with 1 as well
> > osd_op_num_shards = 10    //Depends on your cpu util
> > ms_nocrc = true
> > cephx_sign_messages = false
> > cephx_require_signatures = false
> > ms_dispatch_throttle_bytes = 0
> > throttler_perf_counter = false
> > 
> > [osd]
> > osd_client_message_size_cap = 0
> > osd_client_message_cap = 0
> > osd_enable_op_tracker = false
> > 
> > Also, run more clients (in your case rados bench) and see if it is
> > scaling or not (it should, till it saturates your cpu).
> > 
> > But, your observation on RHEL7 vs UBUNTU 14.04 LTS is interesting !
> > 
> > Thanks & Regards
> > Somnath
> > -----Original Message-----
> > From: ceph-devel-owner@vger.kernel.org
> > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick,
> > Stephen L Sent: Wednesday, January 14, 2015 2:32 PM To: Ceph
> > Development Subject: RE: Memstore performance improvements v0.90 vs
> > v0.87
> > 
> > I went back and grabbed 87 and built it on RHEL7 as well, and
> > performance is also similar (much better).  I've also run it on a few
> > systems (Dual socket 10-core E5v2,  Dual socket 6-core E5v3).  So,
> > it's related to my switch to RHEL7, and not to the code changes
> > between v0.90 and v0.87.     Will post when I get more data.
> > 
> > Thanks,
> > 
> > Stephen
> > 
> > -----Original Message-----
> > From: ceph-devel-owner@vger.kernel.org
> > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick,
> > Stephen L Sent: Wednesday, January 14, 2015 12:06 AM To: Ceph
> > Development Subject: Memstore performance improvements v0.90 vs v0.87
> > 
> > In the process of moving to a new cluster (RHEL7 based) I grabbed
> > v0.90, compiled RPM's and re-ran the simple local-node memstore test
> > I've run on .80 - .87.  It's a single Memstore OSD and a single Rados
> > Bench client locally on the same node.  Increasing queue depth and
> > measuring latency /IOPS.  So far, the measurements have been
> > consistent across different hardware and code releases (with about a
> > 30% improvement with the OpWQ Sharding changes that came in after
> > Firefly).
> > 
> > These are just very early results, but I'm seeing a very large
> > improvement in latency and throughput with v90 on RHEL7.   Next  I'm
> > working to get lttng installed and working in RHEL7 to determine
> > where the improvement is.   On previous levels, these measurements
> > have been roughly the same using a real (fast) backend (i.e. NVMe
> > flash), and I will verify here as well.   Just wondering if anyone
> > else has measured similar improvements?
> > 
> > 
> > 100% Reads or Writes, 4K Objects, Rados Bench
> > 
> > ========================
> > V0.87: Ubuntu 14.04LTS
> > 
> > *Writes*
> > #Thr    IOPS    Latency(ms)
> > 1       618.80          1.61
> > 2       1401.70         1.42
> > 4       3962.73         1.00
> > 8       7354.37         1.10
> > 16      7654.67         2.10
> > 32      7320.33         4.37
> > 64      7424.27         8.62
> > 
> > *Reads*
> > #thr    IOPS    Latency(ms)
> > 1       837.57          1.19
> > 2       1950.00         1.02
> > 4       6494.03         0.61
> > 8       7243.53         1.10
> > 16      7473.73         2.14
> > 32      7682.80         4.16
> > 64      7727.10         8.28
> > 
> > 
> > ========================
> > V0.90:  RHEL7
> > 
> > *Writes*
> > #Thr    IOPS    Latency(ms)
> > 1       2558.53         0.39
> > 2       6014.67         0.33
> > 4       10061.33        0.40
> > 8       14169.60        0.56
> > 16      14355.63        1.11
> > 32      14150.30        2.26
> > 64      15283.33        4.19
> > 
> > *Reads*
> > #Thr    IOPS    Latency(ms)
> > 1       4535.63         0.22
> > 2       9969.73         0.20
> > 4       17049.43        0.23
> > 8       19909.70        0.40
> > 16      20320.80        0.79
> > 32      19827.93        1.61
> > 64      22371.17        2.86
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@vger.kernel.org More majordomo
> > info at  http://vger.kernel.org/majordomo-info.html -- To unsubscribe
> > from this list: send the line "unsubscribe ceph-devel" in the body of
> > a message to majordomo@vger.kernel.org More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
> > 
> > ________________________________
> > 
> > PLEASE NOTE: The information contained in this electronic mail
> > message is intended only for the use of the designated recipient(s)
> > named above. If the reader of this message is not the intended
> > recipient, you are hereby notified that you have received this
> > message in error and that any review, dissemination, distribution, or
> > copying of this message is strictly prohibited. If you have received
> > this communication in error, please notify the sender by telephone or
> > e-mail (as shown above) immediately and destroy any and all copies of
> > this message in your possession (whether hard copies or
> > electronically stored copies).
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > in the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> 
> 
> 
> -- 
> Andreas Bluemle                     mailto:Andreas.Bluemle@itxperts.de
> ITXperts GmbH                       http://www.itxperts.de
> Balanstrasse 73, Geb. 08            Phone: (+49) 89 89044917
> D-81541 Muenchen (Germany)          Fax:   (+49) 89 89044910
> 
> Company details: http://www.itxperts.de/imprint.htm
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-01-15 17:15       ` Sage Weil
@ 2015-01-19  9:28         ` Andreas Bluemle
  0 siblings, 0 replies; 25+ messages in thread
From: Andreas Bluemle @ 2015-01-19  9:28 UTC (permalink / raw)
  To: Sage Weil; +Cc: Somnath Roy, Blinick, Stephen L, Ceph Development

Hi Sage,

I tried both "ceph osd reweight-by-pg 110" and
"ceph osd reweight-by-pg 105". The first reweight shows:
   SUCCESSFUL reweight-by-pg:
       average 684.250000,  overload 752.675000.
       reweighted: osd.1 [1.000000 -> 0.901505])

The second reweight shows:
   SUCCESSFUL reweight-by-pg:
      average 684.250000, overload 718.462500.
      reweighted: osd.1 [0.901505 -> 0.853180],
                  osd.4 [1.000000 -> 0.941193],
                  osd.10 [1.000000 -> 0.945084],

So only the 2nd reweight directly affected osd.10
(where osd.10 was the one where message count throttle
hits had concentrated on).

Running my test profile showed some relief on the
message count throttle hits after each of the reweigth
commands on pool rbd. However: I now encounter hits
on the other pools.

So my guess is that it is the specific combination of my
load profile and the distribution of placement groups
which causes the message count throttle to hit. And
I was only lucky in some sense when modifiying the number
of placement group and achieving a well-behaving distribution.


Regards

Andreas Bluemle
   
On Thu, 15 Jan 2015 09:15:32 -0800 (PST)
Sage Weil <sage@newdream.net> wrote:

> On Thu, 15 Jan 2015, Andreas Bluemle wrote:
> > Hi,
> > 
> > I went from using v0.88 to v0.90 and can confirm the
> > that performance is similar between these two versions.
> > 
> > I am using config settings similar to the values given by
> > Somnath.
> > 
> > There is one difference in my settings: where Somanth has disabled
> > message throttling both for the number of messages and the amount
> > of message data, I am using the settiings:
> > 
> >   "osd_client_message_size_cap": "524288000"   (default)
> >   "osd_client_message_cap": "3000"             (default is 100)
> > 
> > With my test profile (small, i.e. 4k, random writes), the message
> > size throttle is no problem - but the message count throttle
> > is worth to look at:
> > 
> > with my test profile I hit this throttle - but this seems to depend
> > on the number of placement groups (or the distribution achieved
> > by different placement group set).
> > 
> > I have configured my cluster (3 nodes, 12 osds) with 3 rbd pools:
> >    rbd with 768 pgs (default)
> >    rbd2 with 3000 pgs
> >    rbd3 with 769 pgs
> > 
> > I don't see any hits of message count throttle on rbd2 and rbd3 -
> > but on rbd, I see the message throttle hits about 3.750 times
> > during a test with a total of 262.144 client write request within a
> > period of approx. 35 seconds. All of the hits of the message count
> > throttle happen on a single osd; there is no hit of the message
> > count throttle on any of the other 11 osds. 
> 
> The difference between rbd and rbd3 is pretty surprising.. it makes
> me think that the rbd distribution is just a bit unlucky.  Can you
> try this?
> 
>  ceph osd reweight-by-pg 110
> 
> or possibly 105 and see if that changes things?
> 
> > Sage: yesterday, you had been asking for a value of the message
> > count throttle where my tests start to run smoothly - and I can't
> > give an answer. It depends on the distribution of I/O requests
> > achieved by the specific set of pg's - and vice versa, a different
> > pattern of I/O requests will change the behavior again.
> 
> Understood.  This is actually good news, I think.. it's not the
> throttle itself that's problematic but that the rbd distrbution is
> too inbalanced.
> 
> Thanks!
> sage
> 
> 
> 
> 
> > 
> > 
> > 
> > Regards
> > 
> > Andreas Bluemle
> > 
> > On Wed, 14 Jan 2015 22:44:01 +0000
> > Somnath Roy <Somnath.Roy@sandisk.com> wrote:
> > 
> > > Stephen,
> > > You may want to tweak the following parameter(s) in your ceph.conf
> > > file and see if it is further improving your performance or not.
> > > 
> > > debug_lockdep = 0/0
> > > debug_context = 0/0
> > > debug_crush = 0/0
> > > debug_buffer = 0/0
> > > debug_timer = 0/0
> > > debug_filer = 0/0
> > > debug_objecter = 0/0
> > > debug_rados = 0/0
> > > debug_rbd = 0/0
> > > debug_journaler = 0/0
> > > debug_objectcatcher = 0/0
> > > debug_client = 0/0
> > > debug_osd = 0/0
> > > debug_optracker = 0/0
> > > debug_objclass = 0/0
> > > debug_filestore = 0/0
> > > debug_journal = 0/0
> > > debug_ms = 0/0
> > > debug_monc = 0/0
> > > debug_tp = 0/0
> > > debug_auth = 0/0
> > > debug_finisher = 0/0
> > > debug_heartbeatmap = 0/0
> > > debug_perfcounter = 0/0
> > > debug_asok = 0/0
> > > debug_throttle = 0/0
> > > debug_mon = 0/0
> > > debug_paxos = 0/0
> > > debug_rgw = 0/0
> > > osd_op_num_threads_per_shard = 2 //You may want to try with 1 as
> > > well osd_op_num_shards = 10    //Depends on your cpu util
> > > ms_nocrc = true
> > > cephx_sign_messages = false
> > > cephx_require_signatures = false
> > > ms_dispatch_throttle_bytes = 0
> > > throttler_perf_counter = false
> > > 
> > > [osd]
> > > osd_client_message_size_cap = 0
> > > osd_client_message_cap = 0
> > > osd_enable_op_tracker = false
> > > 
> > > Also, run more clients (in your case rados bench) and see if it is
> > > scaling or not (it should, till it saturates your cpu).
> > > 
> > > But, your observation on RHEL7 vs UBUNTU 14.04 LTS is
> > > interesting !
> > > 
> > > Thanks & Regards
> > > Somnath
> > > -----Original Message-----
> > > From: ceph-devel-owner@vger.kernel.org
> > > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick,
> > > Stephen L Sent: Wednesday, January 14, 2015 2:32 PM To: Ceph
> > > Development Subject: RE: Memstore performance improvements v0.90
> > > vs v0.87
> > > 
> > > I went back and grabbed 87 and built it on RHEL7 as well, and
> > > performance is also similar (much better).  I've also run it on a
> > > few systems (Dual socket 10-core E5v2,  Dual socket 6-core
> > > E5v3).  So, it's related to my switch to RHEL7, and not to the
> > > code changes between v0.90 and v0.87.     Will post when I get
> > > more data.
> > > 
> > > Thanks,
> > > 
> > > Stephen
> > > 
> > > -----Original Message-----
> > > From: ceph-devel-owner@vger.kernel.org
> > > [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick,
> > > Stephen L Sent: Wednesday, January 14, 2015 12:06 AM To: Ceph
> > > Development Subject: Memstore performance improvements v0.90 vs
> > > v0.87
> > > 
> > > In the process of moving to a new cluster (RHEL7 based) I grabbed
> > > v0.90, compiled RPM's and re-ran the simple local-node memstore
> > > test I've run on .80 - .87.  It's a single Memstore OSD and a
> > > single Rados Bench client locally on the same node.  Increasing
> > > queue depth and measuring latency /IOPS.  So far, the
> > > measurements have been consistent across different hardware and
> > > code releases (with about a 30% improvement with the OpWQ
> > > Sharding changes that came in after Firefly).
> > > 
> > > These are just very early results, but I'm seeing a very large
> > > improvement in latency and throughput with v90 on RHEL7.   Next
> > > I'm working to get lttng installed and working in RHEL7 to
> > > determine where the improvement is.   On previous levels, these
> > > measurements have been roughly the same using a real (fast)
> > > backend (i.e. NVMe flash), and I will verify here as well.   Just
> > > wondering if anyone else has measured similar improvements?
> > > 
> > > 
> > > 100% Reads or Writes, 4K Objects, Rados Bench
> > > 
> > > ========================
> > > V0.87: Ubuntu 14.04LTS
> > > 
> > > *Writes*
> > > #Thr    IOPS    Latency(ms)
> > > 1       618.80          1.61
> > > 2       1401.70         1.42
> > > 4       3962.73         1.00
> > > 8       7354.37         1.10
> > > 16      7654.67         2.10
> > > 32      7320.33         4.37
> > > 64      7424.27         8.62
> > > 
> > > *Reads*
> > > #thr    IOPS    Latency(ms)
> > > 1       837.57          1.19
> > > 2       1950.00         1.02
> > > 4       6494.03         0.61
> > > 8       7243.53         1.10
> > > 16      7473.73         2.14
> > > 32      7682.80         4.16
> > > 64      7727.10         8.28
> > > 
> > > 
> > > ========================
> > > V0.90:  RHEL7
> > > 
> > > *Writes*
> > > #Thr    IOPS    Latency(ms)
> > > 1       2558.53         0.39
> > > 2       6014.67         0.33
> > > 4       10061.33        0.40
> > > 8       14169.60        0.56
> > > 16      14355.63        1.11
> > > 32      14150.30        2.26
> > > 64      15283.33        4.19
> > > 
> > > *Reads*
> > > #Thr    IOPS    Latency(ms)
> > > 1       4535.63         0.22
> > > 2       9969.73         0.20
> > > 4       17049.43        0.23
> > > 8       19909.70        0.40
> > > 16      20320.80        0.79
> > > 32      19827.93        1.61
> > > 64      22371.17        2.86
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe
> > > ceph-devel" in the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at
> > > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from
> > > this list: send the line "unsubscribe ceph-devel" in the body of
> > > a message to majordomo@vger.kernel.org More majordomo info at
> > > http://vger.kernel.org/majordomo-info.html
> > > 
> > > ________________________________
> > > 
> > > PLEASE NOTE: The information contained in this electronic mail
> > > message is intended only for the use of the designated
> > > recipient(s) named above. If the reader of this message is not
> > > the intended recipient, you are hereby notified that you have
> > > received this message in error and that any review,
> > > dissemination, distribution, or copying of this message is
> > > strictly prohibited. If you have received this communication in
> > > error, please notify the sender by telephone or e-mail (as shown
> > > above) immediately and destroy any and all copies of this message
> > > in your possession (whether hard copies or electronically stored
> > > copies).
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe
> > > ceph-devel" in the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > > 
> > 
> > 
> > 
> > -- 
> > Andreas Bluemle
> > mailto:Andreas.Bluemle@itxperts.de ITXperts
> > GmbH                       http://www.itxperts.de Balanstrasse 73,
> > Geb. 08            Phone: (+49) 89 89044917 D-81541 Muenchen
> > (Germany)          Fax:   (+49) 89 89044910
> > 
> > Company details: http://www.itxperts.de/imprint.htm
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > ceph-devel" in the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> 
> 



-- 
Andreas Bluemle                     mailto:Andreas.Bluemle@itxperts.de
ITXperts GmbH                       http://www.itxperts.de
Balanstrasse 73, Geb. 08            Phone: (+49) 89 89044917
D-81541 Muenchen (Germany)          Fax:   (+49) 89 89044910

Company details: http://www.itxperts.de/imprint.htm

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-01-14 23:39     ` Blinick, Stephen L
@ 2015-01-27 21:03       ` Mark Nelson
  2015-01-28  1:23         ` Blinick, Stephen L
  2015-02-20  9:07         ` James Page
  0 siblings, 2 replies; 25+ messages in thread
From: Mark Nelson @ 2015-01-27 21:03 UTC (permalink / raw)
  To: Blinick, Stephen L, Ceph Development

Hi Stephen,

Took a little longer than I wanted it to, but I finally got some results 
looking at RHEL7 and Ubuntu 14.04 in our test lab.  This is with a 
recent master pull.

Tests are with rados bench to a single memstore OSD on localhost.

Single Op Avg Write Latency:

Ubuntu 14.04:            0.91ms
Ubuntu 14.04 (no debug): 0.67ms
RHEL 7:                  0.49ms
RHEL 7 (no debug):       0.31ms

Single Op Avg read Latency:

Ubuntu 14.04:            0.58ms
Ubuntu 14.04 (no debug): 0.33ms
RHEL 7:                  0.32ms
RHEL 7 (no debug):       0.17ms

I then checked avg network latency to localhost using ping for 120s:

Ubuntu 14.04: 0.025ms
RHEL 7:       0.015ms

So looking at your results, I see similar latency numbers, though not 
quite as dramatic (ie  Ubuntu isn't quite so bad).  I wanted to know if 
the latency would be hidden if enough IOs were thrown at the problem so 
I increased concurrent IOs to 256:

256 concurrent op Write IOPS:

Ubuntu 14.04:             7199 IOPS
Ubuntu 14.04 (no debug): 14613 IOPS
RHEL 7:                   7784 IOPS
REHL 7 (no debug):       17907 IOPS

256 concurrent op Read IOPS:

Ubuntu 14.04:             9887 IOPS
Ubuntu 14.04 (no debug): 20489 IOPS
RHEL 7:                  10832 IOPS
REHL 7 (no debug):       21257 IOPS

So on one hand I'm seeing an effect similar to what you saw, but once I 
throw enough concurrency at the problem it seems like other things take 
over as the bottleneck.  With default debug logging levels the latency 
difference is mostly masked, but with debugging off we see at least for 
writes a fairly substantial difference.

I collected some system utilization data during the tests and will go 
back and see if I can discover anything more with perf as well.  I think 
the two big takeaways at this point are:

1) There is definitely something interesting going on with Ubuntu vs 
RHEL (Maybe network related).
2) Our debug logging has become a major bottleneck in high IOPS 
scenarios (though we already kind of knew this).

Mark

On 01/14/2015 05:39 PM, Blinick, Stephen L wrote:
> Haha :)  Well, my intuition is still pointing to something I've configured wrong (or had wrong).. but it will be interesting to see what it is.
>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Wednesday, January 14, 2015 3:43 PM
> To: Blinick, Stephen L; Ceph Development
> Subject: Re: Memstore performance improvements v0.90 vs v0.87
>
> On 01/14/2015 04:32 PM, Blinick, Stephen L wrote:
>> I went back and grabbed 87 and built it on RHEL7 as well, and performance is also similar (much better).  I've also run it on a few systems (Dual socket 10-core E5v2,  Dual socket 6-core E5v3).  So, it's related to my switch to RHEL7, and not to the code changes between v0.90 and v0.87.     Will post when I get more data.
>
> Stephen, you are practically writing press releases for the RHEL guys here! ;)
>
> Mark
>
>>
>> Thanks,
>>
>> Stephen
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick,
>> Stephen L
>> Sent: Wednesday, January 14, 2015 12:06 AM
>> To: Ceph Development
>> Subject: Memstore performance improvements v0.90 vs v0.87
>>
>> In the process of moving to a new cluster (RHEL7 based) I grabbed v0.90, compiled RPM's and re-ran the simple local-node memstore test I've run on .80 - .87.  It's a single Memstore OSD and a single Rados Bench client locally on the same node.  Increasing queue depth and measuring latency /IOPS.  So far, the measurements have been consistent across different hardware and code releases (with about a 30% improvement with the OpWQ Sharding changes that came in after Firefly).
>>
>> These are just very early results, but I'm seeing a very large improvement in latency and throughput with v90 on RHEL7.   Next  I'm working to get lttng installed and working in RHEL7 to determine where the improvement is.   On previous levels, these measurements have been roughly the same using a real (fast) backend (i.e. NVMe flash), and I will verify here as well.   Just wondering if anyone else has measured similar improvements?
>>
>>
>> 100% Reads or Writes, 4K Objects, Rados Bench
>>
>> ========================
>> V0.87: Ubuntu 14.04LTS
>>
>> *Writes*
>> #Thr	IOPS	Latency(ms)
>> 1	618.80		1.61
>> 2	1401.70		1.42
>> 4	3962.73		1.00
>> 8	7354.37		1.10
>> 16	7654.67		2.10
>> 32	7320.33		4.37
>> 64	7424.27		8.62
>>
>> *Reads*
>> #thr	IOPS	Latency(ms)
>> 1	837.57		1.19
>> 2	1950.00		1.02
>> 4	6494.03		0.61
>> 8	7243.53		1.10
>> 16	7473.73		2.14
>> 32	7682.80		4.16
>> 64	7727.10		8.28
>>
>>
>> ========================
>> V0.90:  RHEL7
>>
>> *Writes*
>> #Thr	IOPS	Latency(ms)
>> 1	2558.53		0.39
>> 2	6014.67		0.33
>> 4	10061.33	0.40
>> 8	14169.60	0.56
>> 16	14355.63	1.11
>> 32	14150.30	2.26
>> 64	15283.33	4.19
>>
>> *Reads*
>> #Thr	IOPS	Latency(ms)
>> 1	4535.63		0.22
>> 2	9969.73		0.20
>> 4	17049.43	0.23
>> 8	19909.70	0.40
>> 16	20320.80	0.79
>> 32	19827.93	1.61
>> 64	22371.17	2.86
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo
>> info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Memstore performance improvements v0.90 vs v0.87
  2015-01-27 21:03       ` Mark Nelson
@ 2015-01-28  1:23         ` Blinick, Stephen L
  2015-01-28 21:51           ` Mark Nelson
  2015-02-20  9:07         ` James Page
  1 sibling, 1 reply; 25+ messages in thread
From: Blinick, Stephen L @ 2015-01-28  1:23 UTC (permalink / raw)
  To: mnelson, Ceph Development

Hi Mark --thanks for the detailed description!  Here's my latency #'s (local ping) on identical hardware 

Ubuntu 14.04LTS:  rtt min/avg/max/mdev    0.025/0.026/0.030/0.005 ms
RHEL7:                        rtt min/avg/max/mdev    0.008/0.009/0.022/0.003ms

So I am seeing a similar network stack latency difference.   Also, all the tests I did were with 'debug off' (but with other things such as message signing, crc. ) .  Maybe we could have a quick discussion on what settings are best to use when trying to get comparable numbers with memstore or all-flash setups.

As far as the high concurrency test goes, that peak # of IOPS will be reached at lower concurrency (something around like t=8 probably), and at that point (the 'knee' of the latency/throughput curve), there's a pretty substantial latency difference.     Once it gets to t=256 I imagine the latency was 10+ms for both platforms.  

Since the last direct comparison was on older code, and the mixing of libnss/cryptopp in the builds, I think I need to rerun the comparison(at least one last time!) between the two distro's on a more recent version of code.

Thanks,

Stephen



-----Original Message-----
From: Mark Nelson [mailto:mark.nelson@inktank.com] 
Sent: Tuesday, January 27, 2015 2:03 PM
To: Blinick, Stephen L; Ceph Development
Subject: Re: Memstore performance improvements v0.90 vs v0.87

Hi Stephen,

Took a little longer than I wanted it to, but I finally got some results looking at RHEL7 and Ubuntu 14.04 in our test lab.  This is with a recent master pull.

Tests are with rados bench to a single memstore OSD on localhost.

Single Op Avg Write Latency:

Ubuntu 14.04:            0.91ms
Ubuntu 14.04 (no debug): 0.67ms
RHEL 7:                  0.49ms
RHEL 7 (no debug):       0.31ms

Single Op Avg read Latency:

Ubuntu 14.04:            0.58ms
Ubuntu 14.04 (no debug): 0.33ms
RHEL 7:                  0.32ms
RHEL 7 (no debug):       0.17ms

I then checked avg network latency to localhost using ping for 120s:

Ubuntu 14.04: 0.025ms
RHEL 7:       0.015ms

So looking at your results, I see similar latency numbers, though not quite as dramatic (ie  Ubuntu isn't quite so bad).  I wanted to know if the latency would be hidden if enough IOs were thrown at the problem so I increased concurrent IOs to 256:

256 concurrent op Write IOPS:

Ubuntu 14.04:             7199 IOPS
Ubuntu 14.04 (no debug): 14613 IOPS
RHEL 7:                   7784 IOPS
REHL 7 (no debug):       17907 IOPS

256 concurrent op Read IOPS:

Ubuntu 14.04:             9887 IOPS
Ubuntu 14.04 (no debug): 20489 IOPS
RHEL 7:                  10832 IOPS
REHL 7 (no debug):       21257 IOPS

So on one hand I'm seeing an effect similar to what you saw, but once I throw enough concurrency at the problem it seems like other things take over as the bottleneck.  With default debug logging levels the latency difference is mostly masked, but with debugging off we see at least for writes a fairly substantial difference.

I collected some system utilization data during the tests and will go back and see if I can discover anything more with perf as well.  I think the two big takeaways at this point are:

1) There is definitely something interesting going on with Ubuntu vs RHEL (Maybe network related).
2) Our debug logging has become a major bottleneck in high IOPS scenarios (though we already kind of knew this).

Mark

On 01/14/2015 05:39 PM, Blinick, Stephen L wrote:
> Haha :)  Well, my intuition is still pointing to something I've configured wrong (or had wrong).. but it will be interesting to see what it is.
>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Wednesday, January 14, 2015 3:43 PM
> To: Blinick, Stephen L; Ceph Development
> Subject: Re: Memstore performance improvements v0.90 vs v0.87
>
> On 01/14/2015 04:32 PM, Blinick, Stephen L wrote:
>> I went back and grabbed 87 and built it on RHEL7 as well, and performance is also similar (much better).  I've also run it on a few systems (Dual socket 10-core E5v2,  Dual socket 6-core E5v3).  So, it's related to my switch to RHEL7, and not to the code changes between v0.90 and v0.87.     Will post when I get more data.
>
> Stephen, you are practically writing press releases for the RHEL guys 
> here! ;)
>
> Mark
>
>>
>> Thanks,
>>
>> Stephen
>>
>> -----Original Message-----
>> From: ceph-devel-owner@vger.kernel.org 
>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick, 
>> Stephen L
>> Sent: Wednesday, January 14, 2015 12:06 AM
>> To: Ceph Development
>> Subject: Memstore performance improvements v0.90 vs v0.87
>>
>> In the process of moving to a new cluster (RHEL7 based) I grabbed v0.90, compiled RPM's and re-ran the simple local-node memstore test I've run on .80 - .87.  It's a single Memstore OSD and a single Rados Bench client locally on the same node.  Increasing queue depth and measuring latency /IOPS.  So far, the measurements have been consistent across different hardware and code releases (with about a 30% improvement with the OpWQ Sharding changes that came in after Firefly).
>>
>> These are just very early results, but I'm seeing a very large improvement in latency and throughput with v90 on RHEL7.   Next  I'm working to get lttng installed and working in RHEL7 to determine where the improvement is.   On previous levels, these measurements have been roughly the same using a real (fast) backend (i.e. NVMe flash), and I will verify here as well.   Just wondering if anyone else has measured similar improvements?
>>
>>
>> 100% Reads or Writes, 4K Objects, Rados Bench
>>
>> ========================
>> V0.87: Ubuntu 14.04LTS
>>
>> *Writes*
>> #Thr	IOPS	Latency(ms)
>> 1	618.80		1.61
>> 2	1401.70		1.42
>> 4	3962.73		1.00
>> 8	7354.37		1.10
>> 16	7654.67		2.10
>> 32	7320.33		4.37
>> 64	7424.27		8.62
>>
>> *Reads*
>> #thr	IOPS	Latency(ms)
>> 1	837.57		1.19
>> 2	1950.00		1.02
>> 4	6494.03		0.61
>> 8	7243.53		1.10
>> 16	7473.73		2.14
>> 32	7682.80		4.16
>> 64	7727.10		8.28
>>
>>
>> ========================
>> V0.90:  RHEL7
>>
>> *Writes*
>> #Thr	IOPS	Latency(ms)
>> 1	2558.53		0.39
>> 2	6014.67		0.33
>> 4	10061.33	0.40
>> 8	14169.60	0.56
>> 16	14355.63	1.11
>> 32	14150.30	2.26
>> 64	15283.33	4.19
>>
>> *Reads*
>> #Thr	IOPS	Latency(ms)
>> 1	4535.63		0.22
>> 2	9969.73		0.20
>> 4	17049.43	0.23
>> 8	19909.70	0.40
>> 16	20320.80	0.79
>> 32	19827.93	1.61
>> 64	22371.17	2.86
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to majordomo@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-01-28  1:23         ` Blinick, Stephen L
@ 2015-01-28 21:51           ` Mark Nelson
  2015-01-29 12:51             ` James Page
  0 siblings, 1 reply; 25+ messages in thread
From: Mark Nelson @ 2015-01-28 21:51 UTC (permalink / raw)
  To: Blinick, Stephen L, Ceph Development

[-- Attachment #1: Type: text/plain, Size: 7498 bytes --]

Per Sage's suggestion in the perf meeting this morning I dumped sysctl 
-a on both systems and wrote a little script to compare an arbitrary 
number of sysctl output files.  It only lists settings that have 
different values and dumps out a csv.

So far it looks like the interesting differences are in:

scheduler
numa
ipv4 (and ipv6)
vm

Script is here:

https://github.com/ceph/ceph-tools/blob/master/cbt/tools/compare_sysctl.py

Mark

On 01/27/2015 07:23 PM, Blinick, Stephen L wrote:
> Hi Mark --thanks for the detailed description!  Here's my latency #'s (local ping) on identical hardware
>
> Ubuntu 14.04LTS:  rtt min/avg/max/mdev    0.025/0.026/0.030/0.005 ms
> RHEL7:                        rtt min/avg/max/mdev    0.008/0.009/0.022/0.003ms
>
> So I am seeing a similar network stack latency difference.   Also, all the tests I did were with 'debug off' (but with other things such as message signing, crc. ) .  Maybe we could have a quick discussion on what settings are best to use when trying to get comparable numbers with memstore or all-flash setups.
>
> As far as the high concurrency test goes, that peak # of IOPS will be reached at lower concurrency (something around like t=8 probably), and at that point (the 'knee' of the latency/throughput curve), there's a pretty substantial latency difference.     Once it gets to t=256 I imagine the latency was 10+ms for both platforms.
>
> Since the last direct comparison was on older code, and the mixing of libnss/cryptopp in the builds, I think I need to rerun the comparison(at least one last time!) between the two distro's on a more recent version of code.
>
> Thanks,
>
> Stephen
>
>
>
> -----Original Message-----
> From: Mark Nelson [mailto:mark.nelson@inktank.com]
> Sent: Tuesday, January 27, 2015 2:03 PM
> To: Blinick, Stephen L; Ceph Development
> Subject: Re: Memstore performance improvements v0.90 vs v0.87
>
> Hi Stephen,
>
> Took a little longer than I wanted it to, but I finally got some results looking at RHEL7 and Ubuntu 14.04 in our test lab.  This is with a recent master pull.
>
> Tests are with rados bench to a single memstore OSD on localhost.
>
> Single Op Avg Write Latency:
>
> Ubuntu 14.04:            0.91ms
> Ubuntu 14.04 (no debug): 0.67ms
> RHEL 7:                  0.49ms
> RHEL 7 (no debug):       0.31ms
>
> Single Op Avg read Latency:
>
> Ubuntu 14.04:            0.58ms
> Ubuntu 14.04 (no debug): 0.33ms
> RHEL 7:                  0.32ms
> RHEL 7 (no debug):       0.17ms
>
> I then checked avg network latency to localhost using ping for 120s:
>
> Ubuntu 14.04: 0.025ms
> RHEL 7:       0.015ms
>
> So looking at your results, I see similar latency numbers, though not quite as dramatic (ie  Ubuntu isn't quite so bad).  I wanted to know if the latency would be hidden if enough IOs were thrown at the problem so I increased concurrent IOs to 256:
>
> 256 concurrent op Write IOPS:
>
> Ubuntu 14.04:             7199 IOPS
> Ubuntu 14.04 (no debug): 14613 IOPS
> RHEL 7:                   7784 IOPS
> REHL 7 (no debug):       17907 IOPS
>
> 256 concurrent op Read IOPS:
>
> Ubuntu 14.04:             9887 IOPS
> Ubuntu 14.04 (no debug): 20489 IOPS
> RHEL 7:                  10832 IOPS
> REHL 7 (no debug):       21257 IOPS
>
> So on one hand I'm seeing an effect similar to what you saw, but once I throw enough concurrency at the problem it seems like other things take over as the bottleneck.  With default debug logging levels the latency difference is mostly masked, but with debugging off we see at least for writes a fairly substantial difference.
>
> I collected some system utilization data during the tests and will go back and see if I can discover anything more with perf as well.  I think the two big takeaways at this point are:
>
> 1) There is definitely something interesting going on with Ubuntu vs RHEL (Maybe network related).
> 2) Our debug logging has become a major bottleneck in high IOPS scenarios (though we already kind of knew this).
>
> Mark
>
> On 01/14/2015 05:39 PM, Blinick, Stephen L wrote:
>> Haha :)  Well, my intuition is still pointing to something I've configured wrong (or had wrong).. but it will be interesting to see what it is.
>>
>> -----Original Message-----
>> From: Mark Nelson [mailto:mark.nelson@inktank.com]
>> Sent: Wednesday, January 14, 2015 3:43 PM
>> To: Blinick, Stephen L; Ceph Development
>> Subject: Re: Memstore performance improvements v0.90 vs v0.87
>>
>> On 01/14/2015 04:32 PM, Blinick, Stephen L wrote:
>>> I went back and grabbed 87 and built it on RHEL7 as well, and performance is also similar (much better).  I've also run it on a few systems (Dual socket 10-core E5v2,  Dual socket 6-core E5v3).  So, it's related to my switch to RHEL7, and not to the code changes between v0.90 and v0.87.     Will post when I get more data.
>>
>> Stephen, you are practically writing press releases for the RHEL guys
>> here! ;)
>>
>> Mark
>>
>>>
>>> Thanks,
>>>
>>> Stephen
>>>
>>> -----Original Message-----
>>> From: ceph-devel-owner@vger.kernel.org
>>> [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Blinick,
>>> Stephen L
>>> Sent: Wednesday, January 14, 2015 12:06 AM
>>> To: Ceph Development
>>> Subject: Memstore performance improvements v0.90 vs v0.87
>>>
>>> In the process of moving to a new cluster (RHEL7 based) I grabbed v0.90, compiled RPM's and re-ran the simple local-node memstore test I've run on .80 - .87.  It's a single Memstore OSD and a single Rados Bench client locally on the same node.  Increasing queue depth and measuring latency /IOPS.  So far, the measurements have been consistent across different hardware and code releases (with about a 30% improvement with the OpWQ Sharding changes that came in after Firefly).
>>>
>>> These are just very early results, but I'm seeing a very large improvement in latency and throughput with v90 on RHEL7.   Next  I'm working to get lttng installed and working in RHEL7 to determine where the improvement is.   On previous levels, these measurements have been roughly the same using a real (fast) backend (i.e. NVMe flash), and I will verify here as well.   Just wondering if anyone else has measured similar improvements?
>>>
>>>
>>> 100% Reads or Writes, 4K Objects, Rados Bench
>>>
>>> ========================
>>> V0.87: Ubuntu 14.04LTS
>>>
>>> *Writes*
>>> #Thr	IOPS	Latency(ms)
>>> 1	618.80		1.61
>>> 2	1401.70		1.42
>>> 4	3962.73		1.00
>>> 8	7354.37		1.10
>>> 16	7654.67		2.10
>>> 32	7320.33		4.37
>>> 64	7424.27		8.62
>>>
>>> *Reads*
>>> #thr	IOPS	Latency(ms)
>>> 1	837.57		1.19
>>> 2	1950.00		1.02
>>> 4	6494.03		0.61
>>> 8	7243.53		1.10
>>> 16	7473.73		2.14
>>> 32	7682.80		4.16
>>> 64	7727.10		8.28
>>>
>>>
>>> ========================
>>> V0.90:  RHEL7
>>>
>>> *Writes*
>>> #Thr	IOPS	Latency(ms)
>>> 1	2558.53		0.39
>>> 2	6014.67		0.33
>>> 4	10061.33	0.40
>>> 8	14169.60	0.56
>>> 16	14355.63	1.11
>>> 32	14150.30	2.26
>>> 64	15283.33	4.19
>>>
>>> *Reads*
>>> #Thr	IOPS	Latency(ms)
>>> 1	4535.63		0.22
>>> 2	9969.73		0.20
>>> 4	17049.43	0.23
>>> 8	19909.70	0.40
>>> 16	20320.80	0.79
>>> 32	19827.93	1.61
>>> 64	22371.17	2.86
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>>> in the body of a message to majordomo@vger.kernel.org More majordomo
>>> info at  http://vger.kernel.org/majordomo-info.html
>>>

[-- Attachment #2: ubuntu_vs_rhel7_sysctl.csv --]
[-- Type: text/csv, Size: 12993 bytes --]

"Attribute", "ubuntu14.04", "rhel7",
"dev.cdrom.lock", "1", "0",
"dev.mac_hid.mouse_button2_keycode", "", "97",
"dev.mac_hid.mouse_button3_keycode", "", "100",
"dev.mac_hid.mouse_button_emulation", "", "0",
"dev.parport.default.spintime", "", "500",
"dev.parport.default.timeslice", "", "200",
"fs.binfmt_misc.status", "enabled", "",
"fs.dentry-state", "32974	0	45	0	0	0", "36429	0	45	0	0	0",
"fs.epoll.max_user_watches", "6735032", "6736384",
"fs.file-max", "3288305", "3269520",
"fs.file-nr", "1152	0	3288305", "928	0	3269520",
"fs.inode-nr", "26372	0", "30768	439",
"fs.inode-state", "26372	0	0	0	0	0	0", "30768	439	0	0	0	0	0",
"fs.nfs.nfs_congestion_kb", "", "183552",
"fs.nfs.nfs_mountpoint_timeout", "", "500",
"fs.quota.syncs", "578", "156",
"fscache.object_max_active", "", "12",
"fscache.operation_max_active", "", "6",
"kernel.auto_msgmni", "0", "1",
"kernel.blk_iopoll", "", "1",
"kernel.cap_last_cap", "37", "36",
"kernel.core_pattern", "/tmp/cbt/ceph/core.%e.%p.magna095.%t", "/tmp/cbt/ceph/core.%e.%p.magna038.%t",
"kernel.core_uses_pid", "1", "0",
"kernel.hostname", "magna095", "magna038",
"kernel.keys.persistent_keyring_expiry", "", "259200",
"kernel.keys.root_maxbytes", "25000000", "20000",
"kernel.keys.root_maxkeys", "1000000", "200",
"kernel.kptr_restrict", "0", "1",
"kernel.msgmni", "32000", "32768",
"kernel.ns_last_pid", "16372", "27058",
"kernel.numa_balancing_migrate_deferred", "", "16",
"kernel.numa_balancing_settle_count", "", "4",
"kernel.osrelease", "3.18.0-ceph-11305-g8260a4a", "3.13.0-37-generic",
"kernel.panic_on_warn", "0", "",
"kernel.perf_event_max_sample_rate", "50000", "25000",
"kernel.printk", "15	4	1	7", "7	4	1	7",
"kernel.prove_locking", "1", "",
"kernel.pty.nr", "1", "6",
"kernel.random.boot_id", "639e52d9-b28b-4fb3-a5f5-a4bae35fd70e", "6a391510-9d7a-4c81-b9ea-56b027ffb84f",
"kernel.random.entropy_avail", "855", "1179",
"kernel.random.uuid", "416f0246-3b7b-41de-bd99-cc9d9df2e407", "15065462-dd23-4d6a-a692-85befd46c6ac",
"kernel.sched_domain.cpu0.domain0.busy_factor", "32", "64",
"kernel.sched_domain.cpu0.domain0.max_interval", "4", "2",
"kernel.sched_domain.cpu0.domain0.max_newidle_lb_cost", "1290", "",
"kernel.sched_domain.cpu0.domain0.min_interval", "2", "1",
"kernel.sched_domain.cpu0.domain0.name", "SMT", "SIBLING",
"kernel.sched_domain.cpu0.domain1.busy_factor", "32", "64",
"kernel.sched_domain.cpu0.domain1.imbalance_pct", "117", "125",
"kernel.sched_domain.cpu0.domain1.max_interval", "24", "4",
"kernel.sched_domain.cpu0.domain1.max_newidle_lb_cost", "2496", "",
"kernel.sched_domain.cpu0.domain1.min_interval", "12", "1",
"kernel.sched_domain.cpu1.domain0.busy_factor", "32", "64",
"kernel.sched_domain.cpu1.domain0.max_interval", "4", "2",
"kernel.sched_domain.cpu1.domain0.max_newidle_lb_cost", "12473", "",
"kernel.sched_domain.cpu1.domain0.min_interval", "2", "1",
"kernel.sched_domain.cpu1.domain0.name", "SMT", "SIBLING",
"kernel.sched_domain.cpu1.domain1.busy_factor", "32", "64",
"kernel.sched_domain.cpu1.domain1.imbalance_pct", "117", "125",
"kernel.sched_domain.cpu1.domain1.max_interval", "24", "4",
"kernel.sched_domain.cpu1.domain1.max_newidle_lb_cost", "11413", "",
"kernel.sched_domain.cpu1.domain1.min_interval", "12", "1",
"kernel.sched_domain.cpu10.domain0.busy_factor", "32", "64",
"kernel.sched_domain.cpu10.domain0.max_interval", "4", "2",
"kernel.sched_domain.cpu10.domain0.max_newidle_lb_cost", "4959", "",
"kernel.sched_domain.cpu10.domain0.min_interval", "2", "1",
"kernel.sched_domain.cpu10.domain0.name", "SMT", "SIBLING",
"kernel.sched_domain.cpu10.domain1.busy_factor", "32", "64",
"kernel.sched_domain.cpu10.domain1.imbalance_pct", "117", "125",
"kernel.sched_domain.cpu10.domain1.max_interval", "24", "4",
"kernel.sched_domain.cpu10.domain1.max_newidle_lb_cost", "3940", "",
"kernel.sched_domain.cpu10.domain1.min_interval", "12", "1",
"kernel.sched_domain.cpu11.domain0.busy_factor", "32", "64",
"kernel.sched_domain.cpu11.domain0.max_interval", "4", "2",
"kernel.sched_domain.cpu11.domain0.max_newidle_lb_cost", "1664", "",
"kernel.sched_domain.cpu11.domain0.min_interval", "2", "1",
"kernel.sched_domain.cpu11.domain0.name", "SMT", "SIBLING",
"kernel.sched_domain.cpu11.domain1.busy_factor", "32", "64",
"kernel.sched_domain.cpu11.domain1.imbalance_pct", "117", "125",
"kernel.sched_domain.cpu11.domain1.max_interval", "24", "4",
"kernel.sched_domain.cpu11.domain1.max_newidle_lb_cost", "13069", "",
"kernel.sched_domain.cpu11.domain1.min_interval", "12", "1",
"kernel.sched_domain.cpu2.domain0.busy_factor", "32", "64",
"kernel.sched_domain.cpu2.domain0.max_interval", "4", "2",
"kernel.sched_domain.cpu2.domain0.max_newidle_lb_cost", "2351", "",
"kernel.sched_domain.cpu2.domain0.min_interval", "2", "1",
"kernel.sched_domain.cpu2.domain0.name", "SMT", "SIBLING",
"kernel.sched_domain.cpu2.domain1.busy_factor", "32", "64",
"kernel.sched_domain.cpu2.domain1.imbalance_pct", "117", "125",
"kernel.sched_domain.cpu2.domain1.max_interval", "24", "4",
"kernel.sched_domain.cpu2.domain1.max_newidle_lb_cost", "3835", "",
"kernel.sched_domain.cpu2.domain1.min_interval", "12", "1",
"kernel.sched_domain.cpu3.domain0.busy_factor", "32", "64",
"kernel.sched_domain.cpu3.domain0.max_interval", "4", "2",
"kernel.sched_domain.cpu3.domain0.max_newidle_lb_cost", "1512", "",
"kernel.sched_domain.cpu3.domain0.min_interval", "2", "1",
"kernel.sched_domain.cpu3.domain0.name", "SMT", "SIBLING",
"kernel.sched_domain.cpu3.domain1.busy_factor", "32", "64",
"kernel.sched_domain.cpu3.domain1.imbalance_pct", "117", "125",
"kernel.sched_domain.cpu3.domain1.max_interval", "24", "4",
"kernel.sched_domain.cpu3.domain1.max_newidle_lb_cost", "4388", "",
"kernel.sched_domain.cpu3.domain1.min_interval", "12", "1",
"kernel.sched_domain.cpu4.domain0.busy_factor", "32", "64",
"kernel.sched_domain.cpu4.domain0.max_interval", "4", "2",
"kernel.sched_domain.cpu4.domain0.max_newidle_lb_cost", "1553", "",
"kernel.sched_domain.cpu4.domain0.min_interval", "2", "1",
"kernel.sched_domain.cpu4.domain0.name", "SMT", "SIBLING",
"kernel.sched_domain.cpu4.domain1.busy_factor", "32", "64",
"kernel.sched_domain.cpu4.domain1.imbalance_pct", "117", "125",
"kernel.sched_domain.cpu4.domain1.max_interval", "24", "4",
"kernel.sched_domain.cpu4.domain1.max_newidle_lb_cost", "2772", "",
"kernel.sched_domain.cpu4.domain1.min_interval", "12", "1",
"kernel.sched_domain.cpu5.domain0.busy_factor", "32", "64",
"kernel.sched_domain.cpu5.domain0.max_interval", "4", "2",
"kernel.sched_domain.cpu5.domain0.max_newidle_lb_cost", "9883", "",
"kernel.sched_domain.cpu5.domain0.min_interval", "2", "1",
"kernel.sched_domain.cpu5.domain0.name", "SMT", "SIBLING",
"kernel.sched_domain.cpu5.domain1.busy_factor", "32", "64",
"kernel.sched_domain.cpu5.domain1.imbalance_pct", "117", "125",
"kernel.sched_domain.cpu5.domain1.max_interval", "24", "4",
"kernel.sched_domain.cpu5.domain1.max_newidle_lb_cost", "16125", "",
"kernel.sched_domain.cpu5.domain1.min_interval", "12", "1",
"kernel.sched_domain.cpu6.domain0.busy_factor", "32", "64",
"kernel.sched_domain.cpu6.domain0.max_interval", "4", "2",
"kernel.sched_domain.cpu6.domain0.max_newidle_lb_cost", "3272", "",
"kernel.sched_domain.cpu6.domain0.min_interval", "2", "1",
"kernel.sched_domain.cpu6.domain0.name", "SMT", "SIBLING",
"kernel.sched_domain.cpu6.domain1.busy_factor", "32", "64",
"kernel.sched_domain.cpu6.domain1.imbalance_pct", "117", "125",
"kernel.sched_domain.cpu6.domain1.max_interval", "24", "4",
"kernel.sched_domain.cpu6.domain1.max_newidle_lb_cost", "13899", "",
"kernel.sched_domain.cpu6.domain1.min_interval", "12", "1",
"kernel.sched_domain.cpu7.domain0.busy_factor", "32", "64",
"kernel.sched_domain.cpu7.domain0.max_interval", "4", "2",
"kernel.sched_domain.cpu7.domain0.max_newidle_lb_cost", "1723", "",
"kernel.sched_domain.cpu7.domain0.min_interval", "2", "1",
"kernel.sched_domain.cpu7.domain0.name", "SMT", "SIBLING",
"kernel.sched_domain.cpu7.domain1.busy_factor", "32", "64",
"kernel.sched_domain.cpu7.domain1.imbalance_pct", "117", "125",
"kernel.sched_domain.cpu7.domain1.max_interval", "24", "4",
"kernel.sched_domain.cpu7.domain1.max_newidle_lb_cost", "2655", "",
"kernel.sched_domain.cpu7.domain1.min_interval", "12", "1",
"kernel.sched_domain.cpu8.domain0.busy_factor", "32", "64",
"kernel.sched_domain.cpu8.domain0.max_interval", "4", "2",
"kernel.sched_domain.cpu8.domain0.max_newidle_lb_cost", "5490", "",
"kernel.sched_domain.cpu8.domain0.min_interval", "2", "1",
"kernel.sched_domain.cpu8.domain0.name", "SMT", "SIBLING",
"kernel.sched_domain.cpu8.domain1.busy_factor", "32", "64",
"kernel.sched_domain.cpu8.domain1.imbalance_pct", "117", "125",
"kernel.sched_domain.cpu8.domain1.max_interval", "24", "4",
"kernel.sched_domain.cpu8.domain1.max_newidle_lb_cost", "3843", "",
"kernel.sched_domain.cpu8.domain1.min_interval", "12", "1",
"kernel.sched_domain.cpu9.domain0.busy_factor", "32", "64",
"kernel.sched_domain.cpu9.domain0.max_interval", "4", "2",
"kernel.sched_domain.cpu9.domain0.max_newidle_lb_cost", "1204", "",
"kernel.sched_domain.cpu9.domain0.min_interval", "2", "1",
"kernel.sched_domain.cpu9.domain0.name", "SMT", "SIBLING",
"kernel.sched_domain.cpu9.domain1.busy_factor", "32", "64",
"kernel.sched_domain.cpu9.domain1.imbalance_pct", "117", "125",
"kernel.sched_domain.cpu9.domain1.max_interval", "24", "4",
"kernel.sched_domain.cpu9.domain1.max_newidle_lb_cost", "13307", "",
"kernel.sched_domain.cpu9.domain1.min_interval", "12", "1",
"kernel.sched_min_granularity_ns", "10000000", "3000000",
"kernel.sched_wakeup_granularity_ns", "15000000", "4000000",
"kernel.sem", "32000	1024000000	500	32000", "250	32000	32	128",
"kernel.shmall", "268435456", "2097152",
"kernel.shmmax", "4294967295", "33554432",
"kernel.softlockup_all_cpu_backtrace", "0", "",
"kernel.sysctl_writes_strict", "0", "",
"kernel.sysrq", "16", "1",
"kernel.tainted", "512", "0",
"kernel.threads-max", "256921", "513945",
"kernel.tracepoint_printk", "0", "",
"kernel.usermodehelper.bset", "4294967295	63", "4294967295	31",
"kernel.usermodehelper.inheritable", "4294967295	63", "4294967295	31",
"kernel.version", "#1 SMP Tue Jan 13 22:34:21 EST 2015", "#64~precise1-Ubuntu SMP Wed Sep 24 21:37:11 UTC 2014",
"net.core.netdev_rss_key", "23:d8:48:f9:3c:7c:4a:e7:93:6b:88:fc:3c:45:dc:2e:51:ed:ed:c3:71:f1:59:59:d3:e8:c0:1d:29:40:19:37:d4:01:4f:28:bd:dd:c1:2f:6d:fe:65:48:1e:3a:14:24:91:22:3c:ca", "",
"net.core.warnings", "0", "1",
"net.ipv4.conf.all.rp_filter", "0", "1",
"net.ipv4.conf.default.accept_source_route", "0", "1",
"net.ipv4.conf.eth0.accept_source_route", "0", "1",
"net.ipv4.conf.eth1.accept_source_route", "0", "1",
"net.ipv4.conf.lo.rp_filter", "0", "1",
"net.ipv4.fwmark_reflect", "0", "",
"net.ipv4.icmp_msgs_burst", "50", "",
"net.ipv4.icmp_msgs_per_sec", "1000", "",
"net.ipv4.igmp_qrv", "2", "",
"net.ipv4.ip_forward_use_pmtu", "0", "",
"net.ipv4.ipfrag_secret_interval", "0", "600",
"net.ipv4.neigh.default.base_reachable_time", "", "30",
"net.ipv4.neigh.default.retrans_time", "", "100",
"net.ipv4.neigh.eth0.base_reachable_time", "", "30",
"net.ipv4.neigh.eth0.retrans_time", "", "100",
"net.ipv4.neigh.eth1.base_reachable_time", "", "30",
"net.ipv4.neigh.eth1.retrans_time", "", "100",
"net.ipv4.neigh.lo.base_reachable_time", "", "30",
"net.ipv4.neigh.lo.retrans_time", "", "100",
"net.ipv4.tcp_autocorking", "1", "",
"net.ipv4.tcp_fwmark_accept", "0", "",
"net.ipv4.tcp_max_reordering", "300", "",
"net.ipv4.tcp_mem", "768387	1024517	1536774", "770916	1027891	1541832",
"net.ipv4.udp_mem", "768387	1024517	1536774", "770916	1027891	1541832",
"net.ipv6.anycast_src_echo_reply", "0", "",
"net.ipv6.auto_flowlabels", "0", "",
"net.ipv6.conf.all.accept_ra_from_local", "0", "",
"net.ipv6.conf.all.use_tempaddr", "0", "2",
"net.ipv6.conf.default.accept_ra_from_local", "0", "",
"net.ipv6.conf.default.use_tempaddr", "0", "2",
"net.ipv6.conf.eth0.accept_ra_from_local", "0", "",
"net.ipv6.conf.eth0.use_tempaddr", "0", "2",
"net.ipv6.conf.eth1.accept_ra_defrtr", "0", "1",
"net.ipv6.conf.eth1.accept_ra_from_local", "0", "",
"net.ipv6.conf.eth1.accept_ra_pinfo", "0", "1",
"net.ipv6.conf.eth1.accept_ra_rtr_pref", "0", "1",
"net.ipv6.conf.eth1.disable_ipv6", "1", "0",
"net.ipv6.conf.eth1.use_tempaddr", "0", "2",
"net.ipv6.conf.lo.accept_ra_from_local", "0", "",
"net.ipv6.conf.lo.use_tempaddr", "-1", "2",
"net.ipv6.flowlabel_consistency", "1", "",
"net.ipv6.fwmark_reflect", "0", "",
"net.ipv6.ip6frag_secret_interval", "0", "600",
"net.ipv6.mld_qrv", "2", "",
"net.ipv6.neigh.default.base_reachable_time", "", "30",
"net.ipv6.neigh.default.retrans_time", "", "250",
"net.ipv6.neigh.eth0.base_reachable_time", "", "30",
"net.ipv6.neigh.eth0.retrans_time", "", "250",
"net.ipv6.neigh.eth1.base_reachable_time", "", "30",
"net.ipv6.neigh.eth1.retrans_time", "", "250",
"net.ipv6.neigh.lo.base_reachable_time", "", "30",
"net.ipv6.neigh.lo.retrans_time", "", "250",
"net.iw_cm.default_backlog", "", "256",
"vm.dirty_ratio", "30", "20",
"vm.overcommit_kbytes", "0", "",
"vm.scan_unevictable_pages", "", "0",
"vm.swappiness", "30", "60",

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-01-28 21:51           ` Mark Nelson
@ 2015-01-29 12:51             ` James Page
  0 siblings, 0 replies; 25+ messages in thread
From: James Page @ 2015-01-29 12:51 UTC (permalink / raw)
  To: mnelson, Blinick, Stephen L, Ceph Development

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Mark

On 28/01/15 21:51, Mark Nelson wrote:
> Per Sage's suggestion in the perf meeting this morning I dumped
> sysctl -a on both systems and wrote a little script to compare an
> arbitrary number of sysctl output files.  It only lists settings
> that have different values and dumps out a csv.
> 
> So far it looks like the interesting differences are in:
> 
> scheduler numa ipv4 (and ipv6) vm
> 
> Script is here:
> 
> https://github.com/ceph/ceph-tools/blob/master/cbt/tools/compare_sysctl.py

Useful
> 
script - thanks!

I'd like to get the Ubuntu Server and Kernel teams investigating this
disparity as well - would it be possible for you to share the
configurations you are using in detail?

That way I can make sure we are looking at the same thing...

Cheers

James

- -- 
James Page
Ubuntu and Debian Developer
james.page@ubuntu.com
jamespage@debian.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBCAAGBQJUyizjAAoJEL/srsug59jD9NgQAIKfj8w3LMYroO/GlXb3jjnu
6ponX2HTVDNaVqLaYtpevYDUbkgIIEOEcOtxxbORxGnWr7fcXzQtUcp7K0KLtU+r
/SwbEcsBRaS7im1/lqAlXOh67XSSzE0eSpqrWxCPKM6X0kmvD49G0fnTclmax/F6
beUm2Oz9UtRoB28mSIVMcUWbi2vsltbZx4khppu+1WpXqdlHr7fg9lKr2jeSbvXC
23nJ7IEd5zVdxaAxQtjjj9Gjk8TWCFICZIoHwvwOigvCnGihP3Ngnyabx7Gz9sJm
vnyibS+bXHPjIAdUGEtEanOt0lIO8TVbPZM4X3YDaCeEvaoLxqqE832EXShxQ76H
1fDHTu/OQeEyN3udYHit/xgEm3/nMg7Bwb1I4l5TtHgPq8mrXTGqI6zJv9Y1PZ/E
DhmgcTAEPx3/IWBiU1F55Ktgn6N7fIFCrTEDZ6tkuraeG5WJzqf4OEU7XP/8Z1bb
L4VrnBVgNrxDPJOO6u4YNDsacaO4rMCs2U1kQiGKruSXRKnw+RtowoIzIy8O0I6o
lspBJdLjSXXnP/MLVFifFkXyynz/uPIQbg76FA9mVIC3tc7yeXhjB/urrGbbZPuT
IsoVrlsb4IG9Ay4iRCE7yEw9gwqlEhCTtrY9WrhgA/23ihqn1+Hm/4h7ypP+AWmq
XmBsJ1VQyxHnsJgf9R+4
=Zfn0
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-01-27 21:03       ` Mark Nelson
  2015-01-28  1:23         ` Blinick, Stephen L
@ 2015-02-20  9:07         ` James Page
  2015-02-20  9:49           ` Blair Bethwaite
  2015-02-20 15:51           ` Mark Nelson
  1 sibling, 2 replies; 25+ messages in thread
From: James Page @ 2015-02-20  9:07 UTC (permalink / raw)
  To: Ceph Development
  Cc: mnelson, Blinick, Stephen L, Jay Vosburgh, Colin Ian King,
	Patricia Gaughen, Leann Ogasawara

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi All

The Ubuntu Kernel team have spent the last few weeks investigating the
apparent performance disparity between RHEL 7 and Ubuntu 14.04; we've
focussed efforts in a few ways (see below).

All testing has been done using the latest Firefly release.

1) Base network latency

Jay Vosburgh looked at the base network latencies between RHEL 7 and
Ubuntu 14.04; under default install, RHEL actually had slightly worse
latency than Ubuntu due to the default enablement of a firewall;
disabling this brought latency back inline between the two distributions:

OS                      rtt min/avg/max/mdev
Ubuntu 14.04 (3.13)     0.013/0.016/0.018/0.005 ms
RHEL7 (3.10)            0.010/0.018/0.025/0.005 ms

...base network latency is pretty much the same.

This testing was performed on a matched pair of Dell Poweredge R610's,
configured with a single 4 core CPU and 8G of RAM.

2) Latency and performance in Ceph using Rados bench

Colin King spent a number of days testing and analysing results using
rados bench against a single node ceph deployment, configured with a
single memory backed OSD, to see if we could reproduce the disparities
reported.

He ran 120 second OSD benchmarks on RHEL 7 as well as Ubuntu 14.04 LTS
with a selection of kernels including 3.10 vanilla, 3.13.0-44 (release
kernel), 3.16.0-30 (utopic HWE kernel), 3.18.0-12 (vivid HWE kernel)
and 3.19-rc6 with 1, 16 and 128 client threads.  The data collected is
available at [0].

Each round of tests consisted of 15 runs, from which we computed
average latency, latency deviation and latency distribution:

> 120 second x 1 thread

Results all seem to cluster around 0.04->0.05ms, with RHEL 7 averaging
at 0.044 and recent Ubuntu kernels at 0.036-0.037ms.  The older 3.10
kernel in RHEL 7 does have some slightly higher average latency.

> 120 second x 16 threads

Results all seem to cluster around 0.6-0.7ms.  3.19.0-rc6 had a couple
of 1.4ms outliers which pushed it out to be worse than RHEL 7. On the
whole Ubuntu 3.10-3.18 kernels are better than RHEL 7 by ~0.1ms.  RHEL
shows a far higher standard deviation, due to the bimodal latency
distribution, which from the casual observer may appear to be more
"jittery".

> 120 second x 128 threads

Later kernels show up to have less standard deviation than RHEL 7, so
that shows perhaps less jitter in the stats than RHEL 7's 3.10 kernel.
With this many threads pounding the test, we get a wider spread of
latencies and it is hard to tell any kind of latency distribution
patterns with just 15 rounds because of the large amount of latency
jitter.  All systems show a latency of ~ 5ms.  Taking into
consideration the amount of jitter, we think these results do not make
much sense unless we repeat these tests with say 100 samples.

3) Conclusion

We’ve have not been able to show any major anomalies in Ceph on Ubuntu
compared to RHEL 7 when using memstore.  Our current hypothesis is that
one needs to run the OSD bench stressor many times to get a fair capture
of system latency stats.  The reason for this is:

* Latencies are very low with memstore, so any small jitter in
scheduling etc will show up as a large distortion (as shown by the large
standard deviations in the samples).

* When memstore is heavily utilized, memory pressure causes the system
to page heavily and so we are subject to the nature of perhaps delays on
paging that cause some latency jitters.  Latency differences may be just
down to where a random page is in memory or in swap, and with memstore
these may cause the large perturbations we see when running just a
single test.

* We needed to make *many* tens of measurements to get a typical idea of
average latency and the latency distributions. Don't trust the results
from just one test

* We ran the tests with a pool configured to 100 pgs and 100 pgps [1].
One can get different results with different placement group configs.

I've CC'ed both Colin and Jay on this mail - so if anyone has any
specific questions about the testing they can chime in with responses.

Regards

James

[0] http://kernel.ubuntu.com/~cking/.ceph/ceph-benchmarks.ods
[1] http://ceph.com/docs/master/rados/configuration/pool-pg-config-ref/

- -- 
James Page
Ubuntu and Debian Developer
james.page@ubuntu.com
jamespage@debian.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBCAAGBQJU5vlrAAoJEL/srsug59jDMvAQAIhSR4GFTXNc4RLpHtLT6h/X
K5uyauKZGtL+wqtPKRfsXqbbUw9I5AZDifQuOEJ0APccLIPbgqxEN3d2uht/qygH
G8q2Ax+M8OyZz07yqTitnD4JV3RmL8wNHUveWPLV0gs2TzBBYwP1ywExbRPed3PY
cfDrszgkQszA/JwT5W5YNf1LZc+5VpOEFrTiLIaRzUDoxg7mm6Hwr3XT8OFjZhjm
LSenKREHtrKKWoBh+OKTvuCUnHzEemK+CiwwRbNQ8l7xbp71wLyS08NpSB5C1y70
7uft+kP6XOGE9AKLvsdEL1PIXHfeKNonBEN5mO6nsXIW+MQzou01zHgDtne7AxDA
5OebQJfJtArmKt78WHuVg7h8gPcIRTRSW43LqJiADnIHL8fnZxj2v5yDiUQj7isw
nYWXEJ3rR7mlVgydN34KQ7gpVWmGjhrVb8N01+zYOMAaTBnekldHdueEAXR07eU0
PXiP9aOZiAxbEnDiJmreehjCuNFTagQqNeECRIHssSacfQXPxVljaImvuSfrxf8i
myQLzftiObINTIHSN4TVDKMyveYrU2hILCKfYuxnSJh29j35wsRSeftjntOEyHai
RDnrLD3fCPk4h3hCY6l60nqu9MQfbgdSB/FItvhiBGYqXvGb4+wuBeU9RT9SwG8N
XPih7nLNvqDNw38IkkDN
=qcvG
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-02-20  9:07         ` James Page
@ 2015-02-20  9:49           ` Blair Bethwaite
  2015-02-20 10:09             ` Haomai Wang
  2015-02-20 15:38             ` Mark Nelson
  2015-02-20 15:51           ` Mark Nelson
  1 sibling, 2 replies; 25+ messages in thread
From: Blair Bethwaite @ 2015-02-20  9:49 UTC (permalink / raw)
  To: James Page
  Cc: Ceph Development, mnelson, Blinick, Stephen L, Jay Vosburgh,
	Colin Ian King, Patricia Gaughen, Leann Ogasawara

Hi James,

Interesting results, but did you do any tests with a NUMA system? IIUC
the original report was from a dual socket setup, and that'd
presumably be the standard setup for most folks (both OSD server and
client side).

Cheers,

On 20 February 2015 at 20:07, James Page <james.page@ubuntu.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hi All
>
> The Ubuntu Kernel team have spent the last few weeks investigating the
> apparent performance disparity between RHEL 7 and Ubuntu 14.04; we've
> focussed efforts in a few ways (see below).
>
> All testing has been done using the latest Firefly release.
>
> 1) Base network latency
>
> Jay Vosburgh looked at the base network latencies between RHEL 7 and
> Ubuntu 14.04; under default install, RHEL actually had slightly worse
> latency than Ubuntu due to the default enablement of a firewall;
> disabling this brought latency back inline between the two distributions:
>
> OS                      rtt min/avg/max/mdev
> Ubuntu 14.04 (3.13)     0.013/0.016/0.018/0.005 ms
> RHEL7 (3.10)            0.010/0.018/0.025/0.005 ms
>
> ...base network latency is pretty much the same.
>
> This testing was performed on a matched pair of Dell Poweredge R610's,
> configured with a single 4 core CPU and 8G of RAM.
>
> 2) Latency and performance in Ceph using Rados bench
>
> Colin King spent a number of days testing and analysing results using
> rados bench against a single node ceph deployment, configured with a
> single memory backed OSD, to see if we could reproduce the disparities
> reported.
>
> He ran 120 second OSD benchmarks on RHEL 7 as well as Ubuntu 14.04 LTS
> with a selection of kernels including 3.10 vanilla, 3.13.0-44 (release
> kernel), 3.16.0-30 (utopic HWE kernel), 3.18.0-12 (vivid HWE kernel)
> and 3.19-rc6 with 1, 16 and 128 client threads.  The data collected is
> available at [0].
>
> Each round of tests consisted of 15 runs, from which we computed
> average latency, latency deviation and latency distribution:
>
>> 120 second x 1 thread
>
> Results all seem to cluster around 0.04->0.05ms, with RHEL 7 averaging
> at 0.044 and recent Ubuntu kernels at 0.036-0.037ms.  The older 3.10
> kernel in RHEL 7 does have some slightly higher average latency.
>
>> 120 second x 16 threads
>
> Results all seem to cluster around 0.6-0.7ms.  3.19.0-rc6 had a couple
> of 1.4ms outliers which pushed it out to be worse than RHEL 7. On the
> whole Ubuntu 3.10-3.18 kernels are better than RHEL 7 by ~0.1ms.  RHEL
> shows a far higher standard deviation, due to the bimodal latency
> distribution, which from the casual observer may appear to be more
> "jittery".
>
>> 120 second x 128 threads
>
> Later kernels show up to have less standard deviation than RHEL 7, so
> that shows perhaps less jitter in the stats than RHEL 7's 3.10 kernel.
> With this many threads pounding the test, we get a wider spread of
> latencies and it is hard to tell any kind of latency distribution
> patterns with just 15 rounds because of the large amount of latency
> jitter.  All systems show a latency of ~ 5ms.  Taking into
> consideration the amount of jitter, we think these results do not make
> much sense unless we repeat these tests with say 100 samples.
>
> 3) Conclusion
>
> We’ve have not been able to show any major anomalies in Ceph on Ubuntu
> compared to RHEL 7 when using memstore.  Our current hypothesis is that
> one needs to run the OSD bench stressor many times to get a fair capture
> of system latency stats.  The reason for this is:
>
> * Latencies are very low with memstore, so any small jitter in
> scheduling etc will show up as a large distortion (as shown by the large
> standard deviations in the samples).
>
> * When memstore is heavily utilized, memory pressure causes the system
> to page heavily and so we are subject to the nature of perhaps delays on
> paging that cause some latency jitters.  Latency differences may be just
> down to where a random page is in memory or in swap, and with memstore
> these may cause the large perturbations we see when running just a
> single test.
>
> * We needed to make *many* tens of measurements to get a typical idea of
> average latency and the latency distributions. Don't trust the results
> from just one test
>
> * We ran the tests with a pool configured to 100 pgs and 100 pgps [1].
> One can get different results with different placement group configs.
>
> I've CC'ed both Colin and Jay on this mail - so if anyone has any
> specific questions about the testing they can chime in with responses.
>
> Regards
>
> James
>
> [0] http://kernel.ubuntu.com/~cking/.ceph/ceph-benchmarks.ods
> [1] http://ceph.com/docs/master/rados/configuration/pool-pg-config-ref/
>
> - --
> James Page
> Ubuntu and Debian Developer
> james.page@ubuntu.com
> jamespage@debian.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
>
> iQIcBAEBCAAGBQJU5vlrAAoJEL/srsug59jDMvAQAIhSR4GFTXNc4RLpHtLT6h/X
> K5uyauKZGtL+wqtPKRfsXqbbUw9I5AZDifQuOEJ0APccLIPbgqxEN3d2uht/qygH
> G8q2Ax+M8OyZz07yqTitnD4JV3RmL8wNHUveWPLV0gs2TzBBYwP1ywExbRPed3PY
> cfDrszgkQszA/JwT5W5YNf1LZc+5VpOEFrTiLIaRzUDoxg7mm6Hwr3XT8OFjZhjm
> LSenKREHtrKKWoBh+OKTvuCUnHzEemK+CiwwRbNQ8l7xbp71wLyS08NpSB5C1y70
> 7uft+kP6XOGE9AKLvsdEL1PIXHfeKNonBEN5mO6nsXIW+MQzou01zHgDtne7AxDA
> 5OebQJfJtArmKt78WHuVg7h8gPcIRTRSW43LqJiADnIHL8fnZxj2v5yDiUQj7isw
> nYWXEJ3rR7mlVgydN34KQ7gpVWmGjhrVb8N01+zYOMAaTBnekldHdueEAXR07eU0
> PXiP9aOZiAxbEnDiJmreehjCuNFTagQqNeECRIHssSacfQXPxVljaImvuSfrxf8i
> myQLzftiObINTIHSN4TVDKMyveYrU2hILCKfYuxnSJh29j35wsRSeftjntOEyHai
> RDnrLD3fCPk4h3hCY6l60nqu9MQfbgdSB/FItvhiBGYqXvGb4+wuBeU9RT9SwG8N
> XPih7nLNvqDNw38IkkDN
> =qcvG
> -----END PGP SIGNATURE-----
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Cheers,
~Blairo
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-02-20  9:49           ` Blair Bethwaite
@ 2015-02-20 10:09             ` Haomai Wang
  2015-02-20 15:38             ` Mark Nelson
  1 sibling, 0 replies; 25+ messages in thread
From: Haomai Wang @ 2015-02-20 10:09 UTC (permalink / raw)
  To: Blair Bethwaite
  Cc: James Page, Ceph Development, mnelson, Blinick, Stephen L,
	Jay Vosburgh, Colin Ian King, Patricia Gaughen, Leann Ogasawara

Actually, I'm concerned about the correctness of benchmark using
MemStore. AFAR it may cause lots of memory frag and cause performance
degraded hugely. Maybe set "filestore_blackhole=true" is more
precious?


On Fri, Feb 20, 2015 at 5:49 PM, Blair Bethwaite
<blair.bethwaite@gmail.com> wrote:
> Hi James,
>
> Interesting results, but did you do any tests with a NUMA system? IIUC
> the original report was from a dual socket setup, and that'd
> presumably be the standard setup for most folks (both OSD server and
> client side).
>
> Cheers,
>
> On 20 February 2015 at 20:07, James Page <james.page@ubuntu.com> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> Hi All
>>
>> The Ubuntu Kernel team have spent the last few weeks investigating the
>> apparent performance disparity between RHEL 7 and Ubuntu 14.04; we've
>> focussed efforts in a few ways (see below).
>>
>> All testing has been done using the latest Firefly release.
>>
>> 1) Base network latency
>>
>> Jay Vosburgh looked at the base network latencies between RHEL 7 and
>> Ubuntu 14.04; under default install, RHEL actually had slightly worse
>> latency than Ubuntu due to the default enablement of a firewall;
>> disabling this brought latency back inline between the two distributions:
>>
>> OS                      rtt min/avg/max/mdev
>> Ubuntu 14.04 (3.13)     0.013/0.016/0.018/0.005 ms
>> RHEL7 (3.10)            0.010/0.018/0.025/0.005 ms
>>
>> ...base network latency is pretty much the same.
>>
>> This testing was performed on a matched pair of Dell Poweredge R610's,
>> configured with a single 4 core CPU and 8G of RAM.
>>
>> 2) Latency and performance in Ceph using Rados bench
>>
>> Colin King spent a number of days testing and analysing results using
>> rados bench against a single node ceph deployment, configured with a
>> single memory backed OSD, to see if we could reproduce the disparities
>> reported.
>>
>> He ran 120 second OSD benchmarks on RHEL 7 as well as Ubuntu 14.04 LTS
>> with a selection of kernels including 3.10 vanilla, 3.13.0-44 (release
>> kernel), 3.16.0-30 (utopic HWE kernel), 3.18.0-12 (vivid HWE kernel)
>> and 3.19-rc6 with 1, 16 and 128 client threads.  The data collected is
>> available at [0].
>>
>> Each round of tests consisted of 15 runs, from which we computed
>> average latency, latency deviation and latency distribution:
>>
>>> 120 second x 1 thread
>>
>> Results all seem to cluster around 0.04->0.05ms, with RHEL 7 averaging
>> at 0.044 and recent Ubuntu kernels at 0.036-0.037ms.  The older 3.10
>> kernel in RHEL 7 does have some slightly higher average latency.
>>
>>> 120 second x 16 threads
>>
>> Results all seem to cluster around 0.6-0.7ms.  3.19.0-rc6 had a couple
>> of 1.4ms outliers which pushed it out to be worse than RHEL 7. On the
>> whole Ubuntu 3.10-3.18 kernels are better than RHEL 7 by ~0.1ms.  RHEL
>> shows a far higher standard deviation, due to the bimodal latency
>> distribution, which from the casual observer may appear to be more
>> "jittery".
>>
>>> 120 second x 128 threads
>>
>> Later kernels show up to have less standard deviation than RHEL 7, so
>> that shows perhaps less jitter in the stats than RHEL 7's 3.10 kernel.
>> With this many threads pounding the test, we get a wider spread of
>> latencies and it is hard to tell any kind of latency distribution
>> patterns with just 15 rounds because of the large amount of latency
>> jitter.  All systems show a latency of ~ 5ms.  Taking into
>> consideration the amount of jitter, we think these results do not make
>> much sense unless we repeat these tests with say 100 samples.
>>
>> 3) Conclusion
>>
>> We’ve have not been able to show any major anomalies in Ceph on Ubuntu
>> compared to RHEL 7 when using memstore.  Our current hypothesis is that
>> one needs to run the OSD bench stressor many times to get a fair capture
>> of system latency stats.  The reason for this is:
>>
>> * Latencies are very low with memstore, so any small jitter in
>> scheduling etc will show up as a large distortion (as shown by the large
>> standard deviations in the samples).
>>
>> * When memstore is heavily utilized, memory pressure causes the system
>> to page heavily and so we are subject to the nature of perhaps delays on
>> paging that cause some latency jitters.  Latency differences may be just
>> down to where a random page is in memory or in swap, and with memstore
>> these may cause the large perturbations we see when running just a
>> single test.
>>
>> * We needed to make *many* tens of measurements to get a typical idea of
>> average latency and the latency distributions. Don't trust the results
>> from just one test
>>
>> * We ran the tests with a pool configured to 100 pgs and 100 pgps [1].
>> One can get different results with different placement group configs.
>>
>> I've CC'ed both Colin and Jay on this mail - so if anyone has any
>> specific questions about the testing they can chime in with responses.
>>
>> Regards
>>
>> James
>>
>> [0] http://kernel.ubuntu.com/~cking/.ceph/ceph-benchmarks.ods
>> [1] http://ceph.com/docs/master/rados/configuration/pool-pg-config-ref/
>>
>> - --
>> James Page
>> Ubuntu and Debian Developer
>> james.page@ubuntu.com
>> jamespage@debian.org
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1
>>
>> iQIcBAEBCAAGBQJU5vlrAAoJEL/srsug59jDMvAQAIhSR4GFTXNc4RLpHtLT6h/X
>> K5uyauKZGtL+wqtPKRfsXqbbUw9I5AZDifQuOEJ0APccLIPbgqxEN3d2uht/qygH
>> G8q2Ax+M8OyZz07yqTitnD4JV3RmL8wNHUveWPLV0gs2TzBBYwP1ywExbRPed3PY
>> cfDrszgkQszA/JwT5W5YNf1LZc+5VpOEFrTiLIaRzUDoxg7mm6Hwr3XT8OFjZhjm
>> LSenKREHtrKKWoBh+OKTvuCUnHzEemK+CiwwRbNQ8l7xbp71wLyS08NpSB5C1y70
>> 7uft+kP6XOGE9AKLvsdEL1PIXHfeKNonBEN5mO6nsXIW+MQzou01zHgDtne7AxDA
>> 5OebQJfJtArmKt78WHuVg7h8gPcIRTRSW43LqJiADnIHL8fnZxj2v5yDiUQj7isw
>> nYWXEJ3rR7mlVgydN34KQ7gpVWmGjhrVb8N01+zYOMAaTBnekldHdueEAXR07eU0
>> PXiP9aOZiAxbEnDiJmreehjCuNFTagQqNeECRIHssSacfQXPxVljaImvuSfrxf8i
>> myQLzftiObINTIHSN4TVDKMyveYrU2hILCKfYuxnSJh29j35wsRSeftjntOEyHai
>> RDnrLD3fCPk4h3hCY6l60nqu9MQfbgdSB/FItvhiBGYqXvGb4+wuBeU9RT9SwG8N
>> XPih7nLNvqDNw38IkkDN
>> =qcvG
>> -----END PGP SIGNATURE-----
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Cheers,
> ~Blairo
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-02-20  9:49           ` Blair Bethwaite
  2015-02-20 10:09             ` Haomai Wang
@ 2015-02-20 15:38             ` Mark Nelson
       [not found]               ` <524687337.1545267.1424448115086.JavaMail.zimbra@oxygem.tv>
  1 sibling, 1 reply; 25+ messages in thread
From: Mark Nelson @ 2015-02-20 15:38 UTC (permalink / raw)
  To: Blair Bethwaite, James Page
  Cc: Ceph Development, Blinick, Stephen L, Jay Vosburgh,
	Colin Ian King, Patricia Gaughen, Leann Ogasawara

I think paying attention to NUMA is good advice.  One of the things that 
apparently changed in RHEL7 is that they are now doing automatic NUMA 
tuning:

http://rhelblog.redhat.com/2015/01/12/mysteries-of-numa-memory-management-revealed/

It's possible that this could be having an effect on the results.

Mark

On 02/20/2015 03:49 AM, Blair Bethwaite wrote:
> Hi James,
>
> Interesting results, but did you do any tests with a NUMA system? IIUC
> the original report was from a dual socket setup, and that'd
> presumably be the standard setup for most folks (both OSD server and
> client side).
>
> Cheers,
>
> On 20 February 2015 at 20:07, James Page <james.page@ubuntu.com> wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> Hi All
>>
>> The Ubuntu Kernel team have spent the last few weeks investigating the
>> apparent performance disparity between RHEL 7 and Ubuntu 14.04; we've
>> focussed efforts in a few ways (see below).
>>
>> All testing has been done using the latest Firefly release.
>>
>> 1) Base network latency
>>
>> Jay Vosburgh looked at the base network latencies between RHEL 7 and
>> Ubuntu 14.04; under default install, RHEL actually had slightly worse
>> latency than Ubuntu due to the default enablement of a firewall;
>> disabling this brought latency back inline between the two distributions:
>>
>> OS                      rtt min/avg/max/mdev
>> Ubuntu 14.04 (3.13)     0.013/0.016/0.018/0.005 ms
>> RHEL7 (3.10)            0.010/0.018/0.025/0.005 ms
>>
>> ...base network latency is pretty much the same.
>>
>> This testing was performed on a matched pair of Dell Poweredge R610's,
>> configured with a single 4 core CPU and 8G of RAM.
>>
>> 2) Latency and performance in Ceph using Rados bench
>>
>> Colin King spent a number of days testing and analysing results using
>> rados bench against a single node ceph deployment, configured with a
>> single memory backed OSD, to see if we could reproduce the disparities
>> reported.
>>
>> He ran 120 second OSD benchmarks on RHEL 7 as well as Ubuntu 14.04 LTS
>> with a selection of kernels including 3.10 vanilla, 3.13.0-44 (release
>> kernel), 3.16.0-30 (utopic HWE kernel), 3.18.0-12 (vivid HWE kernel)
>> and 3.19-rc6 with 1, 16 and 128 client threads.  The data collected is
>> available at [0].
>>
>> Each round of tests consisted of 15 runs, from which we computed
>> average latency, latency deviation and latency distribution:
>>
>>> 120 second x 1 thread
>>
>> Results all seem to cluster around 0.04->0.05ms, with RHEL 7 averaging
>> at 0.044 and recent Ubuntu kernels at 0.036-0.037ms.  The older 3.10
>> kernel in RHEL 7 does have some slightly higher average latency.
>>
>>> 120 second x 16 threads
>>
>> Results all seem to cluster around 0.6-0.7ms.  3.19.0-rc6 had a couple
>> of 1.4ms outliers which pushed it out to be worse than RHEL 7. On the
>> whole Ubuntu 3.10-3.18 kernels are better than RHEL 7 by ~0.1ms.  RHEL
>> shows a far higher standard deviation, due to the bimodal latency
>> distribution, which from the casual observer may appear to be more
>> "jittery".
>>
>>> 120 second x 128 threads
>>
>> Later kernels show up to have less standard deviation than RHEL 7, so
>> that shows perhaps less jitter in the stats than RHEL 7's 3.10 kernel.
>> With this many threads pounding the test, we get a wider spread of
>> latencies and it is hard to tell any kind of latency distribution
>> patterns with just 15 rounds because of the large amount of latency
>> jitter.  All systems show a latency of ~ 5ms.  Taking into
>> consideration the amount of jitter, we think these results do not make
>> much sense unless we repeat these tests with say 100 samples.
>>
>> 3) Conclusion
>>
>> We’ve have not been able to show any major anomalies in Ceph on Ubuntu
>> compared to RHEL 7 when using memstore.  Our current hypothesis is that
>> one needs to run the OSD bench stressor many times to get a fair capture
>> of system latency stats.  The reason for this is:
>>
>> * Latencies are very low with memstore, so any small jitter in
>> scheduling etc will show up as a large distortion (as shown by the large
>> standard deviations in the samples).
>>
>> * When memstore is heavily utilized, memory pressure causes the system
>> to page heavily and so we are subject to the nature of perhaps delays on
>> paging that cause some latency jitters.  Latency differences may be just
>> down to where a random page is in memory or in swap, and with memstore
>> these may cause the large perturbations we see when running just a
>> single test.
>>
>> * We needed to make *many* tens of measurements to get a typical idea of
>> average latency and the latency distributions. Don't trust the results
>> from just one test
>>
>> * We ran the tests with a pool configured to 100 pgs and 100 pgps [1].
>> One can get different results with different placement group configs.
>>
>> I've CC'ed both Colin and Jay on this mail - so if anyone has any
>> specific questions about the testing they can chime in with responses.
>>
>> Regards
>>
>> James
>>
>> [0] http://kernel.ubuntu.com/~cking/.ceph/ceph-benchmarks.ods
>> [1] http://ceph.com/docs/master/rados/configuration/pool-pg-config-ref/
>>
>> - --
>> James Page
>> Ubuntu and Debian Developer
>> james.page@ubuntu.com
>> jamespage@debian.org
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1
>>
>> iQIcBAEBCAAGBQJU5vlrAAoJEL/srsug59jDMvAQAIhSR4GFTXNc4RLpHtLT6h/X
>> K5uyauKZGtL+wqtPKRfsXqbbUw9I5AZDifQuOEJ0APccLIPbgqxEN3d2uht/qygH
>> G8q2Ax+M8OyZz07yqTitnD4JV3RmL8wNHUveWPLV0gs2TzBBYwP1ywExbRPed3PY
>> cfDrszgkQszA/JwT5W5YNf1LZc+5VpOEFrTiLIaRzUDoxg7mm6Hwr3XT8OFjZhjm
>> LSenKREHtrKKWoBh+OKTvuCUnHzEemK+CiwwRbNQ8l7xbp71wLyS08NpSB5C1y70
>> 7uft+kP6XOGE9AKLvsdEL1PIXHfeKNonBEN5mO6nsXIW+MQzou01zHgDtne7AxDA
>> 5OebQJfJtArmKt78WHuVg7h8gPcIRTRSW43LqJiADnIHL8fnZxj2v5yDiUQj7isw
>> nYWXEJ3rR7mlVgydN34KQ7gpVWmGjhrVb8N01+zYOMAaTBnekldHdueEAXR07eU0
>> PXiP9aOZiAxbEnDiJmreehjCuNFTagQqNeECRIHssSacfQXPxVljaImvuSfrxf8i
>> myQLzftiObINTIHSN4TVDKMyveYrU2hILCKfYuxnSJh29j35wsRSeftjntOEyHai
>> RDnrLD3fCPk4h3hCY6l60nqu9MQfbgdSB/FItvhiBGYqXvGb4+wuBeU9RT9SwG8N
>> XPih7nLNvqDNw38IkkDN
>> =qcvG
>> -----END PGP SIGNATURE-----
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-02-20  9:07         ` James Page
  2015-02-20  9:49           ` Blair Bethwaite
@ 2015-02-20 15:51           ` Mark Nelson
  2015-02-20 15:58             ` James Page
  1 sibling, 1 reply; 25+ messages in thread
From: Mark Nelson @ 2015-02-20 15:51 UTC (permalink / raw)
  To: James Page, Ceph Development
  Cc: Blinick, Stephen L, Jay Vosburgh, Colin Ian King,
	Patricia Gaughen, Leann Ogasawara



On 02/20/2015 03:07 AM, James Page wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hi All
>
> The Ubuntu Kernel team have spent the last few weeks investigating the
> apparent performance disparity between RHEL 7 and Ubuntu 14.04; we've
> focussed efforts in a few ways (see below).
>
> All testing has been done using the latest Firefly release.
>
> 1) Base network latency
>
> Jay Vosburgh looked at the base network latencies between RHEL 7 and
> Ubuntu 14.04; under default install, RHEL actually had slightly worse
> latency than Ubuntu due to the default enablement of a firewall;
> disabling this brought latency back inline between the two distributions:
>
> OS                      rtt min/avg/max/mdev
> Ubuntu 14.04 (3.13)     0.013/0.016/0.018/0.005 ms
> RHEL7 (3.10)            0.010/0.018/0.025/0.005 ms
>
> ...base network latency is pretty much the same.
>
> This testing was performed on a matched pair of Dell Poweredge R610's,
> configured with a single 4 core CPU and 8G of RAM.
>
> 2) Latency and performance in Ceph using Rados bench
>
> Colin King spent a number of days testing and analysing results using
> rados bench against a single node ceph deployment, configured with a
> single memory backed OSD, to see if we could reproduce the disparities
> reported.
>
> He ran 120 second OSD benchmarks on RHEL 7 as well as Ubuntu 14.04 LTS
> with a selection of kernels including 3.10 vanilla, 3.13.0-44 (release
> kernel), 3.16.0-30 (utopic HWE kernel), 3.18.0-12 (vivid HWE kernel)
> and 3.19-rc6 with 1, 16 and 128 client threads.  The data collected is
> available at [0].
>
> Each round of tests consisted of 15 runs, from which we computed
> average latency, latency deviation and latency distribution:
>
>> 120 second x 1 thread
>
> Results all seem to cluster around 0.04->0.05ms, with RHEL 7 averaging
> at 0.044 and recent Ubuntu kernels at 0.036-0.037ms.  The older 3.10
> kernel in RHEL 7 does have some slightly higher average latency.
>
>> 120 second x 16 threads
>
> Results all seem to cluster around 0.6-0.7ms.  3.19.0-rc6 had a couple
> of 1.4ms outliers which pushed it out to be worse than RHEL 7. On the
> whole Ubuntu 3.10-3.18 kernels are better than RHEL 7 by ~0.1ms.  RHEL
> shows a far higher standard deviation, due to the bimodal latency
> distribution, which from the casual observer may appear to be more
> "jittery".
>
>> 120 second x 128 threads
>
> Later kernels show up to have less standard deviation than RHEL 7, so
> that shows perhaps less jitter in the stats than RHEL 7's 3.10 kernel.
> With this many threads pounding the test, we get a wider spread of
> latencies and it is hard to tell any kind of latency distribution
> patterns with just 15 rounds because of the large amount of latency
> jitter.  All systems show a latency of ~ 5ms.  Taking into
> consideration the amount of jitter, we think these results do not make
> much sense unless we repeat these tests with say 100 samples.
>
> 3) Conclusion
>
> We’ve have not been able to show any major anomalies in Ceph on Ubuntu
> compared to RHEL 7 when using memstore.  Our current hypothesis is that
> one needs to run the OSD bench stressor many times to get a fair capture
> of system latency stats.  The reason for this is:
>
> * Latencies are very low with memstore, so any small jitter in
> scheduling etc will show up as a large distortion (as shown by the large
> standard deviations in the samples).
>
> * When memstore is heavily utilized, memory pressure causes the system
> to page heavily and so we are subject to the nature of perhaps delays on
> paging that cause some latency jitters.  Latency differences may be just
> down to where a random page is in memory or in swap, and with memstore
> these may cause the large perturbations we see when running just a
> single test.
>
> * We needed to make *many* tens of measurements to get a typical idea of
> average latency and the latency distributions. Don't trust the results
> from just one test
>
> * We ran the tests with a pool configured to 100 pgs and 100 pgps [1].
> One can get different results with different placement group configs.
>
> I've CC'ed both Colin and Jay on this mail - so if anyone has any
> specific questions about the testing they can chime in with responses.

Hi James,


Good testing!  Other than the NUMA questions in the other posts, is it 
possible the disparity we saw could be related to the network 
hardware/driver?  Our nodes just are using 1GbE Intel I350s.  I'm not 
sure what Stephen's uses, but given that he's at Intel I suspect it 
would also be an Intel NIC.  Any chance you could post the specs of the 
boxes you guys tested on so we could try to look for other differences?

Also, was there any additional tuning done on either of the boxes (other 
than disabling RHEL's firewall)?

Mark

>
> Regards
>
> James
>
> [0] http://kernel.ubuntu.com/~cking/.ceph/ceph-benchmarks.ods
> [1] http://ceph.com/docs/master/rados/configuration/pool-pg-config-ref/
>
> - --
> James Page
> Ubuntu and Debian Developer
> james.page@ubuntu.com
> jamespage@debian.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
>
> iQIcBAEBCAAGBQJU5vlrAAoJEL/srsug59jDMvAQAIhSR4GFTXNc4RLpHtLT6h/X
> K5uyauKZGtL+wqtPKRfsXqbbUw9I5AZDifQuOEJ0APccLIPbgqxEN3d2uht/qygH
> G8q2Ax+M8OyZz07yqTitnD4JV3RmL8wNHUveWPLV0gs2TzBBYwP1ywExbRPed3PY
> cfDrszgkQszA/JwT5W5YNf1LZc+5VpOEFrTiLIaRzUDoxg7mm6Hwr3XT8OFjZhjm
> LSenKREHtrKKWoBh+OKTvuCUnHzEemK+CiwwRbNQ8l7xbp71wLyS08NpSB5C1y70
> 7uft+kP6XOGE9AKLvsdEL1PIXHfeKNonBEN5mO6nsXIW+MQzou01zHgDtne7AxDA
> 5OebQJfJtArmKt78WHuVg7h8gPcIRTRSW43LqJiADnIHL8fnZxj2v5yDiUQj7isw
> nYWXEJ3rR7mlVgydN34KQ7gpVWmGjhrVb8N01+zYOMAaTBnekldHdueEAXR07eU0
> PXiP9aOZiAxbEnDiJmreehjCuNFTagQqNeECRIHssSacfQXPxVljaImvuSfrxf8i
> myQLzftiObINTIHSN4TVDKMyveYrU2hILCKfYuxnSJh29j35wsRSeftjntOEyHai
> RDnrLD3fCPk4h3hCY6l60nqu9MQfbgdSB/FItvhiBGYqXvGb4+wuBeU9RT9SwG8N
> XPih7nLNvqDNw38IkkDN
> =qcvG
> -----END PGP SIGNATURE-----
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-02-20 15:51           ` Mark Nelson
@ 2015-02-20 15:58             ` James Page
  0 siblings, 0 replies; 25+ messages in thread
From: James Page @ 2015-02-20 15:58 UTC (permalink / raw)
  To: Mark Nelson, Ceph Development
  Cc: Blinick, Stephen L, Jay Vosburgh, Colin Ian King,
	Patricia Gaughen, Leann Ogasawara

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Mark

On 20/02/15 15:51, Mark Nelson wrote:
>> I've CC'ed both Colin and Jay on this mail - so if anyone has
>> any specific questions about the testing they can chime in with
>> responses.
> Good testing!  Other than the NUMA questions in the other posts, is
> it possible the disparity we saw could be related to the network 
> hardware/driver?  Our nodes just are using 1GbE Intel I350s.  I'm
> not sure what Stephen's uses, but given that he's at Intel I
> suspect it would also be an Intel NIC.  Any chance you could post
> the specs of the boxes you guys tested on so we could try to look
> for other differences?

I know the Dell R610's have Broadcom cards:

  Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet

I need to check with Colin in the other machine we used.

> Also, was there any additional tuning done on either of the boxes
> (other than disabling RHEL's firewall)?

No - apart from the firewall disable, that's the only change over
default config we made.

- -- 
James Page
Ubuntu and Debian Developer
james.page@ubuntu.com
jamespage@debian.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBCAAGBQJU51mpAAoJEL/srsug59jDTiMQANJMiLacrGRDcPojBm28a1dr
dGifSxO2Vz1EEzH1XpuWYvUf55XYkOubsWSqPtMdzUKVvjIkvedT7rwZWdUoAih4
5z46RvEpMFEdlorg9G09HiVZkCccYJSyN1OkxgLV215jAqbx2fCOKoNEYCZNTjw3
l1wUFR2DNazUvyzTUDx6U0oICqFUYrih/3V6Ip6XFzwc3YxTIRqPvqKxy/4EH/Q4
RFo+PAJmNQAI6mC8oOJ45FCZcGqajN8LOzaMjGQiUrJcB8mYKnXOceb23+UfgpSS
TjprLs2wTTRKerOqvVV8pJdEnz+JYET5gJ1WhkCwYf+dT8eJTmpW3nmI/EwmCLKK
dLnUd6GbIsfe30/rRGBnYRWKlHWxyXe3DM3BWA/GPExyhNPVDO17pZtvBt30PEJS
cQAt4rrndqy0iJwF7T1h+VV0Z9hQ3zpM/cHvJaqhksdcBiRyx/Ac8RC5Eqd0NZov
9wFplIKisci7Nx44tpsOozmOYO2IikwiP45RHdAutV/UOVcAVFLWSHXU2PcqhMSz
H0TuUmpSYA9MSkIgza+ueTsFITb/QEZxyCYpvB7Gzrn04g6qZySEjHbma/lKflwZ
JUyrQdnCX0nLMMBGjgeH/WmJ7xdlTgDb+W1Vii+KIPLAjRqt/C0V8kk5tqNygi0g
KTUDGfuO5UF8VS8Ulg+t
=MXKB
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
       [not found]               ` <524687337.1545267.1424448115086.JavaMail.zimbra@oxygem.tv>
@ 2015-02-20 16:03                 ` Alexandre DERUMIER
  2015-02-20 16:12                   ` Mark Nelson
  2015-02-20 18:38                   ` Stefan Priebe
  0 siblings, 2 replies; 25+ messages in thread
From: Alexandre DERUMIER @ 2015-02-20 16:03 UTC (permalink / raw)
  To: Mark Nelson
  Cc: Blair Bethwaite, James Page, ceph-devel, Stephen L Blinick,
	Jay Vosburgh, Colin Ian King, Patricia Gaughen, Leann Ogasawara

>>http://rhelblog.redhat.com/2015/01/12/mysteries-of-numa-memory-management-revealed/ 
>>It's possible that this could be having an effect on the results. 

Isn't auto numa balancing enabled by default since kernel 3.8 ?

it can be checked with 

cat /proc/sys/kernel/numa_balancing



----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "Blair Bethwaite" <blair.bethwaite@gmail.com>, "James Page" <james.page@ubuntu.com>
Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "Stephen L Blinick" <stephen.l.blinick@intel.com>, "Jay Vosburgh" <jay.vosburgh@canonical.com>, "Colin Ian King" <colin.king@canonical.com>, "Patricia Gaughen" <patricia.gaughen@canonical.com>, "Leann Ogasawara" <leann.ogasawara@canonical.com>
Envoyé: Vendredi 20 Février 2015 16:38:02
Objet: Re: Memstore performance improvements v0.90 vs v0.87

I think paying attention to NUMA is good advice. One of the things that 
apparently changed in RHEL7 is that they are now doing automatic NUMA 
tuning: 

http://rhelblog.redhat.com/2015/01/12/mysteries-of-numa-memory-management-revealed/ 

It's possible that this could be having an effect on the results. 

Mark 

On 02/20/2015 03:49 AM, Blair Bethwaite wrote: 
> Hi James, 
> 
> Interesting results, but did you do any tests with a NUMA system? IIUC 
> the original report was from a dual socket setup, and that'd 
> presumably be the standard setup for most folks (both OSD server and 
> client side). 
> 
> Cheers, 
> 
> On 20 February 2015 at 20:07, James Page <james.page@ubuntu.com> wrote: 
>> -----BEGIN PGP SIGNED MESSAGE----- 
>> Hash: SHA256 
>> 
>> Hi All 
>> 
>> The Ubuntu Kernel team have spent the last few weeks investigating the 
>> apparent performance disparity between RHEL 7 and Ubuntu 14.04; we've 
>> focussed efforts in a few ways (see below). 
>> 
>> All testing has been done using the latest Firefly release. 
>> 
>> 1) Base network latency 
>> 
>> Jay Vosburgh looked at the base network latencies between RHEL 7 and 
>> Ubuntu 14.04; under default install, RHEL actually had slightly worse 
>> latency than Ubuntu due to the default enablement of a firewall; 
>> disabling this brought latency back inline between the two distributions: 
>> 
>> OS rtt min/avg/max/mdev 
>> Ubuntu 14.04 (3.13) 0.013/0.016/0.018/0.005 ms 
>> RHEL7 (3.10) 0.010/0.018/0.025/0.005 ms 
>> 
>> ...base network latency is pretty much the same. 
>> 
>> This testing was performed on a matched pair of Dell Poweredge R610's, 
>> configured with a single 4 core CPU and 8G of RAM. 
>> 
>> 2) Latency and performance in Ceph using Rados bench 
>> 
>> Colin King spent a number of days testing and analysing results using 
>> rados bench against a single node ceph deployment, configured with a 
>> single memory backed OSD, to see if we could reproduce the disparities 
>> reported. 
>> 
>> He ran 120 second OSD benchmarks on RHEL 7 as well as Ubuntu 14.04 LTS 
>> with a selection of kernels including 3.10 vanilla, 3.13.0-44 (release 
>> kernel), 3.16.0-30 (utopic HWE kernel), 3.18.0-12 (vivid HWE kernel) 
>> and 3.19-rc6 with 1, 16 and 128 client threads. The data collected is 
>> available at [0]. 
>> 
>> Each round of tests consisted of 15 runs, from which we computed 
>> average latency, latency deviation and latency distribution: 
>> 
>>> 120 second x 1 thread 
>> 
>> Results all seem to cluster around 0.04->0.05ms, with RHEL 7 averaging 
>> at 0.044 and recent Ubuntu kernels at 0.036-0.037ms. The older 3.10 
>> kernel in RHEL 7 does have some slightly higher average latency. 
>> 
>>> 120 second x 16 threads 
>> 
>> Results all seem to cluster around 0.6-0.7ms. 3.19.0-rc6 had a couple 
>> of 1.4ms outliers which pushed it out to be worse than RHEL 7. On the 
>> whole Ubuntu 3.10-3.18 kernels are better than RHEL 7 by ~0.1ms. RHEL 
>> shows a far higher standard deviation, due to the bimodal latency 
>> distribution, which from the casual observer may appear to be more 
>> "jittery". 
>> 
>>> 120 second x 128 threads 
>> 
>> Later kernels show up to have less standard deviation than RHEL 7, so 
>> that shows perhaps less jitter in the stats than RHEL 7's 3.10 kernel. 
>> With this many threads pounding the test, we get a wider spread of 
>> latencies and it is hard to tell any kind of latency distribution 
>> patterns with just 15 rounds because of the large amount of latency 
>> jitter. All systems show a latency of ~ 5ms. Taking into 
>> consideration the amount of jitter, we think these results do not make 
>> much sense unless we repeat these tests with say 100 samples. 
>> 
>> 3) Conclusion 
>> 
>> We’ve have not been able to show any major anomalies in Ceph on Ubuntu 
>> compared to RHEL 7 when using memstore. Our current hypothesis is that 
>> one needs to run the OSD bench stressor many times to get a fair capture 
>> of system latency stats. The reason for this is: 
>> 
>> * Latencies are very low with memstore, so any small jitter in 
>> scheduling etc will show up as a large distortion (as shown by the large 
>> standard deviations in the samples). 
>> 
>> * When memstore is heavily utilized, memory pressure causes the system 
>> to page heavily and so we are subject to the nature of perhaps delays on 
>> paging that cause some latency jitters. Latency differences may be just 
>> down to where a random page is in memory or in swap, and with memstore 
>> these may cause the large perturbations we see when running just a 
>> single test. 
>> 
>> * We needed to make *many* tens of measurements to get a typical idea of 
>> average latency and the latency distributions. Don't trust the results 
>> from just one test 
>> 
>> * We ran the tests with a pool configured to 100 pgs and 100 pgps [1]. 
>> One can get different results with different placement group configs. 
>> 
>> I've CC'ed both Colin and Jay on this mail - so if anyone has any 
>> specific questions about the testing they can chime in with responses. 
>> 
>> Regards 
>> 
>> James 
>> 
>> [0] http://kernel.ubuntu.com/~cking/.ceph/ceph-benchmarks.ods 
>> [1] http://ceph.com/docs/master/rados/configuration/pool-pg-config-ref/ 
>> 
>> - -- 
>> James Page 
>> Ubuntu and Debian Developer 
>> james.page@ubuntu.com 
>> jamespage@debian.org 
>> -----BEGIN PGP SIGNATURE----- 
>> Version: GnuPG v1 
>> 
>> iQIcBAEBCAAGBQJU5vlrAAoJEL/srsug59jDMvAQAIhSR4GFTXNc4RLpHtLT6h/X 
>> K5uyauKZGtL+wqtPKRfsXqbbUw9I5AZDifQuOEJ0APccLIPbgqxEN3d2uht/qygH 
>> G8q2Ax+M8OyZz07yqTitnD4JV3RmL8wNHUveWPLV0gs2TzBBYwP1ywExbRPed3PY 
>> cfDrszgkQszA/JwT5W5YNf1LZc+5VpOEFrTiLIaRzUDoxg7mm6Hwr3XT8OFjZhjm 
>> LSenKREHtrKKWoBh+OKTvuCUnHzEemK+CiwwRbNQ8l7xbp71wLyS08NpSB5C1y70 
>> 7uft+kP6XOGE9AKLvsdEL1PIXHfeKNonBEN5mO6nsXIW+MQzou01zHgDtne7AxDA 
>> 5OebQJfJtArmKt78WHuVg7h8gPcIRTRSW43LqJiADnIHL8fnZxj2v5yDiUQj7isw 
>> nYWXEJ3rR7mlVgydN34KQ7gpVWmGjhrVb8N01+zYOMAaTBnekldHdueEAXR07eU0 
>> PXiP9aOZiAxbEnDiJmreehjCuNFTagQqNeECRIHssSacfQXPxVljaImvuSfrxf8i 
>> myQLzftiObINTIHSN4TVDKMyveYrU2hILCKfYuxnSJh29j35wsRSeftjntOEyHai 
>> RDnrLD3fCPk4h3hCY6l60nqu9MQfbgdSB/FItvhiBGYqXvGb4+wuBeU9RT9SwG8N 
>> XPih7nLNvqDNw38IkkDN 
>> =qcvG 
>> -----END PGP SIGNATURE----- 
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
>> the body of a message to majordomo@vger.kernel.org 
>> More majordomo info at http://vger.kernel.org/majordomo-info.html 
> 
> 
> 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-02-20 16:03                 ` Alexandre DERUMIER
@ 2015-02-20 16:12                   ` Mark Nelson
       [not found]                     ` <298703592.1573873.1424506210041.JavaMail.zimbra@oxygem.tv>
  2015-02-20 18:38                   ` Stefan Priebe
  1 sibling, 1 reply; 25+ messages in thread
From: Mark Nelson @ 2015-02-20 16:12 UTC (permalink / raw)
  To: Alexandre DERUMIER
  Cc: Blair Bethwaite, James Page, ceph-devel, Stephen L Blinick,
	Jay Vosburgh, Colin Ian King, Patricia Gaughen, Leann Ogasawara



On 02/20/2015 10:03 AM, Alexandre DERUMIER wrote:
>>> http://rhelblog.redhat.com/2015/01/12/mysteries-of-numa-memory-management-revealed/
>>> It's possible that this could be having an effect on the results.
>
> Isn't auto numa balancing enabled by default since kernel 3.8 ?

No idea, I'm behind the times on what's been going on with numa tuning.

>
> it can be checked with
>
> cat /proc/sys/kernel/numa_balancing

At least at first glance it doesn't look like it's enabled on either our 
RHEL7 or Ubuntu nodes.  No numad running either.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
  2015-02-20 16:03                 ` Alexandre DERUMIER
  2015-02-20 16:12                   ` Mark Nelson
@ 2015-02-20 18:38                   ` Stefan Priebe
  1 sibling, 0 replies; 25+ messages in thread
From: Stefan Priebe @ 2015-02-20 18:38 UTC (permalink / raw)
  To: Alexandre DERUMIER, Mark Nelson
  Cc: Blair Bethwaite, James Page, ceph-devel, Stephen L Blinick,
	Jay Vosburgh, Colin Ian King, Patricia Gaughen, Leann Ogasawara

Am 20.02.2015 um 17:03 schrieb Alexandre DERUMIER:
>>> http://rhelblog.redhat.com/2015/01/12/mysteries-of-numa-memory-management-revealed/
>>> It's possible that this could be having an effect on the results.
>
> Isn't auto numa balancing enabled by default since kernel 3.8 ?
>
> it can be checked with
>
> cat /proc/sys/kernel/numa_balancing

I have it disabled in kernel due to many libc memory allocation failures 
when enabled.

Stefan
>
>
> ----- Mail original -----
> De: "Mark Nelson" <mnelson@redhat.com>
> À: "Blair Bethwaite" <blair.bethwaite@gmail.com>, "James Page" <james.page@ubuntu.com>
> Cc: "ceph-devel" <ceph-devel@vger.kernel.org>, "Stephen L Blinick" <stephen.l.blinick@intel.com>, "Jay Vosburgh" <jay.vosburgh@canonical.com>, "Colin Ian King" <colin.king@canonical.com>, "Patricia Gaughen" <patricia.gaughen@canonical.com>, "Leann Ogasawara" <leann.ogasawara@canonical.com>
> Envoyé: Vendredi 20 Février 2015 16:38:02
> Objet: Re: Memstore performance improvements v0.90 vs v0.87
>
> I think paying attention to NUMA is good advice. One of the things that
> apparently changed in RHEL7 is that they are now doing automatic NUMA
> tuning:
>
> http://rhelblog.redhat.com/2015/01/12/mysteries-of-numa-memory-management-revealed/
>
> It's possible that this could be having an effect on the results.
>
> Mark
>
> On 02/20/2015 03:49 AM, Blair Bethwaite wrote:
>> Hi James,
>>
>> Interesting results, but did you do any tests with a NUMA system? IIUC
>> the original report was from a dual socket setup, and that'd
>> presumably be the standard setup for most folks (both OSD server and
>> client side).
>>
>> Cheers,
>>
>> On 20 February 2015 at 20:07, James Page <james.page@ubuntu.com> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA256
>>>
>>> Hi All
>>>
>>> The Ubuntu Kernel team have spent the last few weeks investigating the
>>> apparent performance disparity between RHEL 7 and Ubuntu 14.04; we've
>>> focussed efforts in a few ways (see below).
>>>
>>> All testing has been done using the latest Firefly release.
>>>
>>> 1) Base network latency
>>>
>>> Jay Vosburgh looked at the base network latencies between RHEL 7 and
>>> Ubuntu 14.04; under default install, RHEL actually had slightly worse
>>> latency than Ubuntu due to the default enablement of a firewall;
>>> disabling this brought latency back inline between the two distributions:
>>>
>>> OS rtt min/avg/max/mdev
>>> Ubuntu 14.04 (3.13) 0.013/0.016/0.018/0.005 ms
>>> RHEL7 (3.10) 0.010/0.018/0.025/0.005 ms
>>>
>>> ...base network latency is pretty much the same.
>>>
>>> This testing was performed on a matched pair of Dell Poweredge R610's,
>>> configured with a single 4 core CPU and 8G of RAM.
>>>
>>> 2) Latency and performance in Ceph using Rados bench
>>>
>>> Colin King spent a number of days testing and analysing results using
>>> rados bench against a single node ceph deployment, configured with a
>>> single memory backed OSD, to see if we could reproduce the disparities
>>> reported.
>>>
>>> He ran 120 second OSD benchmarks on RHEL 7 as well as Ubuntu 14.04 LTS
>>> with a selection of kernels including 3.10 vanilla, 3.13.0-44 (release
>>> kernel), 3.16.0-30 (utopic HWE kernel), 3.18.0-12 (vivid HWE kernel)
>>> and 3.19-rc6 with 1, 16 and 128 client threads. The data collected is
>>> available at [0].
>>>
>>> Each round of tests consisted of 15 runs, from which we computed
>>> average latency, latency deviation and latency distribution:
>>>
>>>> 120 second x 1 thread
>>>
>>> Results all seem to cluster around 0.04->0.05ms, with RHEL 7 averaging
>>> at 0.044 and recent Ubuntu kernels at 0.036-0.037ms. The older 3.10
>>> kernel in RHEL 7 does have some slightly higher average latency.
>>>
>>>> 120 second x 16 threads
>>>
>>> Results all seem to cluster around 0.6-0.7ms. 3.19.0-rc6 had a couple
>>> of 1.4ms outliers which pushed it out to be worse than RHEL 7. On the
>>> whole Ubuntu 3.10-3.18 kernels are better than RHEL 7 by ~0.1ms. RHEL
>>> shows a far higher standard deviation, due to the bimodal latency
>>> distribution, which from the casual observer may appear to be more
>>> "jittery".
>>>
>>>> 120 second x 128 threads
>>>
>>> Later kernels show up to have less standard deviation than RHEL 7, so
>>> that shows perhaps less jitter in the stats than RHEL 7's 3.10 kernel.
>>> With this many threads pounding the test, we get a wider spread of
>>> latencies and it is hard to tell any kind of latency distribution
>>> patterns with just 15 rounds because of the large amount of latency
>>> jitter. All systems show a latency of ~ 5ms. Taking into
>>> consideration the amount of jitter, we think these results do not make
>>> much sense unless we repeat these tests with say 100 samples.
>>>
>>> 3) Conclusion
>>>
>>> We’ve have not been able to show any major anomalies in Ceph on Ubuntu
>>> compared to RHEL 7 when using memstore. Our current hypothesis is that
>>> one needs to run the OSD bench stressor many times to get a fair capture
>>> of system latency stats. The reason for this is:
>>>
>>> * Latencies are very low with memstore, so any small jitter in
>>> scheduling etc will show up as a large distortion (as shown by the large
>>> standard deviations in the samples).
>>>
>>> * When memstore is heavily utilized, memory pressure causes the system
>>> to page heavily and so we are subject to the nature of perhaps delays on
>>> paging that cause some latency jitters. Latency differences may be just
>>> down to where a random page is in memory or in swap, and with memstore
>>> these may cause the large perturbations we see when running just a
>>> single test.
>>>
>>> * We needed to make *many* tens of measurements to get a typical idea of
>>> average latency and the latency distributions. Don't trust the results
>>> from just one test
>>>
>>> * We ran the tests with a pool configured to 100 pgs and 100 pgps [1].
>>> One can get different results with different placement group configs.
>>>
>>> I've CC'ed both Colin and Jay on this mail - so if anyone has any
>>> specific questions about the testing they can chime in with responses.
>>>
>>> Regards
>>>
>>> James
>>>
>>> [0] http://kernel.ubuntu.com/~cking/.ceph/ceph-benchmarks.ods
>>> [1] http://ceph.com/docs/master/rados/configuration/pool-pg-config-ref/
>>>
>>> - --
>>> James Page
>>> Ubuntu and Debian Developer
>>> james.page@ubuntu.com
>>> jamespage@debian.org
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1
>>>
>>> iQIcBAEBCAAGBQJU5vlrAAoJEL/srsug59jDMvAQAIhSR4GFTXNc4RLpHtLT6h/X
>>> K5uyauKZGtL+wqtPKRfsXqbbUw9I5AZDifQuOEJ0APccLIPbgqxEN3d2uht/qygH
>>> G8q2Ax+M8OyZz07yqTitnD4JV3RmL8wNHUveWPLV0gs2TzBBYwP1ywExbRPed3PY
>>> cfDrszgkQszA/JwT5W5YNf1LZc+5VpOEFrTiLIaRzUDoxg7mm6Hwr3XT8OFjZhjm
>>> LSenKREHtrKKWoBh+OKTvuCUnHzEemK+CiwwRbNQ8l7xbp71wLyS08NpSB5C1y70
>>> 7uft+kP6XOGE9AKLvsdEL1PIXHfeKNonBEN5mO6nsXIW+MQzou01zHgDtne7AxDA
>>> 5OebQJfJtArmKt78WHuVg7h8gPcIRTRSW43LqJiADnIHL8fnZxj2v5yDiUQj7isw
>>> nYWXEJ3rR7mlVgydN34KQ7gpVWmGjhrVb8N01+zYOMAaTBnekldHdueEAXR07eU0
>>> PXiP9aOZiAxbEnDiJmreehjCuNFTagQqNeECRIHssSacfQXPxVljaImvuSfrxf8i
>>> myQLzftiObINTIHSN4TVDKMyveYrU2hILCKfYuxnSJh29j35wsRSeftjntOEyHai
>>> RDnrLD3fCPk4h3hCY6l60nqu9MQfbgdSB/FItvhiBGYqXvGb4+wuBeU9RT9SwG8N
>>> XPih7nLNvqDNw38IkkDN
>>> =qcvG
>>> -----END PGP SIGNATURE-----
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
       [not found]                     ` <298703592.1573873.1424506210041.JavaMail.zimbra@oxygem.tv>
@ 2015-02-21  8:10                       ` Alexandre DERUMIER
       [not found]                         ` <1429598219.1574757.1424509359439.JavaMail.zimbra@oxygem.tv>
  0 siblings, 1 reply; 25+ messages in thread
From: Alexandre DERUMIER @ 2015-02-21  8:10 UTC (permalink / raw)
  To: Mark Nelson
  Cc: Blair Bethwaite, James Page, ceph-devel, Stephen L Blinick,
	Jay Vosburgh, Colin Ian King, Patricia Gaughen, Leann Ogasawara

>>At least at first glance it doesn't look like it's enabled on either our 
>>RHEL7 or Ubuntu nodes. No numad running either. 

Do you have disable autonuma balancing manually ?

because look at kernel config of both rhel7 and ubuntu, it's really seem to be enabled by default:

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_NUMA_BALANCING=y


>>No numad running either. 

AFAIK, numad was used by RHEL6, and auto-numabalancing is used since RHEL7.




One Other thing you could also check, is if transparent hugepages is enabled or not.
It's known to slowdown some applications, like databases for example. (So maybe memstore too)

#cat /sys/kernel/mm/transparent_hugepage/enabled
    always madvise [never]



----- Mail original -----
De: "Mark Nelson" <mnelson@redhat.com>
À: "aderumier" <aderumier@odiso.com>
Cc: "Blair Bethwaite" <blair.bethwaite@gmail.com>, "James Page" <james.page@ubuntu.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Stephen L Blinick" <stephen.l.blinick@intel.com>, "Jay Vosburgh" <jay.vosburgh@canonical.com>, "Colin Ian King" <colin.king@canonical.com>, "Patricia Gaughen" <patricia.gaughen@canonical.com>, "Leann Ogasawara" <leann.ogasawara@canonical.com>
Envoyé: Vendredi 20 Février 2015 17:12:46
Objet: Re: Memstore performance improvements v0.90 vs v0.87

On 02/20/2015 10:03 AM, Alexandre DERUMIER wrote: 
>>> http://rhelblog.redhat.com/2015/01/12/mysteries-of-numa-memory-management-revealed/ 
>>> It's possible that this could be having an effect on the results. 
> 
> Isn't auto numa balancing enabled by default since kernel 3.8 ? 

No idea, I'm behind the times on what's been going on with numa tuning. 

> 
> it can be checked with 
> 
> cat /proc/sys/kernel/numa_balancing 

At least at first glance it doesn't look like it's enabled on either our 
RHEL7 or Ubuntu nodes. No numad running either. 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Memstore performance improvements v0.90 vs v0.87
       [not found]                         ` <1429598219.1574757.1424509359439.JavaMail.zimbra@oxygem.tv>
@ 2015-02-21  9:02                           ` Alexandre DERUMIER
  0 siblings, 0 replies; 25+ messages in thread
From: Alexandre DERUMIER @ 2015-02-21  9:02 UTC (permalink / raw)
  To: Mark Nelson
  Cc: Blair Bethwaite, James Page, ceph-devel, Stephen L Blinick,
	Jay Vosburgh, Colin Ian King, Patricia Gaughen, Leann Ogasawara

>>One Other thing you could also check, is if transparent hugepages is enabled or not.
>>It's known to slowdown some applications, like databases for example. (So maybe memstore too)
>>
>>#cat /sys/kernel/mm/transparent_hugepage/enabled
>>    always madvise [never]

This is disabled by default is you use the network-latency tuned profile

https://developerblog.redhat.com/2015/02/11/low-latency-performance-tuning-rhel-7/


----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "Mark Nelson" <mnelson@redhat.com>
Cc: "Blair Bethwaite" <blair.bethwaite@gmail.com>, "James Page" <james.page@ubuntu.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Stephen L Blinick" <stephen.l.blinick@intel.com>, "Jay Vosburgh" <jay.vosburgh@canonical.com>, "Colin Ian King" <colin.king@canonical.com>, "Patricia Gaughen" <patricia.gaughen@canonical.com>, "Leann Ogasawara" <leann.ogasawara@canonical.com>
Envoyé: Samedi 21 Février 2015 09:10:29
Objet: Re: Memstore performance improvements v0.90 vs v0.87

>>At least at first glance it doesn't look like it's enabled on either our 
>>RHEL7 or Ubuntu nodes. No numad running either. 

Do you have disable autonuma balancing manually ? 

because look at kernel config of both rhel7 and ubuntu, it's really seem to be enabled by default: 

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y 
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y 
CONFIG_NUMA_BALANCING=y 


>>No numad running either. 

AFAIK, numad was used by RHEL6, and auto-numabalancing is used since RHEL7. 




One Other thing you could also check, is if transparent hugepages is enabled or not. 
It's known to slowdown some applications, like databases for example. (So maybe memstore too) 

#cat /sys/kernel/mm/transparent_hugepage/enabled 
always madvise [never] 



----- Mail original ----- 
De: "Mark Nelson" <mnelson@redhat.com> 
À: "aderumier" <aderumier@odiso.com> 
Cc: "Blair Bethwaite" <blair.bethwaite@gmail.com>, "James Page" <james.page@ubuntu.com>, "ceph-devel" <ceph-devel@vger.kernel.org>, "Stephen L Blinick" <stephen.l.blinick@intel.com>, "Jay Vosburgh" <jay.vosburgh@canonical.com>, "Colin Ian King" <colin.king@canonical.com>, "Patricia Gaughen" <patricia.gaughen@canonical.com>, "Leann Ogasawara" <leann.ogasawara@canonical.com> 
Envoyé: Vendredi 20 Février 2015 17:12:46 
Objet: Re: Memstore performance improvements v0.90 vs v0.87 

On 02/20/2015 10:03 AM, Alexandre DERUMIER wrote: 
>>> http://rhelblog.redhat.com/2015/01/12/mysteries-of-numa-memory-management-revealed/ 
>>> It's possible that this could be having an effect on the results. 
> 
> Isn't auto numa balancing enabled by default since kernel 3.8 ? 

No idea, I'm behind the times on what's been going on with numa tuning. 

> 
> it can be checked with 
> 
> cat /proc/sys/kernel/numa_balancing 

At least at first glance it doesn't look like it's enabled on either our 
RHEL7 or Ubuntu nodes. No numad running either. 
-- 
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in 
the body of a message to majordomo@vger.kernel.org 
More majordomo info at http://vger.kernel.org/majordomo-info.html 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-02-21  9:02 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-14  7:05 Memstore performance improvements v0.90 vs v0.87 Blinick, Stephen L
2015-01-14 22:32 ` Blinick, Stephen L
2015-01-14 22:43   ` Mark Nelson
2015-01-14 23:39     ` Blinick, Stephen L
2015-01-27 21:03       ` Mark Nelson
2015-01-28  1:23         ` Blinick, Stephen L
2015-01-28 21:51           ` Mark Nelson
2015-01-29 12:51             ` James Page
2015-02-20  9:07         ` James Page
2015-02-20  9:49           ` Blair Bethwaite
2015-02-20 10:09             ` Haomai Wang
2015-02-20 15:38             ` Mark Nelson
     [not found]               ` <524687337.1545267.1424448115086.JavaMail.zimbra@oxygem.tv>
2015-02-20 16:03                 ` Alexandre DERUMIER
2015-02-20 16:12                   ` Mark Nelson
     [not found]                     ` <298703592.1573873.1424506210041.JavaMail.zimbra@oxygem.tv>
2015-02-21  8:10                       ` Alexandre DERUMIER
     [not found]                         ` <1429598219.1574757.1424509359439.JavaMail.zimbra@oxygem.tv>
2015-02-21  9:02                           ` Alexandre DERUMIER
2015-02-20 18:38                   ` Stefan Priebe
2015-02-20 15:51           ` Mark Nelson
2015-02-20 15:58             ` James Page
2015-01-14 22:44   ` Somnath Roy
2015-01-14 23:37     ` Blinick, Stephen L
2015-01-15 10:43     ` Andreas Bluemle
2015-01-15 17:09       ` Sage Weil
2015-01-15 17:15       ` Sage Weil
2015-01-19  9:28         ` Andreas Bluemle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.